CS 6900: Information Retrieval
Fall 2013


Time and Location: Mon, Wed, Fri 10:45am – 11:40am, ARC 121
Instructor: Razvan Bunescu
Office: Stocker 337
Office Hours: Mon, Wed, Fri 2:05pm – 3:00pm, or by email appointment
Email: bunescu @ ohio edu

Textbook:
  • Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze. Cambridge University Press, 2008.

  • Supplementary Texts:
  • Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Addison-Wesley, 1999, 2011.

  • Course description:
    This course covers the design, implementation, and evaluation of modern information retrieval systems, such as Web search engines. It will focus on the underlying retrieval models, algorithms, and system implementations, such as vector-space and probabilistic retrieval models, as well as the PageRank algorithm used by Google. The course will also cover more advanced topics in information retrieval, including document categorization and clustering, recommender systems, collaborative filtering, and personalized search.

    Prerequisites:
    The students are expected to be comfortable with programming and to exhibit a basic level of mathematical dexterity. Relevant background material in linear algebra and probability theory will be made available during the course.

    Lecture notes:
    1. Syllabus & Introduction
    2. Boolean Retrieval
    3. Word Distributions
    4. From Text to Tokens to Terms
    5. Text Processing with Python and NLTK
    6. Retrieval with Vector Space Models
    7. Probabilistic Information Retrieval
    8. Web Search and Web Crawling
    9. Scalable, High-performance IR using Lucene
    10. Evaluation Measures and Benchmark Datasets for IR
    11. Retrieval with Language Models
    12. Link Analysis for IR
    13. Web Interfaces for IR
    14. Latent Semantic Indexing
    Some of these slides are based on material from IR classes taught at UT Austin and Stanford.

    Homework Assignments:
    Final Project:
    Other online reading materials:
    Python and NLTK resources: