CS 6900: Information Retrieval
Time and Location: Mon, Wed, Fri 10:45am – 11:40am, ARC 121
Instructor: Razvan Bunescu
Office: Stocker 337
Office Hours: Mon, Wed, Fri 2:05pm – 3:00pm, or by email appointment
Email: bunescu @ ohio edu
Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze. Cambridge University Press, 2008.
Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Addison-Wesley, 1999, 2011.
This course covers the design, implementation, and evaluation of modern information retrieval systems, such as Web search engines. It will focus on the underlying retrieval models, algorithms, and system implementations, such as vector-space and probabilistic retrieval models, as well as the PageRank algorithm used by Google. The course will also cover more advanced topics in information retrieval, including document categorization and clustering, recommender systems, collaborative filtering, and personalized search.
The students are expected to be comfortable with programming and to exhibit a basic level of mathematical dexterity. Relevant background material in linear algebra and probability theory will be made available during the course.
Some of these slides are based on material from IR classes taught at UT Austin and Stanford.
- Syllabus & Introduction
- Boolean Retrieval
- Word Distributions
- From Text to Tokens to Terms
- Text Processing with Python and NLTK
- Retrieval with Vector Space Models
- Probabilistic Information Retrieval
- Web Search and Web Crawling
- Scalable, High-performance IR using Lucene
- Evaluation Measures and Benchmark Datasets for IR
- Retrieval with Language Models
- Link Analysis for IR
- Most important person on English language Wikipedia? Newspaper version and Arxiv version.
- Project: modify ranking algorithm to account for "diversity" of links/topics?
- Web Interfaces for IR
- Latent Semantic Indexing
Other online reading materials:
Python and NLTK resources: