CS 753/853 Topics / Information Retrieval and Web Search

Note

This course will be managed on MyCourses and Piazza.

Overview

This course covers basic and advanced algorithms and techniques for Web search engines as well as text-based information retrieval in general.

After this course you will be able to develop your own web search engine or customize existing retrieval frameworks such as Apache Lucene. Every week we will carefully examine a different component of a web search engine system.

The course focuses on index building, query processing, and document ranking. We will further touch on text-based machine learning methods, such as classification and clustering, as well as crawling and link-based algorithms such as Google’s PageRank.

The course will cover several algorithms and data structures with application to web search, thereby building on CS 515 “Data Structures”. Both theoretical analyses of run-time performance as well as hands-on programming assignments and a class project are part of the course.

Information retrieval methods are an essential component in any text-based data analytics system, ranging from text mining and machine learning, to natural language processing and knowledge management applications.

Prerequisites: Data Structures (CS 515) or permission of instructor. Ability to independently write programs in either Java, Python, Haskell, or Scala.

Grading Policy

Your grade will be based on written exams (50%) and a project (50%). Note, that you need to obtain a passing grade in both the exam and the project to pass for this course!

The project will be carried out in teams of up to three people. The project will be implemented in a programming language of your choice. The projects need to be presented in class and will be graded based on a final report. It is necessary to document the individual contribution of each team member.

Both a midterm and a final exam will be offered, together constituting 50% of the final grade.

Bi-weekly homework assignments can be carried out in teams of up to three students. These assignments will graded on a pass-fail-excellent scale. A pass or excellent needs to be obtained on at least four (out of five) homework assignments in order to be eligible for participation in the final exam and submission of a final project report. Students with two or more excellents obtain an upgrade to the next higher letter grade (e.g., B- to B, or B+ to A-).

The same policy applies to both students taking the course as CS 753 and CS 853. Of course, expectations for students taking the course for graduate credits under CS 853 are higher.

Late homework and project report submissions will generally be excluded. Any missed activity due to medical or families emergencies requires supporting documentation.

Academic Integrity

The instructor is strongly committed to upholding the standards of academic integrity. These standards, at the minimum, require that students never present the work of others as their own. Any dishonest behavior, once discovered, will be penalized according to the University’s Student Code of Conduct.

Mutual Expectations

Students are expected to:

The instructor is expected to:

Note that is not sufficient to just be present in class and submit homeworks. Obtaining an A requires that you study and review materials from lecture notes, assignments, and discussions with the help of the book. If stuck, please see the instructor.

Textbooks

The lecture is based on “Introduction to Information Retrieval”. Other books are recommendations for further reading.

C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008 (available at http://nlp.stanford.edu/IR-book).

B. Croft, D. Metzler, T. Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, 2009 (available at http://ciir.cs.umass.edu/irbook/ ).

C. Zhai and S. Massung, Text Data management and Analysis: A Practical Introduction to Information Retrieval and Text Mining”, ACM and Morgan & Claypool Publishers, 2016. (obtain through http://www.morganclaypoolpublishers.com/catalog_Orig/product_info.php?products_id=944 )

R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 2011 (2nd Edition).

Schedule

Note that this schedule is preliminary and will possibly change as the course progresses. Chapter references are based on the book Introduction to Information Retrieval (IIS).

An earlier edition of a similar course was taught at Mannheim University.

Important Dates

Midterm exam: TBD Final exam: TBD