CS410 Text Information Systems (Fall 2022)

Instructor: ChengXiang Zhai Contact Information: Email: czhai AT illinois DOT edu; Phone: (217) 244-4943
Teaching Assistants:

Time & Place: This is an online course offered via the Coursera platform. All class activities happen online, mostly on (Coursera CS410 Course Site) (link only works for students enrolled in the class) and Campuswire (click to join)

Note: This page provides basic information to help students decide whether they would be interested in taking the course. More up-to-date information about the course is available on the Course Space on Campuswire .

About the Course ( Course Introduction Slides)

The growth of "big data" created unprecedented opportunities to leverage computational and statistical approaches, which turn raw data into actionable knowledge that can support various application tasks. This is especially true for the optimization of decision making in virtually all application domains, such as health and medicine, security and safety, learning and education, scientific discovery, and business intelligence. This course covers general computational techniques for building intelligent text information systems to help users manage and make use of large amounts of text data in all kinds of applications.

Text data include all data in the form of natural language text (e.g., English text or Chinese text), including all web pages, social media data such as tweets, news, scientific literature, emails, government documents, and many other kinds of enterprise data. Text data play an essential role in our lives. Since we communicate using natural languages, we produce and consume a large amount of text data every day covering all kinds of topics. The explosive growth of text data makes it impossible for people to consume all the relevant text data in a timely manner.

The two main techniques to assist people in consuming, digesting, and making use of the text data are

  1. Text retrieval, which helps identifying the most relevant text data to a particular problem from a large collection of text documents, thus avoiding to process a large number of non-relevant documents, and
  2. Text mining, which helps users further analyze and digest the found relevant text data and extract actionable knowledge for finishing a task.
This course covers both text retrieval and text mining, so as to provide you with the opportunity to see the complete spectrum of techniques used in building an intelligent text information system. Building on top of two MOOCs on Coursera covering the same topics and including a course project, this course enables you to learn the basic concepts, principles, and general techniques in text retrieval and mining, as well as gain hands-on experience with using software tools to develop interesting text data applications.

Textbook

ChengXiang Zhai, Sean Massung, Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining, ACM and Morgan & Claypool Publishers, 2016. (click here to read the book online)

Prerequisites

Students should come with good programming skills. CS225 or CS400 or an equivalent course is required. Knowledge of basic probability and statistics is a plus. If you are not sure whether you have the right background, please contact the instructor.

Format

Online Lectures: All the lectures have been pre-recorded and are available via the CS410 Course Site on Coursera. The students enrolled in the online Master of Computer Science (MCS) program should all have already had access to materials on Coursera. Those on-campus students taking this course will need to go through a special process in order to have access to materials on Coursera. (The likely process is that they would be added to a "generic" Illinois Onboarding course on Coursera, which is short but is part of the enrollment mapping process. Students would have to complete the Onboarding Courses in order to access their courses. There should be detailed instructions about this sent to those of you who have registered.)

Weekly Quizzes and two Exams: Students are expected to complete a weekly quiz on Coursera after finishing each module. There are 12 modules with 6 modules on Text Retrieval and 6 modules on Text Mining. There will be two 1-hour exams, covering Text Retrieval and Text Mining, respectively. The first exam, which covers the first 6 modules on Text Retrieval, will be given in the middle of the semester; the second, which covers the 6 modules on Text Mining, will be given a couple of weeks before the end of the semester. Both exams will be managed through Proctor-U.

Programming Assignments: There will be a few programming assignments spreading over the semester to enable students gain practical skills by working on software toolkits, and experimenting with data sets and ideas for improving algorithms.

Course Project : The students are also expected to finish a course project. Group projects are highly encouraged. While project activities may spread over the entire semester, the main time period when students are expected to work intensively on the course projects is the last a couple of weeks of the semester, after finishing both exams.

Technology Review: Students who take the course for 4 credit-hours are required to finish a Technology Review, which would be graded as Pass or Fail.

Office Hours

The Instructor and TAs will hold weekly online office hours on Zoom.

Grading

Grading will be based on the following weighting scheme: For students taking the course for 4 credit hours, if they completed the Technology Review satisfactorily, the weighting scheme would be applied in the same way as those who are taking the course for 3 credit hours. If they failed to complete the Technology Review, their maximum grade would be 75 points (out of 100 points), and the weighting scheme above would be applied to the total 75 points (instead of 100 points). The letter grades are determined based on the following mapping:
A+: [95,100]
A:  [90,94]
A-: [85, 89]
B+: [80, 84]
B: [75, 79]
B-: [70,74]
C: [60, 69]
D: [55,59]
F: <55
Students are strongly encouraged to help each other through actively answering questions for each other on Piazza. The most active contributors on Piazza will receive up to 5 points extra credit, which would help move your grade up by one bracket.