Instructor: ChengXiang Zhai
Contact Information: Email: czhai AT illinois DOT edu; Phone: (217) 244-4943
Teaching Assistants:
Note: This page provides basic information to help students decide whether they would be interested in taking the course. More up-to-date information about the course is available on the Course Space on Campuswire .
The growth of "big data" created unprecedented opportunities to leverage computational and statistical approaches, which turn raw data into actionable knowledge that can support various application tasks. This is especially true for the optimization of decision making in virtually all application domains, such as health and medicine, security and safety, learning and education, scientific discovery, and business intelligence. This course covers general computational techniques for building intelligent text information systems to help users manage and make use of large amounts of text data in all kinds of applications.
Text data include all data in the form of natural language text (e.g., English text or Chinese text), including all web pages, social media data such as tweets, news, scientific literature, emails, government documents, and many other kinds of enterprise data. Text data play an essential role in our lives. Since we communicate using natural languages, we produce and consume a large amount of text data every day covering all kinds of topics. The explosive growth of text data makes it impossible for people to consume all the relevant text data in a timely manner.
The two main techniques to assist people in consuming, digesting, and making use of the text data are
ChengXiang Zhai, Sean Massung, Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining, ACM and Morgan & Claypool Publishers, 2016. (click here to read the book online)
Students should come with good programming skills. CS225 or CS400 or an equivalent course is required. Knowledge of basic probability and statistics is a plus. If you are not sure whether you have the right background, please contact the instructor.
Online Lectures: All the lectures have been pre-recorded and are available via the CS410 Course Site on Coursera. The students enrolled in the online Master of Computer Science (MCS) program should all have already had access to materials on Coursera. Those on-campus students taking this course will need to go through a special process in order to have access to materials on Coursera. (The likely process is that they would be added to a "generic" Illinois Onboarding course on Coursera, which is short but is part of the enrollment mapping process. Students would have to complete the Onboarding Courses in order to access their courses. There should be detailed instructions about this sent to those of you who have registered.)
Weekly Quizzes and two Exams: Students are expected to complete a weekly quiz on Coursera after finishing each module. There are 12 modules with 6 modules on Text Retrieval and 6 modules on Text Mining. There will be two 1-hour exams, covering Text Retrieval and Text Mining, respectively. The first exam, which covers the first 6 modules on Text Retrieval, will be given in the middle of the semester; the second, which covers the 6 modules on Text Mining, will be given a couple of weeks before the end of the semester. Both exams will be managed through Proctor-U.
Programming Assignments: There will be a few programming assignments spreading over the semester to enable students gain practical skills by working on software toolkits, and experimenting with data sets and ideas for improving algorithms.
Course Project : The students are also expected to finish a course project. Group projects are highly encouraged. While project activities may spread over the entire semester, the main time period when students are expected to work intensively on the course projects is the last a couple of weeks of the semester, after finishing both exams.
Technology Review: Students who take the course for 4 credit-hours are required to finish a Technology Review, which would be graded as Pass or Fail.
A+: [95,100] A: [90,94] A-: [85, 89] B+: [80, 84] B: [75, 79] B-: [70,74] C: [60, 69] D: [55,59] F: <55