Required readings refer to chapters in Jurafsky and Martin (2008), Speech and Language Processing, 2nd edition, unless stated otherwise.
Optional readings are often more advanced. "MS" refers to chapters in Manning and Schütze (1999), Foundations of Statistical Natural Language Processing (you may need to use a campus machine to access these links) or to original research papers (you can find many more on the ACL anthology). I also recommend the Handbook of Computational Linguistics and Natural Language Processing (you also need to be on the campus network to access this site).
Week | Date | Lecture | Topic | ||
01 | 08/25 | 01 | Introduction | 1up 4up | HW0 out |
What is NLP? What will you learn in this class? | |||||
Required reading: Ch.1 | |||||
Optional reading: Python tutorial (sec. 1-5), Jelinek (2009), Ferrucci et al. (2010) | |||||
Links: NLTK | |||||
01 | 08/27 | 02 | Finite-state methods for morphology | 1up 4up | |
What is the structure of words, and how can we model it? Review of finite-state automata. Finite-state transducers | |||||
Required reading: Ch.3.1-7; | |||||
Optional reading: Karttunen and Beesley (2005), Mohri (1997), the Porter stemmer, Sproat et al. (1996) | |||||
02 | 09/01 | 03 | N-gram language models | 1up 4up | |
The most basic probabilistic models of language. Also: review of basic probability | |||||
Required reading: Ch. 4.1-4 | |||||
Optional reading: MS, Ch. 2 | |||||
02 | 09/03 | 04 | Smoothing | 1up 4up | |
How can we predict what we haven't seen before? | |||||
Required reading: Ch.4.5-7 | |||||
Optional reading: MS, Ch.6, Chen and Goodman (1998) | |||||
03 | 09/08 | 05 | Evaluating language models | 1up 4up | |
Perplexity, task-based evaluation. | |||||
Required reading: Ch 4.5 | |||||
Optional reading: TBD | |||||
03 | 09/10 | 06 | Part-of-speech tagging | 1up 4up | HW1 out |
What are parts of speech? How many are there? Basic intro to HMMs. | |||||
Required reading: Ch. 5.1-5 | |||||
Optional reading: Merialdo (1994), Christodoulopoulos et al. (2010), Roche & Schabes (1995) | |||||
04 | 09/15 | 07 | Part-of-speech tagging with Hidden Markov Models | 1up 4up | |
The Viterbi algorithm. | |||||
Required reading: Ch. 5.1-5 | |||||
Optional reading: Merialdo (1994), Christodoulopoulos et al. (2010), Roche & Schabes (1995) | |||||
04 | 09/17 | 08 | Learning Hidden Markov Models | 1up 4up | |
The Forward-Backward algorithm | |||||
Required reading: Ch. 6.1-5 | |||||
Optional reading: MS, Ch. 9 | |||||
05 | 09/22 | 09 | Sequence labeling tasks | 1up 4up | |
Chunking, shallow parsing, named entity recognition, MEMMs | |||||
Required reading: Ch. 6.6-8 | |||||
Optional reading: Sutton & McCallum (2008) (Introduction to Conditional Random Fields), Berger et al. (1996), Etzioni et al. (2008) (web-scale information extraction) | |||||
05 | 09/24 | 10 | Brown clusters | 1up 4up | |
How can we learn to group words based on their context? | |||||
Required reading:Ch. 4.10 | |||||
Optional reading: MS, Ch. 14.1, Brown et al. (1992b) | |||||
06 | 09/29 | 11 | Vector-space semantics | 1up 4up | |
"You shall know a word by the company it keeps" (Firth, 1957) | |||||
Required reading: Ch. 20.7 | |||||
Optional reading: Schutze (1998), Pantel and Turney (2010), Mikolov et al. (2013) | |||||
06 | 10/01 | 12 | Word Sense Disambiguation | 1up 4up | HW1 due. HW2 out (handout). |
How do we know what is meant by the plant next to the bank? | |||||
Required reading: Ch.20.1-5 | |||||
Optional reading: Yarowsky(1995), Abney (2004) | |||||
07 | 10/06 | 13 | Review for Midterm | 1up 4up | |
07 | 10/08 | 14 | Midterm | ||
Good luck! | |||||
08 | 10/13 | 15 | Formal grammars for English | 1up 4up | |
What is the structure of sentences, and how can we model it? Phrase-structure grammar and dependency grammar. Review of basic English grammar and context-free grammars | |||||
Required reading: Ch. 12.1-3, Ch. 12.7 | |||||
Optional reading: MS, Ch. 3, Woods (2010) | |||||
08 | 10/15 | 16 | (Probabilistic) Context-Free Grammar parsing | 1up 4up | |
How can we represent and deal with syntactic ambiguity? | |||||
Required reading: Ch. 13.1-4, Ch. 14.1 | |||||
Optional reading: Chi (1999) | |||||
09 | 10/20 | 17 | Probabilistic Context-Free Grammars | 1up 4up | |
Algorithms for learning and parsing with PCFGs | |||||
Required reading: Ch. 14.1-3 | |||||
Optional reading: Collins' notes, Chi & Geman (1998), Schabes et al. (1993), Schabes & Pereira (1992), Stolcke (1995) | |||||
09 | 10/22 | 18 | Treebanks and statistical parsing | 1up 4up | HW2 due. HW3 out |
Going beyond simple PCFGs; Penn Treebank parsing | |||||
Required reading: Ch. 14.4-7, Ch. 12.4 | |||||
Optional reading: Marcus et al. (1993), Collins (1997), Johnson (1998), Klein & Manning (2003), Petrov & Klein (2007), Hindle & Rooth | |||||
10 | 10/27 | 19 | Dependency parsing | 1up 4up | |
Dependency treebanks and parsing | |||||
Required reading: McDonald & Nivre (2007) | |||||
Optional reading: Nivre & Scholz (2004), Kubler et al. (2009), Nivre (2010), McDonald & Nivre (2011) | |||||
10 | 10/29 | 20 | Feature structure grammars | 1up 4up | |
Feature structures and unification | |||||
Required reading: Ch. 15.1-4 | |||||
Optional reading: Abney (1997), Miyao & Tsujii (2008) | |||||
11 | 11/03 | 21 | Expressive Grammars | 1up 4up | |
Mildly context-sensitive grammars: Tree-adjoining grammar, Combinatory Categorial grammar | |||||
Required reading: Ch. 16.1, Ch.16.3 | |||||
Optional reading: Joshi and Schabes (1997), Steedman & Baldridge (2011), Schabes & Shieber, Schabes & Waters (1995), Bangalore & Joshi (1999), Hockenmaier & Steedman (2007), Clark & Curran (2007) | |||||
11 | 11/05 | 22 | Introduction to machine translation | 1up 4up | |
Why is MT difficult? Non-statistical approaches to MT (Vauquois triangle); Noisy channel model | |||||
Required reading: Ch. 25.1-4 | |||||
Optional reading: Brown et al. (1990), Lopez (2008) | |||||
12 | 11/10 | 23 | Word Alignment | 1up 4up | |
The prerequisite for building a translation model | |||||
Required reading: Ch. 25.5-6 | |||||
Optional reading: Brown et al. (1993) | |||||
12 | 11/12 | 24 | Phrase-based Machine Translation | 1up 4up | HW3 due. HW4 out (handout). |
Training and using a statistical MT system | |||||
Required reading: Ch. 25.4, 25.7-9 | |||||
Optional reading: Koehn et al., Och & Ney (2004), Wu (1997), Chiang (2007) | |||||
Links: www.statmt.org | |||||
13 | 11/17 | 25 | Compositional Semantics | 1up 4up | |
What is the meaning of a sentence, and how can we represent it? Basic predicate logic and lambda calculus | |||||
Required reading: Ch. 17.2-3 | |||||
Optional reading: Blackburn & Bos (2003) | |||||
Links: Penn Lambda Calculator | |||||
13 | 11/19 | 26 | Lexical Semantics | 1up 4up | |
What is the meaning of a word, and how can we represent it? | |||||
Required reading: Ch. 19.1-4 | |||||
Optional reading: Palmer et al. (2005), Gildea & Jurafsky (2002), Punyakanok et al. (2008) | |||||
Links: WordNet | |||||
14 | 12/01 | 27 | Natural language generation | 1up 4up | |
Very brief intro to NLG, summarization, dialog | |||||
Required reading: | |||||
Optional reading: Reiter & Belz (2012), Stoyanov et al. (2009), Ng (2010) | |||||
14 | 12/03 | 28 | Review and outlook | 1up 4up | |
Very brief intro to deep learning for NLP | |||||
Optional reading: Goldberg (2015), see also Stanford's Deep Learning for NLP class | |||||
15 | 12/08 | 29 | Review for final exam | 1up 4up | HW4 due |