CS 447: Natural Language Processing

Required readings refer to chapters in Jurafsky and Martin (2008), Speech and Language Processing, 2nd edition, unless stated otherwise.

Optional readings are often more advanced. "MS" refers to chapters in Manning and Schütze (1999), Foundations of Statistical Natural Language Processing (you may need to use a campus machine to access these links) or to original research papers (you can find many more on the ACL anthology). I also recommend the Handbook of Computational Linguistics and Natural Language Processing (you also need to be on the campus network to access this site).

Week	Date	Lecture	Topic

01	08/25	01	Introduction	1up 4up	HW0 out
			What is NLP? What will you learn in this class?
			Required reading: Ch.1
			Optional reading: Python tutorial (sec. 1-5), Jelinek (2009), Ferrucci et al. (2010)
			Links: NLTK

01	08/27	02	Finite-state methods for morphology	1up 4up
			What is the structure of words, and how can we model it? Review of finite-state automata. Finite-state transducers
			Required reading: Ch.3.1-7;
			Optional reading: Karttunen and Beesley (2005), Mohri (1997), the Porter stemmer, Sproat et al. (1996)

02	09/01	03	N-gram language models	1up 4up
			The most basic probabilistic models of language. Also: review of basic probability
			Required reading: Ch. 4.1-4
			Optional reading: MS, Ch. 2

02	09/03	04	Smoothing	1up 4up
			How can we predict what we haven't seen before?
			Required reading: Ch.4.5-7
			Optional reading: MS, Ch.6, Chen and Goodman (1998)

03	09/08	05	Evaluating language models	1up 4up
			Perplexity, task-based evaluation.
			Required reading: Ch 4.5
			Optional reading: TBD

03	09/10	06	Part-of-speech tagging	1up 4up	HW1 out
			What are parts of speech? How many are there? Basic intro to HMMs.
			Required reading: Ch. 5.1-5
			Optional reading: Merialdo (1994), Christodoulopoulos et al. (2010), Roche & Schabes (1995)

04	09/15	07	Part-of-speech tagging with Hidden Markov Models	1up 4up
			The Viterbi algorithm.
			Required reading: Ch. 5.1-5
			Optional reading: Merialdo (1994), Christodoulopoulos et al. (2010), Roche & Schabes (1995)

04	09/17	08	Learning Hidden Markov Models	1up 4up
			The Forward-Backward algorithm
			Required reading: Ch. 6.1-5
			Optional reading: MS, Ch. 9

05	09/22	09	Sequence labeling tasks	1up 4up
			Chunking, shallow parsing, named entity recognition, MEMMs
			Required reading: Ch. 6.6-8
			Optional reading: Sutton & McCallum (2008) (Introduction to Conditional Random Fields), Berger et al. (1996), Etzioni et al. (2008) (web-scale information extraction)

05	09/24	10	Brown clusters	1up 4up
			How can we learn to group words based on their context?
			Required reading:Ch. 4.10
			Optional reading: MS, Ch. 14.1, Brown et al. (1992b)

06	09/29	11	Vector-space semantics	1up 4up
			"You shall know a word by the company it keeps" (Firth, 1957)
			Required reading: Ch. 20.7
			Optional reading: Schutze (1998), Pantel and Turney (2010), Mikolov et al. (2013)

06	10/01	12	Word Sense Disambiguation	1up 4up	HW1 due. HW2 out (handout).
			How do we know what is meant by the plant next to the bank?
			Required reading: Ch.20.1-5
			Optional reading: Yarowsky(1995), Abney (2004)

07	10/06	13	Review for Midterm	1up 4up

07	10/08	14	Midterm
			Good luck!

08	10/13	15	Formal grammars for English	1up 4up
			What is the structure of sentences, and how can we model it? Phrase-structure grammar and dependency grammar. Review of basic English grammar and context-free grammars
			Required reading: Ch. 12.1-3, Ch. 12.7
			Optional reading: MS, Ch. 3, Woods (2010)

08	10/15	16	(Probabilistic) Context-Free Grammar parsing	1up 4up
			How can we represent and deal with syntactic ambiguity?
			Required reading: Ch. 13.1-4, Ch. 14.1
			Optional reading: Chi (1999)

09	10/20	17	Probabilistic Context-Free Grammars	1up 4up
			Algorithms for learning and parsing with PCFGs
			Required reading: Ch. 14.1-3
			Optional reading: Collins' notes, Chi & Geman (1998), Schabes et al. (1993), Schabes & Pereira (1992), Stolcke (1995)

09	10/22	18	Treebanks and statistical parsing	1up 4up	HW2 due. HW3 out
			Going beyond simple PCFGs; Penn Treebank parsing
			Required reading: Ch. 14.4-7, Ch. 12.4
			Optional reading: Marcus et al. (1993), Collins (1997), Johnson (1998), Klein & Manning (2003), Petrov & Klein (2007), Hindle & Rooth

10	10/27	19	Dependency parsing	1up 4up
			Dependency treebanks and parsing
			Required reading: McDonald & Nivre (2007)
			Optional reading: Nivre & Scholz (2004), Kubler et al. (2009), Nivre (2010), McDonald & Nivre (2011)

10	10/29	20	Feature structure grammars	1up 4up
			Feature structures and unification
			Required reading: Ch. 15.1-4
			Optional reading: Abney (1997), Miyao & Tsujii (2008)

11	11/03	21	Expressive Grammars	1up 4up
			Mildly context-sensitive grammars: Tree-adjoining grammar, Combinatory Categorial grammar
			Required reading: Ch. 16.1, Ch.16.3
			Optional reading: Joshi and Schabes (1997), Steedman & Baldridge (2011), Schabes & Shieber, Schabes & Waters (1995), Bangalore & Joshi (1999), Hockenmaier & Steedman (2007), Clark & Curran (2007)

11	11/05	22	Introduction to machine translation	1up 4up
			Why is MT difficult? Non-statistical approaches to MT (Vauquois triangle); Noisy channel model
			Required reading: Ch. 25.1-4
			Optional reading: Brown et al. (1990), Lopez (2008)

12	11/10	23	Word Alignment	1up 4up
			The prerequisite for building a translation model
			Required reading: Ch. 25.5-6
			Optional reading: Brown et al. (1993)

12	11/12	24	Phrase-based Machine Translation	1up 4up	HW3 due. HW4 out (handout).
			Training and using a statistical MT system
			Required reading: Ch. 25.4, 25.7-9
			Optional reading: Koehn et al., Och & Ney (2004), Wu (1997), Chiang (2007)
			Links: www.statmt.org

13	11/17	25	Compositional Semantics	1up 4up
			What is the meaning of a sentence, and how can we represent it? Basic predicate logic and lambda calculus
			Required reading: Ch. 17.2-3
			Optional reading: Blackburn & Bos (2003)
			Links: Penn Lambda Calculator

13	11/19	26	Lexical Semantics	1up 4up
			What is the meaning of a word, and how can we represent it?
			Required reading: Ch. 19.1-4
			Optional reading: Palmer et al. (2005), Gildea & Jurafsky (2002), Punyakanok et al. (2008)
			Links: WordNet

14	12/01	27	Natural language generation	1up 4up
			Very brief intro to NLG, summarization, dialog
			Required reading:
			Optional reading: Reiter & Belz (2012), Stoyanov et al. (2009), Ng (2010)

14	12/03	28	Review and outlook	1up 4up
			Very brief intro to deep learning for NLP

			Optional reading: Goldberg (2015), see also Stanford's Deep Learning for NLP class

15	12/08	29	Review for final exam	1up 4up	HW4 due