Almost every lecture will be accompanied by an article. Exams will contain two parts, with roughly half the points for each part: (1) short-answer or multiple-choice questions about the assigned articles, and (2) long-answer questions based on the homework.
- Section 1: Supervised Learning
- August 23: SVD, pseudo-inverse, linear regression. M. Planitz, Inconsistent Systems of Linear Equations, the Mathematical Gazette 63(425):181-185, 1979
- August 25: logistic regression. D.R. Cox, The Regression Analysis of Binary Sequences, Journal of the Royal Statistical Society, Series B, 20(2):215-242, 1958.
- August 30: no lecture.
- September 1: perceptron. Mehryar Mohri and Afshin Rostamizadeh, Perceptron Mistake Bounds, Arxiv 1305.0208, 2013
- September 6: support vector machine. C.J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Knowledge Discovery and Data Mining 2(2), 1998
- September 8, 13, 15: No lecture
- September 20: back-propagation. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams, Learning representations by back-propagating errors, Nature 323:533-536, 1986
- September 22: Time-delay neural networks. Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano and Kevin J. Lang, Phoneme Recognition Using Time-Delay Neural Networks, IEEE Trans. ASSP 37:328-339, 1989
- September 27: convolutional neural networks. Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms Towards AI, in Large-Scale Kernel Machines, L. Bottou, O. Chapelle, D. Decoste and J. Weston, Eds., 2007
- September 29: Exam 1 review
- October 4: Exam 1
- Topic 2: Unsupervised and Semi-supervised Learning
- October 6: Expectation maximization. A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B 39(1):1-38, 1977
- October 11: Gaussian mixture models. Jeff Bilmes, A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models, International Computer Sciences Institute TR-97-021, 1997
- October 13: Parzen windows. Emanuel Parzen, On Estimation of a Probability Density Function and Mode, Annals of Mathematical Statistics 33(3):1065-1076, 1962.
- October 17: smart PCA. Sam Roweis, EM Algorithms for PCA and SPCA, Cal Tech. Research Note, 1999
- October 20: restricted Boltzmann machines. Geoffrey E. Hinton, Training products of experts by minimizing contrastive divergence, Neural computation 14:(8):1771-1800, 2002
- October 25: Exam 2 review
- October 27: Exam 2
- Topic 3: Time
- November 1: back-propagation through time. Paul Werbos, Back-propagation through time: What it is and why we do it, Proceedings of the IEEE 78(10):1550-1560, 1990.
- November 3: LSTM (long short-term memory network). Sepp Hochreiter and Jurgen Schmidhuber, Long Short-Term Memory. Neural Computation 9(8):1735-1780, 1997.
- November 8: encoder-decoder networks. William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals, Listen, Attend and Spell, arXiv:1508.01211, 2015.
- November 10: generative adversarial networks. Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson and Minh N. Do, Semantic Image Inpainting with Perceptual and Contextual Losses, ArXiv 26 Jul 2016
- November 15: hidden Markov models. Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proc. IEEE 77(2):257-286
- November 17: Hybrid LSTM-HMM. Alex Graves, Navdeep Jaitly and Abdel-rahman Mohamed, Hybrid Speech Recognition with Deep Bidirectional LSTM, Proc. IEEE ASRU, 2013
- November 29: finite state transducers. Mehryar Mohri, Fernando Pereira and Michael Riley, Weighted finite-state transducers in speech recognition, Computer Speech and Language 16:69-88, 2002
- December 1: spiking neurons. Fred Rieke, D. Warland, and W. Bialek, Coding efficiency and information rates in sensory neurons, Europhys. Lett. 22:151-6, 1993.
- December 6: polychronization. Eugene M. Izhikevich, Polychronization: Computation with Spikes. Neural Compuutation 18:245-282, 2006.
- December 8: Exam 3 review.
- Final exam week: Exam 3
Extra Readings
Here are some papers that are not critical to the course material (not covered in any quiz) but might be of interest to some of you.
- self-training. H.J. Scudder, Probability of Error of Some Adaptive Pattern-Recognition Machines, IEEE Trans. Information Theory 11:363-371, 1965
- G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Processing Magazine 29(6):82-97, 2012
- simulated annealing. Bruce Hajek, A Tutorial Survey of Theory and Applications of Simulated Annealing, Proc. 24th Conference on Decision and Control, 755-760, 1985
- higher-order learning. Diederik P. Kingma and Jimmy P. Ba, Adam: A method for stochastic optimization, Arxiv 1402.6980, 2014
- transfer learning. Y. Bengio, "Deep learning of representations for unsupervised and transfer learning," JMLR: Proceedings of Unsupervised and Transfer Learning Challenge and Workshop, pp. 17-36, 2012
- multi-task learning. R. Caruana, "Multitask learning", Machine Learning 28(1):41-75, 1997
- Hybrid neural net-HMMs. Nelson Morgan and Herve Bourlard, Neural networks for statistical recognition of continuous speech, Proceedings of the IEEE 83(5):742-770, 1995.
