CS 440/ECE 448
Fall 2019
Margaret Fleck
Midterm 2 skills list
This exam covers material through the perceptrons part of the October 28th lecture.
You're expected to still remember material from the first midterm,
but we'll focus on new concepts. Also expect new questions on
topics on which you've done an MP since the last exam.
Historical tidbits
Very briefly, who/what/when were the following?
- Julie Lovins
- SHRDLU
- Pattern Playback
- Parsey McParseface
Modelling text data
- Word types vs. word tokens
- The Bag of Words model
- Bigrams, ngrams
- Data cleaning:
- tokenization
- stemming (Porter stemmer)
- making units of useful size: dividing words or grouping characters
- Specal types of words and how we might handle them
- stop words
- rare words
- filler
- backchannel
- function vs. content
Naive Bayes
- MAP and MLE versions of the estimation equations
- Estimating probabilities from data
- Avoiding underflow (log transforms)
- Avoiding overfitting
- Smoothing
- Why is it important?
- Laplace smoothing
- Deleted estimation
- Ngram smoothing (high level ideas only)
Bayes nets
- Required structure (e.g. it's a DAG, probability table at each node)
- How does geometry relate to conditional independence assumptions
- What do local node connections imply about independence? conditional indepedence?
- Comparing size of model to full joint probability table
- Reconstructing the joint distribution from a Bayes net
- Building a Bayes net from a joint probability distribution
- How is this affected by the order in which we choose variables?
- Topological sort of a partial order
- Inference, aka calculating specific probability values using the geometry and tables in a
Bayes net (very simple examples only)
- Efficiency results for inference
- Efficient for polytrees
- Some other geometries can be solved efficiently via junction tree algorithm
- NP complete (aka probably exponential) in general case
Natural Language
Shallow knowledge only, but be familiar with the vocabulary.
- Some sample tasks (e.g. translation, question answering, ..)
- Speech: waveform, spectrogram, formants, phones
- Word segmentation (e.g. suffixes)
- Part of speech (POS) tagging
- Parsing
- Deep vs. shallow system
- Dialog coherency
- Formant synthesis vs. concatenative synthesis
- Phonological changes
- Constituency vs. dependency trees
- Attachment of prepositional phrases (why is it difficult?)
- Semantics (e.g. semantic role labelling, sentiment analysis, co-reference resolution, ...)
POS tagging
- General familiarity with common POS tags (e.g. Noun, Determiner)
- Approximate size of typical POS tag sets
- Single word with multiple possible tags
- Baseline tagging algorithm
HMMs
- Markov assumptions
- k-word context can be simulated using 1-word context
- Graphical model picture, what variables depend on which other ones in the HMM
- Component probabilities (initial, emission, transition)
- Equations for computing probability of a given tag sequence
- Tag transition possibilities as a finite-state automaton
- Viterbi (trells) decoding algorithm
- Building the trellis
- Extracting the output tag sequence from the trellis
- Big-O running time
- At what points does an HMM need smoothing?
- Unseen words, also words seen with novel tags
- Estimating distribution of unseen words using ...
- Words that appear in development data
- Words in training data that are "hapax," i.e. appear only once
Computer vision
Shallow knowledge only, but be familiar with the vocabulary.
Basic concepts
- Pinhole camera
- Pixels
- Edge detection, segmentation
- Texture, shading
- Why might an object look different in two pictures (deformation, lighting, aspect,
occlusion, ...)
Applications
- Reconstructing 3D geometry
- why is it useful?
- from multiple 2D views of a scene
- from a single picture (based on training data)
- Image generation
- Identifying/naming objects in a picture
- Localizing/registering objects within a picture
- Visual question answering, captioning, semantic role labelling for a picture
Problems
- What set of categories should we recognize?
- Limitations of training data
- Adversarial examples
Classifiers
General design
- Uses for classification: labelling objects, making decisions
- Types of supervision (supervised, unsupervised, semi-supervised, self-supervised)
- What can we tune? (probabilities, weights, tuning constants, general design features)
- Batch vs. incremental training
- Direct vs. indirect feedback, immediate vs. delayed feedback
- Noise in "correct" answers/annotation
Specific techniques
- Nearest neighbor classifiers
- Decision trees, random forests
- Entropy: definition, how it relates to evaluating possible splits in a decision tree
- k-nearest neighbors (how it works, what happens if you change k)
- L1 vs. L2 norm
Perceptrons
- Linearly separable dataset
- Basics of how perceptrons work
- Overall training algorithm (e.g. epochs, random processing order)
- Rule for updating perceptron weights
- Limitations of perceptrons and ways to address them
- Multi-class perceptrons