CS440 Midterm 2

CS 440/ECE 448
Fall 2019
Margaret Fleck

Midterm 2 skills list

This exam covers material through the perceptrons part of the October 28th lecture.

You're expected to still remember material from the first midterm, but we'll focus on new concepts. Also expect new questions on topics on which you've done an MP since the last exam.

Historical tidbits

Very briefly, who/what/when were the following?

Julie Lovins
SHRDLU
Pattern Playback
Parsey McParseface

Modelling text data

Word types vs. word tokens
The Bag of Words model
Bigrams, ngrams
Data cleaning:
- tokenization
- stemming (Porter stemmer)
- making units of useful size: dividing words or grouping characters
Specal types of words and how we might handle them
- stop words
- rare words
- filler
- backchannel
- function vs. content

Naive Bayes

MAP and MLE versions of the estimation equations
Estimating probabilities from data
Avoiding underflow (log transforms)
Avoiding overfitting
Smoothing
- Why is it important?
- Laplace smoothing
- Deleted estimation
- Ngram smoothing (high level ideas only)

Bayes nets

Required structure (e.g. it's a DAG, probability table at each node)
How does geometry relate to conditional independence assumptions
What do local node connections imply about independence? conditional indepedence?
- Explaining away
Comparing size of model to full joint probability table
Reconstructing the joint distribution from a Bayes net
Building a Bayes net from a joint probability distribution
- How is this affected by the order in which we choose variables?
Topological sort of a partial order
Inference, aka calculating specific probability values using the geometry and tables in a Bayes net (very simple examples only)
Efficiency results for inference
- Efficient for polytrees
- Some other geometries can be solved efficiently via junction tree algorithm
- NP complete (aka probably exponential) in general case

Natural Language

Shallow knowledge only, but be familiar with the vocabulary.

Some sample tasks (e.g. translation, question answering, ..)
Speech: waveform, spectrogram, formants, phones
Word segmentation (e.g. suffixes)
Part of speech (POS) tagging
Parsing
Deep vs. shallow system
Dialog coherency
Formant synthesis vs. concatenative synthesis
Phonological changes
Constituency vs. dependency trees
Attachment of prepositional phrases (why is it difficult?)
Semantics (e.g. semantic role labelling, sentiment analysis, co-reference resolution, ...)

POS tagging

General familiarity with common POS tags (e.g. Noun, Determiner)
Approximate size of typical POS tag sets
Single word with multiple possible tags
Baseline tagging algorithm

HMMs

Markov assumptions
k-word context can be simulated using 1-word context
Graphical model picture, what variables depend on which other ones in the HMM
Component probabilities (initial, emission, transition)
Equations for computing probability of a given tag sequence
Tag transition possibilities as a finite-state automaton
Viterbi (trells) decoding algorithm
- Building the trellis
- Extracting the output tag sequence from the trellis
- Big-O running time
At what points does an HMM need smoothing?
Unseen words, also words seen with novel tags
Estimating distribution of unseen words using ...
- Words that appear in development data
- Words in training data that are "hapax," i.e. appear only once

Computer vision

Shallow knowledge only, but be familiar with the vocabulary.

Basic concepts

Pinhole camera
Pixels
Edge detection, segmentation
Texture, shading
Why might an object look different in two pictures (deformation, lighting, aspect, occlusion, ...)

Applications

Reconstructing 3D geometry
- why is it useful?
- from multiple 2D views of a scene
- from a single picture (based on training data)
Image generation
Identifying/naming objects in a picture
Localizing/registering objects within a picture
Visual question answering, captioning, semantic role labelling for a picture

Problems

What set of categories should we recognize?
Limitations of training data
Adversarial examples

Classifiers

General design

Uses for classification: labelling objects, making decisions
Types of supervision (supervised, unsupervised, semi-supervised, self-supervised)
What can we tune? (probabilities, weights, tuning constants, general design features)
Batch vs. incremental training
Direct vs. indirect feedback, immediate vs. delayed feedback
Noise in "correct" answers/annotation

Specific techniques

Nearest neighbor classifiers
Decision trees, random forests
Entropy: definition, how it relates to evaluating possible splits in a decision tree
k-nearest neighbors (how it works, what happens if you change k)
L1 vs. L2 norm

Perceptrons

Linearly separable dataset
Basics of how perceptrons work
Overall training algorithm (e.g. epochs, random processing order)
Rule for updating perceptron weights
Limitations of perceptrons and ways to address them
Multi-class perceptrons