CS 440/ECE 448
Fall 2023
Margaret Fleck
Quiz 3 skills list
The quiz will cover material through Hidden Markov Models.
Bayes nets
- Required structure (e.g. it's a DAG, probability table at each node)
- How does the geometry (i.e. nodes, edges) of a bayes net relate to conditional independence of the
probabilities it represents?
- What do local node connections imply about independence? conditional indepedence?
- Comparing size of model to full joint probability table
- Reconstructing the joint distribution from a Bayes net
- Building a Bayes net from a joint probability distribution
- How is this affected by the order in which we choose variables?
- Topological sort of a partial order
- Inference, aka calculating specific probability values using the geometry and tables in a
Bayes net (very simple examples only)
- Efficiency results for inference
- Efficient for polytrees (and what is a polytree?)
- Some other geometries can be solved efficiently via junction tree algorithm
- NP complete (aka probably exponential) in general case
Natural Language and HMM history
Very briefly, who/what/when were the following?
- Julie Lovins
- SHRDLU
- Zork
- Pattern Playback
- John Kelly and Louis Gerstman
- Parsey McParseface
- Baum-Welch algorithm
- Viterbi algorithm
- GPT-3
- ILEX system
Natural Language
Types of systems
- Sample tasks (e.g. translation, question answering, ..)
- Deep vs. shallow
- Examples of shallow systems
- Examples of deep systems
- Danger of seeming too fluent
Processing pipeline
- Speech: waveform, spectrogram, formants, phones, synthesis (formant, concatenative)
- Dividing into words, normalizing words, segmenting words (e.g. suffixes), phonological changes
- Finding morphemes
- Part of speech (POS) tagging
- Parsing
- Constituency vs. dependency trees
- Shallow parsers
- Unlexicalized, class-based, Lexicalized
- Semantics
- Shallow semantics
- Sentiment analysis
- Semantic role labelling
- Co-reference resolution
- Dialog coherency
POS tagging
- General familiarity with common POS tags (e.g. Noun, Determiner)
- Approximate size of typical POS tag sets
- Single word with multiple possible tags
- Baseline tagging algorithm
HMMs
- Baseline algorithm
- Markov assumptions
- Graphical model picture, what variables depend on which other ones in the HMM
- Component probabilities (initial, emission, transition)
- Equations for computing probability of a given tag sequence
- Tag transition possibilities as a finite-state automaton
- At what points does an HMM need smoothing?
- Estimating probabilityof unseen word/tag pairs
- Unseen words, also words seen with novel tags
- Words that appear in development data
- Words in training data that are "hapax," i.e. appear only once
- Open-class vs. closed-class words
- Viterbi (trells) decoding algorithm
- Building the trellis
- Extracting the output tag sequence from the trellis
- Big-O running time
- Extensions: bigram, guessing from word form