CS440 Quiz 2

CS 440/ECE 448
Fall 2021
Margaret Fleck

Quiz 2 skills list

The quiz will be on Wednesday 29 September covering material through the Bayes Nets lectures.

It will be available on moodle 7am to noon central time, and you will have 30 minutes to do it.

Word types vs. word tokens
The Bag of Words model
Bigrams, ngrams
Data cleaning:
- tokenization
- stemming (including Julie Lovins and Martin Porter)
- making units of useful size: dividing words or grouping characters
Specal types of words and how we might handle them
- stop words
- rare words
- filler
- backchannel
- function vs. content

Roles of training, development, test datasets.
Evaluation metrics for classification (true positive rate, accuracy, recall, confusion matrix, ...)

Basic definitions and mathematical model:

Bayes rule
Likelihood, prior, posterior
argmax operator
Independence and conditional independence
Maximum a posteriori (MAP) esimate, Maximum likelihood (ML) estimate, factoring out P(evidence)
How does prior affect these estimates?
How do we combine several conditionally independent pieces of evidence into one estimate of P(cause | evidence)?
How do we choose the best value for the cause/class?
How does the size of a Naive Bayes model compare to a full joint distribution?
Why does it matter that Naive Bayes reduces the number of parameters we need to estimate?

Applying Naive Bayes to text classification

MAP and MLE versions of the estimation equations
Estimating probabilities from data
Avoiding underflow (log transforms)
Avoiding overfitting
Smoothing
- Why is it important?
- Laplace smoothing
- Deleted estimation
- Ngram smoothing (high level ideas only)
Headline results: spam detection (SpamCop, Pantel and Lin), gender classification (Boulis and Ostendorf)

Required structure (e.g. it's a DAG, probability table at each node)
How does geometry relate to conditional independence assumptions
What do local node connections imply about independence? conditional indepedence?
- Explaining away
Comparing size of model to full joint probability table
Reconstructing the joint distribution from a Bayes net
Building a Bayes net from a joint probability distribution
- How is this affected by the order in which we choose variables?
Topological sort of a partial order
Inference, aka calculating specific probability values using the geometry and tables in a Bayes net (very simple examples only)
Efficiency results for inference
- Efficient for polytrees
- Some other geometries can be solved efficiently via junction tree algorithm
- NP complete (aka probably exponential) in general case