CS 440/ECE 448
Fall 2021
Margaret Fleck
Quiz 2 skills list
The quiz will be on Wednesday 29 September covering material through the
Bayes Nets lectures.
It will be available on moodle 7am to noon central time, and you will have 30 minutes to do it.
Probability
- Random variables, axioms of probability
- Joint, marginal, conditional probability
Modelling text data
- Word types vs. word tokens
- The Bag of Words model
- Bigrams, ngrams
- Data cleaning:
- tokenization
- stemming (including Julie Lovins and Martin Porter)
- making units of useful size: dividing words or grouping characters
- Specal types of words and how we might handle them
- stop words
- rare words
- filler
- backchannel
- function vs. content
Testing
- Roles of training, development, test datasets.
- Evaluation metrics for classification (true positive rate, accuracy, recall, confusion matrix, ...)
Naive Bayes
Basic definitions and mathematical model:
- Bayes rule
- Likelihood, prior, posterior
- argmax operator
- Independence and conditional independence
- Maximum a posteriori (MAP) esimate, Maximum likelihood (ML) estimate,
factoring out P(evidence)
- How does prior affect these estimates?
- How do we combine several conditionally independent pieces of evidence
into one estimate of P(cause | evidence)?
- How do we choose the best value for the cause/class?
- How does the size of a Naive Bayes model compare to a full joint distribution?
- Why does it matter that Naive Bayes reduces the number of parameters we need to estimate?
Applying Naive Bayes to text classification
- MAP and MLE versions of the estimation equations
- Estimating probabilities from data
- Avoiding underflow (log transforms)
- Avoiding overfitting
- Smoothing
- Why is it important?
- Laplace smoothing
- Deleted estimation
- Ngram smoothing (high level ideas only)
- Headline results: spam detection (SpamCop, Pantel and Lin),
gender classification (Boulis and Ostendorf)
Bayes nets
- Required structure (e.g. it's a DAG, probability table at each node)
- How does geometry relate to conditional independence assumptions
- What do local node connections imply about independence? conditional indepedence?
- Comparing size of model to full joint probability table
- Reconstructing the joint distribution from a Bayes net
- Building a Bayes net from a joint probability distribution
- How is this affected by the order in which we choose variables?
- Topological sort of a partial order
- Inference, aka calculating specific probability values using the geometry and tables in a
Bayes net (very simple examples only)
- Efficiency results for inference
- Efficient for polytrees
- Some other geometries can be solved efficiently via junction tree algorithm
- NP complete (aka probably exponential) in general case