CS440 Lectures

CS 440/ECE 448
Fall 2019
Margaret Fleck

Lecture 36: Vector Semantics 1

Word meanings

What do words mean?
Why do we care?
What are they useful for?

In Russell and Norvig style: what concepts does a rational agent use to do its reasoning?

Seen from a neural net perspective, the key issue is that one-hot representations of words don't scale well. E.g. if we have a 100,000 word vocabulary, we use vectors of length 100,000. We should be able to give each word a distinct representation using a lot fewer parameters.

We can use models of word meaning for a variety of practical tasks.

model human intelligence
answering factual questions
finding word classes (e.g. for parsing)
- parts of speech (e.g. noun vs. verb)
- semantic classes: person? surface? soft material we can spread?
word prediction (e.g. for speech recognition)
judging fluency of text (e.g. translation, speech recognition)

Word meaning is complex

For each word, we'd like to learn

information about its basic sense (e.g. "bird", "apple")
distinctions between similar words (e.g. "apples" vs. "orange")
grammatical properties (e.g. noun vs. verb, singular vs. plural)
connotations (e.g. complimentary or pejorative, fancy vs. common word)

Many meanings can be expressed by a word that's more fancy or more plain. E.g. bellicose (fancy) and warlike (plain) mean the same thing. Words can also describe the same property in more or less complimentary terms, e.g. plump (nice) vs. fat (not nice). This leads to jokes about "irregular conjugations" such as

I'm confident
You're assertive
She's aggressive

Old-style logic-based representations

Definition of "bird" from Oxford Living Dictionaries (Oxford University Press)

"A warm-blooded egg-laying vertebrate animal distinguished by the possession of feathers, wings, a beak, and typically by being able to fly."

Back in the Day, people tried to turn such definitions into representations with the look and feel of formal logic:

isa(bird, animal)
AND has(bird, wings)
AND flies(bird)
AND if female(bird), then lays(bird,eggs) ....

Problems:

Very hard to build (mostly by experts)
Capture the meaning poorly
Context matters
Words fit into a larger vocabulary

The above example is not only incompete, but also has a bug: not all birds fly. It also contains very little information on what birds look like or how they act, so not much help for recognition.

When labelling objects, people use context as well as intrinsic properties. A famous experiment by William Labov (1975, "The boundaries of words and their meanings") showed that the relative probabilities of two labels (e.g. cup vs. bowl) change gradually as properties (e.g. aspect ratio) are varied. Imagining food (e.g. rice) in the container makes the bowl label more likely. Flowers would encourage a container with ambiguous shape to be called a vase. The figures below show some of the pictures used and a graph of how often subjects used "bowl" vs "cup" in neutral and food contexts.

People are also sensitive to the overall structure of the vocabulary. The "Principle of Contrast" states that differences in form imply differences in meaning. (Eve Clark 1987, though idea goes back earlier). For example, kids will say things like "that's not an animal, that's a dog." Adults have a better model that words can refer to more or less general categories of objects, but still make mistakes like "that's not a real number, it's an integer." Apparent synonyms seem to inspire a search for some small difference in meaning. For example, how is "graveyard" different from "cemetery"? Perhaps cemeteries are prettier, or not adjacent to a church, or fancier, or ...

People can also be convinced that two meanings are distinct because an expert says they are, even when they cannot explain how they differ (an observation originally due to Hilary Putnam). For example, pewter (used to make food dishes) and nickel silver (used to make keys for instruments) are similar looking dull silver-colored metals used as replacements for silver. The difference in name tells people that they must be different. But most people would have to trust an expert because they can't tell them apart. Even fewer could tell you that they are alloys with these compositions:

nickel silver: 60% copper, 20% nickel and 20% zinc
pewter: 91% tin, 7.5% antimony, and 1.5% copper

Context-based representations

A different approach to representing word meanings, which has recently been working better in practice is to observe how people use each word.

"You shall know a word by the company it keeps" (J. R. Firth, 1957)

People have a lot of information to work with, e.g. what's going on when the word was said, how other people react to it, even actual explanations from other people. Most computer algorithms only get to see running text. So we approximate this idea by observing which words occur in the same textual context. That is words that occur together often share elements of meaning.

hour test, syllabus, assignment, lecture
milk, flour, spoon, oven, cook

Figuring out an unfamiliar word from examples in context:

Authentic biltong is flavored with coriander.
John took biltong on his hike.
Antelope biltong is better than ostrich biltong .

The first context suggests it's food. The second context suggests that it's not perishable. The third suggests it involves meat.

Word Embeddings (vector semantics)

Idea: let's represent each word as a vector of numerical feature values.

These feature vectors are called word embeddings. In an ideal world, the embeddings might look like this embedding of words into 2D, except that the space has a lot more dimensions. (In practice, a lot of massaging would be required to get a picture this clean.)

from Jurafsky and Martin

AI in action

Apparently Tesla's software doesn't know much about police, leaving them with an interesting puzzle: how to persuade an automonmous vehicle to pull over.

Go master quits because AI 'cannot be defeated'