Artificial Intelligence

The term artificial intelligence (AI) was coined in 1956, 5 years before the term computer science was coined. These terms long represented two philosophies towards what computers could do. Computer science represented what we know computers can do, artificial intelligence represented our aspirations that computers might one day seem intelligent.

To this day, there is no formal boundary between the two terms, but artificial intelligence started to develop an identity as a field within the discipline of computer science in the 1970s. It has been a somewhat fluid field, with problems and algorithms entering and exiting its domain more often than happens with many other fields. That said, the following classes of problems and algorithms are often called artificial intelligence:

It is worth noting that not every problem or algorithm in the above catagories is commonly called AI; some algorithms are seen as too simple to be called AI and others have been absorbed by other fields of computing instead.

1 Artificial General Intelligence

Artificial general intelligence (AGI) is the ill-defined idea that an algorithm might be able to handle the same variety of tasks as a human, or more broadly be as intelligent as a human. Given the lack of clear definitions of human intelligence, the infinite variety of tasks that could be contemplated, and the wide range of human ability, it is common for people to disagree about whether a given system has achieved AGI and about what would be needed to move a non-AGI system towards AGI.

The notion of AGI brings up a host of difficult moral and ethical questions. If an algorithm were to reach sufficient AGI that it had all the characteristics we associate with humans, would it have rights? Would powering it down be murder and programming it be slavery? Thus far we have not created any algorithm that has forced us to resolve these questions; whether we ever will is a matter on which AI researchers disagree.

As a general observation (with exceptions), I have found that

2 Classical AI

Classical AI uses algorithms designed by humans who understand, decompose, and solve the AI problem. While they may use some data, the bulk of their behavior is programmed, not extracted from data.

An AI-controlled character in an open-world game might follow a set of programmed rules like if far away, move toward the human-controlled character or when fighting, randomly pick one of these attacks.

This is an example of a purely human-created AI algorithm.

A chess-playing AI might have programmed rules for looking ahead several moves, with the desirability of a board position at the end of that forecasting based on data derived from playing many games.

This is an example of a mostly human-created AI algorithm with some data used to inform details of its behavior.

As of 2026, classical AI is widely used when the problem to be solved is well-defined and the computational resources available to solve the problem are limited.

While classical AI is powerful and widespread, it is not readily summarized with broad patterns. Each algorithm is based on a specific context and how humans model that context’s patterns, leading to each looking and operating differently.

3 Machine Learning

Machine learning refers to programs that define a large family of functions and picks one of those functions to best match a large pool of data.

The data used to pick a function from the family of functions is called training data.

3.1 Types of machine-learning problems

Problems that machine learning solves can be broadly categorized into the following groups:

Suppose we have data about student study habits and grades in INFO 102.


Regression might come up with a function mapping study habits to grade, something like grade=50+hours of study3\text{grade} = 50 + \dfrac{\text{hours of study}}{3} Regression won’t be fully accurate, but hopefully it will match much of the variation in grades.


Clustering might come up with something like

There are 3 ways students study. You use the 2nd way.

By itself, clustering provides no additional information about the clusters, but we can often augment it with some descriptive differences between the clusters such as studies a little every week or studies for many hours the day before a quiz.


Classification might come up with something like

Your study habits suggest you’ll earn an A.

Classification is similar to clustering, in that it puts each datum in one of several groups; but unlike clustering those groups have to be provided in the training data. I could train a classifier to say you study like a math major because majors are available to me in the course roster, but I couldn’t train it to say you study like a future leader because what students will do after graduation is not available to me.


Generative AI might come up with something like

You probably won’t study at all next Wednesday.

It takes the patterns present in the data thus far and predicts what is likely to come next.

3.2 Function families

Within machine learning research, it is common to refer to machine learning algorithms by the function families they use.

There are many function families with many names, and the names don’t follow any single schema or structure:

Function families are chosen broadly based on three criteria:

  1. Efficient algorithms to perform the training.

    We could always pick a random function from the family, see how well it matches our objective, then pick another; but such a random search is very inefficient. We prefer function families where we can make educated guesses instead.

    One common way to make educated guesses is called hill climbing. Instead of just saying that a function doesn’t match the training data, hill climbing uses how it fails to match the training data to inform which function to try next.

  2. Flexibility to express the kinds of patterns the data contains.

    If we use the function family lines (y=mx+by = mx + b) to try to match data that is curved, we’re guaranteed to fail because the function family is too limited.

    One form of flexibility is the number of parameters, typically in the form of numbers that can be chosen to separate one function in the family from another. A line has two parameters (mm and bb in y=mx+by = mx + b), a parabola has three (the aa, bb, and cc in y=ax2+bx+cy = a x^2 + b x + c), and so on.

    Another form of flexibility is if adding more parameters adds the kind of differences between functions that the data needs. This is harder to define than parameter count but does mean that some function families are better (for some applications) than others.

  3. Resistance to over-fitting.

    Over-fitting occurs when the function selected from the function family matches the details of the training data instead of the patterns the training data is meant to exemplify. Over-fitting is common when the function famly has large numbers of parameters, and is more common in some function families than others.

3.3 Ways machine learning happens

It is typically much more computational difficult to find a function from the function family that matches the training data than it is to evaluate that function once it is selected. This difference in difficulty leads to three broad categories of training.