This is an archived copy of a previous semester's site.
Please see the current semester's site.
It is common for CS programs like ours (ones that teach you C++ and Java before Python) to introduce Python with a few minutes of hand-wavy examples and then let you figure it out from there. And this works, in the sense that you’ll write code that runs. But it also tends to make programmers who write ugly Python code.
There are many Python for X programmers
tutorials online. The following is not a tutorial, but rather a list of facts. These are not exhaustive and are roughly in order of how important I personally think they are to C++ programmers moving to Python.
Learning Python Properly
If you’re interested in truly mastering Python, I strongly recommend following the official Python tutorial. By following
I mean reading it front to back, running each example yourself and experimenting with changes to it.
Official tutorials are usually a good way to learn a language, not just because they tend to be complete but also because they tend to be written in a way that reflects how the language designers think about the language.
All values in Python are stored as a pair (typecode, other). Typecode
is an enumeration value telling us what type the other
is: an integer, floating-point number, array, array list, and so on. For a few built-in types (numbers and the special values True
, False,
None
, and ...
) the value is stored directly in the other
spot; for all the rest other
stores a pointer to dynamically-allocated memory storing the value (e.g. what Java and C++ do with the new
operator).
Dynamic typing means you can re-use a variable for many different types, which sometimes creates useful flexibility and sometimes results in buggy code going undetected. Because of the undetected bugs, Python has released a series of optional type annotation syntaxes and toolchains that can be used to detect some typing bugs, but as of 2024 these are still uncommon in Python code I see in the wild.
Python does not have a special variable declaration syntax: declaring a variable is the same as using the variable. A statement like x = 3
either creates a variable named x
and gives it the value 3
, or it replaces the value previously stored in x
with 3
. This makes code succinct, but it also means that typos in variable names are hard to detect: if code uses a variable named example
everywhere except one place where it uses a variable named examle
instead Python will happily assume this was an intentional decisions to use two different variable names.
Both Python’s syntactic succinctness and the risk of typos help reinforce a culture of python programmers using very short variable names. The difficulty of reading and understanding code with low-meaning variable names encourages some programmers to disagree with that culture.
This same rule applies to members of objects too. thing.whatsit = 3
either changes the value if the whatsit
member field in the object thing
or creates such a field and gives it the value 3
. As a corollary to this, two objects created from the same class may have a different set of member fields.
It is likely that every language you’ve used other than Python uses lexical scoping: that is, a variable is accessible from its point of declaration until the end of the scope in which it is declared, and that scope is indicated by curly braces, files, indentation, or some other lexical property of the language.
Python doesn’t have that kind of scoping. Instead, variable scopes belong to modules and functions. If a variable is assigned a value outside of a function, it is placed in the module scope; if it is assigned a value inside a function or is a parameter of that function, it is placed in the function’s scope; and if it is assigned a value in both places there are two different variables with the same name, one in each scope. Classes also have a scope, though that’s different enough that we’ll ignore it here and come back to it later. And that’s it: there are no loop scopes, if-statement scopes, or the like.
One module is created per .py
file; one function is created per def
and lambda
keyword.
Because modules can import other modules and functions can invoke other functions, there may be many module and function scopes active at a single time, but only three are available to any given code statement: the local
scope, which is the scope of the function or module that contains the code; the global
scope, which is the scope of the module that contains the code; and the builtins
scope, which is a module of several important non-keyword language features. Variable assignments use the local scope; variable accesses try the local scope first and if the variable is not found there they fall back on the global scope, then the builtins scope.
There are various ways to adjust scopes:
The global
keyword can pick specific variable names that a function should look for and/or create in the global scope instead of the local scope.
The import
keyword assigns a module to a name; variables in the module’s scope can be modified as members of that name.
Each scope is actually a dict
type, and the current scopes can be retrieved using the locals()
and globals()
functions in the builtins scope and the __builtins__
variable in the global scope. These dict
s can be manipulated like any other dict
, including passing them between functions, modifying their contents, and so on.
Note that the scope adjustments are part of a common theme. Python generally has the simple path where you just write code; the advanced path where something simple-looking like global x
changes what other code means; and the under-the-hood path where you mess about with the dict
s that underlie everything.
In many languages, the brief-and-easy way to write code using library functions and short expressions is less performant than carefully-written detailed code. In general, the inverse is true in Python.
Python (at least in its most popular implementation) is implemented in C, and many of its best known libraries are as well. The parts that are written in C generally run much more quickly than other parts, in some tests by as much as 100-fold. If you can write a single Python statement that defers a loop over a large number of inputs to a library, that will tend to be much faster to execute than writing out loops in Python itself.
This leads to a coding style in Python that favors brevity and cunning tricks over verbosity and explicit operations. In general, well-written Python code will look quite different from well-written code in C-like languages.
Values of any immutable type can never be changed. We can derive new values from them, but never change anything about existing values.
Common immutable types in Python include bool
, float
, int
, str
, bytes
, and tuple
; the types of the special values None
and ...
; and the types of things you might not be used to thinking about as values like functions and classes.
Values of any mutable type can have details about them change, and if one part of the code changes something about one such value all other parts of the code that know about that value see the change too.
Common mutable types in Python include list
, dict
, and set
; and most user- and library-defined classes.
Immutable values behave like values, like Platonic ideals, like ideas. They have no identity
beyond their value: if you and I are both thinking about 7 it’s not meaningful to ask but are we thinking about the same 7?
They can be used to create new values, but cannot be changed: 3 + 7
creates 10 without modifying either 3 or 7, and nothing foo(7)
can do will change what 7 means. Update operators change which value a variable or spot in a structure is referring to, not change the values themselves: if after age = 18
we run age += 1
, 18
hasn’t changed; rather, age
now stores a different number (19
) instead.
Mutable values behave like entities, like specific objects, like real things. They have identity that goes beyond their properties and appearance: if you and I are both planning to purchase a replica Model-T Ford Motorcar, it is worth asking are we competing on the same one or looking at different ones?
Changes made to one entity are visible to everyone who knows about it: if I dress up one of my assistants in a funny costume, your tutor in gets that funny costume too because they’re the same person. Update operators change the value itself, not which value the operator is referring to: after a = [1, 2]
and b = a
the two variables refer to the same list; thus a += [3]
will change that single list (referred to by both a
and b
) to [1, 2, 3]
.
You can check to see if two variables refer to the same mutable value using the is
operator, as in if x is y:
. This is distinct from the equality operator ==
which checks to see if two values are equal
as defined by their types.
A sequence of n values can be assigned to n variables by putting the variables in a syntactic sequence; for example (x,y) = [2,4]
or [a,b,c] = range(3)
.
Tuples can be written without parentheses; thus x,y = y,x
means (x, y) = (y, x)
and swaps the values stored in variables x
and y
.
Returning tuples from functions and unwrapping the results is a common way of implementing the appearance of functions with multiple return values. An example builtin function using this pattern is divmod
: running d,m = divmod(345,10)
puts 34
in d
and 5
in m
.
Unwrapping also works for implicit assignments such as to the parameters of a function or the variable of a for
loop. These are often used to iterate over pairs of matching items. For example, a dict
’s .items()
method returns a sequence of (key, value) pairs, meaning for k,v in d.items():
iterates through all the pairs in dict
d
.
Unwrapping allows non-variable assignable expressions, such as member access and list and dict indexing; and it works by first evaluating the full right-hand side, then assigning to the elements on the left-hand side in a left-to-right order. Thus you can write (w, w[0], w[0]['yes']) = ([-1,-2,-3], {'no':0}, 1)
and end up with w
being [{'no': 0, 'yes': 1}, -2, -3]
. Taking advantage of anything in this paragraph is often seen as confusing and is quite rare in practice.
In Python, a string of bytes (such as C’s char *
) has type bytes
. The type str
stores a string of Unicode characters instead. Conversion between these two must be explicit, using the decode
method of bytes
and the encode
method of str
.
When you write literal strings in your source code, they are automatically decode
d using the source files’ charset. If you put a b
in front of the opening quote, like b"example"
, this decoding will be skipped and the bytes used to sore that string in the source file will become a bytes
object.
Strings may be delimited using either single or double quotes: 'same' == "same"
. Both allow the same set of backslash escapes, such as \n
and \\
, and neither may contain a newline.
Strings may also be delimited by three consecutive single or double quotes: '''same''' == """same"""
. These allow newlines inside the literal string, as well as backslash escapes, and are sometimes called docstrings
because of one of their common uses.
Any string may be preceded by an r
to prevent backslashes from escaping things: r'\n' == '\\n'
.
If a function definition, class definition, or Python file opens with a string literal that is not part of any statement or expression, then that string is saved in the special __doc__
attribute of the function, class, or module and is called a docstring
. This is the appropriate place to document your code and is used in various tooling, including python’s built-in help()
function, the command-line pydoc
application, and many IDEs’ hints and documentation browsers.
Python’s dict
type is a hash map; it’s keys must be hashable, meaning immutable types.
Python’s set
type is a hash map with only keys, no values. Python also has a frozenset
type which is an immutable, hashable version of a set
.
In Python, int
isn’t a keyword, it’s a variable name in the __builtins__
scope. The value of that variable is a type object describing integers. You can replace its meaning with something like int = "nonsense"
and thereafter no longer have access to the int
type under that name (though you could still get it by doing type(3)
).
This is true of every type and function: they’re just values, stored in variables. abs = 4
isn’t an error, but after doing abs = 4
you can’t do abs(x)
anymore.
The def
keyword does two things: it defines a new function value and it assigns that function to a variable. There’s also a keyword lambda
that defines a function value without giving it a name. The class
keyword likewise both defines a new type value and assigns that type to a variable.
int
is a type. int(3.5)
converts or casts
the value 3.5
to a value of that type.
Some built-in functions
are also actually types. Two notable examples assist with creating low-overhead collections. range
is a type that represents a sequence of integers by generating the integers in the sequence only as they are needed. zip
is a type that represents a sequence of pairs of values taken from two other sequences by checking those two other sequences and generating the pairs only as they are needed. But while these are both commonly used, they are often referred to as functions, not types, because that more naturally matches how most programmers use them.
int
s have no upper boundIf you ask python to compute by running 1234 ** 4567
it will quite happily generate a 14,119-digit number (but likely refuse to show it to you because it’s too big to safely convert to a string to display). If you try to convert it to a floating-point number with float(1234 ** 4567)
or (1234 ** 4567) + 0.0
you’ll get an OverflowError: int too large to convert to float
because the largest float only has 308 digits, not 14,119.
Be aware: just because int
allows you to have gargantuan numbers doesn’t mean doing so is a good idea. Most other languages, and some popular Python libraries like numpy
, have trouble with big integers, so having them in your code might break some operations. Also, big integers are much less efficient than small integers, and most well-known algorithms that appear to use huge integers (like some forms of encryption) actually use small integers that mirror certain properties of big integers instead.
complex
numbersPython’s float
is a standard IEEE-764 64-bit floating-point value, just like the double
value in C, C++, and Java. It has all the gotchas of this format, such as for example 0.1 + 0.1 + 0.1 == 0.3
giving False
(not True
) and 1e308 * -2
giving .
Python has a built-in exponentiation operator, **
, and this works with negative bases and fractional exponents because Python has a built-in complex
type. A complex
number is stored as a pair of float
s, one for the real and one for the imaginary part of the number. Imaginary numbers have a literal syntax with a trailing j
; thus (1j)**2
means and yields (-1+0j)
, the complex-number form of .
Double-slash division always creates an int
result; thus 3.40 // 1.36
gives 2
not 2.5
. It always rounds towards negative infinity; thus 5 // 3 == 1
but -5 // 3 == -2
.
Single-slash division always creates a float
(or complex
) result; thus 4 / 2
gives 2.0
not 2
.
Percent-sign modulus gives a float
if either operand is a float
, matches the sign of the right-hand operand, and for positive right-hand operands obeys the rule x % n == x - (x // n) * n
.
Every function invocation returns a value. If it doesn’t have a meaningful value to return, it returns the special value None
. return
on a line by itself and return None
are fully equivalent statements.
When invoking a function, arguments may be passed either as positional arguments, like f(3, x)
; or as keyword arguments, like f(a=3, y=x)
. In general an invocation may mix any number of either, but all positional arguments must come before any keyword argument.
When defining a function, you can have
Required parameters, given with a name and no default value like def f(x,y):
.
Optional parameters, given with a name and a default value, like def f(x=3,y=4):
.
If both required and optional parameters are part of the same function definition, the required must precede the optional in the parameters list, like def f(x, y=4):
.
Positional-only parameters, which precede an special marker /
in the arguments list, like def f(x,y,/):
which can be called as f(1,2)
but not as f(y=2,x=1)
.
Keyword-only parameters, which follow either a special marker *
in the parameters list or the variadic positional parameter, like def f(*,x,y):
which can be called as f(y=2,x=1)
but not as f(1,2)
.
Parameters that can be provided either by position or by name, which neither precede a /
nor follow a *
, like x
and y
in def f(x,y=2):
and def f(w,/,x,y=2,*,z=3):
.
Variadic positional parameters, indicated by a variable name with a *
in front of it, like def f(*args):
. That variable will be given a tuple of any positional arguments not assigned to other parameters.
A function cannot have both a keyword-only parameters and variadic positional parameters.
Variadic keyword parameters, indicated by a variable name with a **
in front of it, like def f(**kwargs):
. That variable will be given a dict of any keyword arguments not assigned to other parameters.
This may seem needlessly complicated, but each option was added to address a specific need of some programmers.
Consider the function
def g(a, /, b, c=30, *d, e=50, **f):
print(a,b,c,d,e,f)
g()
is an error; in particular, a TypeError for missing a required positional argument.g(1)
is an error; in particular, a TypeError for missing a required positional argument.g(1,2)
prints 1 2 30 () 50 {}
g(1,2,3)
prints 1 2 3 () 50 {}
g(1,2,3,4)
prints 1 2 3 (4,) 50 {}
g(1,2,3,4,5)
prints 1 2 3 (4,5) 50 {}
g(1,2,3,4,5,6)
prints 1 2 3 (4,5,6) 50 {}
g(a=1)
is an error, in particular a TypeError for missing a required positional argument.g(a=1,b=2)
is an error, in particular a TypeError for missing a required positional argument.g(1,b=2)
prints 1 2 30 () 50 {}
g(1,b=2,c=3)
prints 1 2 3 () 50 {}
g(1,c=3,b=2)
prints 1 2 3 () 50 {}
g(1,2,c=3)
prints 1 2 3 () 50 {}
g(1,2,d=4)
prints 1 2 3 () 50 {'d': 4}
g(1,2,b=3)
is an error, in particular a TypeError for providing mltiple values for argument b
(both positional and keyword)g(1,2,3,4,d=5)
prints 1 2 3 (4,) 50 {'d': 5}
g(1,2,3,4,5,6,d=7,e=8,f=9)
prints 1 2 3 (4,5,6) 8 {'d': 7, 'f': 9}
Python classes are declared with the class
keyword, much like Java or C++. They create types that can be used to create objects. They can have fields and methods and superclasses. They can overload any operator, including .
(and also their behavior when the argument of some built-in functions like str
, repr
, and len
) so object.x
defaults to accessing field x
but might do anything if .
was overloaded. Python classes can be implemented in C or Python and have several variants with different rules about assigning to missing fields and so on.
this
keywordIn many languages, class methods have access to the object they are called on using the this
keyword. In Python, the object is instead passed in as the first argument of the method and can have any name you chose to give it, just like any other argument. It is traditional to name it self
, but that’s just a tradition.
This also applies to the constructor. A constructor is just a method named __init__
, and gets the object as its first argument like any other method.
A class defined as
class A:
def __init__(self, b):
self.c = b
can be created by x = A(3)
, which will make a new A
-type object and call its .__init__(3)
method, passing in the object itself as an additional first argument.
In general, method names with two underscores in the beginning and end of the name are used by Python to interact with language-level, including operator overloading and handling how built-in functions like str
and len
treat objects.
Python has a set-builder-notation-inspired syntax for looping over a collection, performing some work on each value, and making a new collection of the results. Called comprehensions
, this family of related syntaxes has several forms and has expanded its functionality several times over the life of the Python language. I’ve never seen a comparable syntax in another programming language.
The general form of this syntax is as follows:
[
to make a list, {
to make a set or dict, (
to make a generatorfor
in
if
Mapping a list of values:
= [1,2,3,4,5]
x = [2*e for e in x]
y # y == [2, 4, 6, 8, 10]
Filtering a list of values:
= [1,2,3,4,5]
x = [e for e in x if (e%2) == 0]
y # y == [2, 4]
A dict from a list:
= [1,2,3,4]
x = {e:e**2 for e in x}
y # y == {1:1, 2:4, 3:9, 4:16}
Reversing a dict:
= {1:2, 3:4, 5:6, 7:2}
x = {v:k for k,v in x.items()}
y # y == {2:7, 4:3, 6:5}
Note that a dict can only have each key once, but this example had a repeated key which Python resolved by keeping the last value for that key.
A generator is an object that can generate the values of a collection on demand instead of all at once. Any object that implements the .__next__()
method and raises a StopIteration
exception when all values are done functions like a generator, but the dedicated generator
type is created by two kinds of code:
yield
statementGenerators are used in many places in Python’s library to gain some of the benefits of lazy evaluation.
Python usually operates like an imperative programming language, but it has several components common in functional programming languages too. Notably, it includes closures: functions with attached copies of all the variables the function references.
Explaining the value of closures is beyond the scope of this document, but it’s worth knowing that they exist and that a function might have state, almost like an object.
def a():
= []
b def c():
len(b))
b.append(return tuple(b)
return c
= a()
x = a()
y
x()
x()
y()print(x()) # prints (0, 1, 2)
print(y()) # prints (0, 1)
Functions can be defined with async def
instead of def
; if so defined, they return coroutines instead of regular values. Coroutines are somewhat like promises, tickets that promise that eventually the result will be available. The await
keyword turns a promise into a value, but does so by stopping execution until a value is available which is something only async
functions are allowed to do.
Writing code that effectively uses async
functions requires some async-specific design and often involves a driver like asyncio.run
.