This is an archived copy of a previous semester's site.

Please see the current semester's site.

Python
Facts about Python that surprise C++ developers

It is common for CS programs like ours (ones that teach you C++ and Java before Python) to introduce Python with a few minutes of hand-wavy examples and then let you figure it out from there. And this works, in the sense that you’ll write code that runs. But it also tends to make programmers who write ugly Python code.

There are many Python for X programmers tutorials online. The following is not a tutorial, but rather a list of facts. These are not exhaustive and are roughly in order of how important I personally think they are to C++ programmers moving to Python.

Learning Python Properly

If you’re interested in truly mastering Python, I strongly recommend following the official Python tutorial. By following I mean reading it front to back, running each example yourself and experimenting with changes to it.

Official tutorials are usually a good way to learn a language, not just because they tend to be complete but also because they tend to be written in a way that reflects how the language designers think about the language.

1 Dynamically typed

All values in Python are stored as a pair (typecode, other). Typecode is an enumeration value telling us what type the other is: an integer, floating-point number, array, array list, and so on. For a few built-in types (numbers and the special values True, False, None, and ...) the value is stored directly in the other spot; for all the rest other stores a pointer to dynamically-allocated memory storing the value (e.g. what Java and C++ do with the new operator).

Dynamic typing means you can re-use a variable for many different types, which sometimes creates useful flexibility and sometimes results in buggy code going undetected. Because of the undetected bugs, Python has released a series of optional type annotation syntaxes and toolchains that can be used to detect some typing bugs, but as of 2024 these are still uncommon in Python code I see in the wild.

2 Implicit variable declaration

Python does not have a special variable declaration syntax: declaring a variable is the same as using the variable. A statement like x = 3 either creates a variable named x and gives it the value 3, or it replaces the value previously stored in x with 3. This makes code succinct, but it also means that typos in variable names are hard to detect: if code uses a variable named example everywhere except one place where it uses a variable named examle instead Python will happily assume this was an intentional decisions to use two different variable names.

Both Python’s syntactic succinctness and the risk of typos help reinforce a culture of python programmers using very short variable names. The difficulty of reading and understanding code with low-meaning variable names encourages some programmers to disagree with that culture.

This same rule applies to members of objects too. thing.whatsit = 3 either changes the value if the whatsit member field in the object thing or creates such a field and gives it the value 3. As a corollary to this, two objects created from the same class may have a different set of member fields.

3 Non-lexical scoping

It is likely that every language you’ve used other than Python uses lexical scoping: that is, a variable is accessible from its point of declaration until the end of the scope in which it is declared, and that scope is indicated by curly braces, files, indentation, or some other lexical property of the language.

Python doesn’t have that kind of scoping. Instead, variable scopes belong to modules and functions. If a variable is assigned a value outside of a function, it is placed in the module scope; if it is assigned a value inside a function or is a parameter of that function, it is placed in the function’s scope; and if it is assigned a value in both places there are two different variables with the same name, one in each scope. Classes also have a scope, though that’s different enough that we’ll ignore it here and come back to it later. And that’s it: there are no loop scopes, if-statement scopes, or the like.

One module is created per .py file; one function is created per def and lambda keyword.

Because modules can import other modules and functions can invoke other functions, there may be many module and function scopes active at a single time, but only three are available to any given code statement: the local scope, which is the scope of the function or module that contains the code; the global scope, which is the scope of the module that contains the code; and the builtins scope, which is a module of several important non-keyword language features. Variable assignments use the local scope; variable accesses try the local scope first and if the variable is not found there they fall back on the global scope, then the builtins scope.

There are various ways to adjust scopes:

Note that the scope adjustments are part of a common theme. Python generally has the simple path where you just write code; the advanced path where something simple-looking like global x changes what other code means; and the under-the-hood path where you mess about with the dicts that underlie everything.

4 Performance favors library code

In many languages, the brief-and-easy way to write code using library functions and short expressions is less performant than carefully-written detailed code. In general, the inverse is true in Python.

Python (at least in its most popular implementation) is implemented in C, and many of its best known libraries are as well. The parts that are written in C generally run much more quickly than other parts, in some tests by as much as 100-fold. If you can write a single Python statement that defers a loop over a large number of inputs to a library, that will tend to be much faster to execute than writing out loops in Python itself.

This leads to a coding style in Python that favors brevity and cunning tricks over verbosity and explicit operations. In general, well-written Python code will look quite different from well-written code in C-like languages.

5 Both immutable and mutable types

Values of any immutable type can never be changed. We can derive new values from them, but never change anything about existing values.

Common immutable types in Python include bool, float, int, str, bytes, and tuple; the types of the special values None and ...; and the types of things you might not be used to thinking about as values like functions and classes.

Values of any mutable type can have details about them change, and if one part of the code changes something about one such value all other parts of the code that know about that value see the change too.

Common mutable types in Python include list, dict, and set; and most user- and library-defined classes.

Immutable values behave like values, like Platonic ideals, like ideas. They have no identity beyond their value: if you and I are both thinking about 7 it’s not meaningful to ask but are we thinking about the same 7? They can be used to create new values, but cannot be changed: 3 + 7 creates 10 without modifying either 3 or 7, and nothing foo(7) can do will change what 7 means. Update operators change which value a variable or spot in a structure is referring to, not change the values themselves: if after age = 18 we run age += 1, 18 hasn’t changed; rather, age now stores a different number (19) instead.

Mutable values behave like entities, like specific objects, like real things. They have identity that goes beyond their properties and appearance: if you and I are both planning to purchase a replica Model-T Ford Motorcar, it is worth asking are we competing on the same one or looking at different ones? Changes made to one entity are visible to everyone who knows about it: if I dress up one of my assistants in a funny costume, your tutor in gets that funny costume too because they’re the same person. Update operators change the value itself, not which value the operator is referring to: after a = [1, 2] and b = a the two variables refer to the same list; thus a += [3] will change that single list (referred to by both a and b) to [1, 2, 3].

You can check to see if two variables refer to the same mutable value using the is operator, as in if x is y:. This is distinct from the equality operator == which checks to see if two values are equal as defined by their types.

6 Sequences can be unwrapped

A sequence of n values can be assigned to n variables by putting the variables in a syntactic sequence; for example (x,y) = [2,4] or [a,b,c] = range(3).

Tuples can be written without parentheses; thus x,y = y,x means (x, y) = (y, x) and swaps the values stored in variables x and y.

Returning tuples from functions and unwrapping the results is a common way of implementing the appearance of functions with multiple return values. An example builtin function using this pattern is divmod: running d,m = divmod(345,10) puts 34 in d and 5 in m.

Unwrapping also works for implicit assignments such as to the parameters of a function or the variable of a for loop. These are often used to iterate over pairs of matching items. For example, a dict’s .items() method returns a sequence of (key, value) pairs, meaning for k,v in d.items(): iterates through all the pairs in dict d.

Unwrapping allows non-variable assignable expressions, such as member access and list and dict indexing; and it works by first evaluating the full right-hand side, then assigning to the elements on the left-hand side in a left-to-right order. Thus you can write (w, w[0], w[0]['yes']) = ([-1,-2,-3], {'no':0}, 1) and end up with w being [{'no': 0, 'yes': 1}, -2, -3]. Taking advantage of anything in this paragraph is often seen as confusing and is quite rare in practice.

7 Strings of bytes or characters

In Python, a string of bytes (such as C’s char *) has type bytes. The type str stores a string of Unicode characters instead. Conversion between these two must be explicit, using the decode method of bytes and the encode method of str.

When you write literal strings in your source code, they are automatically decoded using the source files’ charset. If you put a b in front of the opening quote, like b"example", this decoding will be skipped and the bytes used to sore that string in the source file will become a bytes object.

8 Multiple string literals

Strings may be delimited using either single or double quotes: 'same' == "same". Both allow the same set of backslash escapes, such as \n and \\, and neither may contain a newline.

Strings may also be delimited by three consecutive single or double quotes: '''same''' == """same""". These allow newlines inside the literal string, as well as backslash escapes, and are sometimes called docstrings because of one of their common uses.

Any string may be preceded by an r to prevent backslashes from escaping things: r'\n' == '\\n'.

9 Strings for documentation

If a function definition, class definition, or Python file opens with a string literal that is not part of any statement or expression, then that string is saved in the special __doc__ attribute of the function, class, or module and is called a docstring. This is the appropriate place to document your code and is used in various tooling, including python’s built-in help() function, the command-line pydoc application, and many IDEs’ hints and documentation browsers.

10 Built-in hashes

Python’s dict type is a hash map; it’s keys must be hashable, meaning immutable types.

Python’s set type is a hash map with only keys, no values. Python also has a frozenset type which is an immutable, hashable version of a set.

11 Functions and types are values

In Python, int isn’t a keyword, it’s a variable name in the __builtins__ scope. The value of that variable is a type object describing integers. You can replace its meaning with something like int = "nonsense" and thereafter no longer have access to the int type under that name (though you could still get it by doing type(3)).

This is true of every type and function: they’re just values, stored in variables. abs = 4 isn’t an error, but after doing abs = 4 you can’t do abs(x) anymore.

The def keyword does two things: it defines a new function value and it assigns that function to a variable. There’s also a keyword lambda that defines a function value without giving it a name. The class keyword likewise both defines a new type value and assigns that type to a variable.

12 Types act like functions

int is a type. int(3.5) converts or casts the value 3.5 to a value of that type.

Some built-in functions are also actually types. Two notable examples assist with creating low-overhead collections. range is a type that represents a sequence of integers by generating the integers in the sequence only as they are needed. zip is a type that represents a sequence of pairs of values taken from two other sequences by checking those two other sequences and generating the pairs only as they are needed. But while these are both commonly used, they are often referred to as functions, not types, because that more naturally matches how most programmers use them.

13 ints have no upper bound

If you ask python to compute 123445671234^{4567} by running 1234 ** 4567 it will quite happily generate a 14,119-digit number (but likely refuse to show it to you because it’s too big to safely convert to a string to display). If you try to convert it to a floating-point number with float(1234 ** 4567) or (1234 ** 4567) + 0.0 you’ll get an OverflowError: int too large to convert to float because the largest float only has 308 digits, not 14,119.

Be aware: just because int allows you to have gargantuan numbers doesn’t mean doing so is a good idea. Most other languages, and some popular Python libraries like numpy, have trouble with big integers, so having them in your code might break some operations. Also, big integers are much less efficient than small integers, and most well-known algorithms that appear to use huge integers (like some forms of encryption) actually use small integers that mirror certain properties of big integers instead.

14 Built-in complex numbers

Python’s float is a standard IEEE-764 64-bit floating-point value, just like the double value in C, C++, and Java. It has all the gotchas of this format, such as for example 0.1 + 0.1 + 0.1 == 0.3 giving False (not True) and 1e308 * -2 giving -\infty.

Python has a built-in exponentiation operator, **, and this works with negative bases and fractional exponents because Python has a built-in complex type. A complex number is stored as a pair of floats, one for the real and one for the imaginary part of the number. Imaginary numbers have a literal syntax with a trailing j; thus (1j)**2 means (1)2\big(\sqrt{-1}\big)^2 and yields (-1+0j), the complex-number form of 1-1.

15 Two division operators

Double-slash division always creates an int result; thus 3.40 // 1.36 gives 2 not 2.5. It always rounds towards negative infinity; thus 5 // 3 == 1 but -5 // 3 == -2.

Single-slash division always creates a float (or complex) result; thus 4 / 2 gives 2.0 not 2.

Percent-sign modulus gives a float if either operand is a float, matches the sign of the right-hand operand, and for positive right-hand operands obeys the rule x % n == x - (x // n) * n.

16 Every function returns a value

Every function invocation returns a value. If it doesn’t have a meaningful value to return, it returns the special value None. return on a line by itself and return None are fully equivalent statements.

17 Many styles of arguments

When invoking a function, arguments may be passed either as positional arguments, like f(3, x); or as keyword arguments, like f(a=3, y=x). In general an invocation may mix any number of either, but all positional arguments must come before any keyword argument.

When defining a function, you can have

This may seem needlessly complicated, but each option was added to address a specific need of some programmers.

Consider the function

def g(a, /, b, c=30, *d, e=50, **f):
    print(a,b,c,d,e,f)

18 Flexible classes and objects

Python classes are declared with the class keyword, much like Java or C++. They create types that can be used to create objects. They can have fields and methods and superclasses. They can overload any operator, including . (and also their behavior when the argument of some built-in functions like str, repr, and len) so object.x defaults to accessing field x but might do anything if . was overloaded. Python classes can be implemented in C or Python and have several variants with different rules about assigning to missing fields and so on.

19 No this keyword

In many languages, class methods have access to the object they are called on using the this keyword. In Python, the object is instead passed in as the first argument of the method and can have any name you chose to give it, just like any other argument. It is traditional to name it self, but that’s just a tradition.

This also applies to the constructor. A constructor is just a method named __init__, and gets the object as its first argument like any other method.

A class defined as

class A:
    def __init__(self, b):
        self.c = b

can be created by x = A(3), which will make a new A-type object and call its .__init__(3) method, passing in the object itself as an additional first argument.

In general, method names with two underscores in the beginning and end of the name are used by Python to interact with language-level, including operator overloading and handling how built-in functions like str and len treat objects.

20 Comprehensions

Python has a set-builder-notation-inspired syntax for looping over a collection, performing some work on each value, and making a new collection of the results. Called comprehensions, this family of related syntaxes has several forms and has expanded its functionality several times over the life of the Python language. I’ve never seen a comparable syntax in another programming language.

The general form of this syntax is as follows:

  1. an opening container delimiter: [ to make a list, { to make a set or dict, ( to make a generator
  2. an expression generating a value; for a dict, this should be two expressions separated by a colon
  3. the keyword for
  4. a variable or other valid left-hand-side of an assignment
  5. the keyword in
  6. a collection
  7. Optionally, a filter of the following form
    1. the keyword if
    2. a Boolean expression
  8. the matching closing container delimiter

Mapping a list of values:

x = [1,2,3,4,5]
y = [2*e for e in x]
# y == [2, 4, 6, 8, 10]

Filtering a list of values:

x = [1,2,3,4,5]
y = [e for e in x if (e%2) == 0]
# y == [2, 4]

A dict from a list:

x = [1,2,3,4]
y = {e:e**2 for e in x}
# y == {1:1, 2:4, 3:9, 4:16}

Reversing a dict:

x = {1:2, 3:4, 5:6, 7:2}
y = {v:k for k,v in x.items()}
# y == {2:7, 4:3, 6:5}

Note that a dict can only have each key once, but this example had a repeated key which Python resolved by keeping the last value for that key.

21 Generators

A generator is an object that can generate the values of a collection on demand instead of all at once. Any object that implements the .__next__() method and raises a StopIteration exception when all values are done functions like a generator, but the dedicated generator type is created by two kinds of code:

Generators are used in many places in Python’s library to gain some of the benefits of lazy evaluation.

22 Closures

Python usually operates like an imperative programming language, but it has several components common in functional programming languages too. Notably, it includes closures: functions with attached copies of all the variables the function references.

Explaining the value of closures is beyond the scope of this document, but it’s worth knowing that they exist and that a function might have state, almost like an object.

def a():
  b = []
  def c():
    b.append(len(b))
    return tuple(b)
  return c

x = a()
y = a()
x()
x()
y()
print(x()) # prints (0, 1, 2)
print(y()) # prints (0, 1)

23 Async functions

Functions can be defined with async def instead of def; if so defined, they return coroutines instead of regular values. Coroutines are somewhat like promises, tickets that promise that eventually the result will be available. The await keyword turns a promise into a value, but does so by stopping execution until a value is available which is something only async functions are allowed to do.

Writing code that effectively uses async functions requires some async-specific design and often involves a driver like asyncio.run.