C

This page outlines some of the key elements of the C language. You might also be interested in C without the ++ which introduces how C differs from C++.

1 How C became ubiquitous

In 1972, Dennis Ritchie introduced the C programming language. He wrote it as part of his work on the Unix operating system, specifically to support that work. Broadly speaking, his goal was to make a language that made common patterns in assembly programming easier to read and write without doing anything complicated. Each C construct can be converted to a few simple assembly instructions, and thus into a few simple machine instructions.

C has become one of the most popular programming languages ever designed. Language popularity depends on many factors, but a few advantages of C that have helped it keep this position include:

2 Standard defined, implementation defined, and undefined

The C standard fully defines some aspects of the language and the compiler ecosystem.

(x[3]) is defined in the standard to mean the same thing as (*(x+3)). This will hold on all computers across all C compilers.

The C standard requires that the compiler be named cc. Most implementations also give it another name, like gcc or clang or icc, but the standard requires it to be available under the name cc.

The C standard also leaves some aspects of the language to be implementation defined. They must be consistent across a single compiler compiling to a single target hardware, but may differ between compilers and targets.

The number of bytes in an int is implementation defined. Most modern compilers and targets set it to be 4 bytes, but the standard also allows it to be as few as 2 bytes. However, its size must be fully defined by a compiler; the compiler cannot chose to make int 2 bytes in one place and 4 bytes in another: it picks a number of bytes and sticks to that choice.

The signedness of char is implementation defined. On one computer char might mean the same as signed char, meaning ((char)-20) < 0 is true; on another it might mean the same as unsigned char, meaning ((char)-20) > 0 is true instead.

The C standard further leaves some behaviors to be undefined. This means that the compiler can do anything it wants when those situations arise: crash, pick something at random, pick the fastest of several options, and so on. Some undefined behaviors have defined constraints on what they can do (such as never crashing), while others do not. Undefined behaviors create flexibility for optimizing compilers and for porting the language to new hardware, but they also can lead to a variety of hard-to-track-down bugs.

The value of uninitialized variables is undefined by the C standard. Thus, the following code

int x;
printf("%d\n", x);

could print literally any legal int value. With one compiler on one computer it might always print 0; on another it might always print 340340340; on a third it might print different values each time it is run; and so on. But it will never crash or print a non-integer: the value in the variable is undefined, but it is a value and is an int.

The values stored in heap memory pointed to by the return value of malloc is undefined, but the values stored in heap memory pointed to by the return value of calloc is defined. Whether there is any value in heap addresses outside of memory allocated by malloc or calloc is undefined, and if there are values what they are is also undefined.

Thus, the following code

int *x = malloc(2 * sizeof(int));
int *y = calloc(2, sizeof(int));
printf("A: [%d, %d]\n", x[0], x[1]);
printf("B: [%d, %d]\n", y[0], y[1]);
printf("C: [%d, %d]\n", x[2], y[2]);

could print

  1. A: [2134, 928273640] or any other pair of legal int values.
  2. B: [0, 0] – calloc initialized its memory, so this is the only option for the B: line.
  3. C: [-234561341, 12131145] or any other pair of legal int values; or, this command might crash while trying to access unallocated memory.

3 Always pass-by-value

C only has one way of assigning variables or passing values into functions: by value. That means copying the bytes that make up the value; what some other languages call a shallow copy.

void foo(int a[3]) { // a gets a copy of the bytes stored in the array
    a[2] = 1; // this changes only the local copy
}

int main(int argc, char *argv[]) {
    int x[3] = {3,4,0};
    foo(x);
    printf("%d%d%d\n", x[0], x[1], x[2]); // prints 340 not 341
}

This also applies to pointers. A pointer is a value, represented by bytes. Those bytes happen to be the address of a different value, but for the purposes of assignment and function invocation they are treated like any other bytes. However, C will automatically convert any array into a pointer to its first element if the variable or argument has that type.

void foo(int *a) { // a gets a copy of a pointer, not of the pointed-to memory
    a += 1; // this updates the local pointer to point 1 int later in memory
    a[1] = 1; // this follows the pointer to change a number it points to
}

int main(int argc, char *argv[]) {
    int x[3] = {3,4,0};
    foo(x); // automatically converted to a pointer, as if we wrote foo(&(x[0]));
    printf("%d%d%d\n", x[0], x[1], x[2]); // prints 341, not 340 nor 310 nor 410
}

4 No string class

In C, string literals are arrays of char values; if assigned to a pointer, they are stored in global read-only memory before the pointer is created. String literals have a special char null '\0' added to the end, helping mark the end of the array. All functions in the C standard library that work with strings actually work with any pointer to a null-terminated array of chars.

The following code creates three arrays with the same contents:

const char *s1 = "CS@UIUC";
char s2[] = "CS@UIUC"; 
char s3[8] = {'C', 'S', '@', 'U', 'I', 'U', 'C', '\0'};

The contents are as follows:

C S @ U I U C \0 0 1 2 3 4 5 6 7
An 8-element 0-indexed array, containing {'C', 'S', '@', 'U', 'I', 'U', 'C', '\0'}

The array created for s1 will be stored in read-only global memory, with a pointer to that memory stored in s1. The arrays created for s2 and s3 will be stored in the stack if they are local variables or in read/write global memory if they are global variables.

The standard library header file string.h contains a wide set of convenience functions to use when working with strings, as well as some to use when working with other arrays. For example,

Because C does not manage memory for you, some common string operations are more complicated than you might expect. For example strcat concatenates strings, but requires that you already have allocated a single array big enough to hold the concatenated string before you call it; if you haven’t it will try to do the work anyway even though there’s not enough memory, resulting in undefined behavior1 This particular not-enough-memory undefined behavior is one source of a buffer overflow attack, a common source of security vulnerability that can result in complete loss of system control to an attacker..

The most common way to handle strings is to use pointers to them. If the value pointed to was initialized by a string literal, the array is in read-only memory and cannot be modified, making functions like strsep not work correctly (generally crashing at runtime); but if it was initialized in another way then it will be in read/write memory and strsep and the like will work.

The operations in string.h are all just convenience functions; you could implement your own set of functions with different behaviors if you wished. Because they are in the standard library and align with how string literals work, they are commonly used, but you will sometimes see code that uses its own, different string structures.

5 File descriptor and FILE *

The C standard library has two main ways of working with files.

5.1 File Descriptors

File descriptors are a thin wrapper around how the operating system manages files. We describe file descriptors more in the kernel mode page. File descriptors are used by the library functions open, read, write, and close, defined in fcntl.h and unistd.h.

5.2 File pointers

The C library provides a number of buffered I/O functions that interact with a FILE * argument, defined in stdio.h.

A FILE * is what is known as an opaque pointer: the compiler is not given any information about the data being pointed to besides its type name, preventing you from writing code that interacts with the contents of that structure. Each implementation of the standard library may have its own way of storing a FILE structure, but the usual idea is to store both a [file descriptor] and an array called a buffer. Most operations you do will interact with the buffer, which will only be synced to disk via the file descriptor when the buffer is full (if writing) or empty (if reading). This means that closing a FILE * is an important step in ensuring that what you did to it actually reaches the disk.

Each program initially has three open FILE *s in the global scope: stdout and stderr are opened in write-only mode and stdin is opened in read-only mode.

5.3 File modes and positions

A file may be opened in several different modes. The most common modes (with their fopen mode argument values) are:

Each file has a notion of a file position. Early files worked on magnetic tape held between two reels, with just one spot on the tape underneath the magnetic head that could read and write the contents of the tape. To read 20 bytes2 Bytes are a unit of digital information. We will dive more into this topic later in the course. of data, you also had to advance the tape by 20 bytes, meaning the next read would start after the first. This concept remains the dominant model for files today: they may be just large arrays of bytes under the hood, but the operating system APIs that let you access them have a notion of a position that is advanced every time you read or write. If you want to change that position in any other way, you have to explicitly seek a new position.

For FILE *s, fseek and ftell will let you manually manipulate the file position.

5.4 Byte-oriented I/O

Files store bytes. fread lets you read an array of bytes from a file, and fwrite lets you write an array of bytes to a file. You can also work one byte at a time using fgetc, fputc, though those are less efficient.

fread and fwrite use the void * type for the array they read to or write from. void * means a pointer to something, but the compiler can’t tell what. To help the functions know how to write those values, we have to tell it two more pieces of information: first, the size (in bytes) of a single element in the array pointed to by the void *; and second, the number of elements in that array.

To write the bytes of an array of longs to a file, we might do the following:

long x[4] = {124, 128, 225, 340};
FILE *myfile = fopen("demo.dat", "w");
fwrite(x, sizeof(long), 4, myfile);
fclose(myfile);

To read it back again we’d do

long y[4];
FILE *myfile = fopen("demo.dat", "r");
fread(y, sizeof(long), 4, myfile);
fclose(myfile);

Note that writing values in this way generally results in non-cross-platform files because the exact bytes used to store a value is implementation defined.

5.5 String-oriented I/O

The C standard library has various string-oriented reading and writing functions. The most versatile are fprintf and fscanf, but there are others like fgets, getline, and perror.

The only one of these functions that we’ll use in CS 340 is printf, which wraps fprintf to always use stdout as its FILE *.

printf is a variadic function, meaning it can be called with a variable number of arguments. It’s first argument is always a string, with the string being parsed to figure out how many other arguments they will be and their types. Arguments are noted in the string using conversion specifiers, which are a %, optionally some formatting information, and a letter. There are many conversion specifiers, but the following are particularly common:

%s
A string (i.e. a char *)
%d, %u, %x
An integer, to be displayed in base-10 signed (%d), base-10 unsigned (%u), or base-16 unsigned (%x)
%p
A pointer, displayed in hexadecimal
%f, %e, %g
A floating-point number, displayed in fixed point (%f), exponential notation (%e), or whichever is smaller (%g)

These often have additional information between the % and the letter. For example,

Mastering all printf conversion specifiers is unlikely to be worth your while, but learning enough to display basic information is.

The following shows how to display the value of three variables using printf.

void f(int x, const char *s, double y) {
    // ...
    printf("%s (%d): x=%d, s=\"%s\", y=%g\n", x, s, y);
    // ...
}

6 Heap memory

C does not have a new or delete operator. It also never implicitly creates or expands memory. If you want to allocate, adjust, or deallocate memory, you have to use a function to do so.

There are several operating-system-level memory allocation functions, such as brk, sbrk, and mmap, but they are challenging to use well and should generally be avoided. Instead, almost all C code uses four functions found in stdlib.h{man0p}: malloc, free, realloc, and calloc.

malloc allocates a contiguous range of bytes and returns a pointer to the first byte. It leaves undefined what the values of those bytes are.

calloc is much like malloc, except it sets all the bytes to 0 before returning the pointer. It also has two size arguments to make it less likely that programmers will make certain mistakes when allocating arrays.

free accepts a pointer returned by malloc (or calloc or realloc) and deallocates the memory it points to.

realloc resizes a region of memory. Roughly that means that it (a) mallocs a new region of memory; (b) memcpys byte values from the old to the new; (c) frees the old; and (d) returns a pointer to the new. In practice, it is often much more efficient than that set of steps, in some cases requiring almost no time at all.

When working with these functions, it is important to obey the following rules:

Some other standard library functions explicitly mention malloc and/or free in their documentation; for example, strdup says in part Memory for the new string is obtained with malloc(3), and can be freed with free(3). This is documented so that you know how the above rules apply to those functions too.