Stubbing Functions - CS 225 Fall 2018: Data Structures

This is a (not so) brief background to give some context (or motivation) to the purpose of ‘stubbing out’ a function. Feel free to skip it, but it is worth a read (at least) once.

Suppose you are given a task to implement a new function for an existing code base. Where do you start?

Where should the function go?

The first thing you might want to do is ask “Where should this new function go?” (I.e. where does it belong in the source-code tree?) To answer this question, you might consider all of the possible places it could go and then assess which one of those makes the most sense. So let’s start there; finding out where a new function could go. The following is a list of possible places to put the new function:

(Note for this example, I am limiting possible locations to those you might see in a 225 MP instead of what you might see in say, Google’s code base.)

In main.cpp (e.g. a global function)
In an existing class as a member function
In a new class as a member function
In a library of related functions

Now that we have a list of locations, let’s do as we said above, and assess the merit of each of them.

With respect to implementing global functions, the best first guess is to avoid it if possible. It is messy and pollutes the namespace. It is almost always possible to implement functions as part of a smaller, more descriptive namespace than the global namespace. With respect to MPs in CS 225, you will never actually hand-in a main.cpp anyway, so adding a function in main.cpp is useless because it will not be graded. In fact, if your code depends on a function (or any change) added to main.cpp, it is almost certain that your code will not compile when we grade it. This will result in a grade of zero.

Adding a new function to an existing class is a fine choice if it is the correct choice. What I mean by that is that it is only a good idea to implement a given function for a given class if first, the function is useful to that class, and second, is not useful to any other classes which do not inherit from that class. The same is true of implementing a new function in a new class. It is a completely different task in itself to ask “Does this function really need to be in its own class?”

A very common, but often overlooked option is to implement a new function as part of a library of functions. This library may be a new library or an existing library, but the consideration here is similar to that of adding the function to a class (i.e. “Does this function make sense in the context of the library I am adding it to).

This might be confusing at this point, so let’s look at an example. Let’s say we wanted to implement Merge Sort in C++ and we already had a main.cpp and an ArrayList class. I will skip the consideration of making our new Merge Sort a global function because it is a bad choice almost every time. Should we add Merge Sort to our ArrayList class? Surely it is going to be used in the context of sorting ArrayLists for us, but what if we then implement a LinkedList and it wants to use Merge Sort too? Should we then implement a new Merge Sort for our LinkedList class as well? Probably not. In this case, it might make sense to create a new library of sorting functions and then add our new Merge Sort function there. What would that look like? We would probably create a new file called sortingUtils.cpp or similar. We might then want to do the responsible thing and declare a new namespace in that file called cs225sortingutils or similar. Now, in our new file, in our new namespace, we can declare and define our new function to perform Merge Sort. Surely, I could have just as easily chosen a function like “Expand List” instead of Merge Sort and then perhaps it would have made more sense to add this to our ArrayList class instead of a new library because Expand List is most likely going to be dependent on our ArrayList implementation and thus not useful to another class like LinkedList.

What exactly do we need to add to the code base?

Okay, so now we know that there are several choices for where we can put our new function, and we know how to roughly assess which one makes the most sense. What should we do next, given that we know where we want to add our function? It might make sense to now assess what it is, exactly, that we need to add to the code base. That is, what is the necessary set of changes we need to make to the code base such that our new function will work correctly? As it turns out, there are usually four places in any code base which need to be touched in order to add any new function:

The declaration of the function (goes in the .h file for a class function)
The definition of the function (goes in the .cpp file for a class function)
The function call itself (could go anywhere. Why implement a function if nobody calls it?)
The build process (usually goes in a Makefile. Why write code if you don’t compile it?)

All that is left now, is what exactly to write in each of the four locations above to correctly complete our task of adding a new function to an existing code base. As this is merely an explanation of a useful process, we will stop here as the rest is straightforward.

How to stub out a function

The simple description of what it means to “stub out” a function is that a function stub is merely an empty function definition. That is, the case where stubbing out a function is useful is when you have a function or functions declared somewhere and then subsequently you also have calls to those declared functions elsewhere, but you don’t have the functions defined yet. (We can assume that the code is all set to build properly if there are no compile errors.) So to recap, you have function calls to a function or functions which are declared, but not defined. What happens when you try to compile your code? You’ll probably see a compiler error similar to the following:

main.cpp:6: undefined reference to `TestClass::returnVoid()'

The error above is specifically saying that, on line 6 of the file main.cpp, there is a reference to a function returnVoid, in the class TestClass, which is undefined. Bummer. All you wanted to do was see if the rest of your code compiles, and now, you get this error telling you that you must first define this additional function before you can compile and test the code you are interested in. This is where stubbing functions is useful. Instead of implementing the entire function which needs to be defined before you can compile (which may introduce further compile errors) your code, just write the minimal definition which will make the compiler happy. The minimal definition of any arbitrary function is the following:

void TestClass::returnVoid() {
    // nothing!
}

The above is as simple as a stub gets. If you had a function which had a non-void return type then you would have to add a return statement to the stub in order to avoid the following compiler warning:

hello.cpp: In member function 'int* TestClass::returnPtr()':
hello.cpp:13: warning: control reaches end of non-void function

The compiler warning above is saying that there is a function, returnPtr, in the class TestClass which has a non-void return type and there exists some path (or paths) through the function which does (do) not hit a return statement. This is one of those warnings which is best to treat as an error. In the best case, it is a bad code smell and in the worst, and common, case it will cause your code to crash, and or behave, unexpectedly.

The correct stub for this case would look like:

int* TestClass::returnPtr() {
    return NULL;
}

Of course, you could replace NULL with any valid variable so long as it is a pointer to an integer. The point is, stubbing out a function is a quick way to make the compiler happy without having to concern yourself with functions you are not ready to implement. An alternative to stubbing out a function would be to comment out every function call to the undefined function, but there could be a lot of these, and it is not a very clean solution. This is an even bigger problem if you have several undefined functions. You could make the argument that you should just not call a function which is not yet defined, but this is another bad solution for a number of reasons. In the most simple case, let’s say you are responsible for writing some code, and your code needs to call a function somebody else is responsible for implementing. Your code won’t be complete or correct until you add the necessary function calls, but if you do add them, your code won’t compile. The easiest thing to do in this case, as in most others, is to just stub out the functions you need and worry about their implementations later.