Modularity and Pure Functions

Section 12.5 Modularity and Pure Functions

Modularity is the degree to which a system’s components may be separated and recombined. In programming, this often refers to the organization of code into distinct functions or modules that can be developed, tested, and maintained independently. One important way to make functions modular is to follow the single responsibility pattern. But another is to limit a functions communication with other code to parameters and return values.

🔗

In Subsection 5.6.2, we discussed global variables. Global variables make functions less modular because it ties their behavior to the global state of the program. If you see the code: calculateTriangleArea(side1, side2, side3), it is pretty clear that the function is going to calculate the area of a triangle with those three sides. We could write a version of that function that instead uses global variables to store the data needed:

🔗

// global variables - BAD, BAD, BAD!!!!
double side1, side2, side3;
double area;


void calculateTriangleArea() {
    // I hope someone set these...
    double s = (side1 + side2 + side3) / 2;
    // We'll just put the answer in area and hope the caller knows to look for it
    area = sqrt(s * (s - side1) * (s - side2) * (s - side3));
}

int main() {
    cout << "Enter the length of the first side of the triangle: ";
    cin >> side1;
    cout << side1 << endl; // echo the input
    cout << "Enter the length of the second side of the triangle: ";
    cin >> side2;
    cout << side2 << endl; // echo the input
    cout << "Enter the length of the third side of the triangle: ";
    cin >> side3;
    cout << side3 << endl; // echo the input

    calculateTriangleArea();

    cout << format("The area of the triangle to one decimal is: {:.1f}", area) << endl;
}

Looking at the function call calculateTriangleArea() on line 24, it is completely unclear what data is going to be worked on. This function communicates with the rest of this program via the secret back channel of global variables. The caller has to know that they need to set the special variables side1, side2, and side3 before calling the function. And they need to know that the result will be stored in the global variable area.

🔗

A call to the original version of the function looked like this:

🔗

double area = getTriangleArea(side1, side2, side3);

It is clear exactly what information is being passed to the function and what result is produced. It is an example of a pure function, one that does not have side effects and always produces the same output for the same input. The call getTriangleArea(3, 4, 5) is pure because it always produce the same answer: 6.0. Furthermore, calling the function will not change the behavior of anything else in the program.

🔗

Now consider the version of getTriangleArea() that relies on global variables again. We can’t predict the answer without knowing what value global variables have. Its behavior thus relies on the overall state of the program (whether or not other code set the global variables). Furthermore, the function changes a global variable, which may cause non-obvious changes to the behavior of code that comes after it.

🔗

Pure functions like double calculateTriangleArea(double side1, double side2, double side3) are inherently more modular than functions that rely on global state or side effects. Changes to other code can’t affect the behavior of this version of calculateTriangleArea unless they change the parameter values passed into a call. And calling the function can’t change the behavior of other code.

🔗

Insight 12.5.1.

Functions should take input via parameters. They should return results via return values. All “communication” with the rest of the program should happen through these mechanisms.

🔗

Of course, every rule has exceptions. Functions that do input or output operations always rely on the overall state of the program and input/output streams. But as discussed earlier, input and output should be thought of as distinct jobs and kept out of functions that are doing other work. Doing that will minimize the number of functions that depend on the global state of io streams.

🔗

Another exception to this rule is using reference parameters. When we are working with large data structures, it can be more efficient to modify the parameter in place rather than create a new value to return. Say we want to write a function that is supposed to capitalize a string. We could write a pure function to do so:

🔗

Listing 12.5.1.

🔗

// Take a constant reference
string capitalize(const string& s) {
    string copy = s;  //copy the original string
    for(char& c : copy) {
        c = toupper(c);
    }
    return copy;
}

To use that function we would call: string copy = capitalize(original);

🔗

We end up with two different strings, one capitalized and one not. This would be perfectly reasonable if we need to keep the original string unchanged or if the string is small. But if the string was the entire text of a book, and we did not care about keeping the original, it would be more efficient to modify the original string in place:

🔗

Listing 12.5.2.

🔗

void capitalize(string& s) {
    // loop through the original string and modify it
    for(char& c : s) {
        c = toupper(c);
    }
}

To use that function we would call: capitalize(original);

🔗

The advantage here is there is only one string. We never make a copy of the original. The downside is that the function is no longer pure—whatever string variable was passed into the function has been changed. This changes the state of calling function. Looking at the function call capitalize(original);, a reader can’t necessarily tell that the original string is being modified.

🔗

For a function we expect to only work on small pieces of text, the less efficient approach of returning a modified copy is probably the better design. However, if we expect to work with larger strings where performance is a concern, modifying the original in place may be justified. (Remember, design is about trade-offs!)

🔗

Checkpoint 12.5.1.

Which statements are true of pure functions?

🔗

They communicate back to the caller only via the return value.
They receive information from callers only via parameters.
They can change reference parameters.
Pure functions cannot change any parameters, including reference parameters. Sometimes we may design a
They can do input and output to the console.
Anything that communicates with the outside world (like printing to the console) makes a function depend on what has happened outside of it.

🔗

You have attempted of activities on this page.

🔗

Prev Top Next