Better abstractions in C

2010 July 22

Opaque structures

A cartesian point could be represented with the following struct:


/* point.h */

typedef struct point
{
   int x;
   int y;
} Point;

Point variables are declared and used as:


#include "point.h"

Point p;
p.x = 100;
p.y = 200;

The problem with a traditional C struct is that it can't enforce data validation checks on its own. Imagine that x and y members of a Point should contain only positive values. You want to enforce that restriction on the users of Point, but you can't, because the structure members are directly accessible to anyone who can declare its instances:


Point p;
p.x = -100; /* You don't want x to be negative, but you can't 
               prevent that from happening! */

Object Oriented languages like C++ has special syntactic support to enforce access restrictions on user-defined types:


// Point struct in C++.

struct Point
{
public: 
    int x () { return x_; }
    int y () { return y_; }    
    void set_x (int x) 
    {
        if (x >= 0)    
            x_ = x;
    }
    void set_y (int y) 
    {
        if (y >= 0)    
            y_ = y;
    }
private:
    int x_;
    int y_;
};

Users of the Point struct can access x and y only through the functions declared public. These functions enforce data validation checks. Does it mean that a C programmer is at loss because private member declarations are not supported by the language? No! This is where an opaque structure definition comes handy. Data privacy can be achieved by providing a forward declaration of the struct in the header file and redirecting all operations on the member variables through functions:


/* point.h */

typedef struct Point Point;

/* Construction and destruction of points. */
Point* Point_new (int x, int y);
void Point_delete (Point* p);

/* accessor/mutator functions for Point */
int Point_x (Point* p);
int Point_y (Point* p);
void Point_set_x (Point* p, int x);
void Point_set_y (Point* p, int y);

The definition of the struct and its accessor/mutator functions will go in a separate implementation file:


/* point.c */

#include <stdlib.h> 
#include <assert.h>     
#include "point.h"

struct Point
{
    int x;
    int y;
};

Point* Point_new (int x, int y)
{
    Point* p = malloc (sizeof (Point));
    assert (p != NULL);
    p->x = x;
    p->y = y;
    return p;
}

void Point_delete (Point* p)
{
    if (p != NULL)
       free (p);
}

int Point_x (Point* p) { return p->x; }
int Point_y (Point* p) { return p->y; }

void Point_set_x (Point* p, int x)
{
    if (x >= 0)
        p->x = x;
}

void Point_set_y (Point* p, int y)
{
   if (y >= 0)
      p->y = y;
}

A user of Point will not be able to access its members directly, as 'point.h' has no real definition for the structure:


#include "point.h"

Point* p = Point_new (100, 200);
p->x = -100; /* COMPILER ERROR. At this point the compiler
                do not know what a Point consists of because
                point.h do not specify that. */
Point_set_x (&p, 100); /* OK */
Point_set_y (&p, 200); /* OK */
Point_set_x (&p, -100); 
/* p.x is still 100 because of the check in Point_set_x (). */

Hiding the implementation of the structure has another advantage: the actual representation of a Point can be changed without breaking existing client code. For instance, in future you may decide to use an array instead of separate x and y variables:


/* point.c */

struct Point
{
   /* Using an array does not make much sense here.
      But this is just an example, right?! */
   int coords[2];
};

Point* Point_new (int x, int y)
{
    Point* p = malloc (sizeof (Point));
    assert (p != NULL);
    p->coords[0] = x;
    p->coords[1] = y;
    return p;
}

void Point_delete (Point* p)
{
    if (p != NULL)
        free (p);
}

int Point_x (Point* p) { return p->coords[0]; }
int Point_y (Point* p) { return p->coords[1]; }   

void Point_set_x (Point* p, int x)
{
   if (x >= 0)
      p->coords[0] = x;
}

void Point_set_y (Point* p, int y)
{
   if (y >= 0)
      p->coords[1] = y;
}

If x and y were exposed to users, code that access them will not compile anymore. But as we have hidden the implementation, programs using the Point structure only need to re-link with the new object file. Hiding the implementation details of a type is known as encapsulation in OOP parlance. Opaque structures are a way to achieve encapsulation in C.

Function pointers

The following function prints a table of Fahrenheit values and their corresponding Celsius conversions:


/* Based on a sample from "The C Programming Language" */
static void
fahrenheit_to_celsius (int lower, int upper, int step)
{
    float fahr, celsius;
    fahr = lower;
    while (fahr <= upper)
    {
        celsius = (5.0/9.0) * (fahr - 32.0);
        printf ("%3.0f %8.1f\n", fahr, celsius);
        fahr += step;
    }
}

Usage:


int lower, upper, step;
lower = 0;
upper = 300;
step = 20;

fahrenheit_to_celsius (lower, upper, step);

Output:


0    -17.8
20     -6.7
40      4.4
60     15.6
 .....

Now you are asked to write a function to do the reverse: convert Celsius to Fahrenheit. Well, that's easy too!


static void
celsius_to_fahrenheit (int lower, int upper, int step)
{
    float celsius, fahr;
    celsius = lower;
    while (celsius <= upper)
    {
        fahr = (9.0/5.0 * celsius) + 32;
        printf ("%3.0f %8.1f\n", celsius, fahr);
        celsius += step;
    }
}

Usage:


int lower, upper, step;
lower = 0;
upper = 300;
step = 20;

celsius_to_fahrenheit (lower, upper, step);

Output:


0     32.0
20     68.0
40    104.0
60    140.0
 ....

You may have observed that fahrenheit_to_celsius and celsius_to_fahrenheit share much in common. In fact, they differ only at the conversion formula. If there is a way to abstract away the formula and pass that as an argument to the function, we can have just one function to convert between myriads of units! C provide this abstraction in the form of Function Pointers. A function pointer is a variable that points to the address of a function. It is declared to point to functions of a particular signature. For example, the following function pointer, fnptr, can point to any function that take two int arguments to produce an int:


int (*fnptr) (int, int);
   
int add (int a, int b) { return a + b; } 
int sub (int a, int b) { return a - b; }

fnptr = add;
fnptr (10, 20); /* => 30 */
fnptr = sub;
fnptr (10, 20); /* => -10 */

With the help of typedefs, function pointer declarations can be made to look like any other type declaration:


/* Declares the function pointer type - FnPtr. */
typedef int (*FnPtr) (int, int);

FnPtr fp = add; /* fp is an instance of FnPtr */
fp = sub;

Armed with this knowledge about function pointers, let us try to write a generic conversion function. This function takes an extra argument which is a pointer to a function that consume a float and return a float:


typedef float (*conversion_fn) (float);

static void
convertion_table (int lower, int upper, 
                  int step, conversion_fn fn)
{
    float from, to;

    from = lower;
    while (from <= upper)
    {
        to = fn (from); /* Execute the user-defined formula */
        printf ("%3.0f %8.1f\n", from, to);
        from += step;
    }
}

The conversion_table function is agnostic of the units it is working on and the method of conversion. All it does is pass float values in a range to the conversion_fn object and print the result. The only restriction it imposes on the caller is that the function pointed to by fn should conform to a protocol: it should take a float as input and produce a float as output. Now you can define unit conversions based on this protocol.


/* Implementation of the "conversion protocol" 
   to convert fahrenheit to celsius. */ 
static float 
fahr_to_celsius (float fahr)
{
    return (5.0/9.0) * (fahr - 32.0);
}

/* Implementation of the "conversion protocol" 
   to convert celsius to fahrenheit. */ 
static float 
celsius_to_fahr (float celsius)
{
   return (9.0/5.0 * celsius) + 32;
}

conversion_table (lower, upper, step, fahr_to_celsius);

/* Output:
   0    -17.8
  20     -6.7
  40      4.4
  60     15.6
  .....
*/

conversion_table (lower, upper, step, celsius_to_fahr);

/* Output:
   0     32.0
  20     68.0
  40    104.0
  60    140.0
  ....
*/

Someone else may use conversion_table() to produce a table of "kilograms to pounds" conversions. He just have to write another simple one-liner formula function and pass that to conversion_table():


static float
kg_to_pounds (float kg)
{
   return kg * 2.2;
}

conversion_table (lower, upper, step, kg_to_pounds);
/* Output:
   0      0.0 
  20     44.0
  40     88.0
  60    132.0
  80    176.0
  ....
*/

The ability to identify patterns in source code and abstract them as re-usable solutions is one of the key skills of a competent programmer. This is so important in writing maintainable code that some people took the trouble of identifying and documenting the most common of such patterns. These are popularly known as Design Patterns. One such pattern is the Strategy Pattern or the Policy Pattern, whereby algorithms are selected at runtime. Our conversion_table function makes use of the Strategy Pattern. The algorithm for performing the unit conversion is passed to it dynamically in the form of a function pointer.

Function pointers are one of the most useful features of C. They provide simple solutions for lots of design problems. The same solution will be much more complicated in languages that don't allow functions to take other functions as arguments. (Some higher-level languages like Lisp and Smalltalk don't have pointers, but they give functions the same status as other types, so that they can be passed as arguments and returned as results. Such functions are called Higher-Order Functions).

An identifying feature of OOP is polymorphism. It is the ability to call different implementations of a function, based on the runtime type of the object invoking the function. Some OOP languages use function pointers under the hood to implement polymorphic objects.

Further reading

  1. The C programming Language.
  2. Notes on C Programming by Rob Pike.
  3. Structure and Interpretation of Computer Programs - shows some interesting abstraction mechanisms using functions.
  4. Design Patterns: Elements of Reusable Object-Oriented Software.