Is C simple?

28 October 2024 Subscribe

Linus Torvalds is quoted in a recent LWN article as saying:

C is, in the end, a very simple language.

Some of the commenters disagree:

roc: When you first write some programs which produce correct output, you think C is simple. Then you gradually learn the rules and get better at applying them, until it's not really simple at all, but you still think it's simple from your first impression. In the meantime, you have unfortunately written a lot of garbage C code.

And:

NYKevin: C is simple if you learn it wrong. For example, if you are told that pointers are integers representing the (arbitrary) memory address of a variable, then you will think that pointers are "simple." But pointers are not, on the abstract machine, integers referencing memory addresses. Rather, a pointer is an opaque handle given out by the abstract machine which may only be manipulated in very specific ways, or else it's UB.

And a similar sentiment elsewhere:

gilgoomesh: C sounds simple when you look through K&R C… But the reality is filled with a staggering number of weird special cases that exist because memory doesn't work like a simple flat address space; or the compiler needs to optimise field layouts, loops, functions, allocations, register assignments and local variables; or your CPU doesn't use the lower 4 bits or upper 24 bits when addressing.

I think there are actually two distinct viewpoints that are expressed in these objections:

C is simple, but its simplicity offloads complexity onto the programmer, e.g. manual memory management is undeniably simpler than automated garbage collection, but it doesn't make it any simpler to write correct programs.
C is not simple, because the language specification is much more complex and abstract than "portable assembly" and there are many strange and surprising edge cases.

The first point is really about semantics (in the linguistic sense), and – setting aside the larger question of language design – I don't think there's much substantive disagreement on the actual facts.

The second point is covered in depth by two excellent articles:

A Guide to Undefined Behavior in C and C++, Part 1 by John Regehr: Undefined behavior is an umbrella term for conditions that your program must not meet. Two common examples are signed integer overflow and dereferencing a null pointer. The key point that I did not understand for a long time was that it is not only an issue if at runtime your code "triggers" undefined behavior, e.g. by overflowing a signed integer. The mere presence of possible undefined behavior in your program can cause the compiler to emit code that is quite different from your wishes, for example by compiling "dead" code into a no-op.
Pointers Are More Abstract Than You Might Expect in C by Stefan Schulze Frielinghaus: The key point here is that, while pointers are normally represented as integers on modern hardware, they do not behave like them. The article gives an especially surprising example: for two pointers p and q with the same numeric value, p == q may be false if p and q were constructed from different objects and just happen to have the same address because of the stack-frame layout. More prosaically, adding 1 to an int* will increase its numeric value by sizeof (int), not by 1.

So, is C simple? I still don't know. I'd like to hold on to my fond impression of C as a straightforward language that does not get in your way. But I have to concede that these arguments have given me pause. ∎