The Not So Secret Life of C

The idea of this class is to understand how C++ is translated into assembly (how assembly is translated into object code is left as an exercise for the student).

So first we will talk about how C is translated into assembly, because C++ is mostly a superset of C, and because it a good way to learn techniques for achieving this understanding.

Hello World

First lets look at what hello world looks like in assembly.

Our source code, as you might expect, looks like this (click here for the source in a file.):

#include 

int
main (int argc, char **argv)
{
    printf("Hello World.\n");
    return 0;
}
      
We turn that into assembly, and it looks like hello-world.s.

How do we do that? We run cc -s -o hello-world.s hello-world.c

We can also see it with source interpolated by running cc -g -c -Wa,-adhls=hello-world.listing hello-world.c The output of that will look like hello-world.listing

Some notes about memory

In a running process there are 5 parts to memory:
Text
The area where the program code is found.
Data
Area where initialized global variables are loaded.
BSS
Area for uninitialized global variables, part of your free store.
Heap
Area where memory allocators get their memory.
Stack
Place were stack frames and local variables are allocated.

Global Variable

We can look at how global variables are defined in assembly. They are basically symbols which point into these memory areas and never change. Sometimes the are exposed to LD, sometimes not. See this assembly or this listing based on global-variables.c for more detail.

Functions

Functions are chuncks of code that get called using the call instruction. The basic idea is that we jump into that location, run the code, and jump back to where we came from.

We use the stack to store where we came from, so that we can get back. But since the code we call might trash the registers, we also want to save some registers on the stack. And we need to pass arguments, so we put those on the stack too, unless if they don't fit in the registers.

We also pass return values back through a register, eax on Intel.

Calling Convention

Calling convention just means identifying which registers the caller saves, which registers the callee saves, where to put the arguments and where to put the return value.

The example function.c (function.s, function.listing)

Control Structures

Control structures such as if/else, while and do-while are all implemented in terms of conditional jumps in assembly. This is fairly straightforward. Check out if-then.c (if-then.s, if-then.listing) for a simple example.

Pointers

You may have noticed that assembly already deals with pointers a great deal. So its obvious how pointers and pointer arithmatic are dealt with when they are compiled.

Array

An array is just a pointer to the beginning of the array. To mess with a value to it, you just calculate the offset.

TODO: Write a short example.

Structures

Like arrays, structures are just going to be pointers to the start of a memory area. To access the contents, you just calculate the offset.

Since they are of a fixed size, you can put them on the stack easily enough. TODO: Write a short example.

Unions

And of course Unions are like structures, but with slightly more complex sorts of offsets.

TODO: Write a short example.


That's it for our introduction to how C is compiled. As you can see, C is very simple to compile, asssuming you aren't doing any optimization.
Richard Tibbetts
Last modified: Mon Jan 19 19:06:34 EST 2004