The Not So Secret Life of C
The idea of this class is to understand how C++ is translated
into assembly (how assembly is translated into object code is
left as an exercise for the student).
So first we will talk about how C is translated into assembly,
because C++ is mostly a superset of C, and because it a good way
to learn techniques for achieving this understanding.
Hello World
First lets look at what hello world looks like in assembly.
Our source code, as you might expect, looks like this (click here for the source in a file.):
#include
int
main (int argc, char **argv)
{
printf("Hello World.\n");
return 0;
}
We turn that into assembly, and it looks like hello-world.s.
How do we do that? We run cc -s -o hello-world.s hello-world.c
We can also see it with source interpolated by running
cc -g -c -Wa,-adhls=hello-world.listing hello-world.c
The output of that will look like hello-world.listing
Some notes about memory
In a running process there are 5 parts to memory:
- Text
- The area where the program code is found.
- Data
- Area where initialized global variables are loaded.
- BSS
- Area for uninitialized global variables, part of your free store.
- Heap
- Area where memory allocators get their memory.
- Stack
- Place were stack frames and local variables are allocated.
Global Variable
We can look at how global variables are defined in assembly. They
are basically symbols which point into these memory areas and
never change. Sometimes the are exposed to LD, sometimes not. See
this assembly or this listing based on global-variables.c for more detail.
Functions
Functions are chuncks of code that get called using the
call
instruction. The basic idea is that we jump into
that location, run the code, and jump back to where we came from.
We use the stack to store where we came from, so that we can get
back. But since the code we call might trash the registers, we
also want to save some registers on the stack. And we need to pass
arguments, so we put those on the stack too, unless if they don't
fit in the registers.
We also pass return values back through a register,
eax
on Intel.
Calling Convention
Calling convention just means identifying which registers the
caller saves, which registers the callee saves, where to put the
arguments and where to put the return value.
The example function.c (function.s, function.listing)
Control Structures
Control structures such as if/else, while and do-while are all
implemented in terms of conditional jumps in assembly. This is
fairly straightforward. Check out if-then.c (if-then.s,
if-then.listing) for a simple
example.
Pointers
You may have noticed that assembly already deals with pointers a
great deal. So its obvious how pointers and pointer arithmatic are
dealt with when they are compiled.
Array
An array is just a pointer to the beginning of the array. To mess
with a value to it, you just calculate the offset.
TODO: Write a short example.
Structures
Like arrays, structures are just going to be pointers to the start
of a memory area. To access the contents, you just calculate the
offset.
Since they are of a fixed size, you can put them on the stack easily enough.
TODO: Write a short example.
Unions
And of course Unions are like structures, but with slightly more complex sorts of offsets.
TODO: Write a short example.
That's it for our introduction to how C is compiled. As you can
see, C is very simple to compile, asssuming you aren't doing any
optimization.
Richard Tibbetts
Last modified: Mon Jan 19 19:06:34 EST 2004