ARM Embedded C Tutorial : Compiling and linking your first embedded C Code:
-Aviral Mittal (avimit att yhaoo dat camm)


This tutorial uses KEIL tools to compile the C programs.
A limited version of KEIL is available free for download for anyone.
This is the link to armKEIL to download the free version.

The Compile Flow:

This section of the techerature explains bit about the machine-code (assembly-code) that is produced by compiler/Linker.
Sections of the assembly-code are shown/described.
What happens behind-the-scenes during the compilation is also described.
OK, but what is 'Compile' ?
A processor can only execute instructions that are in binary. A compiler converts a human-readable program written in a high level language such as C, into binary instructions. These binary instructions are then put into memory. The processor when started, fetches these binary instructions from the memory and executes them. The process of converting a human readable code into binary is called 'Compile'

The Simple C program introduced earlier is reproduced here:

typedef unsigned long uint32_t;
int main ()

    int ii;
    for(ii=0;ii<305419896;ii++) {
    *((uint32_t *)0x40E00018) = 0x87654321;

Compile it using KEIL uVision: Click Here to go to a Very simple Quick Tutorial

The compilation of the above C-Code, along with the 'statup.s' file results in an executable file called the 'axf' file. axf is a binary file which humans cant read, However it is possible to generate a human read-able version of this 'axf' file which will show instructions in 'assembly language': This is done by utility provided by Keil called the 'fromelf' utility.  The Keil Tutorial demonstrates how this conversion is done using this utility and the exact command syntax

Following is a section from the human-readable version of the 'axf' file. It can be seen how the 'main()' in C-language has been converted to assembly instructions.
The following code-snipped also shows the addresses of each instruction, i.e. where in memory this section of the code is stored. For example the first instruction of the 'main()' is stored at location 0x0000_0134, this is the instruction MOVS r0,#0. Which means move a value of '0' in register r0.
It can also be noticed that all the 'constant' values from the above 'C' program have been stored at totally separate memory locations starting from 0x0000_014C. There are 3 such constants as shown below.

        0x00000134:    2000        .       MOVS     r0,#0
        0x00000136:    e004        ..      B        0x142 ; main + 14
        0x00000138:    4904        .I      LDR      r1,[pc,#16] ; [0x14c] = 0x87654321
        0x0000013a:    4a05        .J      LDR      r2,[pc,#20] ; [0x150] = 0x40e00000
        0x0000013c:    6191        .a      STR      r1,[r2,#0x18]
        0x0000013e:    bf00        ..      NOP
        0x00000140:    1c40        @.      ADDS     r0,r0,#1
        0x00000142:    4904        .I      LDR      r1,[pc,#16] ; [0x154] = 0x12345678
        0x00000144:    4288        .B      CMP      r0,r1
        0x00000146:    dbf7        ..      BLT      0x138 ; main + 4
        0x00000148:    bf00        ..      NOP
        0x0000014a:    e7fe        ..      B        0x14a ; main + 22
        0x0000014c:    87654321    !Ce.    DCD    2271560481
        0x00000150:    40e00000    ...@    DCD    1088421888          
        0x00000154:    12345678    xV4.    DCD    305419896

Notice is, that the instructions inside the 'main()' above do not start from the address 0x0000_0000, it starts from 0x0000_0134.

Then, if the full text version of the 'axf' file, is analyzed it will show a lot going on. The 'main' code above is a few lines from it.
So what is this extra stuff in the 'axf' file.
The 'axf' file contains a lot of debug information. When the code is downloaded on to the target device, and the device remains connected to the host PC, this debug information helps to debug your code. While just the object code is loaded on the target itself, both the code and the debug information are loaded in the development host PC's  memory.

When the debug information is removed using some compile-time options, the axf file will look like this:
This is again a lot of code which is surplus to your 'main' code.
The surplus information is required to put the binary axf file into a format which the ARM architecture will be able to execute.
Before the user 'main()' is executed the following functions are called.

__main -> this is not the user main(), but a function called at the start of the binary executable, which calls other functions.

    __rt_entry in turn will call
        User Code (your code inside main)

The __main is the entry point of the user's program. This __main function is pre-defined (though the user can write their own __main). Note that this __main is different from the main() in the user's C-program. If the user intends to write their own '__main' they can use their code and also their name for it. However, then the user must update the Linker's default for '--startup' option e.g '--startup my__main' as the default linker option is the following '--startup=__main'. The user can also use '--no_startup' if they so wish. However the consequences of doing so is beyond the scope of this tutorial.

__main then calls __scatterload.
To understand __scatterload, it is important to understand a bit more about how the code is stored in memory and how it is executed.
A typical microcontroller system typically has several types of memories. e.g. Flash memory, ROM, RAM etc.
This means that it is possible that the same code may reside in one memory while it is not being executed and then moved to another memory when it is being executed. For example, code and its data can reside in ROM when it is not being executed, and then it is moved to RAM for execution.
In another example, the code may be executed directly form ROM, but its variables must be copied to RAM as these variables may need to be updated by the running code. The Keil Tutorial 2, shows how it is possible that a variable's initial values may be stored in Read-Only memory, but the variable itself is stored in read-write memory, and the initial values for these variables are then copied to the read-write memory before the program executes. In The Keil Tutorial 2, it can be seen that the C-program has 2 integer array variables, namely avar[10] and bvar[10], which have some initial values. The initial values for avar[10] are stored at addresses 0x0000_015c to 0x0000_0180. this could be ROM address.
However when the program executes, the variables avar[10] and bvar[10] are stored somewhere in the stack memory The stack memory  is a read/write memory in the region which starts from 0x2000_0000. Before the user's main() executes, the initial values of these variables are already available in the stack. This means that some how these initial values were copied from the region 0x0000_0xxx , which is Read-Only region, to 0x2000_0yyy which is a Read-Write region, before the user's main() is executed.

Load Region Vs Execution Region:
Now this means the code or some parts of the code may have different addresses for loading them into the memory e.g. when loading them into the ROM, and when they are executed. The address where the program is loaded is called its 'load-address' and the address from where the program is executed is called its execution address.
Now this is very clear that if the load address of the program or the load address for a section of a program  is different to their respective execution addresses, then there must be 'relocation' of the code or a section of the code to a different address i.e. to 'execution address' from its 'load address'
The function __scatterload exactly does that.
In addition to this re-location of code or section of code from load regions to execute regions, this function will also do initialization of certain regions such as stack region. The stack region is normally initialized to zero, as the compiler directive 'SPACE' used to reserve a space for the stack memory results in its zero initialization as well.
While Keil usually automatically does all the background code insertion for the user, to relocate the code, it also allows the user to define specific files called the scatter-load files to define the different memory regions. However this is beyond the scope of this tutorial. An example of scatter-load file is shown here. Using custom scatter load file, the user can effectively 'scatter' the code & data into multiple regions in the memory.

This now makes an interesting point: What if the system RAM is not available? It is quite possible that the system RAM is in-available at the time of system power-up. XIP is the answer to this scenario.
It is possible for the user to bypass the default execution flow by using their own 'entry point'. This can be done by setting the linker option '--entry My_Reset_Handler'.
The other way is to use assembler directive 'ENTRY'

Random Info:

Actually the full sequence of the Execution of the program is something like this:
1.   Stack Pointer is loaded from whatever the contents of the memory are at 0x0000_0000
2.   Program Counter of the processor is loaded to the location of Reset_Handler, this location will be present at the memory location 0x0000_0004
3.   Reset_Handler is nothing but a jump to __main (this is because in the startup.s file the Reset_Handler has been coded to do so)
4.   __main calls __scatterload; Program jumps to __scatterload
5.   __scatterload -> Initialization, Zero Initialization regions to 0, load region relocation to execution addresses
6.   __scatterload_null
7.   __scatterload_zero_init
8.   __rt_entry
9.   __user_setup_stackheap (optional)
10. __user_libspace
11. __rt_entry_li
12. __rt_lib_init
11. main() -> User Code
12. __rt_lib_shutdown

The C-program written above contains application code and data constants. When the compiled version of application code and data is put into the memory of a microcontroller, then it can be put into a 'root region' or a 'non-root region' of the memory. Root regions have the same load-time and execution-time addresses. Non-root regions have different load-time and execution-time addresses. The root region contains a region table output by the ARM linker.The region table contains the addresses of the non-root code and data regions that require initialization.
OK, but what is Root Region?
Root Region is a region in the memory space, where the 'load address' of the program is just its 'execute address'.
OK, but what is 'load address' and what is 'execution address'?
Load address is simply where the user's code will reside in memory. E.g Flash memory or ROM or RAM.
However at times the program may not be 'executed' from where it resides in the memory, but before it is executed, it must be relocated into the region of memory from where it is executed. This is 'execute address'.
Load Address Vs Execution Address: Why these are different at all?
There may be many reasons, one of them is explained here:
Consider an example, where user's code resides in Flash memory, and since the Flash memory accesses are slow, it may be desirable to execute the code from some RAM which is closer to the processor for fast processing. The flash memory location where the program is stored will be its 'load-address', however the RAM memory location where the program will be copied and eventually be executed from will be its 'execution address'.
In any embedded product, its very common for the software to reside on a Non-Volatile memory e.g. Flash memory, and to copy the code from Flash to RAM before it is executed.

The region table also contains a function pointer that indicates what initialization is needed for the region, for example a copying, zeroing, or decompressing function.

 goes through the region table and initializes the various execution-time regions. The function:
Initializes the Zero Initialized (ZI) regions to zero
Copies or decompresses the non-root code and data region from their load-time locations to the execute-time regions.
 always calls this function during startup before calling __rt_entry.

What is Scatter-Loading?
Scatter-loading is a process that enables a user to specify the memory map of your compiled binary file to the linker using text description.
Its a way to gain control over where to place your components of your binary file in the memory map.
The origin of the word 'scatter' comes form the fact that if a program is quite complex which requires several regions of the compiled code placed in several regions of the memory, then it is effectively being 'scattered' in the memory. Hence this mechanism is usually suitable for complex programs, however it is equally applicable to simple programs.

As per Joseph Yui:
Different development tools have different ways to specify the layout of the program and data memory in the microcontroller system. In ARM toolchains, you can use a file type called scatter-loading file, or in the case of Keil MDK-ARM, the scatter-loading file can be generated automatically by the mVision development environment

When to use Scatter-Loding?
As per the ARM documentation:
Scatter-loading is usually required for implementing embedded systems because these use ROM, RAM, and memory-mapped peripherals.
Situations where scatter-loading is either required or very useful:
Complex memory maps
Code and data that must be placed into many distinct areas of memory require detailed instructions on where to place the sections in the memory space.
Different types of memory
Many systems contain a variety of physical memory devices such as flash, ROM, SDRAM, and fast SRAM. A scatter-loading description can match the code and data with the most appropriate type of memory. For example, interrupt code might be placed into fast SRAM to improve interrupt response time but infrequently-used configuration information might be placed into slower flash memory.
Memory-mapped peripherals
The scatter-loading description can place a data section at a precise address in the memory map so that memory mapped peripherals can be accessed.
Functions at a constant location
A function can be placed at the same location in memory even though the surrounding application has been modified and recompiled. This is useful for jump table implementation.
Using symbols to identify the heap and stack
Symbols can be defined for the heap and stack location when the application is linked

Example of custom scatter load file for Keil uVision

Key Words:
Scatter Loading
load region execution region
code relocation.
purpose of __scatterload function
What does __scatterload function do?

<= Back                                            Next =>

Now what is __scatterload or __rt_entry or __rt_lib_init, and why are they needed?

To answer all of the above, we need to understand the ARM executable file and a bit about ARM architecture.
The binary execute able file or the axf file is also called 'image file'.
The structure of the ARM 'image file' has several 'regions'

        0x00000168:    00000178    x...    DCD    376
        0x0000016c:    20000000    ...     DCD    536870912
        0x00000170:    00000460    `...    DCD    1120
        0x00000174:    00000044    D...    DCD    68

The first entry at 0x00000168 has the value of 00000178. This value corresponds to where the code finishes.
The second entry at 0x0000016c has the value of 20000000. This value corresponds to the start of R/W memory base.
The third entry at 0x00000170 has the value of 00000460. This value corresponds to the size of stack
The fourth entry at 0x00000174 has the value of 00000044. This value I am not sure, looks like its the first address from the __scatterload_zeroinit function.

BTW the above listing is produced by converting the 'axf' file produced by KEIL compiler to text version with something called 'fromelf' utility.
fromelf --text -c -s -a -d -t -z --output firstproject.txt firstproject.axf
Where, firstproject.txt is a file which is the output from 'fromelf' utility, and 'firstproject.axf' is the input to 'fromelf' utitity. The above C program was called 'firstproject' hence you see 'firstproject' in the names of files.
The above command can be fed into Keil at the time of compilation as shown in the Keil Tutorial.
Or it can be used at command-line as shown above, and as shown in the Keil Tutorial.