**Contents** [[!toc levels=3]] # Introduction This wiki page examines the transformation of a very simple C program into a running ELF executable containing PowerPC machine code. A NetBSD system (NetBSD 4.0.1/macppc in these examples) and its toolchain (gcc and GNU binutils) perform several steps in this transformation. 1. gcc translates our C code to assembly code. 2. gcc calls GNU as to translate the assembly code to machine code in an ELF relocatable object. 3. gcc calls GNU ld to link our relocatable object with the C runtime and the C library to form an ELF executable object. 4. NetBSD kernel loads ld.elf_so, which loads our ELF executable and the C library (an ELF shared object) to run our program. _So far, this wiki page examines only the first two steps._ # A very simple C program This program is only one C file, which contains only one main function, which calls [[!template id=man name="printf" section="3"]] to print a single message, then returns 0 as the exit status. #include int main(int argc, char *argv[]) { printf("%s", "Greetings, Earth!\n"); return 0; } The C compiler _gcc_ likes to use its knowledge of builtin functions to manipulate code. The version of gcc in NetBSD 4.0.1/macppc will simplify the printf statement to puts("Greeting, Earth!"); so the main function effectively calls [[!template id=man name="puts" section="3"]] once and then returns 0. We can apply [[!template id=man name="gcc" section="1"]] in the usual way to compile this program. (With NetBSD, _cc_ or _gcc_ invokes the same command, so we use either name.) Then we can run our program: $ cc -o greetings greetings.c $ ./greetings Greetings, Earth! $ We can apply gcc with the _-v_ option to see some extra information. (Unlike most other commands, gcc does not allow combined options. Instead of _gcc -vo_, we must type _gcc -v -o_.) The gcc driver program actually runs three other commands. Here is the output from one run using my NetBSD 4.0.1 system. I have put the three commands in **bold**. $ cc -v -o greetings greetings.c Using built-in specs. Target: powerpc--netbsd Configured with: /usr/src/tools/gcc/../../gnu/dist/gcc4/configure --enable-long- long --disable-multilib --enable-threads --disable-symvers --build=i386-unknown- netbsdelf4.99.3 --host=powerpc--netbsd --target=powerpc--netbsd Thread model: posix gcc version 4.1.2 20061021 prerelease (NetBSD nb3 20061125) **/usr/libexec/cc1 -quiet -v greetings.c -quiet -dumpbase greetings.c -auxbase gr** **eetings -version -o /var/tmp//ccVB1DcZ.s** #include "..." search starts here: #include <...> search starts here: /usr/include End of search list. GNU C version 4.1.2 20061021 prerelease (NetBSD nb3 20061125) (powerpc--netbsd) compiled by GNU C version 4.1.2 20061021 (prerelease) (NetBSD nb3 200611 25). GGC heuristics: --param ggc-min-expand=38 --param ggc-min-heapsize=77491 Compiler executable checksum: 325f59dbd937debe20281bd6a60a4aef **as -mppc -many -V -Qy -o /var/tmp//ccMiXutV.o /var/tmp//ccVB1DcZ.s** GNU assembler version 2.16.1 (powerpc--netbsd) using BFD version 2.16.1 **ld --eh-frame-hdr -dc -dp -e _start -dynamic-linker /usr/libexec/ld.elf_so -o g** **reetings /usr/lib/crt0.o /usr/lib/crti.o /usr/lib/crtbegin.o /var/tmp//ccMiXutV.** **o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/crtend.o /usr/lib/crtn.o** The first command, _/usr/libexec/cc1_, is internal to gcc and is not for our direct use. The other two commands, _as_ and _ld_, are external to gcc. We would use _as_ and _ld_ without gcc, if we would want so. The first command, _/usr/libexec/cc1_, is the C compiler proper; it compiles C code and outputs assembly code, in a file with the _.s_ suffix. In the above example, it created the assembly version of our main function. The second command, _as_, assembles the _.s_ file to machine code, in a relocatable object file with the _.o_ suffix. It created the machine code version of our main function. The third command, _ld_, links object files into an executable file. It combined our main function with the C runtime and the C library to create our program of _greetings_. The _.s_ assembly file and the _.o_ object file were temporary files, so the gcc driver program deleted them. We only keep the final executable of _greetings_. # The program in PowerPC assembly language The manual page for [[!template id=man name="gcc" section="1"]] explains that we can use the _-S_ option to stop gcc with the assembly code. For PowerPC targets, gcc outputs register numbers by default; the _-mregnames_ option tells gcc to output register names instead. If you are learning assembly language, then _cc -mregnames -S_ is a good way to produce examples of assembly code. The command _cc -mregnames -S greetings.c_ produces the output file _greetings.s_ which contains the assembly version of our main function. (If you want _greeting.s_ to contain PowerPC assembly code, then you need to use compiler that targets PowerPC.) The assembly syntax allows for comments, assembler directives, instructions and labels. * Comments begin with a '#' sign, though gcc never puts any comments in its generated code. PowerPC uses '#', unlike many other architectures that use ';' instead. * Assembler directives have names that begin with a dot (like _.section_ or _.string_) and may take arguments. * Instructions have mnemonics without a dot (like _li_ or _stw_) and may take operands. * Labels end with a colon (like _.LC0:_ or _main:_) and save the current address into a symbol. The PowerPC processor executes instructions. Most PowerPC instructions operate on the registers inside the processor. There are other instructions that load registers from memory or store registers to memory. Each of the general purpose registers (named r0 through r31) and the link register (named lr) hold a 32-bit integer. Assembly code may contain the register numbers (0 through 31) or the register names. Register numbers become confusing, when a 3 in the code might refer to the general purpose register r3, the floating point register f3, or the immediate value 3. The _cc -mregnames_ flag uses the assembly syntax for register names, which is to put a '%' sign before each name, as in _%r3_ or _%f3_. This is necessary to distinguish register _%r3_ from a symbol named _r3_. ## Commented copy of greetings.s Here is a copy of _greeting.s_ (from the gcc of NetBSD 4.0.1/macppc) with added comments. Each instruction has a comment in pseudo-C to show the effect, if you know C language. Pretend that the registers are (char *) for indexing, but int or (int *) for assignment. # This is a commented version of greeting.s, the 32-bit PowerPC # assembly code output from cc -mregnames -S greetings.c # .file takes the name of the original source file, # because this was a generated file. I guess that this # allows error messages or debuggers to blame the # original source file. .file "greetings.c" # Enter the .rodata section for read-only data. String constants # belong in this section. .section .rodata # For PowerPC, .align takes an exponent of 2. # So .align 2 gives an alignment of 4 bytes, so that # the current address is a multiple of 4. .align 2 # .string inserts a C string, and the assembler provides # the terminating \0 byte. The label sets the symbol # .LC0 to the address of the string. .LC0: .string "Greetings, Earth!" # Enter the .text section for program text, which is the # executable part. .section ".text" # We need an alignment of 4 bytes for the following # PowerPC processor instructions. .align 2 # We need to export main as a global symbol so that the # linker will see it. ELF wants to know that main is a # @function symbol, not an @object symbol. .globl main .type main, @function main: # The code for the main function begins here. # Passed in general purpose registers: # r1 = stack pointer, r3 = argc, r4 = argv # Passed in link register: # lr = return address # The int return value goes in r3. # Allocate 32 bytes for our the stack frame. Use the # atomic instruction "store word with update" (stwu) so # that r1[0] always points to the previous stack frame. stwu %r1,-32(%r1) # r1[-32] = r1; r1 -= 32 # Save registers r31 and lr to the stack. We need to # save r31 because it is a nonvolatile register, and to # save lr before any function calls. Now r31 belongs in # the register save area at the top of our stack frame, # but lr belongs in the previous stack frame, in the # lr save word at (r1[0])[0] == r1[36]. mflr %r0 # r0 = lr stw %r31,28(%r1) # r1[28] = r31 stw %r0,36(%r1) # r1[36] = r0 # Save argc, argv to the stack. mr %r31,%r1 # r31 = r1 stw %r3,8(%r31) # r31[8] = r3 /* argc */ stw %r4,12(%r31) # r31[12] = r4 /* argv */ # Call puts(.LC0). First we need to load r3 = .LC0, but # each instruction can load only 16 bits. # .LC0@ha = (.LC0 >> 16) & 0xff # .LC0@l = .LC0 & 0xff # This method uses "load immediate shifted" (lis) to # load r9 = (.LC0@ha << 16), then "load address" (la) to # load r3 = &(r9[.LC0@l]), same as r3 = (r9 + .LC0@l). lis %r9,.LC0@ha la %r3,.LC0@l(%r9) # r3 = .LC0 # The "bl" instruction calls a function; it also sets # the link register (lr) to the address of the next # instruction after "bl" so that puts can return here. bl puts # puts(r3) # Load r3 = 0 so that main returns 0. li %r0,0 # r0 = 0 mr %r3,%r0 # r3 = r0 # Point r11 to the previous stack frame. lwz %r11,0(%r1) # r11 = r1[0] # Restore lr from r11[4]. Restore r31 from r11[-4], # same as r1[28]. lwz %r0,4(%r11) # r0 = r11[4] mtlr %r0 # lr = r0 lwz %r31,-4(%r11) # r31 = r11[-4] # Free the stack frame, then return. mr %r1,%r11 # r1 = r11 blr # return r3 # End of main function. # ELF wants to know the size of the function. The dot # symbol is the current address, now the end of the # function, and the "main" symbol is the start, so we # set the size to dot minus main. .size main, .-main # This is the tag of the gcc from NetBSD 4.0.1; the # assembler will put this string in the object file. .ident "GCC: (GNU) 4.1.2 20061021 prerelease (NetBSD nb3 20061125)" The above code is not a complete, standalone assembly program! It only contains a main function, for linking with the C runtime and the C library. It obeys the ELF and PowerPC conventions for the use of registers. (These conventions require the code to save r31 but not r9.) The _bl puts_ instruction is our evidence that the program calls [[!template id=man name="puts" section="3"]] instead of [[!template id=man name="printf" section="3"]]. The compiler did not optimize the above code. Some optimizations might be obvious! Consider the code that saves argc and argv to the stack. We would can use r1 instead of copying r1 to r11. Going further, we would can delete the code and never save argc and argv, because this main function never uses argc and argv! ## Optimizing the main function Expect a compiler like gcc to write better assembly code than a human programmer who knows assembly language. The best way to optimize the assembly code is to enable some gcc optimization flags. Released software often uses the -O2 flag, so here is a commented copy of greetings.s (from the gcc of NetBSD 4.0.1/macppc) with -O2 in use. # This is a commented version of the optimized assembly output # from cc -O2 -mregnames -S greetings.c .file "greetings.c" # Our string constant is now in a section that would allow an # ELF linker to remove duplicate strings. See the "info as" # documentation for the .section directive. .section .rodata.str1.4,"aMS",@progbits,1 .align 2 .LC0: .string "Greetings, Earth!" # Enter the .text section and declare main, as before. .section ".text" .align 2 .globl main .type main, @function main: # We use registers as before: # r1 = stack pointer, r3 = argc, r4 = argv, # lr = return address, r3 = int return value # Set r0 = lr so that we can save lr later. mflr %r0 # r0 = lr # Allocate only 16 bytes for our stack frame, and # point r1[0] to the previous stack frame. stwu %r1,-16(%r1) # r1[-16] = r1; r1 -= 16 # Save lr in the lr save word at (r1[0])[0] == r1[20], # before calling puts(.LC0). lis %r3,.LC0@ha la %r3,.LC0@l(%r3) # r3 = .LC0 stw %r0,20(%r1) # r1[20] = r0 bl puts # puts(r3) # Restore lr, free stack frame, and return 0. lwz %r0,20(%r1) # r0 = r1[20] li %r3,0 # r3 = 0 addi %r1,%r1,16 # r1 = r1 + 16 mtlr %r0 # lr = r0 blr # return r3 # This main function is smaller than before but ELF # wants to know the size. .size main, .-main .ident "GCC: (GNU) 4.1.2 20061021 prerelease (NetBSD nb3 20061125)" The optimized version of the main function does not use the r9, r11 or r31 registers; and it does not save r31, argc or argv to the stack. The stack frame occupies only 16 bytes, not 32 bytes. The main function barely uses the stack frame, only writing the frame pointer to 0(r1) and never reading anything. The main function must reserve 4 bytes of space at 4(r1) for an lr save word, in case the puts function saves its link register. The frame pointer and lr save word together occupy 8 bytes of stack space. The main function allocates 16 bytes, instead of only 8 bytes, because of a convention that the stack pointer is a multiple of 16. # The relocatable object file Now that we have the assembly code, there are two more steps before we have the final executable. 1. The first step is to run the assembler (as), which translates the assembly code to machine code, and stores the machine code in an ELF relocatable object. 2. The second step is to run the linker (ld), which combines some ELF relocatables into one ELF executable. There are various tools that can examine ELF files. The command [[!template id=man name="nm" section="1"]] lists the global symbols in an object file. The commands [[!template id=man name="objdump" section="1"]] and [[!template id=man name="readelf" section="1"]] show other information. These commands can examine both relocatables and executables. Though the executable is more interesting, the relocatable is simpler. To continue our example, we can run the assembler with _greetings.s_ to produce _greetings.o_. We use the optimized code in _greetings.s_ from _cc -O2 -mregnames -S greetings.c_, because it was shorter. We feed our file _greeting.s_ to /usr/bin/as with a simple command. $ as -o greetings.o greetings.s The output _greetings.o_ is a relocatable object file, and [[!template id=man name="file" section="1"]] confirms this. $ file greetings.o greetings.o: ELF 32-bit MSB relocatable, PowerPC or cisco 4500, version 1 (SYSV) , not stripped ## List of sections The source _greetings.s_ had assembler directives for two sections (_.rodata.str1.4_ and _.text_), so the ELF relocatable _greetings.o_ should contain those two sections. The command _objdump_ can list the sections. $ objdump Usage: objdump Display information from object . At least one of the following switches must be given: ... -h, --[section-]headers Display the contents of the section headers ... $ objdump -h greetings.o greetings.o: file format elf32-powerpc Sections: Idx Name Size VMA LMA File off Algn 0 .text 0000002c 00000000 00000000 00000034 2**2 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 1 .data 00000000 00000000 00000000 00000060 2**0 CONTENTS, ALLOC, LOAD, DATA 2 .bss 00000000 00000000 00000000 00000060 2**0 ALLOC 3 .rodata.str1.4 00000014 00000000 00000000 00000060 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .comment 0000003c 00000000 00000000 00000074 2**0 CONTENTS, READONLY This command verifies the presence of the _.text_ and _.rodata.str1.4_ sections. The _.text_ section begins at file offset 0x34 and has size 0x2c, in bytes. The _.rodata.str1.4_ section begins at file offset 0x60 and has size 0x14. Because the source _greetings.s_ does not have assembler directives for the _.data_ or _.bss_ or _.comment_ section, there must be another explanation for those three sections. The _.data_ and _.bss_ section has size 0x0. Perhaps for traditional reasons, the assembler puts these sections into every object file. Because the source _greeting.s_ never mentioned the _.data_ or _.bss_ section, nor allocated space in them, so the assembler output them as empty sections. (The [[!template id=man name="a.out" section="5"]] format always had text, data and bss segments. The [[!template id=man name="elf" section="5"]] format distinguishes segments and sections, and also allows for arbitrary sections like _.rodata.str1.4_ and _.comment_.) That leaves the mystery of the _.comment_ section. The _objdump_ command accepts _-j_ to select a section and _-s_ to show the contents, so _objdump -j .comment -s greetings.o_ dumps the 0x3c bytes in that section. $ objdump -j .comment -s greetings.o greetings.o: file format elf32-powerpc Contents of section .comment: 0000 00474343 3a202847 4e552920 342e312e .GCC: (GNU) 4.1. 0010 32203230 30363130 32312070 72657265 2 20061021 prere 0020 6c656173 6520284e 65744253 44206e62 lease (NetBSD nb 0030 33203230 30363131 32352900 3 20061125). This is just the string fromm the _.ident_ assembler directive, between a pair of \0 bytes. So whenever gcc generates an _.ident_ directive, the assembler leaves this _.comment_ section to identify the compiler that produced this relocatable. (The "info as" documentation for the _.ident_ directive, shipped with NetBSD 4.0.1, continues to claim that the assembler "does not actually emit anything for it", but in fact the assembler emits this _.comment_ section.) The _objdump_ tool also has a disassembler through its _-d_ option. Disassembly is the reverse process of assembly; it translates machine code to assembly code. We would can disassemble _greetings.o_ but the output would have a few defects, because of symbols that lack their final values. ## Of symbols and addresses Our assembly code in _greetings.s_ had three symbols. The first symbol had the name _.LC0_ and pointed to our string. .LC0: .string "Greetings, Earth!" The second symbol had the name _main_. It was a global symbol that pointed to a function. .globl main .type main, @function main: mflr %r0 ... The third symbol had the name _puts_. Our code used _puts_ in a function call, though it never defined the symbol. bl puts A symbol has a name and an integer value. In assembly code, a symbol acts as a constant inline integer. The very most common use of a symbol is to hold an address, pointing to either a function or a datum. When a symbol appears as an operand to an instruction, the assembler would inline the value of that symbol into the machine code. The problem is that the assembler often does not know the final value of the symbol. So the assembler _as_ saves some information about symbols into the ELF file. The linker _ld_ can use this information to relocate symbols to their final values, resolve undefined symbols and inline the final values into the machine code. The fact that _ld_ relocates symbols is also the reason that _.o_ files are relocatable objects. The _nm_ command shows the names of symbols in an object file. The output of nm shows that _greetings.o_ contains only two symbols. The _.LC0_ symbol is missing. $ nm greetings.o 00000000 T main U puts The _nm_ tool comes from Unix tradition, and remains a great way to check the list of symbols. For each symbol, _nm_ displays the hexadecimal value, a single letter for the type, then the name. The letter 'T' marks symbols that point into a text section, either the _.text_ section or some other arbitrary section of executable code. The letter 'U' marks undefined symbols, which do not have a value. The _nm_ tool claims that symbol _main_ has address 0x00000000, which to be a useless value. The actual meaning is that _main_ points to offset 0x0 within section _.text_. A more detailed view of the symbol table would provide evidence of this. ## Fate of symbols The machine code in _greetings.o_ is incomplete. If the address of the string "Greetings, Earth!" is not zero, then something must fix the instructions at 0x8 and 0xc. To avoid an infinite loop, something must fix the instruction at 0x14 to find the function _puts_. The linker will have the task to edit and finish the machine code. _(Because this part of the wiki page now comes before the part about machine code, this disassembly should probably not be here.)_ 00000000
: 0: (31|00000|01000|02a6) mflr r0 4: (37|00001|00001|fff0) stwu r1,-16(r1) 8: (15|00011|00000|0000) lis r3,0 c: (14|00011|00011|0000) addi r3,r3,0 10: (36|00000|00001|0014) stw r0,20(r1) 14: (18|00000|00000|0001) bl 14 18: (32|00000|00001|0014) lwz r0,20(r1) 1c: (14|00011|00000|0000) li r3,0 20: (14|00001|00001|0010) addi r1,r1,16 24: (31|00000|01000|03a6) mtlr r0 28: (19|10100|00000|0020) blr The above disassembly does not tell that the code at 0x8, 0xc and 0x14 is incomplete (though the infinite loop at 0x14 is a hint). There must be another way to find where the above code uses a symbol that lacks a final value. The ELF relocatable _greetings.o_ bears information about both relocating symbols and resolving undefined symbols; some uses of _objdump_ or _readelf_ can reveal this information. ELF, like any object format, allows for a symbol table. The list of symbols from _nm greetings.o_ is only an incomplete view of this table. $ nm greetings.o 00000000 T main U puts The command _objdump -t_ shows the symbol table in more detail. $ objdump -t greetings.o greetings.o: file format elf32-powerpc SYMBOL TABLE: 00000000 l df *ABS* 00000000 greetings.c 00000000 l d .text 00000000 .text 00000000 l d .data 00000000 .data 00000000 l d .bss 00000000 .bss 00000000 l d .rodata.str1.4 00000000 .rodata.str1.4 00000000 l d .comment 00000000 .comment 00000000 g F .text 0000002c main 00000000 *UND* 00000000 puts The first column is the value of the symbol in hexadecimal. By some coincidence, every symbol in _greetings.o_ has the value zero. The second column gives letter 'l' for a local symbol or letter 'g' for a global symbol. The third column gives the type of symbol, with 'd' for a debugging symbol, lowercase 'f' for a filename or uppercase 'F' for a function symbol. The fourth column gives the section of the symbol, or '*ABS*' for an absolute symbol, or '*UND*' for an undefined symbol. The fifth column gives the size of the symbol in hexadecimal. The sixth column gives the name of the symbol. The filename symbol _greetings.c_ exists because the assembly code _greetings.s_ had a directive _.file greetings.c_. The symbol _main_ has a nonzero size because of the _.size_ directive. Each section of this ELF relocatable has a symbol that points to address 0x0 in the section. Then every section of this relocatable must contain address 0x0. A view of the section headers in _greetings.o_ confirms that every section begins at address 0x0. $ objdump -h greetings.o greetings.o: file format elf32-powerpc Sections: Idx Name Size VMA LMA File off Algn 0 .text 0000002c 00000000 00000000 00000034 2**2 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 1 .data 00000000 00000000 00000000 00000060 2**0 CONTENTS, ALLOC, LOAD, DATA 2 .bss 00000000 00000000 00000000 00000060 2**0 ALLOC 3 .rodata.str1.4 00000014 00000000 00000000 00000060 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .comment 0000003c 00000000 00000000 00000074 2**0 CONTENTS, READONLY The output of _objdump -h_ shows the address of each section in the VMA and LMA columns. (ELF files always use the same address for both VMA and LMA.) All five sections in _greetings.o_ begin at address 0x0. Thus, all four sections marked ALLOC would overlap in memory. The linker will must relocate these four sections so that they do not overlap. The value of each symbol is the address. The section of each symbol serves to disambiguate addresses where sections overlap. Thus symbol _main_ points to address 0x0 in the section _.text_, not any other section. Because every section begins at address 0x0, each address is relative to the beginning of the section. Therefore, symbol _main_ points to offset 0x0 into the section _.text_. TODO: explain "relocation records" $ objdump -r greetings.o greetings.o: file format elf32-powerpc RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 0000000a R_PPC_ADDR16_HA .rodata.str1.4 0000000e R_PPC_ADDR16_LO .rodata.str1.4 00000014 R_PPC_REL24 puts # Disassembly and machine code ## Disassembly GNU binutils provide both assembly and the reverse process, disassembly. While _as_ does assembly, _objdump -d_ does disassembly. Both programs use the same library of opcodes. By default, _objdump -d_ disassembles all executable sections. (The _-j_ option can select a section to disassemble. Our example relocatable _greetings.o_ has only executable section, so _-j .text_ becomes optional.) The disassembler works better with linked executable files. It can also disassemble relocatables like _greetings._. The output will confuse the reader because of undefined symbols, and symbols not relocated to their final values. $ objdump -d greetings.o greetings.o: file format elf32-powerpc Disassembly of section .text: 00000000
: 0: 7c 08 02 a6 mflr r0 4: 94 21 ff f0 stwu r1,-16(r1) 8: 3c 60 00 00 lis r3,0 c: 38 63 00 00 addi r3,r3,0 10: 90 01 00 14 stw r0,20(r1) 14: 48 00 00 01 bl 14 18: 80 01 00 14 lwz r0,20(r1) 1c: 38 60 00 00 li r3,0 20: 38 21 00 10 addi r1,r1,16 24: 7c 08 03 a6 mtlr r0 28: 4e 80 00 20 blr The disassembled code has a slightly different syntax. Every instruction has a label, and each label is the hexadecimal address. The hex after each label is the machine code for that instruction. The syntax is more ambiguous, so register names do not begin with a '%' sign, "r3" can be a register instead of a symbol, and "14" can be a label instead of an immediate value. The size of every PowerPC instruction is four bytes. PowerPC architecture requires this fixed width of four bytes or 32 bits for every instruction. It also requires an alignment of four bytes for every instruction, meaning that the address of every instruction is a multiple of four. The above code meets both requirements. The disassembled code would must resemble the assembly code in _greetings.s_. A comparison shows that every instruction is the same, except for three instructions. * Address 0x8 has _lis r3,0_ instead of _lis %r3,.LC0@ha_. * Address 0xc has _addi r3,r3,0_ instead of _la %r3,.LC0@l(%r3)_. * Address 0x14 has _bl 14 _ instead of _bl puts_. The fate of the symbols _.LC0_ and _puts_ would explain these differences. The linker will inline the correct values of _.LC0_ and _puts_, so that these three instructions have sense. Because we have the source file _greeting.s_, we know about the _.LC0_ and _puts_ symbols. If the reader of _objdump -d greetings.o_ would not know about these symbols, then the three instructions at 0x8, 0xc and 0x14 would seem strange, useless and wrong. * The "load immediate shifted" (lis) instruction shifts the immediate value leftward by 16 bits, then loads the register with the shifted value. So _lis r3,0_ shifts zero leftward by 16 bits, but the shifted value remains zero. The unnecessary shift seems strange, but it is not a problem. * The "add immediate" (addi) instruction does addition, so _addi r3,r3,0_ increments r3 by zero, which effectively does nothing! The instruction seems unnecessary and useless. * The instruction at address 0x14 is _bl 14 _, which branches to label 14, effectively forming an infinite loop because it branches to itself! Something is wrong. If the reader understands that the infinite loop is actually a branch to function _puts_, then the function call still seems wrong, because _puts_ uses the argument in r3, and the code loads r3 with zero. Therefore the function call might be _puts(NULL)_ or _puts(&main)_. Zero is the null pointer, and also the address of the main function, but function _puts_ wants the address of some string. Before the linker relocates some things away from zero, now zero has two or three redundant meanings. The reader cannot follow the code. In the build of a large C program, there are many _.c_ and _.o_ files but no _.s_ files. (NetBSD and pkgsrc both provide many examples of this.) If someone did _objdump -d_ to read a _.o_ file, then the reader be unable to follow and understand the disassembly, because of undefined symbols, and symbols not relocated to their final values, and overlapping addresses with redundant meanings. A better understanding of how symbols fit into machine code would help. ## Machine code in parts The output of _objdump -d_ has the machine code in hexadecimal. This allows the reader to identify individual bytes. This is good with architectures that organize opcodes and operands into bytes. PowerPC packs the opcodes and operands more tightly into bits. Each instruction has 32 bits. The first 6 bits (at the big end) have the opcode. In a typical instruction, the next 5 bits pick the first register, the next 5 bits pick the second register, and the remaining 16 bits hold an immediate value. A filter program that takes the hexadecimal machine code from _objdump -d_ and splits each instruction into these four parts would be helpful. One can write the filter program using a scripting language that provides both regular expressions and bit-shifting operations. Perl (available in [lang/perl5](http://pkgsrc.se/lang/perl5#main)) is such a language. Here follows _machine.pl_, such a script. #!/usr/bin/env perl # usage: objdump -d ... | perl machine.pl # # The output of objdump -d shows the machine code in hexadecimal. This # script converts the machine code to a format that shows the parts of a # typical PowerPC instruction such as "addi". # # The format is (opcode|register-1|register-2|immediate-value), # with digits in (decimal|binary|binary|hexadecimal). use strict; use warnings; my $byte = "[0-9a-f][0-9a-f]"; my $word = "$byte $byte $byte $byte"; while (defined(my $line = )) { chomp $line; if ($line =~ m/^([^:]*:\s*)($word)(.*)$/) { my ($before, $code, $after) = ($1, $2, $3); $code =~ s/ //g; $code = hex($code); my $opcode = $code >> (32-6); # first 6 bits my $reg1 = ($code >> (32-11)) & 0x1f; # next 5 bits my $reg2 = ($code >> (32-16)) & 0x1f; # next 5 bits my $imm = $code & 0xffff; # last 16 bits $line = sprintf("%s(%2d|%05b|%05b|%04x)%s", $before, $opcode, $reg1, $reg2, $imm, $after); } print "$line\n"; } Here follows the disassembly of _greetings.o_, with the machine code in parts. $ objdump -d greetings.o | perl machine.pl greetings.o: file format elf32-powerpc Disassembly of section .text: 00000000
: 0: (31|00000|01000|02a6) mflr r0 4: (37|00001|00001|fff0) stwu r1,-16(r1) 8: (15|00011|00000|0000) lis r3,0 c: (14|00011|00011|0000) addi r3,r3,0 10: (36|00000|00001|0014) stw r0,20(r1) 14: (18|00000|00000|0001) bl 14 18: (32|00000|00001|0014) lwz r0,20(r1) 1c: (14|00011|00000|0000) li r3,0 20: (14|00001|00001|0010) addi r1,r1,16 24: (31|00000|01000|03a6) mtlr r0 28: (19|10100|00000|0020) blr The disassembly now shows the machine code with the opcode in decimal, then the next 5 bits in binary, then another 5 bits in binary, then the remaining 16 bits in hexadecimal. The "load word and zero" (lwz) instruction given _lwz X,N(Y)_ would load register X with a value from memory. It indexes memory using register Y as a pointer and value N as an offset in bytes. Thus the memory location is N bytes after where register Y points. The mnemonic _lwz_ uses opcode 32. The next 5 bits hold the register number for X. Another 5 bits hold the register number for Y. The remaining 16 bits hold the offset value N. Given _lwz r0,20(r1)_ then r0 is 00000 in binary, r1 is 00001 in binary, 20 is 0x14 in hexadecimal, so the filter script would write _(32|00000|00001|0014)_. It becomes apparent that "store word" (stw) uses opcode 36, while "store word with update" (stwu) uses opcode 37. Given _stw r0,20(r1)_ then the filter script would write _(36|00000|00001|0014)_. Given _stwu r1,-16(r1)_ then -16 is 0xfff0 in hexadecimal 2s complement, so the filter script would write _(37|00001|00001|fff0)_. The "add immediate" (addi) instruction given _addi X,Y,Z_ would load register X with the sum of register Y and immediate value Z. The mnemonic _addi_ uses opcode 14. The next 5 bits hold the register number for X. Another 5 bits hold the register number for Y. The remaining 16 bits hold the immediate value Z. Given _addi r3,r3,0_ then r3 is 00011 in binary, so the filter script would write _(14|00011|00011|0000)_. Given _addi r1,r1,16_, then r1 is 00001 in binary, 16 is 0x10 in hexadecimal, so the filter script would write _(14|00001|00001|0010)_. The addi instruction has one more quirk. The second operand Y is either the immediate value 0, or a register number 1 through 31. (Some other instructions have this same quirk and cannot read register 0.) Thus _addi 4,0,5_ would actually do _r4 = 0 + 5_ instead of _r4 = r0 + 5_, as if register 0 would contain value 0 instead of the value in register 0. This quirk allows the "load immediate" (li) mnemonic, which also uses opcode 14, to load an immediate value by adding it to zero. So _li r3,0_ is the same as _addi 3,0,0_ which becomes _(14|00011|00000|0000)_. The "load address" (la) instruction given _la X,N(Y)_ would load register X with the address of N bytes after where register Y points. This is the same as to add register Y to immediate value N, so _la X,N(Y)_ and _addi X,Y,N_ are the same, and both use opcode 14. When machine code contains opcode 14, then the disassembler tries to be smart about choosing an instruction mnemonic. Here follows a quick example. $ cat quick-example.s .section .text addi 4,0,5 # bad la 3,3(0) # very bad la 3,0(3) la 5,2500(3) $ as -o quick-example.o quick-example.s $ objdump -d quick-example.o | perl machine.pl quick-example.o: file format elf32-powerpc Disassembly of section .text: 00000000 <.text>: 0: (14|00100|00000|0005) li r4,5 4: (14|00011|00000|0003) li r3,3 8: (14|00011|00011|0000) addi r3,r3,0 c: (14|00101|00011|09c4) addi r5,r3,2500 If the second register operand to opcode 14 is 00000, then the machine code looks like an instruction "li", so the disassembler uses the mnemonic "li". Otherwise the disassembler prefers mnemonic "addi" to "la". ## Opcodes more strange The filter script shows the four parts of a typical instruction, but not all instructions have those four parts. The instructions that do branching or access special registers are not typical instructions. Here again is the disassembly of the main function in _greetings.o_: 00000000
: 0: (31|00000|01000|02a6) mflr r0 4: (37|00001|00001|fff0) stwu r1,-16(r1) 8: (15|00011|00000|0000) lis r3,0 c: (14|00011|00011|0000) addi r3,r3,0 10: (36|00000|00001|0014) stw r0,20(r1) 14: (18|00000|00000|0001) bl 14 18: (32|00000|00001|0014) lwz r0,20(r1) 1c: (14|00011|00000|0000) li r3,0 20: (14|00001|00001|0010) addi r1,r1,16 24: (31|00000|01000|03a6) mtlr r0 28: (19|10100|00000|0020) blr Assembly code uses "branch and link" (bl) to call functions and "branch to link register" (blr) to return from functions. * The instruction _bl_ branches to the address of a function, and stores the return address in the link register. * The instruction _blr_ branches to the address in the link register. * The instructions "move from link register" (mflr) and "move to link register" (mtlr) access the link register, so that a function may save its return address while it uses _bl_ to call other functions. Every processor has a program counter (ctr) to hold the address of the current instruction. The branch instructions change the program counter. PowerPC uses a link register (lr) instead of a general purpose register (any of r0 through r31) or a memory location to hold the return address, seemingly to separate the branch processing unit (bpu) from the units that access general purpose registers or memory. The instruction _bl_ takes one operand, an immediate value for the address. There is no way to fit an opcode of 6 bits and an address of 32 bits into an instruction of only 32 bits. So _bl_ has only 26 bits for an operand. Thus _bl_ actually takes a 26-bit relative address and increments the program counter. This provides a range of about 32 MB in either direction. The assembler converts the operand to a relative address when it assembles the instruction _bl_. The instruction _bl puts_ would cause the assembler to convert the value of symbol _puts_ to a relative address; but _puts_ was an undefined symbol, so the assembler output a meaningless relative address of zero. An instruction _bl_ with relative address zero would branch to itself for an infinite loop. (The address for PowerPC is relative to the branch instruction. The address for some other architectures would be relative to the next instruction after the branch.) The address of every PowerPC instruction is a multiple of 4, thus the relative branch can also ignore the low 2 bits of the 26-bit relative address. Thus PowerPC uses those 2 bits to select the type of branch. Instruction _bl_ uses opcode 18 and sets the lower 2 bits to 0x1. Mnemonic _bl_ shares opcode 18 with three other mnemonics that set the lower 2 bits in other way. Given _bl puts_ the assembler began with opcode 18, ended with 0x1, and filled the intermediate 24 bits with zeros, so the filter script would write _(18|00000|00000|0001)_. A better filter script would write opcode 18 in decimal, the 26-bit relative address in hexadecimal, and the low 2 bits in binary. Opcode 19 is for many types of branches; the lower 26 bits somehow specify the type of branch. Mnemonic _blr_ shares opcode 19 with many other mnemonics. Opcode 31 is for operations with special purpose registers; the lower 26 bits somehow pick a special register and an action. Mnemonics _mflr_ and _mtlr_ share opcode 31 with many other mnemonics, including the more general "move from special purpose register" (mfspr) and "move to special purpose register" (mtspr). The instructions _blr', _mflr_ and _mtlr_ do not involve any symbols, so the lower 26 bits already have their final values._ The source file [/usr/src/gnu/dist/binutils/opcodes/ppc-opc.c](http://cvsweb.de.netbsd.org/cgi-bin/cvsweb.cgi/src/gnu/dist/binutils/opcodes/ppc-opc.c?rev=HEAD) contains a table of _powerpc_opcodes_ that lists the various mnemonics that use opcodes 18, 19 and 31.