Contents

  1. Introduction
  2. A very simple C program
  3. The program in PowerPC assembly language
    1. Commented copy of greetings.s
    2. Optimizing the main function
  4. The relocatable object file
    1. List of sections
    2. Of symbols and addresses
    3. Fate of symbols
  5. Disassembly and machine code
    1. Disassembly
    2. Machine code in parts
    3. Opcodes more strange

Introduction

This wiki page examines the transformation of a very simple C program into a running ELF executable containing PowerPC machine code. A NetBSD system (NetBSD 4.0.1/macppc in these examples) and its toolchain (gcc and GNU binutils) perform several steps in this transformation.

  1. gcc translates our C code to assembly code.
  2. gcc calls GNU as to translate the assembly code to machine code in an ELF relocatable object.
  3. gcc calls GNU ld to link our relocatable object with the C runtime and the C library to form an ELF executable object.
  4. NetBSD kernel loads ld.elf_so, which loads our ELF executable and the C library (an ELF shared object) to run our program.

So far, this wiki page examines only the first two steps.

A very simple C program

This program is only one C file, which contains only one main function, which calls printf(3) to print a single message, then returns 0 as the exit status.

#include <stdio.h>

int
main(int argc, char *argv[])
{
    printf("%s", "Greetings, Earth!\n");
    return 0;
}

The C compiler gcc likes to use its knowledge of builtin functions to manipulate code. The version of gcc in NetBSD 4.0.1/macppc will simplify the printf statement to puts("Greeting, Earth!"); so the main function effectively calls puts(3) once and then returns 0.

We can apply gcc(1) in the usual way to compile this program. (With NetBSD, cc or gcc invokes the same command, so we use either name.) Then we can run our program:

$ cc -o greetings greetings.c $ ./greetings Greetings, Earth! $

We can apply gcc with the -v option to see some extra information. (Unlike most other commands, gcc does not allow combined options. Instead of gcc -vo, we must type gcc -v -o.) The gcc driver program actually runs three other commands. Here is the output from one run using my NetBSD 4.0.1 system. I have put the three commands in bold.

$ cc -v -o greetings greetings.c
Using built-in specs.
Target: powerpc--netbsd
Configured with: /usr/src/tools/gcc/../../gnu/dist/gcc4/configure --enable-long-
long --disable-multilib --enable-threads --disable-symvers --build=i386-unknown-
netbsdelf4.99.3 --host=powerpc--netbsd --target=powerpc--netbsd
Thread model: posix
gcc version 4.1.2 20061021 prerelease (NetBSD nb3 20061125)
 **/usr/libexec/cc1 -quiet -v greetings.c -quiet -dumpbase greetings.c -auxbase gr**
**eetings -version -o /var/tmp//ccVB1DcZ.s**
#include "..." search starts here:
#include <...> search starts here:
 /usr/include
End of search list.
GNU C version 4.1.2 20061021 prerelease (NetBSD nb3 20061125) (powerpc--netbsd)
    compiled by GNU C version 4.1.2 20061021 (prerelease) (NetBSD nb3 200611
25).
GGC heuristics: --param ggc-min-expand=38 --param ggc-min-heapsize=77491
Compiler executable checksum: 325f59dbd937debe20281bd6a60a4aef
 **as -mppc -many -V -Qy -o /var/tmp//ccMiXutV.o /var/tmp//ccVB1DcZ.s**
GNU assembler version 2.16.1 (powerpc--netbsd) using BFD version 2.16.1
 **ld --eh-frame-hdr -dc -dp -e _start -dynamic-linker /usr/libexec/ld.elf_so -o g**
**reetings /usr/lib/crt0.o /usr/lib/crti.o /usr/lib/crtbegin.o /var/tmp//ccMiXutV.**
**o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/crtend.o /usr/lib/crtn.o**

The first command, /usr/libexec/cc1, is internal to gcc and is not for our direct use. The other two commands, as and ld, are external to gcc. We would use as and ld without gcc, if we would want so.

The first command, /usr/libexec/cc1, is the C compiler proper; it compiles C code and outputs assembly code, in a file with the .s suffix. In the above example, it created the assembly version of our main function. The second command, as, assembles the .s file to machine code, in a relocatable object file with the .o suffix. It created the machine code version of our main function. The third command, ld, links object files into an executable file. It combined our main function with the C runtime and the C library to create our program of greetings.

The .s assembly file and the .o object file were temporary files, so the gcc driver program deleted them. We only keep the final executable of greetings.

The program in PowerPC assembly language

The manual page for gcc(1) explains that we can use the -S option to stop gcc with the assembly code. For PowerPC targets, gcc outputs register numbers by default; the -mregnames option tells gcc to output register names instead. If you are learning assembly language, then cc -mregnames -S is a good way to produce examples of assembly code.

The command cc -mregnames -S greetings.c produces the output file greetings.s which contains the assembly version of our main function. (If you want greeting.s to contain PowerPC assembly code, then you need to use compiler that targets PowerPC.) The assembly syntax allows for comments, assembler directives, instructions and labels.

The PowerPC processor executes instructions. Most PowerPC instructions operate on the registers inside the processor. There are other instructions that load registers from memory or store registers to memory. Each of the general purpose registers (named r0 through r31) and the link register (named lr) hold a 32-bit integer.

Assembly code may contain the register numbers (0 through 31) or the register names. Register numbers become confusing, when a 3 in the code might refer to the general purpose register r3, the floating point register f3, or the immediate value 3. The cc -mregnames flag uses the assembly syntax for register names, which is to put a '%' sign before each name, as in %r3 or %f3. This is necessary to distinguish register %r3 from a symbol named r3.

Commented copy of greetings.s

Here is a copy of greeting.s (from the gcc of NetBSD 4.0.1/macppc) with added comments. Each instruction has a comment in pseudo-C to show the effect, if you know C language. Pretend that the registers are (char ) for indexing, but int or (int ) for assignment.

# This is a commented version of greeting.s, the 32-bit PowerPC
# assembly code output from cc -mregnames -S greetings.c

    # .file takes the name of the original source file,
    # because this was a generated file. I guess that this
    # allows error messages or debuggers to blame the
    # original source file.
    .file   "greetings.c"

# Enter the .rodata section for read-only data. String constants
# belong in this section.
    .section    .rodata

    # For PowerPC, .align takes an exponent of 2.
    # So .align 2 gives an alignment of 4 bytes, so that
    # the current address is a multiple of 4.
    .align 2

    # .string inserts a C string, and the assembler provides
    # the terminating \0 byte. The label sets the symbol
    # .LC0 to the address of the string.
.LC0:
    .string "Greetings, Earth!"

# Enter the .text section for program text, which is the
# executable part.
    .section    ".text"

    # We need an alignment of 4 bytes for the following
    # PowerPC processor instructions.
    .align 2

    # We need to export main as a global symbol so that the
    # linker will see it. ELF wants to know that main is a
    # @function symbol, not an @object symbol.
    .globl main
    .type   main, @function
main:
    # The code for the main function begins here.
    # Passed in general purpose registers:
    #   r1 = stack pointer, r3 = argc, r4 = argv
    # Passed in link register:
    #   lr = return address
    # The int return value goes in r3.

    # Allocate 32 bytes for our the stack frame. Use the
    # atomic instruction "store word with update" (stwu) so
    # that r1[0] always points to the previous stack frame.
    stwu %r1,-32(%r1)   # r1[-32] = r1; r1 -= 32

    # Save registers r31 and lr to the stack. We need to
    # save r31 because it is a nonvolatile register, and to
    # save lr before any function calls. Now r31 belongs in
    # the register save area at the top of our stack frame,
    # but lr belongs in the previous stack frame, in the
    # lr save word at (r1[0])[0] == r1[36].
    mflr %r0        # r0 = lr
    stw %r31,28(%r1)    # r1[28] = r31
    stw %r0,36(%r1)     # r1[36] = r0

    # Save argc, argv to the stack.
    mr %r31,%r1     # r31 = r1
    stw %r3,8(%r31)     # r31[8] = r3 /* argc */
    stw %r4,12(%r31)    # r31[12] = r4 /* argv */

    # Call puts(.LC0). First we need to load r3 = .LC0, but
    # each instruction can load only 16 bits.
    #   .LC0@ha = (.LC0 >> 16) & 0xff
    #   .LC0@l = .LC0 & 0xff
    # This method uses "load immediate shifted" (lis) to
    # load r9 = (.LC0@ha << 16), then "load address" (la) to
    # load r3 = &(r9[.LC0@l]), same as r3 = (r9 + .LC0@l).
    lis %r9,.LC0@ha
    la %r3,.LC0@l(%r9)  # r3 = .LC0

    # The "bl" instruction calls a function; it also sets
    # the link register (lr) to the address of the next
    # instruction after "bl" so that puts can return here.
    bl puts         # puts(r3)

    # Load r3 = 0 so that main returns 0.
    li %r0,0        # r0 = 0
    mr %r3,%r0      # r3 = r0

    # Point r11 to the previous stack frame.
    lwz %r11,0(%r1)     # r11 = r1[0]

    # Restore lr from r11[4]. Restore r31 from r11[-4],
    # same as r1[28].
    lwz %r0,4(%r11)     # r0 = r11[4]
    mtlr %r0        # lr = r0
    lwz %r31,-4(%r11)   # r31 = r11[-4]

    # Free the stack frame, then return.
    mr %r1,%r11     # r1 = r11
    blr         # return r3
    # End of main function.

    # ELF wants to know the size of the function. The dot
    # symbol is the current address, now the end of the
    # function, and the "main" symbol is the start, so we
    # set the size to dot minus main.
    .size   main, .-main

    # This is the tag of the gcc from NetBSD 4.0.1; the
    # assembler will put this string in the object file.
    .ident  "GCC: (GNU) 4.1.2 20061021 prerelease (NetBSD nb3 20061125)"

The above code is not a complete, standalone assembly program! It only contains a main function, for linking with the C runtime and the C library. It obeys the ELF and PowerPC conventions for the use of registers. (These conventions require the code to save r31 but not r9.) The bl puts instruction is our evidence that the program calls puts(3) instead of printf(3).

The compiler did not optimize the above code. Some optimizations might be obvious! Consider the code that saves argc and argv to the stack. We would can use r1 instead of copying r1 to r11. Going further, we would can delete the code and never save argc and argv, because this main function never uses argc and argv!

Optimizing the main function

Expect a compiler like gcc to write better assembly code than a human programmer who knows assembly language. The best way to optimize the assembly code is to enable some gcc optimization flags.

Released software often uses the -O2 flag, so here is a commented copy of greetings.s (from the gcc of NetBSD 4.0.1/macppc) with -O2 in use.

# This is a commented version of the optimized assembly output
# from cc -O2 -mregnames -S greetings.c

    .file   "greetings.c"

# Our string constant is now in a section that would allow an
# ELF linker to remove duplicate strings. See the "info as"
# documentation for the .section directive.
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align 2
.LC0:
    .string "Greetings, Earth!"

# Enter the .text section and declare main, as before.
    .section    ".text"
    .align 2
    .globl main
    .type   main, @function
main:
    # We use registers as before:
    #   r1 = stack pointer, r3 = argc, r4 = argv,
    #   lr = return address, r3 = int return value

    # Set r0 = lr so that we can save lr later.
    mflr %r0        # r0 = lr

    # Allocate only 16 bytes for our stack frame, and
    # point r1[0] to the previous stack frame.
    stwu %r1,-16(%r1)   # r1[-16] = r1; r1 -= 16

    # Save lr in the lr save word at (r1[0])[0] == r1[20],
    # before calling puts(.LC0).
    lis %r3,.LC0@ha
    la %r3,.LC0@l(%r3)  # r3 = .LC0
    stw %r0,20(%r1)     # r1[20] = r0
    bl puts         # puts(r3)

    # Restore lr, free stack frame, and return 0.
    lwz %r0,20(%r1)     # r0 = r1[20]
    li %r3,0        # r3 = 0
    addi %r1,%r1,16     # r1 = r1 + 16
    mtlr %r0        # lr = r0
    blr         # return r3

    # This main function is smaller than before but ELF
    # wants to know the size.
    .size   main, .-main
    .ident  "GCC: (GNU) 4.1.2 20061021 prerelease (NetBSD nb3 20061125)"

The optimized version of the main function does not use the r9, r11 or r31 registers; and it does not save r31, argc or argv to the stack. The stack frame occupies only 16 bytes, not 32 bytes.

The main function barely uses the stack frame, only writing the frame pointer to 0(r1) and never reading anything. The main function must reserve 4 bytes of space at 4(r1) for an lr save word, in case the puts function saves its link register. The frame pointer and lr save word together occupy 8 bytes of stack space. The main function allocates 16 bytes, instead of only 8 bytes, because of a convention that the stack pointer is a multiple of 16.

The relocatable object file

Now that we have the assembly code, there are two more steps before we have the final executable.

  1. The first step is to run the assembler (as), which translates the assembly code to machine code, and stores the machine code in an ELF relocatable object.
  2. The second step is to run the linker (ld), which combines some ELF relocatables into one ELF executable.

There are various tools that can examine ELF files. The command nm(1) lists the global symbols in an object file. The commands objdump(1) and readelf(1) show other information. These commands can examine both relocatables and executables. Though the executable is more interesting, the relocatable is simpler.

To continue our example, we can run the assembler with greetings.s to produce greetings.o. We use the optimized code in greetings.s from cc -O2 -mregnames -S greetings.c, because it was shorter. We feed our file greeting.s to /usr/bin/as with a simple command.

$ as -o greetings.o greetings.s

The output greetings.o is a relocatable object file, and file(1) confirms this.

$ file greetings.o greetings.o: ELF 32-bit MSB relocatable, PowerPC or cisco 4500, version 1 (SYSV) , not stripped

List of sections

The source greetings.s had assembler directives for two sections (.rodata.str1.4 and .text), so the ELF relocatable greetings.o should contain those two sections. The command objdump can list the sections.

$ objdump
Usage: objdump <option(s)> <file(s)>
 Display information from object <file(s)>.
 At least one of the following switches must be given:
 ...
 -h, --[section-]headers  Display the contents of the section headers
 ...
$ objdump -h greetings.o

greetings.o:     file format elf32-powerpc

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000002c  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000060  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000060  2**0
                  ALLOC
  3 .rodata.str1.4 00000014  00000000  00000000  00000060  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .comment      0000003c  00000000  00000000  00000074  2**0
                  CONTENTS, READONLY

This command verifies the presence of the .text and .rodata.str1.4 sections. The .text section begins at file offset 0x34 and has size 0x2c, in bytes. The .rodata.str1.4 section begins at file offset 0x60 and has size 0x14.

Because the source greetings.s does not have assembler directives for the .data or .bss or .comment section, there must be another explanation for those three sections. The .data and .bss section has size 0x0. Perhaps for traditional reasons, the assembler puts these sections into every object file. Because the source greeting.s never mentioned the .data or .bss section, nor allocated space in them, so the assembler output them as empty sections. (The a.out(5) format always had text, data and bss segments. The elf(5) format distinguishes segments and sections, and also allows for arbitrary sections like .rodata.str1.4 and .comment.)

That leaves the mystery of the .comment section. The objdump command accepts -j to select a section and -s to show the contents, so objdump -j .comment -s greetings.o dumps the 0x3c bytes in that section.

$ objdump -j .comment -s greetings.o

greetings.o:     file format elf32-powerpc

Contents of section .comment:
 0000 00474343 3a202847 4e552920 342e312e  .GCC: (GNU) 4.1.
 0010 32203230 30363130 32312070 72657265  2 20061021 prere
 0020 6c656173 6520284e 65744253 44206e62  lease (NetBSD nb
 0030 33203230 30363131 32352900           3 20061125).    

This is just the string fromm the .ident assembler directive, between a pair of \0 bytes. So whenever gcc generates an .ident directive, the assembler leaves this .comment section to identify the compiler that produced this relocatable. (The "info as" documentation for the .ident directive, shipped with NetBSD 4.0.1, continues to claim that the assembler "does not actually emit anything for it", but in fact the assembler emits this .comment section.)

The objdump tool also has a disassembler through its -d option. Disassembly is the reverse process of assembly; it translates machine code to assembly code. We would can disassemble greetings.o but the output would have a few defects, because of symbols that lack their final values.

Of symbols and addresses

Our assembly code in greetings.s had three symbols. The first symbol had the name .LC0 and pointed to our string.

.LC0:
    .string "Greetings, Earth!"

The second symbol had the name main. It was a global symbol that pointed to a function.

    .globl main
    .type   main, @function
main:
    mflr %r0
    ...

The third symbol had the name puts. Our code used puts in a function call, though it never defined the symbol.

    bl puts

A symbol has a name and an integer value. In assembly code, a symbol acts as a constant inline integer. The very most common use of a symbol is to hold an address, pointing to either a function or a datum. When a symbol appears as an operand to an instruction, the assembler would inline the value of that symbol into the machine code. The problem is that the assembler often does not know the final value of the symbol. So the assembler as saves some information about symbols into the ELF file. The linker ld can use this information to relocate symbols to their final values, resolve undefined symbols and inline the final values into the machine code. The fact that ld relocates symbols is also the reason that .o files are relocatable objects.

The nm command shows the names of symbols in an object file. The output of nm shows that greetings.o contains only two symbols. The .LC0 symbol is missing.

$ nm greetings.o
00000000 T main
         U puts

The nm tool comes from Unix tradition, and remains a great way to check the list of symbols. For each symbol, nm displays the hexadecimal value, a single letter for the type, then the name. The letter 'T' marks symbols that point into a text section, either the .text section or some other arbitrary section of executable code. The letter 'U' marks undefined symbols, which do not have a value.

The nm tool claims that symbol main has address 0x00000000, which to be a useless value. The actual meaning is that main points to offset 0x0 within section .text. A more detailed view of the symbol table would provide evidence of this.

Fate of symbols

The machine code in greetings.o is incomplete. If the address of the string "Greetings, Earth!" is not zero, then something must fix the instructions at 0x8 and 0xc. To avoid an infinite loop, something must fix the instruction at 0x14 to find the function puts. The linker will have the task to edit and finish the machine code.

(Because this part of the wiki page now comes before the part about machine code, this disassembly should probably not be here.)

00000000 <main>:
   0:   (31|00000|01000|02a6)   mflr    r0
   4:   (37|00001|00001|fff0)   stwu    r1,-16(r1)
   8:   (15|00011|00000|0000)   lis     r3,0
   c:   (14|00011|00011|0000)   addi    r3,r3,0
  10:   (36|00000|00001|0014)   stw     r0,20(r1)
  14:   (18|00000|00000|0001)   bl      14 <main+0x14>
  18:   (32|00000|00001|0014)   lwz     r0,20(r1)
  1c:   (14|00011|00000|0000)   li      r3,0
  20:   (14|00001|00001|0010)   addi    r1,r1,16
  24:   (31|00000|01000|03a6)   mtlr    r0
  28:   (19|10100|00000|0020)   blr

The above disassembly does not tell that the code at 0x8, 0xc and 0x14 is incomplete (though the infinite loop at 0x14 is a hint). There must be another way to find where the above code uses a symbol that lacks a final value. The ELF relocatable greetings.o bears information about both relocating symbols and resolving undefined symbols; some uses of objdump or readelf can reveal this information.

ELF, like any object format, allows for a symbol table. The list of symbols from nm greetings.o is only an incomplete view of this table.

$ nm greetings.o
00000000 T main
         U puts

The command objdump -t shows the symbol table in more detail.

$ objdump -t greetings.o

greetings.o:     file format elf32-powerpc

SYMBOL TABLE:
00000000 l    df *ABS*  00000000 greetings.c
00000000 l    d  .text  00000000 .text
00000000 l    d  .data  00000000 .data
00000000 l    d  .bss   00000000 .bss
00000000 l    d  .rodata.str1.4 00000000 .rodata.str1.4
00000000 l    d  .comment   00000000 .comment
00000000 g     F .text  0000002c main
00000000         *UND*  00000000 puts

The first column is the value of the symbol in hexadecimal. By some coincidence, every symbol in greetings.o has the value zero. The second column gives letter 'l' for a local symbol or letter 'g' for a global symbol. The third column gives the type of symbol, with 'd' for a debugging symbol, lowercase 'f' for a filename or uppercase 'F' for a function symbol. The fourth column gives the section of the symbol, or 'ABS' for an absolute symbol, or 'UND' for an undefined symbol. The fifth column gives the size of the symbol in hexadecimal. The sixth column gives the name of the symbol.

The filename symbol greetings.c exists because the assembly code greetings.s had a directive .file greetings.c. The symbol main has a nonzero size because of the .size directive.

Each section of this ELF relocatable has a symbol that points to address 0x0 in the section. Then every section of this relocatable must contain address 0x0. A view of the section headers in greetings.o confirms that every section begins at address 0x0.

$ objdump -h greetings.o

greetings.o:     file format elf32-powerpc

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000002c  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000060  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000060  2**0
                  ALLOC
  3 .rodata.str1.4 00000014  00000000  00000000  00000060  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .comment      0000003c  00000000  00000000  00000074  2**0
                  CONTENTS, READONLY

The output of objdump -h shows the address of each section in the VMA and LMA columns. (ELF files always use the same address for both VMA and LMA.) All five sections in greetings.o begin at address 0x0. Thus, all four sections marked ALLOC would overlap in memory. The linker will must relocate these four sections so that they do not overlap.

The value of each symbol is the address. The section of each symbol serves to disambiguate addresses where sections overlap. Thus symbol main points to address 0x0 in the section .text, not any other section. Because every section begins at address 0x0, each address is relative to the beginning of the section. Therefore, symbol main points to offset 0x0 into the section .text.

TODO: explain "relocation records"

$ objdump -r greetings.o

greetings.o:     file format elf32-powerpc

RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE 
0000000a R_PPC_ADDR16_HA   .rodata.str1.4
0000000e R_PPC_ADDR16_LO   .rodata.str1.4
00000014 R_PPC_REL24       puts

Disassembly and machine code

Disassembly

GNU binutils provide both assembly and the reverse process, disassembly. While as does assembly, objdump -d does disassembly. Both programs use the same library of opcodes.

By default, objdump -d disassembles all executable sections. (The -j option can select a section to disassemble. Our example relocatable greetings.o has only executable section, so -j .text becomes optional.) The disassembler works better with linked executable files. It can also disassemble relocatables like greetings.. The output will confuse the reader because of undefined symbols, and symbols not relocated to their final values.

$ objdump -d greetings.o

greetings.o:     file format elf32-powerpc

Disassembly of section .text:

00000000 <main>:
   0:   7c 08 02 a6     mflr    r0
   4:   94 21 ff f0     stwu    r1,-16(r1)
   8:   3c 60 00 00     lis     r3,0
   c:   38 63 00 00     addi    r3,r3,0
  10:   90 01 00 14     stw     r0,20(r1)
  14:   48 00 00 01     bl      14 <main+0x14>
  18:   80 01 00 14     lwz     r0,20(r1)
  1c:   38 60 00 00     li      r3,0
  20:   38 21 00 10     addi    r1,r1,16
  24:   7c 08 03 a6     mtlr    r0
  28:   4e 80 00 20     blr

The disassembled code has a slightly different syntax. Every instruction has a label, and each label is the hexadecimal address. The hex after each label is the machine code for that instruction. The syntax is more ambiguous, so register names do not begin with a '%' sign, "r3" can be a register instead of a symbol, and "14" can be a label instead of an immediate value.

The size of every PowerPC instruction is four bytes. PowerPC architecture requires this fixed width of four bytes or 32 bits for every instruction. It also requires an alignment of four bytes for every instruction, meaning that the address of every instruction is a multiple of four. The above code meets both requirements.

The disassembled code would must resemble the assembly code in greetings.s. A comparison shows that every instruction is the same, except for three instructions.

The fate of the symbols .LC0 and puts would explain these differences. The linker will inline the correct values of .LC0 and puts, so that these three instructions have sense. Because we have the source file greeting.s, we know about the .LC0 and puts symbols.

If the reader of objdump -d greetings.o would not know about these symbols, then the three instructions at 0x8, 0xc and 0x14 would seem strange, useless and wrong.

If the reader understands that the infinite loop is actually a branch to function puts, then the function call still seems wrong, because puts uses the argument in r3, and the code loads r3 with zero. Therefore the function call might be puts(NULL) or puts(&main). Zero is the null pointer, and also the address of the main function, but function puts wants the address of some string. Before the linker relocates some things away from zero, now zero has two or three redundant meanings. The reader cannot follow the code.

In the build of a large C program, there are many .c and .o files but no .s files. (NetBSD and pkgsrc both provide many examples of this.) If someone did objdump -d to read a .o file, then the reader be unable to follow and understand the disassembly, because of undefined symbols, and symbols not relocated to their final values, and overlapping addresses with redundant meanings.

A better understanding of how symbols fit into machine code would help.

Machine code in parts

The output of objdump -d has the machine code in hexadecimal. This allows the reader to identify individual bytes. This is good with architectures that organize opcodes and operands into bytes.

PowerPC packs the opcodes and operands more tightly into bits. Each instruction has 32 bits. The first 6 bits (at the big end) have the opcode. In a typical instruction, the next 5 bits pick the first register, the next 5 bits pick the second register, and the remaining 16 bits hold an immediate value. A filter program that takes the hexadecimal machine code from objdump -d and splits each instruction into these four parts would be helpful.

One can write the filter program using a scripting language that provides both regular expressions and bit-shifting operations. Perl (available in lang/perl5) is such a language. Here follows machine.pl, such a script.

#!/usr/bin/env perl

# usage: objdump -d ... | perl machine.pl
#
# The output of objdump -d shows the machine code in hexadecimal. This
# script converts the machine code to a format that shows the parts of a
# typical PowerPC instruction such as "addi".
#
# The format is (opcode|register-1|register-2|immediate-value),
# with digits in (decimal|binary|binary|hexadecimal).

use strict;
use warnings;

my $byte = "[0-9a-f][0-9a-f]";
my $word = "$byte $byte $byte $byte";

while (defined(my $line = <ARGV>)) {
    chomp $line;
    if ($line =~ m/^([^:]*:\s*)($word)(.*)$/) {
        my ($before, $code, $after) = ($1, $2, $3);
        $code =~ s/ //g;
        $code = hex($code);

        my $opcode = $code >> (32-6);       # first 6 bits
        my $reg1 = ($code >> (32-11)) & 0x1f;   # next 5 bits
        my $reg2 = ($code >> (32-16)) & 0x1f;   # next 5 bits
        my $imm = $code & 0xffff;       # last 16 bits

        $line = sprintf("%s(%2d|%05b|%05b|%04x)%s",
                $before, $opcode, $reg1, $reg2, $imm,
                $after);
    }
    print "$line\n";
}

Here follows the disassembly of greetings.o, with the machine code in parts.

$ objdump -d greetings.o | perl machine.pl

greetings.o:     file format elf32-powerpc

Disassembly of section .text:

00000000 <main>:
   0:   (31|00000|01000|02a6)   mflr    r0
   4:   (37|00001|00001|fff0)   stwu    r1,-16(r1)
   8:   (15|00011|00000|0000)   lis     r3,0
   c:   (14|00011|00011|0000)   addi    r3,r3,0
  10:   (36|00000|00001|0014)   stw     r0,20(r1)
  14:   (18|00000|00000|0001)   bl      14 <main+0x14>
  18:   (32|00000|00001|0014)   lwz     r0,20(r1)
  1c:   (14|00011|00000|0000)   li      r3,0
  20:   (14|00001|00001|0010)   addi    r1,r1,16
  24:   (31|00000|01000|03a6)   mtlr    r0
  28:   (19|10100|00000|0020)   blr

The disassembly now shows the machine code with the opcode in decimal, then the next 5 bits in binary, then another 5 bits in binary, then the remaining 16 bits in hexadecimal.

The "load word and zero" (lwz) instruction given lwz X,N(Y) would load register X with a value from memory. It indexes memory using register Y as a pointer and value N as an offset in bytes. Thus the memory location is N bytes after where register Y points. The mnemonic lwz uses opcode 32. The next 5 bits hold the register number for X. Another 5 bits hold the register number for Y. The remaining 16 bits hold the offset value N. Given lwz r0,20(r1) then r0 is 00000 in binary, r1 is 00001 in binary, 20 is 0x14 in hexadecimal, so the filter script would write (32|00000|00001|0014).

It becomes apparent that "store word" (stw) uses opcode 36, while "store word with update" (stwu) uses opcode 37. Given stw r0,20(r1) then the filter script would write (36|00000|00001|0014). Given stwu r1,-16(r1) then -16 is 0xfff0 in hexadecimal 2s complement, so the filter script would write (37|00001|00001|fff0).

The "add immediate" (addi) instruction given addi X,Y,Z would load register X with the sum of register Y and immediate value Z. The mnemonic addi uses opcode 14. The next 5 bits hold the register number for X. Another 5 bits hold the register number for Y. The remaining 16 bits hold the immediate value Z. Given addi r3,r3,0 then r3 is 00011 in binary, so the filter script would write (14|00011|00011|0000). Given addi r1,r1,16, then r1 is 00001 in binary, 16 is 0x10 in hexadecimal, so the filter script would write (14|00001|00001|0010).

The addi instruction has one more quirk. The second operand Y is either the immediate value 0, or a register number 1 through 31. (Some other instructions have this same quirk and cannot read register 0.) Thus addi 4,0,5 would actually do r4 = 0 + 5 instead of r4 = r0 + 5, as if register 0 would contain value 0 instead of the value in register 0. This quirk allows the "load immediate" (li) mnemonic, which also uses opcode 14, to load an immediate value by adding it to zero. So li r3,0 is the same as addi 3,0,0 which becomes (14|00011|00000|0000).

The "load address" (la) instruction given la X,N(Y) would load register X with the address of N bytes after where register Y points. This is the same as to add register Y to immediate value N, so la X,N(Y) and addi X,Y,N are the same, and both use opcode 14.

When machine code contains opcode 14, then the disassembler tries to be smart about choosing an instruction mnemonic. Here follows a quick example.

$ cat quick-example.s
    .section    .text
    addi    4,0,5       # bad
    la  3,3(0)      # very bad
    la  3,0(3)
    la  5,2500(3)
$ as -o quick-example.o quick-example.s
$ objdump -d quick-example.o | perl machine.pl

quick-example.o:     file format elf32-powerpc

Disassembly of section .text:

00000000 <.text>:
   0:   (14|00100|00000|0005)   li      r4,5
   4:   (14|00011|00000|0003)   li      r3,3
   8:   (14|00011|00011|0000)   addi    r3,r3,0
   c:   (14|00101|00011|09c4)   addi    r5,r3,2500

If the second register operand to opcode 14 is 00000, then the machine code looks like an instruction "li", so the disassembler uses the mnemonic "li". Otherwise the disassembler prefers mnemonic "addi" to "la".

Opcodes more strange

The filter script shows the four parts of a typical instruction, but not all instructions have those four parts. The instructions that do branching or access special registers are not typical instructions.

Here again is the disassembly of the main function in greetings.o:

00000000 <main>:
   0:   (31|00000|01000|02a6)   mflr    r0
   4:   (37|00001|00001|fff0)   stwu    r1,-16(r1)
   8:   (15|00011|00000|0000)   lis     r3,0
   c:   (14|00011|00011|0000)   addi    r3,r3,0
  10:   (36|00000|00001|0014)   stw     r0,20(r1)
  14:   (18|00000|00000|0001)   bl      14 <main+0x14>
  18:   (32|00000|00001|0014)   lwz     r0,20(r1)
  1c:   (14|00011|00000|0000)   li      r3,0
  20:   (14|00001|00001|0010)   addi    r1,r1,16
  24:   (31|00000|01000|03a6)   mtlr    r0
  28:   (19|10100|00000|0020)   blr

Assembly code uses "branch and link" (bl) to call functions and "branch to link register" (blr) to return from functions.

Every processor has a program counter (ctr) to hold the address of the current instruction. The branch instructions change the program counter. PowerPC uses a link register (lr) instead of a general purpose register (any of r0 through r31) or a memory location to hold the return address, seemingly to separate the branch processing unit (bpu) from the units that access general purpose registers or memory.

The instruction bl takes one operand, an immediate value for the address. There is no way to fit an opcode of 6 bits and an address of 32 bits into an instruction of only 32 bits. So bl has only 26 bits for an operand. Thus bl actually takes a 26-bit relative address and increments the program counter. This provides a range of about 32 MB in either direction. The assembler converts the operand to a relative address when it assembles the instruction bl.

The instruction bl puts would cause the assembler to convert the value of symbol puts to a relative address; but puts was an undefined symbol, so the assembler output a meaningless relative address of zero. An instruction bl with relative address zero would branch to itself for an infinite loop. (The address for PowerPC is relative to the branch instruction. The address for some other architectures would be relative to the next instruction after the branch.)

The address of every PowerPC instruction is a multiple of 4, thus the relative branch can also ignore the low 2 bits of the 26-bit relative address. Thus PowerPC uses those 2 bits to select the type of branch. Instruction bl uses opcode 18 and sets the lower 2 bits to 0x1. Mnemonic bl shares opcode 18 with three other mnemonics that set the lower 2 bits in other way. Given bl puts the assembler began with opcode 18, ended with 0x1, and filled the intermediate 24 bits with zeros, so the filter script would write (18|00000|00000|0001). A better filter script would write opcode 18 in decimal, the 26-bit relative address in hexadecimal, and the low 2 bits in binary.

Opcode 19 is for many types of branches; the lower 26 bits somehow specify the type of branch. Mnemonic blr shares opcode 19 with many other mnemonics. Opcode 31 is for operations with special purpose registers; the lower 26 bits somehow pick a special register and an action. Mnemonics mflr and mtlr share opcode 31 with many other mnemonics, including the more general "move from special purpose register" (mfspr) and "move to special purpose register" (mtspr). The instructions blr', mflr and mtlr do not involve any symbols, so the lower 26 bits already have their final values.

The source file /usr/src/gnu/dist/binutils/opcodes/ppc-opc.c contains a table of powerpc_opcodes that lists the various mnemonics that use opcodes 18, 19 and 31.