Binary File

Compiling Process

The compilation process can be divided into five main stages:

  1. Lexical analysis
  2. Syntax analysis
  3. Semantic analysis
  4. Intermediate code generation
  5. Code optimization and generation

Compilation Procedure Using GCC

  1. Preprocessing: Use the -E option to preprocess the source code. This step expands macros, removes comments, and includes header files.
  2. Compilation: Use the -S option to compile the source code or preprocessed code to assembly language (.s file).
  3. Assembling: Use the -c option to assemble the source code or assembly code to object code (.o file).
  4. Linking: This step involves address and storage allocation, symbol binding, and relocation.
    By default, GCC performs dynamic linking. You can use the -static option to enable static linking.
    The linking process is accomplished by the system linker, ld.so (or sometimes simply ld).

You can use -save-temps option to preserve all intermediate files, and --verbose to display detailed information about GCC’s workflow.

ELF File Format

ELF (Executable and Linkable Format) is a standard file format for executable files, object files, shared libraries, and core dumps. It is designed to be platform-independent and flexible, supporting a wide range of computer architectures and operating systems.

Types of ELF

There are three types of ELF files:

  1. Executable Files
  2. Relocatable Files: These contain object code generated by the compiler but not yet linked into an executable file or a shared library.
  3. Shared Object Files: These are dynamically linked libraries that can be loaded at runtime.

Views of ELF File

An ELF file can be viewed from two perspectives: the Linking View and the Execution View.

  • The Linking View divides the file into sections.
  • The Execution View divides the file into segments.

Here are some important sections:

  1. .text: Executable machine code.
  2. .data: Initialized global and static variables.
  3. .bss: Uninitialized global and static variables.
  4. .rodata: Read-only data, such as string literals and constants.
  5. .symtab: Symbol table, storing symbols definitions and references.
  6. .strtab: String table, storing symbol names and other strings.
  7. .dynamic: Dynamic linking information.
  8. .plt: Procedure Linkage Table, used for dynamic function calls.
  9. .got: Global Offset Table, used for resolving addresses in dynamic linking.

Useful Commands

Command readelf and objdump can be used to display information about ELF files.

  • For Linking View:
    readelf

    1. readelf -h <file> – Display the ELF file header.
    2. readelf -S <file> – Display the sections.
    3. readelf -s <file> – Display the symbols.
    4. readelf -r <file> – Display the relocation entries.
    5. readelf -d <file> – Display the dynamic section.
    6. readelf -x <section> <file> – Display the contents of the specified section.

    objdump

    1. objdump -x <file> – Display the header information.
    2. objdump -s <file> – Display the contents of the sections.
    3. objdump -d <file> – Display the disassembled code.
  • For Execution View:
    readelf

    1. readelf -l <file> – Display the program headers and the mapping of sections to segments.

Linking

Static Linking

A static linker combines multiple object files and libraries into a single executable file. Static linking can occur at compile time, load time and run time. There are two main tasks in static linking:

  1. Symbol resolution: Match each symbol reference(e.g., a function call or variable usage) with its definition.
  2. Relocation: Assign virtual memory address to all symbols and update their references accordingly.

By the way, static libraries usually have the .a suffix, and they are created by ar command.

Some concepts:

  1. VMA: Virtual Memory Address.
  2. LMA: Load Memory Address.

Dynamic Linking

Dynamic linking allows multiple programs to share the same library code at runtime instead of embedding it into each executable. This reduces memory usage and makes updating libraries easier, since only the shared library needs to be replaced.

Some important concepts:

  1. PIC (Position-Independent Code)
    It allows the same machine code to run correctly regardless of where it is loaded in memory — a crucial property for shared libraries (.so files).
    You can generate PIC using the compiler flag:

    1
    gcc -fPIC -c file.c

    Without PIC, absolute addresses would need to be fixed during loading, making sharing between processes impossible or inefficient.

  2. GOT (Global Offset Table)
    The GOT stores the absolute addresses of global variables and functions used by a shared object.
    When a shared library is loaded, the dynamic linker fills in the GOT entries with the correct runtime addresses.
    The program then accesses data and functions indirectly through the GOT, allowing the code to remain position-independent.

  3. PLT (Procedure Linkage Table)
    The PLT works together with the GOT. It contains small pieces of code used to call external functions whose addresses are not known until runtime.
    When a function in another shared library is called for the first time, the PLT entry triggers the dynamic linker to resolve the function’s address and update the corresponding GOT entry.

  4. Lazy Binding
    Lazy binding (or lazy symbol resolution) means that external function addresses are not resolved until they are actually called for the first time.
    This speeds up program startup because the dynamic linker only resolves what is needed.