C++ Build Process

Understanding how source code becomes an executable is essential for diagnosing compilation and link errors. C and C++ use the most explicit multi-step translation pipeline of mainstream languages.

Interpreters vs. Compilers

InterpretersCompilers
TranslationLine-by-line, immediate executionEntire program → machine code
SpeedSlow (re-translate repeated code)Fast (code generated once)
Error messagesMore specific (source always available)After compilation pass
Large programsSevere limitations (interpreter must fit in memory)Separate compilation enables large programs
ExamplesBASIC (traditional), Python*C, C++, Go

*Python is a hybrid — compiles to bytecode, then interprets; supports separate compilation.

The Full Pipeline

flowchart LR
    A[Source .cpp] --> B[Preprocessor]
    B --> C[Pre-processed code]
    C --> D[Parser: builds parse tree]
    D --> E[Global Optimizer optional]
    E --> F[Code Generator]
    F --> G[Peephole Optimizer optional]
    G --> H[Object Module .o/.obj]
    H --> I[Linker]
    I --> J[Executable]
    K[Other .o files] --> I
    L[Libraries .a/.lib] --> I

1. Preprocessor

Runs before the compiler. Handles:

  • #include directives — pastes in header file contents
  • #define macros — text substitution
  • Conditional compilation (#ifdef, #ifndef)

Output is pre-processed source code. C++ design discourages preprocessor use (subtle bugs); C++ features like const and inline replace most macros.

2. Parser (first pass)

Breaks source into a parse tree (leaves are tokens: identifiers, operators, literals). Also performs static type checking here — verifying correct argument types, return types, etc.

3. Code Generator (second pass)

Walks the parse tree and produces:

  • Assembly language code → run through the assembler, or
  • Machine code directly

Output: an object module (file with .o or .obj extension).

The word “object” here predates OOP — it means “compiled chunk of machine code,” not a class instance.

4. Linker

Combines object modules into a final executable:

  • Resolves external references: if module A calls a function defined in module B, the linker patches in the correct address
  • Searches libraries by index (links only the needed object module, not the whole library)
  • Secretly adds a startup module: initialization routines that set up the stack and initialize program variables before main() runs

Library search order matters — older linkers search left-to-right once; if a dependency appears after its user in the argument list, it may not be found.

Static Type Checking

C++ checks types at compile time (static), not runtime (dynamic). Advantages:

  • Catches misuse of arguments and return values early
  • Maximizes runtime speed (no runtime type-checking overhead)
  • Compiler reports errors before the program ever runs

Java adds dynamic type checking on top; C++ relies on static by default (you can add dynamic checking manually via dynamic_cast).

Separate Compilation

A C/C++ program can be split into multiple files. Each file is compiled independently into an object module. The linker assembles them:

  • Function = atomic unit of code — cannot span files; a file can contain many functions
  • Each file is compiled independently → faster rebuilds (only changed files recompiled)
  • Compiled pieces can be combined into libraries for reuse

Using a Library: 3 Steps

  1. #include the library’s header file
  2. Use the functions/classes in your code
  3. Link the library’s object modules into your executable

The standard library is always searched by the linker automatically. For add-on libraries, you must explicitly name them on the linker command line.

Key Points

  • #include = textual paste; always use the standard no-.h form (<iostream> not <iostream.h>)
  • The .h form gives the old non-template version; mixing .h and non-.h in one program causes errors
  • Linker errors (“undefined reference”) mean: defined nowhere in any object module or library
  • One-Definition Rule (ODR): each function or variable may be defined in only one translation unit
  • The startup module (linked silently) is why main() can assume the stack is initialized