Dynamic Linking

Shared objects

In a typical system, a number of programs will be running. Each program relies on a number of functions, some of which will be “standard” C library functions, like printf(), malloc(), write(), etc.

If every program uses the standard C library, it follows that each program would normally have a unique copy of this particular library present within it. Unfortunately, this results in wasted resources. Since the C library is common, it makes more sense to have each program reference the common instance of that library, instead of having each program contain a copy of the library. This approach yields several advantages, not the least of which is the savings in terms of total system memory required.

Statically linked

The term statically linked means that the program and the particular library that it's linked against are combined together by the linker at linktime. This means that the binding between the program and the particular library is fixed and known at linktime — well in advance of the program ever running. It also means that we can't change this binding, unless we relink the program with a new version of the library.

You might consider linking a program statically in cases where you weren't sure whether the correct version of a library will be available at runtime, or if you were testing a new version of a library that you don't yet want to install as shared.

Programs that are linked statically are linked against archives of objects (libraries) that typically have the extension of .a. An example of such a collection of objects is the standard C library, libc.a.

Dynamically linked

The term dynamically linked means that the program and the particular library it references are not combined together by the linker at linktime. Instead, the linker places information into the executable that tells the loader which shared object module the code is in and which runtime linker should be used to find and bind the references. This means that the binding between the program and the shared object is done at runtime — before the program starts, the appropriate shared objects are found and bound.

This type of program is called a partially bound executable, because it isn't fully resolved — the linker, at linktime, didn't cause all the referenced symbols in the program to be associated with specific code from the library. Instead, the linker simply said: “This program calls some functions within a particular shared object, so I'll just make a note of which shared object these functions are in, and continue on.” Effectively, this defers the binding until runtime.

Programs that are linked dynamically are linked against shared objects that have the extension .so. An example of such an object is the shared object version of the standard C library, libc.so.

You use a command-line option to the compiler driver qcc to tell the tool chain whether you're linking statically or dynamically. This command-line option then determines the extension used (either .a or .so).

Augmenting code at runtime

Taking this one step further, a program may not know which functions it needs to call until it's running. While this may seem a little strange initially (after all, how could a program not know what functions it's going to call?), it really can be a very powerful feature. Here's why.

Consider a “generic” disk driver. It starts, probes the hardware, and detects a hard disk. The driver would then dynamically load the io-blk code to handle the disk blocks, because it found a block-oriented device. Now that the driver has access to the disk at the block level, it finds two partitions present on the disk: a DOS partition and a QNX 4 partition. Rather than force the disk driver to contain filesystem drivers for all possible partition types it may encounter, we kept it simple: it doesn't have any filesystem drivers! At runtime, it detects the two partitions and then knows that it should load the fs-dos.so and fs-qnx4.so filesystem code to handle those partitions.

By deferring the decision of which functions to call, we've enhanced the flexibility of the disk driver (and also reduced its size).

How shared objects are used

To understand how a program makes use of shared objects, let's first see the format of an executable and then examine the steps that occur when the program starts.

ELF format

QNX Neutrino uses the ELF (Executable and Linking Format) binary format, which is currently used in SVR4 Unix systems. ELF not only simplifies the task of making shared libraries, but also enhances dynamic loading of modules at runtime.

In the following diagram, we show two views of an ELF file: the linking view and the execution view. The linking view, which is used when the program or library is linked, deals with sections within an object file. Sections contain the bulk of the object file information: data, instructions, relocation information, symbols, debugging information, etc. The execution view, which is used when the program runs, deals with segments.

At linktime, the program or library is built by merging together sections with similar attributes into segments. Typically, all the executable and read-only data sections are combined into a single text segment, while the data and BSSs are combined into the data segment. These segments are called load segments, because they need to be loaded in memory at process creation. Other sections such as symbol information and debugging sections are merged into other, nonload segments.


Picture showing linking view + execution view


Object file format: linking view and execution view.

ELF without COFF

Most implementations of ELF loaders are derived from COFF (Common Object File Format) loaders; they use the linking view of the ELF objects at load time. This is inefficient because the program loader must load the executable using sections. A typical program could contain a large number of sections, each of which would have to be located in the program and loaded into memory separately.

QNX Neutrino, however, doesn't rely at all on the COFF technique of loading sections. When developing our ELF implementation, we worked directly from the ELF spec and kept efficiency paramount. The ELF loader uses the “execution view” of the program. By using the execution view, the task of the loader is greatly simplified: all it has to do is copy to memory the load segments (usually two) of the program or library. As a result, process creation and library loading operations are much faster.

The process

The diagram below shows the memory layout of a typical process. The process load segments (corresponding to text and data in the diagram) are loaded at the process's base address. The main stack is located just below and grows downwards. Any additional threads that are created will have their own stacks, located below the main stack. Each of the stacks is separated by a guard page to detect stack overflows. The heap is located above the process and grows upwards.


Picture showing process memory layout


Process memory layout on an x86.

In the middle of the process's address space, a large region is reserved for shared objects. Shared libraries are located at the top of the address space and grow downwards.

When a new process is created, the process manager first maps the two segments from the executable into memory. It then decodes the program's ELF header. If the program header indicates that the executable was linked against a shared library, the process manager will extract the name of the dynamic interpreter from the program header. The dynamic interpreter points to a shared library that contains the runtime linker code. The process manager will load this shared library in memory and will then pass control to the runtime linker code in this library.

Runtime linker

The runtime linker is invoked when a program that was linked against a shared object is started or when a program requests that a shared object be dynamically loaded. The runtime linker is contained within the C runtime library.

The runtime linker performs several tasks when loading a shared library (.so file):

  1. If the requested shared library isn't already loaded in memory, the runtime linker loads it:
  2. Once the requested shared library is found, it's loaded into memory. For ELF shared libraries, this is a very efficient operation: the runtime linker simply needs to use the mmap() call twice to map the two load segments into memory.
  3. The shared library is then added to the internal list of all libraries that the process has loaded. The runtime linker maintains this list.
  4. The runtime linker then decodes the dynamic section of the shared object.

This dynamic section provides information to the linker about other libraries that this library was linked against. It also gives information about the relocations that need to be applied and the external symbols that need to be resolved. The runtime linker will first load any other required shared libraries (which may themselves reference other shared libraries). It will then process the relocations for each library. Some of these relocations are local to the library, while others require the runtime linker to resolve a global symbol. In the latter case, the runtime linker will search through the list of libraries for this symbol. In ELF files, hash tables are used for the symbol lookup, so they're very fast. The order in which libraries are searched for symbols is very important, as we'll see in the section on “Symbol name resolution” below.

Once all relocations have been applied, any initialization functions that have been registered in the shared library's init section are called. This is used in some implementations of C++ to call global constructors.

Loading a shared library at runtime

A process can load a shared library at runtime by using the dlopen() call, which instructs the runtime linker to load this library. Once the library is loaded, the program can call any function within that library by using the dlsym() call to determine its address.


Note: Remember: shared libraries are available only to processes that are dynamically linked.

The program can also determine the symbol associated with a given address by using the dladdr() call. Finally, when the process no longer needs the shared library, it can call dlclose() to unload the library from memory.

Symbol name resolution

When the runtime linker loads a shared library, the symbols within that library have to be resolved. The order and the scope of the symbol resolution are important. If a shared library calls a function that happens to exist by the same name in several libraries that the program has loaded, the order in which these libraries are searched for this symbol is critical. This is why the OS defines several options that can be used when loading libraries.

All the objects (executables and libraries) that have global scope are stored on an internal list (the global list). Any global-scope object, by default, makes available all of its symbols to any shared library that gets loaded. The global list initially contains the executable and any libraries that are loaded at the program's startup.

By default, when a new shared library is loaded by using the dlopen() call, symbols within that library are resolved by searching in this order through:

  1. the shared library
  2. the list of libraries specified by the LD_PRELOAD environment variable. You can use this environment variable to add or change functionality when you run a program. For setuid or setgid ELF binaries, only libraries in the standard search directories that are also setuid will be loaded.
  3. the global list
  4. any dependent objects that the shared library references (i.e. any other libraries that the shared library was linked against)

The runtime linker's scoping behavior can be changed in two ways when dlopen()'ing a shared library: