What Is A Linker?
This is part of a larger series known as “How To Program Anything: Core Rulebook“
With low-level and middle-level languages like assembly and C, you’ll probably find yourself worrying about “the linker” at some point, probably when there’s an error and a symbol can’t be found. What is this mysterious linker, and what does it do? For higher-level languages, particularly interpreted languages like Python you probably won’t encounter this as much unless you’re compiling extensions from source such as pygame. In PHP there is a sort of middle ground, however, as you’re able to dynamically load, or link, libraries into the PHP run time environment during program execution with the dl() command. I only really learned of the linker and what it does, and how it does it when I started studying assembly on a Linux machine myself.
Where Do We Use A Linker?
To understand the role of the linker we must revisit our compilation process. I borrow the compilation process diagram from the article “Interpretation Versus Compilation” below:
Here you can see the step “links together into executable.” That’s where the linker comes in. The compiler outputs what are known as object code files. These files are not exactly executable-ready machine code. They are instead pieces of a program with tags and information about what symbols correspond to what machine code. The linker, not exactly pictured in the diagram above, takes these object code files and “glue” together all the symbols so that wherever they are encountered in the object files they are resolved to their matching machine code. I shall explain further, but first below is a more detailed diagram of the linking process:
When you write a program, such as a C program, you will often write parts or modules of the program in separate files. These files will define functions, global variables, macros, etc. (for more coverage over what a function or global variable is see my Programming Crash Course post). When you send these program files to the compiler, it will generate the appropriate machine code for each definition in each file, and for each file typically, it will generate a file that corresponds each definition (known as a symbol at this point) to the generated machine code. However, the program is not executable at this point.
Imagine that you have two program files, FileOne and FileTwo. They each define their own functions as below:
a = 5
Now, when the compiler compiles each of these files it creates object code. Each object code file has a list of symbols, for FileOne that is main and myFunction, for FileTwo that is anotherFunction, and the corresponding machine code that the processor will run. However, notice something: FileOne calls anotherFunction() inside myFunction()… that means in the object file generated by the compiler for FileOne there’s a missing piece. Nowhere in FileOne’s object code is anotherFunction and its corresponding machine code defined. How is the processor going to know what to do?
That’s where the linker comes in. It takes both files and identifies all the symbols: main, myFunction and anotherFunction. It then places all the corresponding machine code in, presumably, one file where it fills in the call to anotherFunction in myFunction with the proper memory address in the executable. It can do this because it knows all the symbols from all the files and their corresponding machine code.
Why Do We Use A Linker?
There are many reasons to use a linker; most of them have to do with program code organization. The first reason is that we can break up program code, the code we write, into separate files. This allows the programmer to not only organize the code in a user-friendly way, putting associated functions together forming what is commonly referred to as modules, but also allows multiple programmers to work together more easily. If different program functionalities are written in different modules/files then one programmer can work on one file without having to disturb other files, whereas if the program was in one giant file you’d have multiple people editing one file at once and that would be awful. Each programmer in this scenario would have to wait until the previous programmer was done editing the file. You can see how that would be counter-productive. On top of this, source versioning programs like Git or Subversion (to be covered elsewhere) take advantage of the multiple file approach when keeping track of changes.
Another reason to use a linker is that, with a linker, we are able to use what is commonly referred to as libraries. A program library is a bunch of code that exposes some kind of interface (typically a bunch of functions and entry points to machine code, or if you are object-oriented, to be covered in Bootstrap Part 2 as of this writing, classes) to a bunch of machine code that does something. An interface in this sense is a bunch of symbols that some other program can reference to gain that additional functionality. This is most apparent when we are programming in C and we use the C Standard Library. C itself as a specified language does not define a lot of functionality such as input/output operators, as with many other languages, and instead, puts that type of functionality in a standard library that is linked against. When the C program is compiled and linked, the linker looks up the library symbols in the object code of the library and fills in the blanks. This is true for any library, such as one having to deal with graphics for instance.
Note: What we have described thus far is known as static linking, where the final executable includes all the necessary machine code from all the sources to run independently. Some operating systems and linkers have the option of performing dynamic linking, where a dynamic library is loaded into memory as the program executes and then linked in real-time against the existing program. These programs thus have unknown symbols within them even before they execute.
There is a pro to dynamic linking and that is that we only really have to have the dynamic library stored once somewhere in the bowels of the computer while multiple executables link to it when they run. There is a con to this method however, and that is if the dynamic library is upgraded or re-written in a non-backwards compatible way and replaced in the system, all the previous programs will break and not run. This wouldn’t occur if they statically linked and had all their machine code with them independently.
Another, more esoteric, function of a linker is what is known as relocation. Typically an object file defines all the symbols that may be used by another object file external to it, however, it also defines symbols for internal usage. These symbols can be translated and filtered by the linker in a process known as relocation. The process goes something like this: the compiler has no means of knowing the final layout of machine code in the final output, so it can’t take advantage of instructions that may be more efficient given a certain layout of the code. So, the compiler generates the most conservative instruction, which may be the slowest or most inefficient in the final machine code layout, but it adds what is known as relaxation hints. Once all the input objects have been read and assigned temporary addresses, the linker can perform what is known as a relaxation pass which reassigns addresses so that more efficient instructions can be substituted in for the conservative guesses the compiler made. This, in turn, may allow further relaxations, and so on. In general, an average programmer not implementing their own linker or compiler can safely not consider these issues; the linker will take care of it.
Linkers are essential pieces of the software development process and allow programmers to separate out their code in meaninful ways. They also allow the use of programming libraries, code that has been generated to be used by other programs. The C Standard Library is an example of such a library, allowing standard C programs to link to its code to perform such operations as general input/output. If we did not have a linker in our compilation process we’d have to write all our programs as one giant monolithic file. If multiple programmers were working on one program each would have to wait for the other to be done before pursuing their own goals on the same file. This is unnecessary with a linker that can tie together multiple files into one executable. I hope this article cleared up what a linker does for you, but if you have any questions feel free to leave them in the comments. Thanks for reading!
This is part of a larger series known as “How To Program Anything: Core Rulebook“
If you appreciate this article you might consider supporting my Patreon.
But if a monthly commitment is a bit much, I get it, you might consider buying me a coffee.