Say you've got a file you want to put into an executable. Some help text, a copyright notice. Putting these into the source code is painful:
static const char *copyright_notice[] = {
"This program is free software; you can redistribute it and/or modify",
"it under the terms of the GNU General Public License as published by",
"the Free Software Foundation; either version 2 of the License, or (at",
"your option) any later version.",
NULL /* Marks end of text. */
};
#include <stdio.h>
const char **line_p;
for (line_p = copyright_notice; *line_p != NULL; line_p++) {
puts(*line_p);
}
If the file is binary, such as an image, then the pain rises exponentially. If you must take this approach then you'll want to know about VIM's xxd hexdump tool:
$ xxd -i copyright.txt > copyright.i
which gives a file which can be included into a C program:
unsigned char copyright_txt[] = {
0x54, 0x68, 0x69, 0x73, 0x20, 0x70, 0x72, 0x6f, 0x67, 0x72, 0x61, 0x6d,
0x20, 0x69, 0x73, 0x20, 0x66, 0x72, 0x65, 0x65, 0x20, 0x73, 0x6f, 0x66,
…
0x30, 0x31, 0x2c, 0x20, 0x55, 0x53, 0x41, 0x2e, 0x0a
};
unsigned int copyright_txt_len = 681;
That program looks like so:
#include "copyright.i"
unsigned char *p;
unsigned int len;
for (p = copyright_txt, len = 0;
len < copyright_txt_len;
p++, len++) {
putchar(*p);
}
If you are going to use this in anger then modify the generated .i file to declare a static const unsigned char …[]. A sed command can do that easily enough; that way the Makefile can re-create the .i file upon any change to the input binary file.
It is much easier to insert a binary file using the linker, and the rest of this blog post explores how that is done. Again the example file will be copyright.txt, but the technique applies to any file, not just text.
Fortunately the GNU linker supports a binary object format, so using the typical linkage tools a binary file can be transformed into an object file simply with:
$ ld --relocatable --format=binary --output=copyright.o copyright.txt
$ cc -c helloworld.c
$ cc -o helloworld helloworld.o copyright.o
The GNU linker's --relocatable indicates that this object file is to be linked with other object files, and therefore addresses in this object file will need to be relocated at the final linkage.
The final cc in the example doesn't compile anything: it runs ld to link the object files of C programs on this particular architecture and operating system.
The linker defines some symbols in the object file marking the start, end and size of the copied copyright.txt:
$ nm copyright.o
000003bb D _binary_copyright_txt_end
000003bb A _binary_copyright_txt_size
00000000 D _binary_copyright_txt_start
Ignore the address of 00000000, this is relocatable object file and the final linkage will assign a final address and clean up references to it.
A C program can access these symbols with:
extern const unsigned char _binary_copyright_txt_start[];
extern const unsigned char _binary_copyright_txt_end[];
extern const size_t *_binary_copyright_txt_size;
Don't rush ahead and puts() this variable. The copyright.txt file has no final ASCII NUL character which C uses to mark the end of strings. Perhaps use the old-fashioned UNIX write():
#include <stdio.h>
#include <unistd.h>
fflush(stdout); /* Synchronise C's stdio and UNIX's I/O. */
write(fileno(stdout)),
_binary_copyright_txt_start,
(size_t)&_binary_copyright_txt_size);
Alternatively, add a final NUL to the copyright.txt file:
$ echo -e -n "\x00" >> copyright.txt
and program:
#include <stdio.h>
extern const unsigned char _binary_copyright_txt_start[];
fputs(_binary_copyright_txt_start, stdout);
There's one small wrinkle:
$ objdump -s copyright.o
copyright.o: file format elf32-littlearm
Contents of section .data:
0000 54686973 2070726f 6772616d 20697320 This program is
0010 66726565 20736f66 74776172 653b2079 free software; y
0020 6f752063 616e2072 65646973 74726962 ou can redistrib
0030 75746520 69742061 6e642f6f 72206d6f ute it and/or mo
The .data section is copied into memory for all running instances of the executable. We really want the contents of the copyright.txt file to be in the .rodata section so that there is only ever one copy in memory no matter how many copies are running.
objcopy could have copied an input ‘binary’ copyright.txt file to a particular section in an output object file, and that particular section could have been .rodata. But objcopy's options require us to state the architecture of the output object file. We really don't want a different command for compiling on x86, AMD64, ARM and so on.
So here's a hack: let ld set the architecture details when it generates its default output and then use objcopy to rename the section from .data to .rodata. Remember that .data contains only the three _binary_… symbols and so they are the only symbols which will move from .data to .rodata:
$ ld --relocatable --format=binary --output=copyright.tmp.o copyright.txt
$ objcopy --rename-section .data=.rodata,alloc,load,readonly,data,contents copyright.tmp.o copyright.o
$ objdump -s copyright.o
copyright.o: file format elf32-littlearm
Contents of section .rodata:
0000 54686973 2070726f 6772616d 20697320 This program is
0010 66726565 20736f66 74776172 653b2079 free software; y
0020 6f752063 616e2072 65646973 74726962 ou can redistrib
0030 75746520 69742061 6e642f6f 72206d6f ute it and/or mo
Link this copyright.o with the remainder of the program as before:
$ cc -c helloworld.c
$ cc -o helloworld helloworld.o copyright.o