There are several aspects that should be considered on the way to deterministic binary output:
- absolute paths which are compiled into the binary code (mainly for debugging)
- compiler sometimes decide randomly e.g., which optimization to use, which path to choose, or how to mangle a specific function in anonymous namespaces. Of course, this has no influence on the functional properties of the code, the binaries are (they should ;-) ) always be functional equivalent.
- timestamps, uuid in object files, libraries, etc.
- timestamps, dates generated by __DATE__, __TIME__, __TIMESTAMP__ macros
An example where the first point comes into play is the __FILE__ macro which is often used for debugging purposes. The implementation of how this macro gets expanded depends from compiler to compiler. For example Microsofts C++ Compiler uses an FC flag which allows to control if the macro expansion to absolute or relative paths. Of course, the question regarding absolute or relative paths is only of value if you have multiple build machines with different location of workspaces or you care about information of your workspace that gets delivered to your customer. You can easily check if there are any path informations like that in the binary by searching for the workspace path in the binary.
strings binary.out | grep workspace
If there is any output that contains full paths to your sourcecode files then you would have to take care of this problem. For me, I have discovered two solutions:
- Using compiler switches to make sure that paths are relative
- Making sure that the build environment/workspaces are on the same absolute paths independent of the actual build machine(master, slave jenkins)
cmp -b -l b1 b2
This will show you all binary differences with location and difference for b1 vs. b2.
The binary incompatibility has several reasons: one is for examples how gcc mangles functions in anonymous namespaces. A part of this name mangling is randomized by using a random generator. If you have taken care of our first point and your object files still differ, then you can use a special gcc parameter. The -frandom-seed=<string> allows one to specify a string which will be used to initialize the random generator. The documentation for this option tells us...
-frandom-seed=string This option provides a seed that GCC uses when it would otherwise use random numbers. It is used to generate certain symbol names that have to be different in every compiled file. It is also used to place unique stamps in coverage data files and the object files that produce them. You can use the -frandom-seed option to produce reproducibly identical object files. The string should be different for every file you compile.
That means that we have to provide random strings for each file we will compile. I have found one solution to this problem in this blogpost by Jörg Förstner. He suggested to use the md5 hash of the source file as input to the -frandom-seed. This is sufficient as it will change for different source files and vice versa will provide the same seed if the source hasn't changed. He suggested to use the following compile parameters...
$(CC) -frandom-seed=$(shell md5sum $< | sed 's/\(.*\) .*/\1/') $(CCFLAGS) -c $< -o $@
The seed is constructed by calculating the md5sum of the source code file (e.g. test.cpp).
md5sum $< b61f78373a5b404a027c533b9ca6280f test.cpp
This result is piped into sed (sed 's/\(.*\) .*/\1/') to cut away the filename part behind the actual md5 sum.
The problem described by the third point (timestamps, uuids) is created by some linkers in the linking step. For example when building object files/static libraries/archives with the ar tool, ar will also insert timestamps, uuids and other stuff which will change from build to build. You can easily try this out by executing ar two times and comparing by comparing the generated output. However, for the ar tool there is a simple solution to this problem, ar comes with the -D option which will turn ar into deterministic mode. The documentation for -D tells us...
D Operate in deterministic mode. When adding files and the archive index use zero for UIDs, GIDs, timestamps, and use consistent file modes for all files. When this option is used, if ar is used with identical options and identical input files, multiple runs will create identical output files regardless of the input files' owners, groups, file modes, or modification times.
My command line for building a static library looks like:
ar Drvs <output> <input>
In some cases, for example when using a cross-compiler tool chain, you cannot easily change the bin-utils version to get an ar version that supports the deterministic option. This was the motivation for someone to write a tool that wipes out the timestamps in the generated archive files. You can find this tool at github under the following url: https://github.com/nh2/ar-timestamp-wiper/tree/master. If you use cmake as part of your build system, you can link in the tool in the finish step of the archive generation.
SET(CMAKE_C_ARCHIVE_FINISH "ar-timestamp-wiper") SET(CMAKE_CXX_ARCHIVE_FINISH ${CMAKE_C_ARCHIVE_FINISH}) SET(CMAKE_C_ARCHIVE_FINISH ${CMAKE_C_ARCHIVE_FINISH})
The last point (timestamps, dates introduced by macros like __DATE__, __TIME__, __TIMESTAMP__) can addressed by specifying a deterministic/known value for the corresponding build. I know at least two ways how to do this, both work in general, but sometimes one approach is easier to use then the other.
- faketime/libfaketime
- overriding the macros by compiler defines
apt-get install faketime faketime '2014-01-09 00:00:00' /usr/bin/dateThe second approach works by adding for example -D__DATE__="'Jan 9 2014'" -D__TIME__="'12:00:00'" to your buildstep. You have to take care that that you specify a valid date and time according to the expected return values of __DATE__ and __TIME__.
References:
- http://cmake.3232098.n2.nabble.com/How-to-calculate-a-value-quot-on-the-fly-quot-for-use-with-gcc-compiler-option-td3277077.html
- http://stackoverflow.com/questions/14653874/deterministic-binary-output-with-g
- https://wiki.debian.org/ReproducibleBuilds