Dienstag, 24. Dezember 2013

On the way to deterministic binariy (gcc) output

In some projects (actually it should be in general) it is necessary to prove that two releases from the same input (source + configuration) generate the same output. This property is useful because it allows one to compare the binary output of a compilation/linking step. If there is no difference at all, one can be sure that there was no change on source code either and that the behaviour of the software doesn't change (as long as one trusts the compiler). It also allows one to prove that changes in the build infrastructure/system doesn't change the output of a build, that an archiving concept works, etc.

There are several aspects that should be considered on the way to deterministic binary output:
  1. absolute paths which are compiled into the binary code (mainly for debugging)
  2. compiler sometimes decide randomly e.g., which optimization to use, which path to choose, or how to mangle a specific function in anonymous namespaces. Of course, this has no influence on the functional properties of the code, the binaries are (they should ;-) ) always be functional equivalent.
  3. timestamps, uuid in object files, libraries, etc.
  4. timestamps, dates generated by __DATE__, __TIME__, __TIMESTAMP__ macros

An example where the first point comes into play is the __FILE__ macro which is often used for debugging purposes. The implementation of how this macro gets expanded depends from compiler to compiler. For example Microsofts C++ Compiler uses an FC flag which allows to control if the macro expansion to absolute or relative paths. Of course, the question regarding absolute or relative paths is only of value if you have multiple build machines with different location of workspaces or you care about information of your workspace that gets delivered to your customer. You can easily check if there are any path informations like that in the binary by searching for the workspace path in the binary.

strings binary.out | grep workspace

If there is any output that contains full paths to your sourcecode files then you would have to take care of this problem. For me, I have discovered two solutions:
  • Using compiler switches to make sure that paths are relative
  • Making sure that the build environment/workspaces are on the same absolute paths independent of the actual build machine(master, slave jenkins)
The next point is quite interesting. The average programmer would expect that given a specific piece of code and a set of rules for the compiler and linker, the outcome would be always the same. Well this is (normally) true for the functional behaviour of the piece of code. However, this is not true when comparing the two binaries on byte level. You can easily compare two binaries by using the


cmp -b -l b1 b2
 
This will show you all binary differences with location and difference for b1 vs. b2.
The binary incompatibility has several reasons: one is for examples how gcc mangles functions in anonymous namespaces. A part of this name mangling is randomized by using a random generator. If you have taken care of our first point and your object files still differ, then you can use a special gcc parameter. The -frandom-seed=<string> allows one to specify a string which will be used to initialize the random generator. The documentation for this option tells us...

       -frandom-seed=string
           This option provides a seed that GCC uses when it would otherwise
           use random numbers.  It is used to generate certain symbol names
           that have to be different in every compiled file.  It is also used
           to place unique stamps in coverage data files and the object files
           that produce them.  You can use the -frandom-seed option to produce
           reproducibly identical object files.

           The string should be different for every file you compile.

That means that we have to provide random strings for each file we will compile. I have found one solution to this problem in this blogpost by Jörg Förstner. He suggested to use the md5 hash of the source file as input to the -frandom-seed. This is sufficient as it will change for different source files and vice versa will provide the same seed if the source hasn't changed. He suggested to use the following compile parameters...

           $(CC) -frandom-seed=$(shell md5sum $< | sed 's/\(.*\) .*/\1/') $(CCFLAGS) -c $< -o $@

The seed is constructed by calculating the md5sum of the source code file (e.g. test.cpp).

          md5sum $<
          b61f78373a5b404a027c533b9ca6280f  test.cpp

This result is piped into sed (sed 's/\(.*\) .*/\1/') to cut away the filename part behind the actual md5 sum.

The problem described by the third point (timestamps, uuids) is created by some linkers in the linking step. For example when building object files/static libraries/archives with the ar tool, ar will also insert timestamps, uuids and other stuff which will change from build to build. You can easily try this out by executing ar two times and comparing by comparing the generated output. However, for the ar tool there is a simple solution to this problem, ar comes with the -D option which will turn ar into deterministic mode. The documentation for -D tells us...

       D   Operate in deterministic mode.  When adding files and the archive
           index use zero for UIDs, GIDs, timestamps, and use consistent file
           modes for all files.  When this option is used, if ar is used with
           identical options and identical input files, multiple runs will
           create identical output files regardless of the input files'
           owners, groups, file modes, or modification times. 

My command line for building a static library looks like:
      ar Drvs <output> <input> 

In some cases, for example when using a cross-compiler tool chain, you cannot easily change the bin-utils version to get an ar version that supports the deterministic option. This was the motivation for someone to write a tool that wipes out the timestamps in the generated archive files. You can find this tool at github under the following url: https://github.com/nh2/ar-timestamp-wiper/tree/master. If you use cmake as part of your build system, you can link in the tool in the finish step of the archive generation.
          SET(CMAKE_C_ARCHIVE_FINISH "ar-timestamp-wiper ")
          SET(CMAKE_CXX_ARCHIVE_FINISH ${CMAKE_C_ARCHIVE_FINISH})
          SET(CMAKE_C_ARCHIVE_FINISH ${CMAKE_C_ARCHIVE_FINISH})   

The last point (timestamps, dates introduced by macros like __DATE__, __TIME__, __TIMESTAMP__) can addressed by specifying a deterministic/known value for the corresponding build. I know at least two ways how to do this, both work in general, but sometimes one approach is easier to use then the other.
  1. faketime/libfaketime
  2. overriding the macros by compiler defines  
The first approach works by calling the build step/executable using faketime. Faketime then uses the LD_PRELOAD mechanism to override some of the syscalls to pretend a specific time.
          apt-get install faketime
          faketime '2014-01-09 00:00:00' /usr/bin/date
The second approach works by adding for example -D__DATE__="'Jan 9 2014'" -D__TIME__="'12:00:00'" to your buildstep. You have to take care that that you specify a valid date and time according to the expected return values of __DATE__ and __TIME__.

 References:

  • http://cmake.3232098.n2.nabble.com/How-to-calculate-a-value-quot-on-the-fly-quot-for-use-with-gcc-compiler-option-td3277077.html
  • http://stackoverflow.com/questions/14653874/deterministic-binary-output-with-g
  • https://wiki.debian.org/ReproducibleBuilds

Freitag, 13. Dezember 2013

KVM networking with guests having static public ip addresses

In this blogpost I will show you how to setup kvm networking in the way that guest IPs are mapped to the local network. This way it is possible have guests running on the host with external/public IP addresses. At first kvm should be already installed, additionally we should check if bridge-utils is already installed.
 
sudo apt-get install bridge-utils

Assuming the following scenario:
  • host system ip configuration:
    •   IP: 10.10.10.5
    • netmask: 255.255.0.0
    • Gateway: 10.10.0.1
    • DNS: 10.10.0.250, 10.10.0.251
  • guest ip configuration:
    • IP: 10.10.10.10
    • netmask: 255.255.0.0
    • Gateway: 10.10.0.1
    • DNS: 10.10.0.250, 10.10.0.251
We have to edit /etc/network/interfaces to enable the bridges. Currently my interfaces file looks like that:

auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
       address 10.10.10.5
       netmask 255.255.0.0
       gateway 10.10.0.1
       dns-nameservers 10.10.0.250 10.10.0.251 


This has to be changed in the following way:
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual

auto br0
iface br0 inet static
       address 10.10.10.5
       netmask 255.255.0.0
       gateway 10.10.0.1
       dns-nameservers 10.10.0.250 10.10.0.251 
       bridge_ports eth0
       bridge_stp off
       bridge_fd 0
       bridge_maxwait 0
Now we have to restart networking to incorporate the changes.
/etc/init.d/networking restart

After that we can run vmbuilder to create a new machine with the corresponding client ip configuration.
vmbuilder kvm ubuntu --suite=precise --flavour=virtual --arch=amd64 
--install-mirror=http://apt-cacher-ng:3142/ubuntu -o --libvirt=qemu:///system 
--ip=10.10.10.10 --gw=10.10.10.1 --part=vmbuilder.partition  --templates=templates/ 
--user=admin --name=admin --pass=pass --addpkg=acpid 
--firstboot=/opt/kvm/images/vslave-001/vmbuilder.boot.sh 
--mem=4096 --hostname=build-vslave-001 --bridge=br0 


  • https://help.ubuntu.com/community/KVM/Networking
  • http://docwiki.cisco.com/wiki/OpenStack:VM_Build
  • http://foswiki.org/Support/UbuntuVmBuilder
  • http://www.fak-online.net/?p=22
  • http://www.linux-kvm.org/page/Networking
  • http://blog.braastad.org/?p=128
  • http://www.howtoforge.com/virtualization-with-kvm-on-ubuntu-12.04-lts

Donnerstag, 12. Dezember 2013

changing ulimits for jenkins (daemons started by start-stop-daemon)

Our build infrastructure consists of several build systems running jenkins as a CI platform. The hardware that hosts these servers is very powerful, that means large and fast disks and up to 48 cores and 128GB of RAM. Having such a hardware allows to us to run a higher number of parallel build tasks which is the cause why we have hit the 4k limit for open files (Ubuntu 12.04LTS with default settings). Using ulimit -a will show you all limit settings for the current user.

root@system:# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1031036
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1031036
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

This output shows that the maximum number of open files (soft limit) is 1024, the hard limit can be show with ulimit -a -H giving the following output.

root@system:# ulimit -a -H
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1031036
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1031036
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

A user can increase its ulimits only until the hard limit has been reached by using the ulimit command. Ulimits can be specified system wide by using the /etc/security/limits.conf configuration file. However, these limits can only be applied to sessions that use pam daemon. We will come to this later...
In the following we will change the ulimits for a jenkins server in 2 steps:
  1. setting limits for user jenkins in /etc/security/limits.conf by adding the following two lines to the end of this file...
  2. jenkins soft nofile 64000
    jenkins hard nofile 64000
    
  3. setting limits for the jenkins process started by start-stop-daemon
  4. RUN_AS=jenkins
    COMMAND=start_jenkins.sh
    d_start() {
            ulimit -n 64000
            start-stop-daemon --start --quiet --background --chuid $RUN_AS --exec $COMMAND
    }
    
    The important part is that you have to specify the ulimits, e.g., for the number of open files before start-stop-daemon is called. The reason is that start-stop-daemon doesn't consider pam and hence will not find the limits which have been specified in /etc/security/limits.conf. Once the limits have been changed, one can check this by running a test job in jenkins that uses a shell as build step in which ulimit -a is executed...

Resources:

  • http://superuser.com/questions/454465/make-ulimits-work-with-start-stop-daemon
  • http://www.windowslinuxosx.com/q/answers-values-set-in-limits-conf-are-not-working-678783.html
  • http://posidev.com/blog/2009/06/04/set-ulimit-parameters-on-ubuntu/
  • http://serverfault.com/questions/472904/how-to-diagnose-ulimit-enforcement
  • http://www.ovirt.org/Jenkins#change_open_files
  • http://nishal-tech.blogspot.de/2013/07/how-to-set-ulimit-in-ubuntudebian-linux.html
  • http://www.altj.com/tag/ubuntu/

nagios: check number of threads,open files and http for service using check_by_ssh and a simple bash script

Recently, I had the problem that some of our jenkins servers in our build farm sometimes spawned up to 16k threads. In order to diagnose the problem and in order to monitor the services for availability issues I have created some monitoring scripts for nagios. This tutorial will show you how to monitor jenkins using nagios. Nagios is a nice tool for monitoring infrastructure and services. We will use the check_by_ssh service to execute the checks on the target system.

We will cover the following steps:
  1. Install/configure check_by_ssh plugin
  2. Configure service check_http to check jenkins web interface availability
  3. Configure service to check for jenkins number of open files (ulimit issue)
  4. Configure service to check for jenkins number of threads
1.Install/Configure check_by_ssh
  •  client side
    • install client package
      • apt-get install nagios-nrpe-server
    • enable nagios shell
      • usermod -s /bin/bash nagios
  • server side
    • install server package
      • apt-get install nagios3
    • enable nagios shell
      • usermod -s /bin/bash nagios
    • create ssh keypair
      • su - nagios
      • ssh-keygen -N ""
    • copy public key to clients using scp or other methods e.g. salt, chef
      •  add public key to ~nagios/.ssh/authorized_keys
  • check that the ssh connection is working
    • su - nagios
    • ssh <client-ip-address>
  • check that check_by_ssh is working
    • /usr/lib/nagios/plugins/check_by_ssh -l nagios -H <client-dns-name> -C "hostname"
      • output: <client-dns-name>
 2. Configure service check_http to check jenkins web interface availability
For the http check I was using the check_http command which is already provided as a nagios plugin (/usr/lib/nagios/plugins/). The corresponding nagios service section (services_nagios2.cfg) looks like this:

define service {
        hostgroup_name                  jenkins-servers
        service_description             jenkins nginx http redirect
        check_command                   check_http!-p 8080
        use                             generic-service
        notification_interval           0;
}
Actually in my case I have nginx proxies in front of the jenkins servers to do the https stuff and to redirect from non http addresses to https. This check here will check if the http redirect which is installed on port 8080 is available. However, this can be also used to check for normal jenkins instances running.

3.  Configure service to check for jenkins number of open files (ulimit issue)

Checking the number of open files for a specific user is a little bit trickier. I was using a perl script found at http://exchange.nagios.org/directory/Plugins/Uncategorized/Operating-Systems/Linux/check-open-files/details which I have placed in a plugin sub folder in the nagios home directory (on the clients). Then I have added a check_by_ssh_open_files custom command to the custom_commands.cfg nagios configuration file. The command uses the check_by_ssh command to call the plugin which has been installed on the client side.
define command {
        command_name    check_by_ssh_open_files
        command_line    $USER1$/check_by_ssh -o StrictHostKeyChecking=no -l nagios -H $HOSTADDRESS$ -C "/var/lib/nagios/plugins/check_unix_open_files.pl -a $ARG1$ -w $ARG2$,$ARG2$ -c $ARG3$,$ARG3$"
        }
After that I have defined a custom service that uses the check_by_ssh_open_files command with the username jenkins and warning level set to 2048 and critical level set to 4096 threads. Note that on standard installations the ulimits have been typically set to 4096 open files at most.
define service {
        hostgroup_name                  jenkins-servers
        service_description             check open files jenkins
        check_command                   check_by_ssh_open_files!jenkins!2048!4096
        use                             generic-service
        notification_interval           0;
}

4.Configure service to check for jenkins number of threads
Checking the number of threads for a specific process is even more complex. I found two solutions to do that, the first is to use the check_proc plugin which is part of nagios. The problem is that you have to recompile that plugin with a special ps-command syntax to display also all threads of a process instead of only the processes. Also I figured out how to download, configure and compile the plugin code, I wasn't able to figure out the specific ps options. You somehow have to define the command, parse paremeters and so on... I have found the following parameters on the web:

    --with-ps-command="/bin/ps -eo 's uid pid ppid vsz rss pcpu etime comm args'" \
    --with-ps-format='%s %d %d %d %d %d %f %s %s %n' \
    --with-ps-cols=10 \
    --with-ps-varlist='procstat,&;procuid,&;procpid,&;procppid,&;procvsz,&;procrss,&;procpcpu,procetime,procprog,&pos'

After wasting more than one hour with that I decided to write a simple bash script which will suffice my and nagios requirements. Here it is, my first nagios checker script...


#!/bin/bash

RET_OK=0
RET_WARN=1
RET_CRIT=2
RET_UNKNOWN=3

user="$1"
warn="$2"
crit="$3"

id=`id -u $user`
if [ $? -ne 0 ]
then
echo "UNKNOWN: USAGE ./check_threads.sh   "
fi
count=`ps auxH | grep $user | wc -l`
if [ $? -ne 0 ]
then
echo "UNKNOWN: USAGE ./check_threads.sh   "
fi
if [ $count -lt $warn ]
then
echo "THREADS OK: $count processes/threads with UID = $id ($user)"
exit $RET_OK
elif [ $count -lt $crit ]
then
echo "WARNING - $count threads processes/threads with UID = $id ($user)"
exit $RET_WARN
else
echo "CRITICAL - $count threads processes/threads with UID = $id ($user)"
exit $RET_CRIT
fi
This is the corresponding custom command...
define command {
        command_name    check_by_ssh_threads
        command_line    $USER1$/check_by_ssh -o StrictHostKeyChecking=no -l nagios -H $HOSTADDRESS$ -C "/var/lib/nagios/plugins/check_threads.sh $ARG1$ $ARG2$ $ARG3$"
        }

and this is the corresponding service that will check if the number of jenkins threads exceed 512 (warning) or 1024 (critical).
 
define service {
        hostgroup_name                  jenkins-servers
        service_description             check number threads jenkins
        check_command                   check_by_ssh_threads!jenkins!512!1024
        use                             generic-service
        flap_detection_enabled          0
        notification_interval           0;
}

Resources: 

  • http://www.nagios-wiki.de/nagios/plugins/check_by_ssh
  • http://exchange.nagios.org/directory/Plugins/Uncategorized/Operating-Systems/Linux/check-open-files/details
  • http://esisteinfehleraufgetreten.wordpress.com/2009/09/25/installing-nagios-or-icinga/
  • http://www.nagios-wiki.de/nagios/plugins/check_http
  • http://www.nagios.org/documentation