March 2006

Aggregate Optimization and Tuning I

A question, if key portions of a system are recompiled with full optimization and the system tuned - on an aggregate scale will the work yield any results? There are also other questions to consider; is the extra memory allocation worth the payoff, what about extended compile times for optimization strategies, and finally on an aggregate scale - is it even worth it?


To keep things simple, a small system was used and a limited set of of software.

System Specs

The system is a small whitebox purchased from a fly by night company (arguably - a good deal in 2000 only 200 USD):

mui@pyxis:~$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Celeron(TM) CPU                1000MHz
stepping        : 1
cpu MHz         : 993.499
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge \
                                        mca cmov pat pse36 mmx fxsr sse
bogomips        : 1989.45

and the memory:

mui@pyxis:~$ cat /proc/meminfo 
MemTotal:       127148 kB
MemFree:          2524 kB
Buffers:          8088 kB
Cached:          99332 kB
SwapCached:        540 kB
Active:           9608 kB
Inactive:        99984 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       127148 kB
LowFree:          2524 kB
SwapTotal:      369452 kB
SwapFree:       366860 kB
Dirty:               0 kB
Writeback:           0 kB
Mapped:           5104 kB
Slab:            11896 kB
CommitLimit:    433024 kB
Committed_AS:    12204 kB
PageTables:        232 kB
VmallocTotal:   909008 kB
VmallocUsed:      2468 kB
VmallocChunk:   906128 kB

Last and not least:

mui@pyxis:~$ uname -r
mui@pyxis:~$ cat /etc/issue
Debian GNU/Linux testing/unstable


To narrow the scope, the optimized/tuned components were narrowed down to the following list:

  • Linux kernel, preferably a recent Debian source packaged one.
  • perl interpreter
  • gcc
  • bash

The Steps

The procedure is somewhat chicken and egg, initially it consisted of can it even be done in that, can the required components even be recompiled at all? Next, can the optimization work? Once those two questions were answered, the rest of the exercise becomes procedural. The one common element is that full optimization using the GNU C Compiler is simply a matter of setting the CFLAGS to -O3 - in this respect the task simply became time consuming (waiting for builds to finish) and was slightly different for some components.


Building gcc is pretty easy, just do the following once it is downloaded:

mkdir gccbuild
mv gcc-x.x.x.tar.gz gccbuild
cd gccbuild
mkdir build
tar xzvf gcc-x.x.x.tar.gz
export CFLAGS+=-O3
sudo make install

Just to be cautious the Makefile CFLAGS line was changed from -O2 to -O3.


tar xzvf bash-x.x.x.tar.gz
cd bash-x.x.x
vi Makefile; s/-O2/O3/g; wq!
make >> sudo make install


tar xzvf perl-x.x.x.tar.gz
cd perl-x.x.x

When it asks for flags, change -O2 to -O3.

Debian Packaged Linux Kernel

apt-get install linux-source-x.x.x-foo
tar jxfv /usr/src/linux-source-x.x.x-foo
make-kpkg clean
fakeroot make-kpkg --initrd --revision=custom kernel_image
dpkg -i ../linux-image-x.x.x-foo

Please note a few things:

  • An initrd is used because it is the default behavior of the Debian GNU/Linux system.
  • Several other packages are required, such as headers (which for the testing branch generally work with the latest source package) and Debian developer essentials plus fakeroot.

The Next Steps

All of the the optimized software (outside of the kernel) was installed into /usr/local. The remaining documents will detail the following steps:

  • Tuning the kernel and creating a new custom package for it.
  • Creating test cases to relate the installed softwares and kernel versus the optimized/tuned ones.
  • Discussion and summary of results.