# PAE and Xen balloon benchmarks #
## Protocol ##
Three tests were performed to benchmark the kernel:
1. build.sh runs. The results are those returned by [[!template id=man name="time" section="1"]].
1. hackbench, a popular tool used by Linux to benchmarks thread/process creation time.
1. sysbench, which can benchmark mulitple aspect of a system. Presently, the memory bandwidth, thread creation, and OLTP (online transaction processing) tests were used.
All were done three times, with a reboot between each of these tests.
The machine used:
[[!template id=programlisting text="""
# cpuctl list
Num HwId Unbound LWPs Interrupts Last change
---- ---- ------------ -------------- ----------------------------
0 0 online intr Sun Jul 11 00:25:31 2010
1 1 online intr Sun Jul 11 00:25:31 2010
# cpuctl identify 0
cpu0: Intel Pentium 4 (686-class), 2798.78 MHz, id 0xf29
cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 0x4400<CID,xTPR>
cpu0: "Intel(R) Pentium(R) 4 CPU 2.80GHz"
cpu0: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
cpu0: L2 cache 512KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: SMT ID 0
cpu0: family 0f model 02 extfamily 00 extmodel 00
# cpuctl identify 1
cpu1: Intel Pentium 4 (686-class), 2798.78 MHz, id 0xf29
cpu1: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu1: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu1: features2 0x4400<CID,xTPR>
cpu1: "Intel(R) Pentium(R) 4 CPU 2.80GHz"
cpu1: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
cpu1: L2 cache 512KB 64B/line 8-way
cpu1: ITLB 4K/4M: 64 entries
cpu1: DTLB 4K/4M: 64 entries
cpu1: Initial APIC ID 0
cpu1: Cluster/Package ID 0
cpu1: SMT ID 0
cpu1: family 0f model 02 extfamily 00 extmodel 00
This machine uses HT - so technically speaking, it is not a true bi-CPU host.
## PAE ##
Overall, PAE affects memory performance by a 15-20% ratio; this is particularly noticeable with sysbench and hackbench, where bandwidth and thread/process creation time are all slower.
Userland remains rather unaffected, with differences in the 5% range; build.sh -j4 runs approximately 5% slower under PAE, both for native and Xen case.
Do not be surprised by the important "user" result for build.sh benchmark in the native vs Xen case. Build being performed with -j4 (4 make sub-jobs in parallel), many processes may run concurrently under i386 native, crediting more time for userland, while under Xen, the kernel is not SMP capable.
Notice that, in a MP context, Xen stays behind by a 40% margin for parallel build. Given that Xen overhead is considered negligible, it shows that NetBSD build system gets an important boost when parallelized, at least for bi-CPU setups. Just to show that the concurrent build is not purely rhetorical :)
## Xen ballooning ##
In essence, there is not much to say. Results are all below the 5% margin, adding the balloon thread did not affect performance or process creation/scheduling drastically. It is all noise. The timeout delay added by cherry@ seems to be reasonable (can be revisited later, but does not seem to be critical).
CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb