File:  [NetBSD Developer Wiki] / wikisrc / users / jym / benchmarks.mdwn
Revision 1.2: download - view: text, annotated - select for diffs
Sat Jul 10 23:25:33 2010 UTC (5 years, 4 months ago) by wiki
Branches: MAIN
CVS tags: HEAD
web commit by jym: Add the images.

# PAE and Xen balloon benchmarks #

## Protocol ##

Three tests were performed to benchmark the kernel:

1. runs. The results are those returned by [[!template  id=man name="time" section="1"]].
1. hackbench, a popular tool used by Linux to benchmarks thread/process creation time.
1. sysbench, which can benchmark mulitple aspect of a system. Presently, the memory bandwidth, thread creation, and OLTP (online  transaction processing) tests were used.

All were done three times, with a reboot between each of these tests.

The machine used:

[[!template  id=programlisting text="""
# cpuctl list                                                      
Num  HwId Unbound LWPs Interrupts     Last change
---- ---- ------------ -------------- ----------------------------
0    0    online       intr           Sun Jul 11 00:25:31 2010
1    1    online       intr           Sun Jul 11 00:25:31 2010
# cpuctl identify 0                                                
cpu0: Intel Pentium 4 (686-class), 2798.78 MHz, id 0xf29
cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 0x4400<CID,xTPR>
cpu0: "Intel(R) Pentium(R) 4 CPU 2.80GHz"
cpu0: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
cpu0: L2 cache 512KB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: SMT ID 0
cpu0: family 0f model 02 extfamily 00 extmodel 00
# cpuctl identify 1 
cpu1: Intel Pentium 4 (686-class), 2798.78 MHz, id 0xf29
cpu1: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu1: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu1: features2 0x4400<CID,xTPR>
cpu1: "Intel(R) Pentium(R) 4 CPU 2.80GHz"
cpu1: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
cpu1: L2 cache 512KB 64B/line 8-way
cpu1: ITLB 4K/4M: 64 entries
cpu1: DTLB 4K/4M: 64 entries
cpu1: Initial APIC ID 0
cpu1: Cluster/Package ID 0
cpu1: SMT ID 0
cpu1: family 0f model 02 extfamily 00 extmodel 00

This machine uses HT - so technically speaking, it is not a true bi-CPU host.

## PAE ##


Overall, PAE affects memory performance by a 15-20% ratio; this is particularly noticeable with sysbench and hackbench, where bandwidth and thread/process creation time are all slower.

Userland remains rather unaffected, with differences in the 5% range; -j4 runs approximately 5% slower under PAE, both for native and Xen case.

Do not be surprised by the important "user" result for benchmark in the native vs Xen case. Build being performed with -j4 (4 make sub-jobs in parallel), many processes may run concurrently under i386 native, crediting more time for userland, while under Xen, the kernel is not SMP capable.

Notice that, in a MP context, Xen stays behind by a 40% margin for parallel build. Given that Xen overhead is considered negligible, it shows that NetBSD build system gets an important boost when parallelized, at least for bi-CPU setups. Just to show that the concurrent build is not purely rhetorical :)

## Xen ballooning ##


In essence, there is not much to say. Results are all below the 5% margin, adding the balloon thread did not affect performance or process creation/scheduling drastically. It is all noise. The timeout delay added by cherry@ seems to be reasonable (can be revisited later, but does not seem to be critical).

CVSweb for NetBSD wikisrc <> software: FreeBSD-CVSweb