Annotation of wikisrc/users/jym/benchmarks.mdwn, revision 1.3

1.1       wiki        1: # PAE and Xen balloon benchmarks #
                      2: 
                      3: ## Protocol ##
                      4: 
                      5: Three tests were performed to benchmark the kernel:
                      6: 
                      7: 1. build.sh runs. The results are those returned by [[!template  id=man name="time" section="1"]].
                      8: 1. hackbench, a popular tool used by Linux to benchmarks thread/process creation time.
                      9: 1. sysbench, which can benchmark mulitple aspect of a system. Presently, the memory bandwidth, thread creation, and OLTP (online  transaction processing) tests were used.
                     10: 
                     11: All were done three times, with a reboot between each of these tests.
                     12: 
                     13: The machine used:
                     14: 
                     15: [[!template  id=programlisting text="""
                     16: # cpuctl list                                                      
                     17: Num  HwId Unbound LWPs Interrupts     Last change
                     18: ---- ---- ------------ -------------- ----------------------------
                     19: 0    0    online       intr           Sun Jul 11 00:25:31 2010
                     20: 1    1    online       intr           Sun Jul 11 00:25:31 2010
                     21: # cpuctl identify 0                                                
                     22: cpu0: Intel Pentium 4 (686-class), 2798.78 MHz, id 0xf29
                     23: cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
                     24: cpu0: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
                     25: cpu0: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
                     26: cpu0: features2 0x4400<CID,xTPR>
                     27: cpu0: "Intel(R) Pentium(R) 4 CPU 2.80GHz"
                     28: cpu0: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
                     29: cpu0: L2 cache 512KB 64B/line 8-way
                     30: cpu0: ITLB 4K/4M: 64 entries
                     31: cpu0: DTLB 4K/4M: 64 entries
                     32: cpu0: Initial APIC ID 0
                     33: cpu0: Cluster/Package ID 0
                     34: cpu0: SMT ID 0
                     35: cpu0: family 0f model 02 extfamily 00 extmodel 00
                     36: # cpuctl identify 1 
                     37: cpu1: Intel Pentium 4 (686-class), 2798.78 MHz, id 0xf29
                     38: cpu1: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
                     39: cpu1: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
                     40: cpu1: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
                     41: cpu1: features2 0x4400<CID,xTPR>
                     42: cpu1: "Intel(R) Pentium(R) 4 CPU 2.80GHz"
                     43: cpu1: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
                     44: cpu1: L2 cache 512KB 64B/line 8-way
                     45: cpu1: ITLB 4K/4M: 64 entries
                     46: cpu1: DTLB 4K/4M: 64 entries
                     47: cpu1: Initial APIC ID 0
                     48: cpu1: Cluster/Package ID 0
                     49: cpu1: SMT ID 0
                     50: cpu1: family 0f model 02 extfamily 00 extmodel 00
                     51: """]]
                     52: 
                     53: This machine uses HT - so technically speaking, it is not a true bi-CPU host.
                     54: 
                     55: ## PAE ##
                     56: 
1.2       wiki       57: [[build-pae.png]]
                     58: [[hackbench-pae.png]]
                     59: [[sysbench-pae.png]]
                     60: 
1.1       wiki       61: Overall, PAE affects memory performance by a 15-20% ratio; this is particularly noticeable with sysbench and hackbench, where bandwidth and thread/process creation time are all slower.
                     62: 
                     63: Userland remains rather unaffected, with differences in the 5% range; build.sh -j4 runs approximately 5% slower under PAE, both for native and Xen case.
                     64: 
                     65: Do not be surprised by the important "user" result for build.sh benchmark in the native vs Xen case. Build being performed with -j4 (4 make sub-jobs in parallel), many processes may run concurrently under i386 native, crediting more time for userland, while under Xen, the kernel is not SMP capable.
                     66: 
1.3     ! wiki       67: When comparing Xen with a native kernel with all CPU turned offline except one, we observe an overhead of 15 to 20%, that mostly impacts performance at "sys" (kernel) level, which directly affects the total time of a full build.sh -j4 release. Contrary to original belief, Xen does add overhead. One exception being the memory bandwidth benchmark, where Xen (PAE and non-PAE) outperforms the native kernels in an UP context.
        !            68: 
        !            69: Notice that, in a MP context, the total build time between the full-MP system and the one with just one CPU running sees an improvement by approximately 15%, with "sys" nearly doubling its time credit when both CPUs are running. As the *src/* directory remained the same between the two tests, we can assume that the kernel was **concurrently** solicited twice as much in the bi-CPU than in the mono-CPU case.
1.1       wiki       70: 
                     71: ## Xen ballooning ##
                     72: 
1.2       wiki       73: [[build-balloon.png]]
                     74: [[hackbench-balloon.png]]
                     75: [[sysbench-balloon.png]]
                     76: 
                     77: 
1.1       wiki       78: In essence, there is not much to say. Results are all below the 5% margin, adding the balloon thread did not affect performance or process creation/scheduling drastically. It is all noise. The timeout delay added by cherry@ seems to be reasonable (can be revisited later, but does not seem to be critical).

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb