Annotation of wikisrc/users/jym/benchmarks.mdwn, revision 1.1

1.1     ! wiki        1: # PAE and Xen balloon benchmarks #
        !             2: 
        !             3: ## Protocol ##
        !             4: 
        !             5: Three tests were performed to benchmark the kernel:
        !             6: 
        !             7: 1. build.sh runs. The results are those returned by [[!template  id=man name="time" section="1"]].
        !             8: 1. hackbench, a popular tool used by Linux to benchmarks thread/process creation time.
        !             9: 1. sysbench, which can benchmark mulitple aspect of a system. Presently, the memory bandwidth, thread creation, and OLTP (online  transaction processing) tests were used.
        !            10: 
        !            11: All were done three times, with a reboot between each of these tests.
        !            12: 
        !            13: The machine used:
        !            14: 
        !            15: [[!template  id=programlisting text="""
        !            16: # cpuctl list                                                      
        !            17: Num  HwId Unbound LWPs Interrupts     Last change
        !            18: ---- ---- ------------ -------------- ----------------------------
        !            19: 0    0    online       intr           Sun Jul 11 00:25:31 2010
        !            20: 1    1    online       intr           Sun Jul 11 00:25:31 2010
        !            21: # cpuctl identify 0                                                
        !            22: cpu0: Intel Pentium 4 (686-class), 2798.78 MHz, id 0xf29
        !            23: cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
        !            24: cpu0: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
        !            25: cpu0: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
        !            26: cpu0: features2 0x4400<CID,xTPR>
        !            27: cpu0: "Intel(R) Pentium(R) 4 CPU 2.80GHz"
        !            28: cpu0: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
        !            29: cpu0: L2 cache 512KB 64B/line 8-way
        !            30: cpu0: ITLB 4K/4M: 64 entries
        !            31: cpu0: DTLB 4K/4M: 64 entries
        !            32: cpu0: Initial APIC ID 0
        !            33: cpu0: Cluster/Package ID 0
        !            34: cpu0: SMT ID 0
        !            35: cpu0: family 0f model 02 extfamily 00 extmodel 00
        !            36: # cpuctl identify 1 
        !            37: cpu1: Intel Pentium 4 (686-class), 2798.78 MHz, id 0xf29
        !            38: cpu1: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
        !            39: cpu1: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
        !            40: cpu1: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
        !            41: cpu1: features2 0x4400<CID,xTPR>
        !            42: cpu1: "Intel(R) Pentium(R) 4 CPU 2.80GHz"
        !            43: cpu1: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
        !            44: cpu1: L2 cache 512KB 64B/line 8-way
        !            45: cpu1: ITLB 4K/4M: 64 entries
        !            46: cpu1: DTLB 4K/4M: 64 entries
        !            47: cpu1: Initial APIC ID 0
        !            48: cpu1: Cluster/Package ID 0
        !            49: cpu1: SMT ID 0
        !            50: cpu1: family 0f model 02 extfamily 00 extmodel 00
        !            51: """]]
        !            52: 
        !            53: This machine uses HT - so technically speaking, it is not a true bi-CPU host.
        !            54: 
        !            55: ## PAE ##
        !            56: 
        !            57: Overall, PAE affects memory performance by a 15-20% ratio; this is particularly noticeable with sysbench and hackbench, where bandwidth and thread/process creation time are all slower.
        !            58: 
        !            59: Userland remains rather unaffected, with differences in the 5% range; build.sh -j4 runs approximately 5% slower under PAE, both for native and Xen case.
        !            60: 
        !            61: Do not be surprised by the important "user" result for build.sh benchmark in the native vs Xen case. Build being performed with -j4 (4 make sub-jobs in parallel), many processes may run concurrently under i386 native, crediting more time for userland, while under Xen, the kernel is not SMP capable.
        !            62: 
        !            63: Notice that, in a MP context, Xen stays behind by a 40% margin for parallel build. Given that Xen overhead is considered negligible, it shows that NetBSD build system gets an important boost when parallelized, at least for bi-CPU setups. Just to show that the concurrent build is not purely rhetorical :)
        !            64: 
        !            65: ## Xen ballooning ##
        !            66: 
        !            67: In essence, there is not much to say. Results are all below the 5% margin, adding the balloon thread did not affect performance or process creation/scheduling drastically. It is all noise. The timeout delay added by cherry@ seems to be reasonable (can be revisited later, but does not seem to be critical).

CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb