Hyper-threading: measure its impact during SLOB runs thanks to numactl and turbostat

Introduction

As you probably know, a single physical CPU core with hyper-threading appears as two logical CPUs to an operating system. 

While the operating system sees two CPUs for each core, the actual CPU hardware only has a single set of execution resources for each core.

Hyper-threading can help speed your system up, but it’s nowhere near as good as having additional cores.

Out of curiosity, I checked the impact of hyper-threading on the number of logical IO per second (LIOPS) during SLOB runs. The main idea is to compare:

  • 2 FULL CACHED SLOB sessions running on the same core
  • 2 FULL CACHED SLOB sessions running on two distincts cores

Into this blog post, I just want to share how I did the measure and my results (of course your results could differ depending of the cpus, the slob configuration and so on..).

Tests environment and setup

  • Oracle database 11.2.0.4.
  • SLOB with the following slob.conf in place (the rest of the file contains the default slob.conf values):
UPDATE_PCT=0
RUN_TIME=60
WORK_LOOP=0
SCALE=80M
WORK_UNIT=64
  • The cpu_topology is the following (the cpu_topology script comes from Karl Arao’s cputoolkit here):
$ sh ./cpu_topology
	
model name	: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
processors  (OS CPU count)          0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
physical id (processor socket)      0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
siblings    (logical CPUs/socket)   16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16
core id     (# assigned to a core)  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
cpu cores   (physical cores/socket) 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

$ lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 2893.187
BogoMIPS: 5785.23
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K

As you can see the box is made of 2 sockets, 16 cores and 32 threads (so that the database sees 32 cpus).

  • Turbo boost does not come into play (you’ll see the proof during the tests thanks to the turbostat command).
  • Obviously the same SLOB configuration has been used for the tests.

Launch the tests

Test 1: launch 2 SLOB sessions on 2 distincts sockets (so on 2 distincts cores) that way:

$ cd SLOB
$ numactl --physcpubind=4,9 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

Cpus 4 and 9 are on 2 distincts sockets (see cpu_topology output).

The turbsostat command shows that they are 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     241    8.34    2893    2893       0   91.66    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00  116.66   88.59    0.00    0.00    0.00
       0       0       0      86    2.98    2893    2893       0   97.02    0.00    0.00    0.00      46      48    0.00    0.00    0.00    0.00   50.44   36.66    0.00    0.00    0.00
       0       0      16      70    2.41    2893    2893       0   97.59
       0       1       1      71    2.45    2893    2893       0   97.55    0.00    0.00    0.00      43
       0       1      17     131    4.54    2893    2893       0   95.46
       0       2       2      81    2.81    2893    2893       0   97.19    0.00    0.00    0.00      44
       0       2      18      73    2.53    2893    2893       0   97.47
       0       3       3      29    1.00    2893    2893       0   99.00    0.00    0.00    0.00      44
       0       3      19      21    0.72    2893    2893       0   99.28
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      46
       0       4      20      40    1.39    2893    2893       0   98.61
       0       5       5      58    2.02    2893    2893       0   97.98    0.00    0.00    0.00      46
       0       5      21      82    2.82    2893    2893       0   97.18
       0       6       6     106    3.65    2893    2893       0   96.35    0.00    0.00    0.00      44
       0       6      22      42    1.46    2893    2893       0   98.54
       0       7       7      28    0.97    2893    2893       0   99.03    0.00    0.00    0.00      48
       0       7      23      15    0.52    2893    2893       0   99.48
       1       0       8     222    7.67    2893    2893       0   92.33    0.00    0.00    0.00      72      76    0.00    0.00    0.00    0.00   66.22   51.92    0.00    0.00    0.00
       1       0      24      63    2.17    2893    2893       0   97.83
       1       1       9    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      76
       1       1      25     108    3.73    2893    2893       0   96.27
       1       2      10      86    2.98    2893    2893       0   97.02    0.00    0.00    0.00      71
       1       2      26      78    2.71    2893    2893       0   97.29
       1       3      11      57    1.96    2893    2893       0   98.04    0.00    0.00    0.00      69
       1       3      27      41    1.40    2893    2893       0   98.60
       1       4      12      56    1.93    2893    2893       0   98.07    0.00    0.00    0.00      71
       1       4      28      30    1.03    2893    2893       0   98.97
       1       5      13      32    1.12    2893    2893       0   98.88    0.00    0.00    0.00      71
       1       5      29      23    0.79    2893    2893       0   99.21
       1       6      14      49    1.68    2893    2893       0   98.32    0.00    0.00    0.00      71
       1       6      30      93    3.20    2893    2893       0   96.80
       1       7      15      41    1.41    2893    2893       0   98.59    0.00    0.00    0.00      72
       1       7      31      21    0.72    2893    2893       0   99.28

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 955 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      2.95
              DB CPU(s):               1.9               5.6      0.00      2.89
      Redo size (bytes):          16,541.5          48,682.3
  Logical read (blocks):         955,750.1       2,812,818.1

Test 2: launch 2 SLOB sessions on 2 distincts cores on the same socket that way:

$ cd SLOB
$ numactl --physcpubind=4,5 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

Cpus 4 and 5 are on 2 distincts cores on the same socket (see cpu_topology output).

The turbsostat command shows that they are 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     270    9.34    2893    2893       0   90.66    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00  119.34   91.17    0.00    0.00    0.00
       0       0       0     152    5.26    2893    2893       0   94.74    0.00    0.00    0.00      47      49    0.00    0.00    0.00    0.00   55.29   41.51    0.00    0.00    0.00
       0       0      16      59    2.04    2893    2893       0   97.96
       0       1       1      62    2.13    2893    2893       0   97.87    0.00    0.00    0.00      42
       0       1      17      49    1.70    2893    2893       0   98.30
       0       2       2      59    2.03    2893    2893       0   97.97    0.00    0.00    0.00      43
       0       2      18      55    1.90    2893    2893       0   98.10
       0       3       3      10    0.33    2893    2893       0   99.67    0.00    0.00    0.00      44
       0       3      19      16    0.56    2893    2893       0   99.44
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      46
       0       4      20      40    1.40    2893    2893       0   98.60
       0       5       5    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      49
       0       5      21     133    4.59    2893    2893       0   95.41
       0       6       6      71    2.47    2893    2893       0   97.53    0.00    0.00    0.00      45
       0       6      22      57    1.97    2893    2893       0   98.03
       0       7       7      20    0.70    2893    2893       0   99.30    0.00    0.00    0.00      47
       0       7      23      14    0.47    2893    2893       0   99.53
       1       0       8     364   12.58    2893    2893       0   87.42    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00   64.05   49.66    0.00    0.00    0.00
       1       0      24      93    3.20    2893    2893       0   96.80
       1       1       9     270    9.34    2893    2893       0   90.66    0.00    0.00    0.00      76
       1       1      25     105    3.62    2893    2893       0   96.38
       1       2      10     267    9.23    2893    2893       0   90.77    0.00    0.00    0.00      75
       1       2      26     283    9.77    2893    2893       0   90.23
       1       3      11      73    2.52    2893    2893       0   97.48    0.00    0.00    0.00      73
       1       3      27      92    3.19    2893    2893       0   96.81
       1       4      12      43    1.50    2893    2893       0   98.50    0.00    0.00    0.00      75
       1       4      28      44    1.53    2893    2893       0   98.47
       1       5      13      46    1.60    2893    2893       0   98.40    0.00    0.00    0.00      75
       1       5      29      48    1.67    2893    2893       0   98.33
       1       6      14      76    2.63    2893    2893       0   97.37    0.00    0.00    0.00      75
       1       6      30     105    3.64    2893    2893       0   96.36
       1       7      15     106    3.67    2893    2893       0   96.33    0.00    0.00    0.00      75

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 920 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      3.02
              DB CPU(s):               1.9               5.7      0.00      2.97
      Redo size (bytes):          16,014.6          47,163.1
  Logical read (blocks):         921,446.7       2,713,660.6

So, let’s say this is more or less the same as with 2 sessions on 2 distincts sockets.

Test 3: launch 2 SLOB sessions on the same core that way:

$ cd SLOB
$ numactl --physcpubind=4,20 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

Cpus 4 and 20 are sharing the same core (see cpu_topology output).

The turbsostat command shows that they are 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     250    8.65    2893    2893       0   91.35    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00  114.06   85.95    0.00    0.00    0.00
       0       0       0     100    3.45    2893    2893       0   96.55    0.00    0.00    0.00      48      48    0.00    0.00    0.00    0.00   51.21   37.45    0.00    0.00    0.00
       0       0      16      41    1.41    2893    2893       0   98.59
       0       1       1      75    2.61    2893    2893       0   97.39    0.00    0.00    0.00      42
       0       1      17      58    2.01    2893    2893       0   97.99
       0       2       2      51    1.75    2893    2893       0   98.25    0.00    0.00    0.00      44
       0       2      18      40    1.38    2893    2893       0   98.62
       0       3       3      25    0.86    2893    2893       0   99.14    0.00    0.00    0.00      45
       0       3      19       9    0.30    2893    2893       0   99.70
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      47
       0       4      20    2893  100.00    2893    2893       0    0.00
       0       5       5     120    4.14    2893    2893       0   95.86    0.00    0.00    0.00      44
       0       5      21      73    2.54    2893    2893       0   97.46
       0       6       6      65    2.25    2893    2893       0   97.75    0.00    0.00    0.00      45
       0       6      22      51    1.76    2893    2893       0   98.24
       0       7       7      18    0.64    2893    2893       0   99.36    0.00    0.00    0.00      48
       0       7      23       7    0.25    2893    2893       0   99.75
       1       0       8     491   16.97    2893    2893       0   83.03    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00   62.85   48.50    0.00    0.00    0.00
       1       0      24      88    3.04    2893    2893       0   96.96
       1       1       9      91    3.16    2893    2893       0   96.84    0.00    0.00    0.00      75
       1       1      25      80    2.75    2893    2893       0   97.25
       1       2      10      87    3.00    2893    2893       0   97.00    0.00    0.00    0.00      74
       1       2      26     104    3.61    2893    2893       0   96.39
       1       3      11      62    2.16    2893    2893       0   97.84    0.00    0.00    0.00      72
       1       3      27      80    2.77    2893    2893       0   97.23
       1       4      12      36    1.24    2893    2893       0   98.76    0.00    0.00    0.00      74
       1       4      28      36    1.25    2893    2893       0   98.75
       1       5      13      47    1.62    2893    2893       0   98.38    0.00    0.00    0.00      75
       1       5      29      75    2.61    2893    2893       0   97.39
       1       6      14      52    1.79    2893    2893       0   98.21    0.00    0.00    0.00      75
       1       6      30      92    3.17    2893    2893       0   96.83
       1       7      15      39    1.35    2893    2893       0   98.65    0.00    0.00    0.00      76
       1       7      31      32    1.11    2893    2893       0   98.89

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 600 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      3.02
              DB CPU(s):               1.9               5.6      0.00      2.96
      Redo size (bytes):          16,204.0          47,101.1
  Logical read (blocks):         603,430.3       1,754,028.2

So, as you can see, the SLOB runs (with the same SLOB configuration) produced about:

  • 955 000 LIOPS with the 2 sessions running on 2 distincts sockets (so on 2 distincts cores).
  • 920 000 LIOPS with the 2 sessions running on 2 distincts cores on the same socket.
  • 600 000 LIOPS with the 2 sessions running on the same core.

Out of curiosity, what if we disable the hyper-threading on one core and launch the 2 sessions on it?

Let’s put the cpu 20 offline (to be sure there is no noise on it during the test)

echo 0 > /sys/devices/system/cpu/cpu20/online

launch the 2 sessions on the same cpu (the remaining one online for this core as the second one has just been put offline) that way:

$ cd SLOB
$ numactl --physcpubind=4 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

The turbsostat command shows that cpu 4 is 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     153    5.30    2893    2893       0   94.70    0.00    0.00    0.00      77      77    0.00    0.00    0.00    0.00  112.76   84.64    0.00    0.00    0.00
       0       0       0      56    1.94    2893    2893       0   98.06    0.00    0.00    0.00      45      46    0.00    0.00    0.00    0.00   49.86   36.07    0.00    0.00    0.00
       0       0      16      42    1.45    2893    2893       0   98.55
       0       1       1      97    3.35    2893    2893       0   96.65    0.00    0.00    0.00      42
       0       1      17      61    2.11    2893    2893       0   97.89
       0       2       2      47    1.62    2893    2893       0   98.38    0.00    0.00    0.00      44
       0       2      18      45    1.54    2893    2893       0   98.46
       0       3       3      19    0.64    2893    2893       0   99.36    0.00    0.00    0.00      44
       0       3      19      25    0.85    2893    2893       0   99.15
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      46
       0       5       5       9    0.31    2893    2893       0   99.69    0.00    0.00    0.00      44
       0       5      21      14    0.49    2893    2893       0   99.51
       0       6       6      14    0.48    2893    2893       0   99.52    0.00    0.00    0.00      44
       0       6      22      12    0.41    2893    2893       0   99.59
       0       7       7      25    0.88    2893    2893       0   99.12    0.00    0.00    0.00      46
       0       7      23       9    0.33    2893    2893       0   99.67
       1       0       8     133    4.59    2893    2893       0   95.41    0.00    0.00    0.00      76      77    0.00    0.00    0.00    0.00   62.90   48.56    0.00    0.00    0.00
       1       0      24      69    2.37    2893    2893       0   97.63
       1       1       9      89    3.07    2893    2893       0   96.93    0.00    0.00    0.00      76
       1       1      25      73    2.52    2893    2893       0   97.48
       1       2      10      98    3.38    2893    2893       0   96.62    0.00    0.00    0.00      76
       1       2      26     113    3.92    2893    2893       0   96.08
       1       3      11      76    2.62    2893    2893       0   97.38    0.00    0.00    0.00      74
       1       3      27      78    2.71    2893    2893       0   97.29
       1       4      12     194    6.71    2893    2893       0   93.29    0.00    0.00    0.00      75
       1       4      28      27    0.94    2893    2893       0   99.06
       1       5      13      62    2.14    2893    2893       0   97.86    0.00    0.00    0.00      76
       1       5      29      95    3.29    2893    2893       0   96.71
       1       6      14     101    3.48    2893    2893       0   96.52    0.00    0.00    0.00      75
       1       6      30      92    3.19    2893    2893       0   96.81
       1       7      15      64    2.20    2893    2893       0   97.80    0.00    0.00    0.00      77
       1       7      31      20    0.68    2893    2893       0   99.32

and as you can see it is the only cpu available on that core.

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 460 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      2.69
              DB CPU(s):               1.0               2.8      0.00      1.32
      Redo size (bytes):          18,978.1          55,154.9
  Logical read (blocks):         458,451.7       1,332,369.7

which makes fully sense as 2 sessions on 2 distincts cores on the same sockets produced about 920 000 LIOPS (Test 2).

So what?

Imagine that you launched multiple full cached SLOB runs on your new box to measure its behaviour/scalability (from one session up to the cpu_count value).

Example: I launched 11 full cached SLOB runs with 2,4,6,8,12,16,18,20,24,28 and 32 sessions running. The LIOPS graph is the following:

Screen Shot 2015-09-22 at 09.29.08

Just check the curve of the graph (not the values). As you can see, once the number of sessions reached the number of cores (16) the LIOPS is still increasing but its increase rate is slow down. One of the reason is the one we observed during our previous tests: 2 sessions running on the same core don’t perform as fast as 2 sessions running on 2 distincts cores.

Remarks

Worth watching presentation around oracle and cpu stuff:

The absolute LIOPS numbers are relatively low. It would have been possible to increase those numbers during the tests by increasing the WORK_UNIT slob parameter (see this blog post).

To get the turbostat utility, copy/paste the turbostat.c and msr-index.h contents into a directory and compile turbostat.c that way:

$ gcc -Wall -DMSRHEADER='"./msr-index.h"' turbostat.c -o turbostat

Conclusion

During the test I observed the following (of course your results could differ depending of the cpus, the slob configuration and so on..):

  • 955 000 LIOPS with the 2 sessions running on 2 distincts sockets (so on 2 distincts cores)
  • 920 000 LIOPS with the 2 sessions running on 2 distincts cores on the same socket
  • 600 000 LIOPS with the 2 sessions running on the same core
  • 460 000 LIOPS with the 2 sessions running on the same core with hyper-threading disabled

Hyper-threading helps to increase the overall LIOPS throughput of the system, but once the numbers of sessions is > number of cores then the average LIOPS per session decrease.

Advertisements