Monitor the dNFS activity as of 11.2.0.4

As you may know, there is some views available since 11g to monitor the dnfs activity:

  • v$dnfs_servers: Shows a table of servers accessed using Direct NFS.
  • v$dnfs_files: Shows a table of files now open with Direct NFS.
  • v$dnfs_channels: Shows a table of open network paths (or channels) to servers for which Direct NFS is providing files.
  • v$dnfs_stats: Shows a table of performance statistics for Direct NFS.

One interesting thing is that the v$dnfs_stats view provides two new columns as of 11.2.0.4:

  • NFS_READBYTES: Number of bytes read from NFS server
  • NFS_WRITEBYTES: Number of bytes written to NFS server

See the oracle documentation here.

Then, by sampling the view we can provide those metrics:

  • Reads/s: Number of read per second.
  • KbyRead/s: Kbytes read per second (as of 11.2.0.4).
  • AvgBy/Read: Average bytes per read (as of 11.2.0.4).
  • Writes/s: Number of Write per second.
  • KbyWrite/s: Kbytes write per second (as of 11.2.0.4).
  • AvgBy/Write: Average bytes per write (as of 11.2.0.4).

To do so, I just created a very simple db_io_dnfs_metrics.pl utility. It basically takes a snapshot each second (default interval) from the gv$dnfs_stats view and computes the delta with the previous snapshot. The utility is RAC aware.

Let’s see the help:

$>./db_io_dnfs_metrics.pl -help

Usage: ./db_io_dnfs_metrics.pl [-interval] [-count] [-inst] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                           Default
  ---------         -------                                                           -------
  -INST=            ALL - Show all Instance(s)                                        ALL
                    CURRENT - Show Current Instance
  -DISPLAY=         What to display: snap,avg (comma separated list)                  SNAP
  -SORT_FIELD=      reads|writes|iops                                                 NONE

Example: ./db_io_dnfs_metrics.pl
Example: ./db_io_dnfs_metrics.pl  -inst=BDT_1
Example: ./db_io_dnfs_metrics.pl  -sort_field=reads

This is a very simple utility, the options/features are:

  1. You can choose the number of snapshots to display and the time to wait between the snapshots.
  2. In case of RAC, you can choose on which database instance to collect the metrics thanks to the –INST= parameter.
  3. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the –DISPLAY= parameter.
  4. In case of RAC, you can sort the instances based on the number of reads, number of writes, number of IOPS (reads+writes) thanks to the –SORT_FIELD= parameter.

Examples:

Collecting on a single Instance:

$>./db_io_dnfs_metrics.pl
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

14:26:18                          Kby       AvgBy/               Kby       AvgBy/
14:26:18   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
14:26:18   ----------   -------   -------   -------   --------   -------   --------   --------   --------
14:26:18   VSBDT        321       2568      8192      0          0         0          321        2.5
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

14:26:19                          Kby       AvgBy/               Kby       AvgBy/
14:26:19   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
14:26:19   ----------   -------   -------   -------   --------   -------   --------   --------   --------
14:26:19   VSBDT        320       2560      8192      1          16        16384      321        2.5

Collecting on a RAC database, sorting the Instances by the number of read:

$>./db_io_dnfs_metrics.pl -sort_field=reads
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:21                          Kby       AvgBy/               Kby       AvgBy/
17:21:21   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:21   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:21   VBDTO_1      175       44536     260599    0          0         0          175        43.5
17:21:21   VBDTO_2      69        2272      33718     0          0         0          69         2.2
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:22                          Kby       AvgBy/               Kby       AvgBy/
17:21:22   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:22   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:22   VBDTO_2      151       36976     250751    0          0         0          151        36.1
17:21:22   VBDTO_1      131       33408     261143    0          0         0          131        32.6
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:23                          Kby       AvgBy/               Kby       AvgBy/
17:21:23   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:23   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:23   VBDTO_2      133       33592     258633    0          0         0          133        32.8
17:21:23   VBDTO_1      121       31360     265394    0          0         0          121        30.6

Remarks:

Posted in Perl Scripts, ToolKit | Leave a comment

Hyper-threading: measure its impact during SLOB runs thanks to numactl and turbostat

Introduction

As you probably know, a single physical CPU core with hyper-threading appears as two logical CPUs to an operating system. 

While the operating system sees two CPUs for each core, the actual CPU hardware only has a single set of execution resources for each core.

Hyper-threading can help speed your system up, but it’s nowhere near as good as having additional cores.

Out of curiosity, I checked the impact of hyper-threading on the number of logical IO per second (LIOPS) during SLOB runs. The main idea is to compare:

  • 2 FULL CACHED SLOB sessions running on the same core
  • 2 FULL CACHED SLOB sessions running on two distincts cores

Into this blog post, I just want to share how I did the measure and my results (of course your results could differ depending of the cpus, the slob configuration and so on..).

Tests environment and setup

  • Oracle database 11.2.0.4.
  • SLOB with the following slob.conf in place (the rest of the file contains the default slob.conf values):
UPDATE_PCT=0
RUN_TIME=60
WORK_LOOP=0
SCALE=80M
WORK_UNIT=64
  • The cpu_topology is the following (the cpu_topology script comes from Karl Arao’s cputoolkit here):
$ sh ./cpu_topology
	
model name	: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
processors  (OS CPU count)          0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
physical id (processor socket)      0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
siblings    (logical CPUs/socket)   16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16
core id     (# assigned to a core)  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
cpu cores   (physical cores/socket) 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

$ lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 2893.187
BogoMIPS: 5785.23
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K

As you can see the box is made of 2 sockets, 16 cores and 32 threads (so that the database sees 32 cpus).

  • Turbo boost does not come into play (you’ll see the proof during the tests thanks to the turbostat command).
  • Obviously the same SLOB configuration has been used for the tests.

Launch the tests

Test 1: launch 2 SLOB sessions on 2 distincts sockets (so on 2 distincts cores) that way:

$ cd SLOB
$ numactl --physcpubind=4,9 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

Cpus 4 and 9 are on 2 distincts sockets (see cpu_topology output).

The turbsostat command shows that they are 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     241    8.34    2893    2893       0   91.66    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00  116.66   88.59    0.00    0.00    0.00
       0       0       0      86    2.98    2893    2893       0   97.02    0.00    0.00    0.00      46      48    0.00    0.00    0.00    0.00   50.44   36.66    0.00    0.00    0.00
       0       0      16      70    2.41    2893    2893       0   97.59
       0       1       1      71    2.45    2893    2893       0   97.55    0.00    0.00    0.00      43
       0       1      17     131    4.54    2893    2893       0   95.46
       0       2       2      81    2.81    2893    2893       0   97.19    0.00    0.00    0.00      44
       0       2      18      73    2.53    2893    2893       0   97.47
       0       3       3      29    1.00    2893    2893       0   99.00    0.00    0.00    0.00      44
       0       3      19      21    0.72    2893    2893       0   99.28
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      46
       0       4      20      40    1.39    2893    2893       0   98.61
       0       5       5      58    2.02    2893    2893       0   97.98    0.00    0.00    0.00      46
       0       5      21      82    2.82    2893    2893       0   97.18
       0       6       6     106    3.65    2893    2893       0   96.35    0.00    0.00    0.00      44
       0       6      22      42    1.46    2893    2893       0   98.54
       0       7       7      28    0.97    2893    2893       0   99.03    0.00    0.00    0.00      48
       0       7      23      15    0.52    2893    2893       0   99.48
       1       0       8     222    7.67    2893    2893       0   92.33    0.00    0.00    0.00      72      76    0.00    0.00    0.00    0.00   66.22   51.92    0.00    0.00    0.00
       1       0      24      63    2.17    2893    2893       0   97.83
       1       1       9    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      76
       1       1      25     108    3.73    2893    2893       0   96.27
       1       2      10      86    2.98    2893    2893       0   97.02    0.00    0.00    0.00      71
       1       2      26      78    2.71    2893    2893       0   97.29
       1       3      11      57    1.96    2893    2893       0   98.04    0.00    0.00    0.00      69
       1       3      27      41    1.40    2893    2893       0   98.60
       1       4      12      56    1.93    2893    2893       0   98.07    0.00    0.00    0.00      71
       1       4      28      30    1.03    2893    2893       0   98.97
       1       5      13      32    1.12    2893    2893       0   98.88    0.00    0.00    0.00      71
       1       5      29      23    0.79    2893    2893       0   99.21
       1       6      14      49    1.68    2893    2893       0   98.32    0.00    0.00    0.00      71
       1       6      30      93    3.20    2893    2893       0   96.80
       1       7      15      41    1.41    2893    2893       0   98.59    0.00    0.00    0.00      72
       1       7      31      21    0.72    2893    2893       0   99.28

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 955 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      2.95
              DB CPU(s):               1.9               5.6      0.00      2.89
      Redo size (bytes):          16,541.5          48,682.3
  Logical read (blocks):         955,750.1       2,812,818.1

Test 2: launch 2 SLOB sessions on 2 distincts cores on the same socket that way:

$ cd SLOB
$ numactl --physcpubind=4,5 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

Cpus 4 and 5 are on 2 distincts cores on the same socket (see cpu_topology output).

The turbsostat command shows that they are 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     270    9.34    2893    2893       0   90.66    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00  119.34   91.17    0.00    0.00    0.00
       0       0       0     152    5.26    2893    2893       0   94.74    0.00    0.00    0.00      47      49    0.00    0.00    0.00    0.00   55.29   41.51    0.00    0.00    0.00
       0       0      16      59    2.04    2893    2893       0   97.96
       0       1       1      62    2.13    2893    2893       0   97.87    0.00    0.00    0.00      42
       0       1      17      49    1.70    2893    2893       0   98.30
       0       2       2      59    2.03    2893    2893       0   97.97    0.00    0.00    0.00      43
       0       2      18      55    1.90    2893    2893       0   98.10
       0       3       3      10    0.33    2893    2893       0   99.67    0.00    0.00    0.00      44
       0       3      19      16    0.56    2893    2893       0   99.44
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      46
       0       4      20      40    1.40    2893    2893       0   98.60
       0       5       5    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      49
       0       5      21     133    4.59    2893    2893       0   95.41
       0       6       6      71    2.47    2893    2893       0   97.53    0.00    0.00    0.00      45
       0       6      22      57    1.97    2893    2893       0   98.03
       0       7       7      20    0.70    2893    2893       0   99.30    0.00    0.00    0.00      47
       0       7      23      14    0.47    2893    2893       0   99.53
       1       0       8     364   12.58    2893    2893       0   87.42    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00   64.05   49.66    0.00    0.00    0.00
       1       0      24      93    3.20    2893    2893       0   96.80
       1       1       9     270    9.34    2893    2893       0   90.66    0.00    0.00    0.00      76
       1       1      25     105    3.62    2893    2893       0   96.38
       1       2      10     267    9.23    2893    2893       0   90.77    0.00    0.00    0.00      75
       1       2      26     283    9.77    2893    2893       0   90.23
       1       3      11      73    2.52    2893    2893       0   97.48    0.00    0.00    0.00      73
       1       3      27      92    3.19    2893    2893       0   96.81
       1       4      12      43    1.50    2893    2893       0   98.50    0.00    0.00    0.00      75
       1       4      28      44    1.53    2893    2893       0   98.47
       1       5      13      46    1.60    2893    2893       0   98.40    0.00    0.00    0.00      75
       1       5      29      48    1.67    2893    2893       0   98.33
       1       6      14      76    2.63    2893    2893       0   97.37    0.00    0.00    0.00      75
       1       6      30     105    3.64    2893    2893       0   96.36
       1       7      15     106    3.67    2893    2893       0   96.33    0.00    0.00    0.00      75

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 920 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      3.02
              DB CPU(s):               1.9               5.7      0.00      2.97
      Redo size (bytes):          16,014.6          47,163.1
  Logical read (blocks):         921,446.7       2,713,660.6

So, let’s say this is more or less the same as with 2 sessions on 2 distincts sockets.

Test 3: launch 2 SLOB sessions on the same core that way:

$ cd SLOB
$ numactl --physcpubind=4,20 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

Cpus 4 and 20 are sharing the same core (see cpu_topology output).

The turbsostat command shows that they are 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     250    8.65    2893    2893       0   91.35    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00  114.06   85.95    0.00    0.00    0.00
       0       0       0     100    3.45    2893    2893       0   96.55    0.00    0.00    0.00      48      48    0.00    0.00    0.00    0.00   51.21   37.45    0.00    0.00    0.00
       0       0      16      41    1.41    2893    2893       0   98.59
       0       1       1      75    2.61    2893    2893       0   97.39    0.00    0.00    0.00      42
       0       1      17      58    2.01    2893    2893       0   97.99
       0       2       2      51    1.75    2893    2893       0   98.25    0.00    0.00    0.00      44
       0       2      18      40    1.38    2893    2893       0   98.62
       0       3       3      25    0.86    2893    2893       0   99.14    0.00    0.00    0.00      45
       0       3      19       9    0.30    2893    2893       0   99.70
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      47
       0       4      20    2893  100.00    2893    2893       0    0.00
       0       5       5     120    4.14    2893    2893       0   95.86    0.00    0.00    0.00      44
       0       5      21      73    2.54    2893    2893       0   97.46
       0       6       6      65    2.25    2893    2893       0   97.75    0.00    0.00    0.00      45
       0       6      22      51    1.76    2893    2893       0   98.24
       0       7       7      18    0.64    2893    2893       0   99.36    0.00    0.00    0.00      48
       0       7      23       7    0.25    2893    2893       0   99.75
       1       0       8     491   16.97    2893    2893       0   83.03    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00   62.85   48.50    0.00    0.00    0.00
       1       0      24      88    3.04    2893    2893       0   96.96
       1       1       9      91    3.16    2893    2893       0   96.84    0.00    0.00    0.00      75
       1       1      25      80    2.75    2893    2893       0   97.25
       1       2      10      87    3.00    2893    2893       0   97.00    0.00    0.00    0.00      74
       1       2      26     104    3.61    2893    2893       0   96.39
       1       3      11      62    2.16    2893    2893       0   97.84    0.00    0.00    0.00      72
       1       3      27      80    2.77    2893    2893       0   97.23
       1       4      12      36    1.24    2893    2893       0   98.76    0.00    0.00    0.00      74
       1       4      28      36    1.25    2893    2893       0   98.75
       1       5      13      47    1.62    2893    2893       0   98.38    0.00    0.00    0.00      75
       1       5      29      75    2.61    2893    2893       0   97.39
       1       6      14      52    1.79    2893    2893       0   98.21    0.00    0.00    0.00      75
       1       6      30      92    3.17    2893    2893       0   96.83
       1       7      15      39    1.35    2893    2893       0   98.65    0.00    0.00    0.00      76
       1       7      31      32    1.11    2893    2893       0   98.89

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 600 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      3.02
              DB CPU(s):               1.9               5.6      0.00      2.96
      Redo size (bytes):          16,204.0          47,101.1
  Logical read (blocks):         603,430.3       1,754,028.2

So, as you can see, the SLOB runs (with the same SLOB configuration) produced about:

  • 955 000 LIOPS with the 2 sessions running on 2 distincts sockets (so on 2 distincts cores).
  • 920 000 LIOPS with the 2 sessions running on 2 distincts cores on the same socket.
  • 600 000 LIOPS with the 2 sessions running on the same core.

Out of curiosity, what if we disable the hyper-threading on one core and launch the 2 sessions on it?

Let’s put the cpu 20 offline (to be sure there is no noise on it during the test)

echo 0 > /sys/devices/system/cpu/cpu20/online

launch the 2 sessions on the same cpu (the remaining one online for this core as the second one has just been put offline) that way:

$ cd SLOB
$ numactl --physcpubind=4 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

The turbsostat command shows that cpu 4 is 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     153    5.30    2893    2893       0   94.70    0.00    0.00    0.00      77      77    0.00    0.00    0.00    0.00  112.76   84.64    0.00    0.00    0.00
       0       0       0      56    1.94    2893    2893       0   98.06    0.00    0.00    0.00      45      46    0.00    0.00    0.00    0.00   49.86   36.07    0.00    0.00    0.00
       0       0      16      42    1.45    2893    2893       0   98.55
       0       1       1      97    3.35    2893    2893       0   96.65    0.00    0.00    0.00      42
       0       1      17      61    2.11    2893    2893       0   97.89
       0       2       2      47    1.62    2893    2893       0   98.38    0.00    0.00    0.00      44
       0       2      18      45    1.54    2893    2893       0   98.46
       0       3       3      19    0.64    2893    2893       0   99.36    0.00    0.00    0.00      44
       0       3      19      25    0.85    2893    2893       0   99.15
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      46
       0       5       5       9    0.31    2893    2893       0   99.69    0.00    0.00    0.00      44
       0       5      21      14    0.49    2893    2893       0   99.51
       0       6       6      14    0.48    2893    2893       0   99.52    0.00    0.00    0.00      44
       0       6      22      12    0.41    2893    2893       0   99.59
       0       7       7      25    0.88    2893    2893       0   99.12    0.00    0.00    0.00      46
       0       7      23       9    0.33    2893    2893       0   99.67
       1       0       8     133    4.59    2893    2893       0   95.41    0.00    0.00    0.00      76      77    0.00    0.00    0.00    0.00   62.90   48.56    0.00    0.00    0.00
       1       0      24      69    2.37    2893    2893       0   97.63
       1       1       9      89    3.07    2893    2893       0   96.93    0.00    0.00    0.00      76
       1       1      25      73    2.52    2893    2893       0   97.48
       1       2      10      98    3.38    2893    2893       0   96.62    0.00    0.00    0.00      76
       1       2      26     113    3.92    2893    2893       0   96.08
       1       3      11      76    2.62    2893    2893       0   97.38    0.00    0.00    0.00      74
       1       3      27      78    2.71    2893    2893       0   97.29
       1       4      12     194    6.71    2893    2893       0   93.29    0.00    0.00    0.00      75
       1       4      28      27    0.94    2893    2893       0   99.06
       1       5      13      62    2.14    2893    2893       0   97.86    0.00    0.00    0.00      76
       1       5      29      95    3.29    2893    2893       0   96.71
       1       6      14     101    3.48    2893    2893       0   96.52    0.00    0.00    0.00      75
       1       6      30      92    3.19    2893    2893       0   96.81
       1       7      15      64    2.20    2893    2893       0   97.80    0.00    0.00    0.00      77
       1       7      31      20    0.68    2893    2893       0   99.32

and as you can see it is the only cpu available on that core.

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 460 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      2.69
              DB CPU(s):               1.0               2.8      0.00      1.32
      Redo size (bytes):          18,978.1          55,154.9
  Logical read (blocks):         458,451.7       1,332,369.7

which makes fully sense as 2 sessions on 2 distincts cores on the same sockets produced about 920 000 LIOPS (Test 2).

So what?

Imagine that you launched multiple full cached SLOB runs on your new box to measure its behaviour/scalability (from one session up to the cpu_count value).

Example: I launched 11 full cached SLOB runs with 2,4,6,8,12,16,18,20,24,28 and 32 sessions running. The LIOPS graph is the following:

Screen Shot 2015-09-22 at 09.29.08

Just check the curve of the graph (not the values). As you can see, once the number of sessions reached the number of cores (16) the LIOPS is still increasing but its increase rate is slow down. One of the reason is the one we observed during our previous tests: 2 sessions running on the same core don’t perform as fast as 2 sessions running on 2 distincts cores.

Remarks

Worth watching presentation around oracle and cpu stuff:

The absolute LIOPS numbers are relatively low. It would have been possible to increase those numbers during the tests by increasing the WORK_UNIT slob parameter (see this blog post).

To get the turbostat utility, copy/paste the turbostat.c and msr-index.h contents into a directory and compile turbostat.c that way:

$ gcc -Wall -DMSRHEADER='"./msr-index.h"' turbostat.c -o turbostat

Conclusion

During the test I observed the following (of course your results could differ depending of the cpus, the slob configuration and so on..):

  • 955 000 LIOPS with the 2 sessions running on 2 distincts sockets (so on 2 distincts cores)
  • 920 000 LIOPS with the 2 sessions running on 2 distincts cores on the same socket
  • 600 000 LIOPS with the 2 sessions running on the same core
  • 460 000 LIOPS with the 2 sessions running on the same core with hyper-threading disabled

Hyper-threading helps to increase the overall LIOPS throughput of the system, but once the numbers of sessions is > number of cores then the average LIOPS per session decrease.

Posted in Performance | 6 Comments

Exadata cell metrics: Join the metrics and their descriptions on the fly

Introduction

Cells metrics are very useful but their name are not so friendly. The name is a concatenation of abbreviations for the type of component, delimited by the underscore character. Then you have to understand the naming convention to understand the meaning of the metric name.

For example, knowing that:

  • CD stands for “Cell Disks metrics”
  • IO_BY stands for “Number of megabytes”
  • R stands for Read
  • LG stands for Large

You can conclude that the CD_IO_BY_R_LG metric is linked to the “Number of megabytes read in large blocks from a cell disk”.

Hopefully the metrics are explained into the Oracle Documentation and you can also retrieve their description from cellcli:

$ cellcli -e "list metricdefinition attributes name,description where name='CD_IO_BY_R_LG'"

CD_IO_BY_R_LG	 "Number of megabytes read in large blocks from a cell disk"

Lack of description

That said, as an example, let’s query the current metric values for a particular database that way:

$ cellcli -e "list metriccurrent attributes metricObjectName,name,metricValue where name like 'DB.*' and metricObjectName='BDT'"

	 BDT	 DB_FC_IO_BY_SEC   	 0 MB/sec
	 BDT	 DB_FC_IO_RQ       	 47,638 IO requests
	 BDT	 DB_FC_IO_RQ_SEC   	 2.1 IO/sec
	 BDT	 DB_FD_IO_BY_SEC   	 0 MB/sec
	 BDT	 DB_FD_IO_LOAD     	 19,885
	 BDT	 DB_FD_IO_RQ_LG    	 36 IO requests
	 BDT	 DB_FD_IO_RQ_LG_SEC	 0.0 IO/sec
	 BDT	 DB_FD_IO_RQ_SM    	 47,602 IO requests
	 BDT	 DB_FD_IO_RQ_SM_SEC	 2.1 IO/sec
	 BDT	 DB_FL_IO_BY       	 0.000 MB
	 BDT	 DB_FL_IO_BY_SEC   	 0.000 MB/sec
	 BDT	 DB_FL_IO_RQ       	 0 IO requests
	 BDT	 DB_FL_IO_RQ_SEC   	 0.0 IO/sec
	 BDT	 DB_IO_BY_SEC      	 0 MB/sec
	 BDT	 DB_IO_LOAD        	 0.0
	 BDT	 DB_IO_RQ_LG       	 0 IO requests
	 BDT	 DB_IO_RQ_LG_SEC   	 0.0 IO/sec
	 BDT	 DB_IO_RQ_SM       	 19 IO requests
	 BDT	 DB_IO_RQ_SM_SEC   	 0.0 IO/sec
	 BDT	 DB_IO_UTIL_LG     	 0 %
	 BDT	 DB_IO_UTIL_SM     	 0 %
	 BDT	 DB_IO_WT_LG       	 0 ms
	 BDT	 DB_IO_WT_LG_RQ    	 0.0 ms/request
	 BDT	 DB_IO_WT_SM       	 0 ms
	 BDT	 DB_IO_WT_SM_RQ    	 0.0 ms/request

As you can see the metric description is not there and there is no way to retrieve it from metriccurrent (or metrichistory) because this is not an attribute:

$ cellcli -e "describe metriccurrent"
	name
	alertState
	collectionTime
	metricObjectName
	metricType
	metricValue
	objectType

$ cellcli -e "describe metrichistory"
	name
	alertState
	collectionTime
	metricObjectName
	metricType
	metricValue
	metricValueAvg
	metricValueMax
	metricValueMin
	objectType

But if you send the result of our example to someone that don’t know (or don’t remember) the naming convention (or if you are not 100% sure of the definition of a particular metric) then he/you’ll have to:

  • go back to the oracle documentation
  • query the metricdefinition with cellcli

New script: exadata_metrics_desc.pl

Thanks to the exadata_metrics_desc.pl script, you can add (to the cellcli output) the description of the metric on the fly.

Let’s launch the same query (as the one used in the previous example) and add a call to exadata_metrics_desc.pl that way:

$ cellcli -e "list metriccurrent attributes metricObjectName,name,metricValue where name like 'DB.*' and metricObjectName='BDT'" | ./exadata_metrics_desc.pl

  BDT   DB_FC_IO_BY_SEC (Number of megabytes of I/O per second for this database to flash cache)          0 MB/sec
  BDT   DB_FC_IO_RQ (Number of IO requests issued by a database to flash cache)                           48,123 IO requests
  BDT   DB_FC_IO_RQ_SEC (Number of IO requests issued by a database to flash cache per second)            2.1 IO/sec
  BDT   DB_FD_IO_BY_SEC (Number of megabytes of I/O per second for this database to flash disks)          0 MB/sec
  BDT   DB_FD_IO_LOAD (Average I/O load from this database for flash disks)                               4,419
  BDT   DB_FD_IO_RQ_LG (Number of large IO requests issued by a database to flash disks)                  36 IO requests
  BDT   DB_FD_IO_RQ_LG_SEC (Number of large IO requests issued by a database to flash disks per second)   0.0 IO/sec
  BDT   DB_FD_IO_RQ_SM (Number of small IO requests issued by a database to flash disks)                  48,087 IO requests
  BDT   DB_FD_IO_RQ_SM_SEC (Number of small IO requests issued by a database to flash disks per second)   2.1 IO/sec
  BDT   DB_FL_IO_BY (The number of MB written to the Flash Log)                                           0.000 MB
  BDT   DB_FL_IO_BY_SEC (The number of MB written per second to the Flash Log)                            0.000 MB/sec
  BDT   DB_FL_IO_RQ (The number of I/O requests issued to the Flash Log)                                  0 IO requests
  BDT   DB_FL_IO_RQ_SEC (The number of I/O requests per second issued to the Flash Log)                   0.0 IO/sec
  BDT   DB_IO_BY_SEC (Number of megabytes of I/O per second for this database to hard disks)              0 MB/sec
  BDT   DB_IO_LOAD (Average I/O load from this database for hard disks)                                   0.0
  BDT   DB_IO_RQ_LG (Number of large IO requests issued by a database to hard disks)                      0 IO requests
  BDT   DB_IO_RQ_LG_SEC (Number of large IO requests issued by a database to hard disks per second)       0.0 IO/sec
  BDT   DB_IO_RQ_SM (Number of small IO requests issued by a database to hard disks)                      19 IO requests
  BDT   DB_IO_RQ_SM_SEC (Number of small IO requests issued by a database to hard disks per second)       0.0 IO/sec
  BDT   DB_IO_UTIL_LG (Percentage of disk resources utilized by large requests from this database)        0 %
  BDT   DB_IO_UTIL_SM (Percentage of disk resources utilized by small requests from this database)        0 %
  BDT   DB_IO_WT_LG (IORM wait time for large IO requests issued by a database)                           0 ms
  BDT   DB_IO_WT_LG_RQ (Average IORM wait time per request for large IO requests issued by a database)    0.0 ms/request
  BDT   DB_IO_WT_SM (IORM wait time for small IO requests issued by a database)                           0 ms
  BDT   DB_IO_WT_SM_RQ (Average IORM wait time per request for small IO requests issued by a database)    0.0 ms/request

As you can see the description of each metric being part of the initial output has been added.

Remarks

  • You can download the script from this repository or from GitHub.
  • Feel free to build the query you want on the metrics. You just need to add a call to exadata_metrics_desc.pl to see the metric description being added on the fly (as long as the metric name appears in the output of your initial query).
  • The idea of this script is all to credit to Martin Bach.
  • This script works with any input (could be a text file):

Screen Shot 2015-09-14 at 15.34.36

Conclusion

The exadata_metrics_desc.pl can be used to join on the fly the metric name, its value (and whatever attribute you would love to see) with its associated description.

Posted in Exadata | Leave a comment

Extract Exadata cells historical metrics in CSV format

Exadata provides a lot of useful metrics to monitor the Cells and you may want to retrieve historical values for some metrics. To do so, you can use the “LIST METRICHISTORY” command through CellCLI on the cell.

But as usual, visualising the metrics is even more better. For this purpose, you can use a perl script (see the download link in the remarks section) that extracts the historical metrics in CSV format so that you can graph them with the visualisation tool of your choice.

Let’s see the help:

Usage: ./csv_exadata_metrics_history.pl [cell=|groupfile=] [serial] [type=] [name=] [objectname=] [name!=] [objectname!=] [ago_unit=] [ago_value=]

 Parameter                 Comment                                                      Default
 ---------                 -------                                                      -------
 CELL=                     comma-separated list of cells
 GROUPFILE=                file containing list of cells
 SERIAL                    serialize execution over the cells (default is no)
 TYPE=                     Metrics type to extract: Cumulative|Rate|Instantaneous       ALL
 NAME=                     Metrics to extract (wildcard allowed)                        ALL
 OBJECTNAME=               Objects to extract (wildcard allowed)                        ALL
 NAME!=                    Exclude metrics (wildcard allowed)                           EMPTY
 OBJECTNAME!=              Exclude objects (wildcard allowed)                           EMPTY
 AGO_UNIT=                 Unit to retrieve historical metrics back: day|hour|minute    HOUR
 AGO_VALUE=                Value associated to Unit to retrieve historical metrics back 1

utility assumes passwordless SSH from this cell node to the other cell nodes
utility assumes ORACLE_HOME has been set (with celladmin user for example)

Example : ./csv_exadata_metrics_history.pl cell=cell
Example : ./csv_exadata_metrics_history.pl groupfile=./cell_group
Example : ./csv_exadata_metrics_history.pl cell=cell objectname='CD_disk03_cell'
Example : ./csv_exadata_metrics_history.pl cell=cell name='.*BY.*' objectname='.*disk.*'
Example : ./csv_exadata_metrics_history.pl cell=enkcel02 name='.*DB_IO.*' objectname!='ASM' name!='.*RQ.*' ago_unit=minute ago_value=4
Example : ./csv_exadata_metrics_history.pl cell=enkcel02 type='Instantaneous' name='.*DB_IO.*' objectname!='ASM' name!='.*RQ.*' ago_unit=hour ago_value=4
Example : ./csv_exadata_metrics_history.pl cell=enkcel01,enkcel02 type='Instantaneous' name='.*DB_IO.*' objectname!='ASM' name!='.*RQ.*' ago_unit=minute ago_value=4 serial

You have to setup passwordless SSH from one cell to the other cells (Then you can launch the script from this cell and retrieve data from the other cells).

The main options/features are:

  1. You can specify the cells on which you want to collect the metrics thanks to the cell or groupfile parameter.
  2. You can choose to serialize the execution over the cells thanks to the serial parameter.
  3. You can choose the type of metrics you want to retrieve (Cumulative, rate or instantaneous) thanks to the type parameter.
  4. You can focus on some metrics thanks to the name parameter (wildcard allowed).
  5. You can exclude some metrics thanks to the name! parameter (wildcard allowed).
  6. You can focus on some metricobjectname thanks to the objectname parameter (wildcard allowed).
  7. You can exclude some metricobjectname thanks to the objectname! parameter (wildcard allowed).
  8. You can choose the unit to retrieve metrics back (day, hour, minute) thanks to the ago_unit parameter.
  9. You can choose the value associated to the unit to retrieve metrics back thanks to the ago_value parameter.

Let’s see an example:

I want to retrieve in csv format the metrics from 2 cells related to databases for the last 20 minutes:

$> ./csv_exadata_metrics_history.pl cell=enkx3cel01,enkx3cel02 name='DB_.*' ago_unit=minute ago_value=20
Cell;metricType;DateTime;name;objectname;value;unit
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;ACSTBY;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;ASM;15,779;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;BDT;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;BIGDATA;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DBFS;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DBM;15,779;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DEMO;794,329;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DEMOX3;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;EXDB;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;WZSDB;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;_OTHER_DATABASE_;48,764;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;ACSTBY;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;ASM;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;BDT;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;BIGDATA;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DBFS;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DBM;15;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DEMO;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DEMOX3;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;EXDB;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;WZSDB;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;_OTHER_DATABASE_;0;MB/sec
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;ACSTBY;2,318;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;ASM;0;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;BDT;2,966;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;BIGDATA;25,415;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DBFS;3,489;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DBM;1,627,066;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DEMO;4,506;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DEMOX3;4,172;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;EXDB;0;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;WZSDB;4,378;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;_OTHER_DATABASE_;6,227;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ_LG;ACSTBY;0;IO requests
.
.
.

That way, you could visualize your data the way you feel comfortable with. For example, I used tableau to create this “Database metrics dashboard” on DB_* rate metrics:

cells_metrics

Remarks:

  • If you retrieve too much data, you could receive something like:
Error: enkx3cel02 is returning over 100000 lines; output is truncated !!!
 Command could be retried with the serialize option: --serial
 Killing child pid 15720 to enkx3cel02...

Then, you can launch the script with the serial option (see the help).

Conclusion:

You probably already have a way to build your own graph of the historical metrics. But if you don’t, feel free to use this script and the visualisation tool of your choice.

Posted in Exadata | Leave a comment

12c Application Continuity with Data Guard in action

Introduction

Application Continuity enables replay, in a non-disruptive and rapid manner, of a database request when a recoverable error makes the database session unavailable. Application Continuity uses Transaction Guard. You can find more details on those features here.

With RAC, Application Continuity can be used to launch the replay on another Instance. It can also be used with Data Guard under some circumstances and conditions (Extract from here):

Screen Shot 2015-05-07 at 08.52.52

Well, I just want to see this feature in action during a Data Guard switchover. Let’s give it a try.

Test preparation

The test environment is made of:

  • NBDT database (12.1.0.2 and initially primary).
  • SNBDT database (12.1.0.2 and initially standby).

On the primary database, connected as user bdt, let’s create the following tables:

SQL> create table BDT_CONT as select rownum num,rpad('U',20) val from dual connect by level <= 500000;

Table created.

SQL> create table BDT_SPY (d date, instance_name varchar2(16), num number);

Table created.

And the following trigger:

create or replace trigger BDT_CONT_SPY_T
before update
on BDT_CONT
for each row
begin
insert into BDT_SPY values(sysdate,sys_context('USERENV', 'INSTANCE_NAME'), :old.num);
end;
/

Trigger created.

The test application will update the 500000 rows of the BDT_CONT table in one transaction. The aim of the trigger is to track on which Instance the update has been successfully executed .

Now let’s create a service on the primary and standby databases that way:

# Primary
$ srvctl add service -d NBDT -s appco -preferred NBDT1,NBDT2 -l primary -policy automatic -j SHORT -B SERVICE_TIME -z 30 -w 10 -commit_outcome TRUE -e TRANSACTION -replay_init_time 1800 -retention 86400 -notification TRUE

# Standby
$ srvctl add service -d SNBDT -s appco -preferred SNBDT1,SNBDT2  -l primary -policy automatic -j SHORT -B SERVICE_TIME -z 30 -w 10 -commit_outcome TRUE -e TRANSACTION -replay_init_time 1800 -retention 86400 -notification TRUE

So that the appco service:

  • Is automatically started on the standby if it becomes primary.
  • Is defined for Application Continuity (see more details here on how to configure a service for Application Continuity).

Now let’s create a JAVA application that will be used to test the Application Continuity:

import java.sql.*;
import oracle.jdbc.pool.*;
import oracle.jdbc.*;
import oracle.jdbc.replay.*;

public class AppCont {
    public String getInstanceName(Connection conn) throws SQLException {
        PreparedStatement preStatement = conn.prepareStatement("select instance_name from v$instance");
        String r=new String();
        ResultSet result = preStatement.executeQuery();

        while (result.next()){
            r=result.getString("instance_name");
        }
        return r;
    }

    public static void main(String args[]) throws SQLException {
         Connection conn = null;
         Statement stmt = null;
         try {
              // OracleDataSourceImpl instead of OracleDataSource.
              OracleDataSourceImpl ocpds = new OracleDataSourceImpl();
              // Create the database URL
              String dbURL =
              "jdbc:oracle:thin:@(DESCRIPTION_LIST="+
              "(LOAD_BALANCE=off)"+
              "(FAILOVER=on)"+
              "(DESCRIPTION=(CONNECT_TIMEOUT=90) (RETRY_COUNT=10)(RETRY_DELAY=3)"+
              "(ADDRESS_LIST="+
              "(LOAD_BALANCE=on)"+
              "(ADDRESS=(PROTOCOL=TCP) (HOST=rac-cluster-scan)(PORT=1521)))"+
              "(CONNECT_DATA=(SERVICE_NAME=appco)))"+
              "(DESCRIPTION=(CONNECT_TIMEOUT=90) (RETRY_COUNT=10)(RETRY_DELAY=10)"+
              "(ADDRESS_LIST="+
              "(LOAD_BALANCE=on)"+
	      "(ADDRESS=(PROTOCOL=TCP)(HOST=srac-cluster-scan)(PORT=1521)))"+
              "(CONNECT_DATA=(SERVICE_NAME=appco))))";

              ocpds.setURL(dbURL);
              ocpds.setUser("bdt");
              ocpds.setPassword("bdt");

              // Get a connection
              conn = ocpds.getConnection();

              // Instantiation
              AppCont ex = new AppCont();

              // On which Instance are we connected?
              System.out.println("Instance Name = "+ex.getInstanceName(conn));

              System.out.println("Performing transaction");
              /* Don't forget to disable autocommit
	       * And launch the transaction
	       */
              ((oracle.jdbc.replay.ReplayableConnection)conn).beginRequest();
              conn.setAutoCommit(false);
              stmt = conn.createStatement();
              String updsql = "UPDATE BDT_CONT " +
                              "SET val=UPPER(val)";
              stmt.executeUpdate(updsql);
              conn.commit();
              // End
              ((oracle.jdbc.replay.ReplayableConnection)conn).endRequest();

	      stmt.close();

              System.out.println("Instance Name = "+ex.getInstanceName(conn));

              // On which Instance are we connected?
              conn.close();
         }
         catch (Exception e) {
              e.printStackTrace();
        }
    }
}

The important parts are:

  • The use of OracleDataSourceImpl instead of OracleDataSource.
  • The database URL definition that includes the access to the appco service for both the primary and standby databases (NBDT is hosted on “rac-cluster” and SNBDT is hosted on “srac-cluster”).
  • Autocommit is disabled (Read this if you need more details).

We are ready to launch the application and test the Application Continuity in a Data Guard context.

Run it

First let’s check the status of the data guard configuration and that it is ready for the switchover:

DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/toto@NBDT
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - nbdt_dr

  Protection Mode: MaxPerformance
  Members:
  nbdt  - Primary database
    snbdt - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 24 seconds ago)

DGMGRL> validate database SNBDT;

  Database Role:     Physical standby database
  Primary Database:  nbdt

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    nbdt:   Off
    snbdt:  Off

Launch the application

$ cat launch.ksh
export CLASSPATH=/u01/app/oracle/product/12.1.0.2/jdbc/lib/ojdbc7.jar:.
time java -cp $CLASSPATH:. AppCont
$ ksh ./launch.ksh
Instance Name = NBDT1
Performing transaction

At this stage, launch the switchover (from another terminal):

DGMGRL> switchover to snbdt;
Performing switchover NOW, please wait...
Operation requires a connection to instance "SNBDT1" on database "snbdt"
Connecting to instance "SNBDT1"...
Connected as SYSDBA.
New primary database "snbdt" is opening...
Oracle Clusterware is restarting database "nbdt" ...
Switchover succeeded, new primary is "snbdt"

And then wait until the application ends:

$ ksh ./launch.ksh
Instance Name = NBDT1
Performing transaction
.
.   <= Switchover occurred here
.
Instance Name = SNBDT1

real	1m46.19s
user	0m1.33s
sys	0m0.13s

As we can see:

  • No errors have been reported.
  • The application terminated on the SNBDT1 instance (Then on the new primary database).

Let’s check (thanks to the spy table) on which Instance did the transaction commit:

$ sqlplus bdt/bdt@SNBDT

SQL*Plus: Release 12.1.0.2.0 Production on Thu May 7 10:04:54 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Thu May 07 2015 09:59:05 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> select distinct(instance_name) from bdt_spy;

INSTANCE_NAME
----------------
SNBDT1

As we can see the transaction has been replayed and did commit on the SNBDT database (The new primary database).

Remarks

  1. Application Continuity is supported for Oracle Data Guard switchovers to physical standby databases. It is also supported for fast-start failover to physical standbys in maximum availability data protection mode.
  2. Don’t forget that the primary and standby databases must be licensed for Oracle RAC or Oracle Active Data Guard in order to use Application Continuity.
  3. The bdt oracle user that has been used during the test, has grant execute on DBMS_APP_CONT.
  4. This Java Application has been created and shared by Laurent Leturgez during the Paris Oracle Meetup. User groups are always good, as you can share your work with others. Thanks again Laurent!

Conclusion

We have seen the Application Continuity feature in Action in a Data Guard context.

Posted in Availability | 2 Comments

Direct Path Read and “enq: KO – fast object checkpoint”

Introduction

When using direct path read, oracle reads the blocks from the disk. The buffer cache is not being used.

Then, when oracle begins to read a segment using direct path read, it flush its dirty blocks to disk (A dirty block is a block that does not have the same image in memory and on disk). During that time, the session that triggered the direct path read is waiting for the “enq: KO – fast object checkpoint” wait event.

In this post, I just want to see with my own eyes that the dirty blocks being flushed to disk belong only to the segment being read.

Tests

For the test, I used two tables on a 11.2.0.4 database. One table is made of 2 partitions and one is not partitioned (then 3 segments):

BDT:BDT1> select SEGMENT_NAME,PARTITION_NAME from dba_segments where segment_name in ('BDT1','BDT2');

SEGMENT_NAME PARTITION_NAME
------------ --------------
BDT2
BDT1         P2
BDT1         P1

The rows distribution across segments, blocks and file relative numbers is the following:

BDT:BDT1> l
  1  select u.object_name,u.subobject_name,dbms_rowid.rowid_block_number(BDT1.rowid) BLOCK_NUMBER, dbms_rowid.rowid_relative_fno(BDT1.rowid) FNO
  2	 from BDT1, user_objects u
  3	 where dbms_rowid.rowid_object(BDT1.rowid) = u.object_id
  4	 union all
  5	 select u.object_name,u.subobject_name,dbms_rowid.rowid_block_number(BDT2.rowid) BLOCK_NUMBER, dbms_rowid.rowid_relative_fno(BDT2.rowid) FNO
  6	 from BDT2, user_objects u
  7	 where dbms_rowid.rowid_object(BDT2.rowid) = u.object_id
  8	 order by 1,2
  9*
BDT:BDT1> /

OBJECT_NAME  SUBOBJECT_NAME  BLOCK_NUMBER FNO
------------ --------------- ------------ -----------
BDT1	     P1              2509842	  5
BDT1	     P1              2509842      5
BDT1	     P1              2509842      5
BDT1	     P1              2509842      5
BDT1	     P2              2510866      5
BDT1	     P2              2510866      5
BDT1	     P2              2510866      5
BDT1	     P2              2510866      5
BDT1	     P2              2510866      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5

18 rows selected.

so that all the rows of the segment:

  • BDT1:P1 are in datafile 5, block 2509842
  • BDT1:P2 are in datafile 5, block 2510866
  • BDT2 are in datafile 5, block 1359091

Remark:

BDT1:P<N> stands for table BDT1 partition <N>

Now, let’s update both tables (without committing):

BDT:BDT1> update BDT1 set ID2=100;

9 rows updated.

BDT:BDT1> update BDT2 set ID2=100;

9 rows updated.

As I do not commit (nor rollback), then those 3 blocks in memory contain an active transaction.

In another session, let’s force serial direct path read and read partition P1 from BDT1 only:

BDT:BDT1> alter session set "_serial_direct_read"=always;

Session altered.

BDT:BDT1> select count(*) from BDT1 partition (P1);

COUNT(*)
-----------
4

Now, check its wait events and related statistics with snapper. A cool thing with snapper is that I can get wait events and statistics from one tool.

SYS:BDT1> @snapper all,end 5 1 20
Sampling SID 20 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.22 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 SID @INST, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    20  @1, BDT       , STAT, physical reads direct                                     ,             1,        .02,         ,             ,          ,           ,         .5 per execution
    20  @1, BDT       , WAIT, enq: KO - fast object checkpoint                          ,          1117,    18.12us,      .0%, [          ],         1,        .02,     1.12ms average wait

--  End of Stats snap 1, end=2015-04-29 08:25:20, seconds=61.7

--  End of ASH snap 1, end=, seconds=, samples_taken=0, AAS=(No ASH sampling in begin/end snapshot mode)

PL/SQL procedure successfully completed.

For brevity and clarity, I keep only the wait events and statistics of interest.

As we can see, the direct path read has been used and the “fast object checkpoint” occurred (as the session waited for it).

Now let’s check which block(s) has/have been flushed to disk: to do so, I’ll dump those 3 blocks from disk and check if they contain an active transaction (as the ones in memory).

BDT:BDT1> !cat dump_blocks.sql
-- BDT2
alter system dump datafile 5 block 1359091;
-- BDT1:P2
alter system dump datafile 5 block 2510866;
-- BDT1:P1
alter system dump datafile 5 block 2509842;
select value from v$diag_info where name like 'Default%';

BDT:BDT1> @dump_blocks.sql

System altered.

System altered.

System altered.

VALUE
------------------------------
/u01/app/oracle/diag/rdbms/bdt/BDT1/trace/BDT1_ora_91815.trc

BDT:BDT1> !view /u01/app/oracle/diag/rdbms/bdt/BDT1/trace/BDT1_ora_91815.trc

The Interested Transaction List (ITL) for the first block that has been dumped from disk (BDT2 Table) looks like:

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.0016f9c7
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

We can see that the block on disk does not contain an active transaction in it (There is no rows locked as Lck=0). We can conclude that the block that belongs to the BDT2 table has not been flushed to disk (The block is still dirty).

The Interested Transaction List (ITL) for the second block that has been dumped from disk (BDT1:P2) looks like:

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.0016f99f
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

We can see that the block on disk does not contain an active transaction in it (There is no rows locked as Lck=0). We can conclude that the block that belongs to the partition P2 of the BDT1 table has not been flushed to disk (The block is still dirty).

The Interested Transaction List (ITL) for the third block that has been dumped from disk (BDT1:P1) looks like:

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.0016f998
0x02   0x000a.005.00001025  0x00c00211.038e.23  ----    4  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

We can see that the block on disk does contain an active transaction in it (4 rows are locked as Lck=4). This is the block that has been read in direct path read during the select on BDT1:P1.

We can conclude that the block that belongs to BDT1:P1 has been flushed to disk (The block is not dirty anymore).

Conclusion

When oracle begins to read a segment using direct path read:

  • It does not flush to disk the dirty blocks from other segments.
  • Only the dirty blocks of the segment being read are flushed to disk.
Posted in Performance | 8 Comments

ASM Bytes Read and Smart Scan

Introduction

I like to query the v$asm_disk_iostat cumulative view (at the ASM instance level) as this is a centralized location where you can find metrics for all the databases the ASM instance is servicing. One of its metric is:

BYTES_READ: Total number of bytes read from the disk (see the oracle documentation).

I would like to see which value is recorded into this metric in case of smart scan. To do so, I’ll launch some tests and check the output of the asm_metrics utility: It basically takes a snapshot each second (default interval) from the gv$asm_disk_iostat cumulative view and computes the delta with the previous snapshot.

Environment

ASM and database versions are 11.2.0.4. A segment of about 19.5 gb has been created without any indexes, so that a full table scan is triggered during the tests.

Tests

During the tests:

  • This simple query is launched:
select id from bdt where id=1;
  • Then, the asm metrics will be recorded that way:
./asm_metrics.pl -show=dbinst,fg,inst -dbinst=BDT2 -inst=+ASM2 -display=avg
  • We’ll look at the output field “Kby Read/s” which is based on the BYTES_READ column coming from the v$asm_disk_iostat cumulative view.
  • The sql elapsed time and the percentage of IO saved by the offload (if any) will be checked with a query that looks like fsx.sql. By “percentage of IO saved” I mean the ratio of data received from the storage cells to the actual amount of data that would have had to be received on non-Exadata storage.

Test 1: Launch the query without offload

BDT:BDT2> alter session set cell_offload_processing=false;

Session altered.

BDT:BDT2> select /* NO_SMART_SCAN_NO_STORAGEIDX */ id from bdt where id=1;

The elapsed time and the percentage of IO saved are:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD ELIGIBLE_MB  INTERCO_MB OFF_RETU_MB IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ----------- ----------- ----------- ---------- ----------------------------------------------------------------------
7y7xa34jjab2q	   0   627556429      1      23.11	0 No		    0 19413.42969	    0	     .00 select /* NO_SMART_SCAN_NO_STORAGEIDX */ id from bdt where id=1

The elapsed time of the sql is 23.11 seconds and obviously no IO have been saved by offload. As you can see about 19.5 gb has been exchanged between the Oracle Database and the storage system (INTERCO_MB is based on IO_INTERCONNECT_BYTES column from v$sql).

The asm_metrics ouput is the following:

06:39:47                                                                            Kby       Avg       AvgBy/               Kby       Avg        AvgBy/
06:39:47   INST     DBINST        DG            FG           DSK          Reads/s   Read/s    ms/Read   Read      Writes/s   Write/s   ms/Write   Write
06:39:47   ------   -----------   -----------   ----------   ----------   -------   -------   -------   ------    ------     -------   --------   ------
06:39:47   +ASM2                                                          869       885423    3.7       1042916   2          24        0.9        15477
06:39:47   +ASM2    BDT2                                                  869       885423    3.7       1042916   2          24        0.9        15477
06:39:47   +ASM2    BDT2                        ENKCEL01                  300       305865    3.7       1044812   1          8         1.2        15061
06:39:47   +ASM2    BDT2                        ENKCEL02                  330       335901    3.7       1041451   1          8         0.8        15061
06:39:47   +ASM2    BDT2                        ENKCEL03                  239       243657    3.6       1042564   0          8         0.6        16384

So the average Kby Read per second is 885423.

Then we can conclude that the BYTES_READ column records that about 885423 * 23.11 = 19.5 gb has been read from disk.

Does it make sense? Yes.

Test 2: Offload and no storage indexes

BDT:BDT2> alter session set cell_offload_processing=true;

Session altered.

BDT:BDT2> alter session set "_kcfis_storageidx_disabled"=true;

Session altered.

BDT:BDT2> select /* WITH_SMART_SCAN_NO_STORAGEIDX */ id from bdt where id=1;

The elapsed time and the percentage of IO saved are:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD ELIGIBLE_MB  INTERCO_MB OFF_RETU_MB IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ----------- ----------- ----------- ---------- ----------------------------------------------------------------------
5zzvpyn94b05g	   0   627556429      1       4.11	0 Yes	  19413.42969 2.860450745 2.860450745	   99.99 select /* WITH_SMART_SCAN_NO_STORAGEIDX */ id from bdt where id=1

The elapsed time of the sql is 4.11 seconds and 99.99 % of IO has been saved by offload. About 2.8 mb have been exchanged between the Oracle Database and the storage system.

The asm_metrics ouput is the following:

06:41:54                                                                            Kby       Avg       AvgBy/               Kby       Avg        AvgBy/
06:41:54   INST     DBINST        DG            FG           DSK          Reads/s   Read/s    ms/Read   Read      Writes/s   Write/s   ms/Write   Write
06:41:54   ------   -----------   -----------   ----------   ----------   -------   -------   -------   ------    ------     -------   --------   ------
06:41:54   +ASM2                                                          4862      4969898   0.0       1046671   1          12        6.4        16384
06:41:54   +ASM2    BDT2                                                  4862      4969898   0.0       1046671   1          12        6.4        16384
06:41:54   +ASM2    BDT2                        ENKCEL01                  1678      1715380   0.0       1047123   0          4         17.2       16384
06:41:54   +ASM2    BDT2                        ENKCEL02                  1844      1883738   0.0       1046067   0          4         1.0        16384
06:41:54   +ASM2    BDT2                        ENKCEL03                  1341      1370780   0.0       1046935   0          4         1.0        16384

So the average Kby Read per second is 4969898.

Then we can conclude that the BYTES_READ column records that about 4969898 *4.11 = 19.5 gb has been read from disk.

Does it make sense? I would say yes, because the storage indexes haven’t been used. Then, during the smart scan all the datas blocks have been opened in the cells in order to extract and send back to the database layer the requested column (column projection) on the selected rows (row filtering).

Test 3: Offload and storage indexes

BDT:BDT2> alter session set "_kcfis_storageidx_disabled"=false;

Session altered.

BDT:BDT2> select /* WITH_SMART_SCAN_AND_STORAGEIDX */ id from bdt where id=1;

The elapsed time and the percentage of IO saved are:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD ELIGIBLE_MB  INTERCO_MB OFF_RETU_MB IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ----------- ----------- ----------- ---------- ----------------------------------------------------------------------
3jdpqa2s4bb0v	   0   627556429      1        .09	0 Yes	  19413.42969  .062171936  .062171936	  100.00 select /* WITH_SMART_SCAN_AND_STORAGEIDX */ id from bdt where id=1

The elapsed time of the sql is 0.09 seconds and about 100 % of IO has been saved by offload. About 0.06 mb have been exchanged between the Oracle Database and the storage system.

The storage indexes saved a lot of reads (almost the whole table):

BDT:BDT2> l
  1* select n.name, s.value from v$statname n, v$mystat s where n.name='cell physical IO bytes saved by storage index' and n.STATISTIC#=s.STATISTIC#
BDT:BDT2> /
NAME							     VALUE
-------------------------------------------------- ---------------
cell physical IO bytes saved by storage index	       20215947264

The asm_metrics ouput is the following:

06:43:58                                                                            Kby        Avg       AvgBy/               Kby       Avg        AvgBy/
06:43:58   INST     DBINST        DG            FG           DSK          Reads/s   Read/s     ms/Read   Read      Writes/s   Write/s   ms/Write   Write
06:43:58   ------   -----------   -----------   ----------   ----------   -------   -------    -------   ------    ------     -------   --------   ------
06:43:58   +ASM2                                                          19436     19879384   0.0       1047360   0          0         0.0        0
06:43:58   +ASM2    BDT2                                                  19436     19879384   0.0       1047360   0          0         0.0        0
06:43:58   +ASM2    BDT2                        ENKCEL01                  6708      6861488    0.0       1047430   0          0         0.0        0
06:43:58   +ASM2    BDT2                        ENKCEL02                  7369      7534840    0.0       1047045   0          0         0.0        0
06:43:58   +ASM2    BDT2                        ENKCEL03                  5359      5483056    0.0       1047705   0          0         0.0        0

So the average Kby Read per second is 19879384.

As the asm_metrics utility’s granularity to collect the data is one second and as the elapsed time is < 1s then we can conclude that the BYTES_READ column records that about 19.5 gb has been read from disk.

Does it make sense? I would say no, because during the smart scan, thanks to the Storage indexes, not all the datas blocks have been opened in the cells in order to extract the requested column (column projection) on the selected rows (row filtering).

Remark

You could also query the v$asm_disk_iostat view and measure the delta for the BYTES_READ column by your own (means without the asm_metrics utility). The results would be the same.

Conclusion

The BYTES_READ column displays:

  • The Total number of bytes read from the disk (and also transferred to the database) without smart scan.
  • The Total number of bytes read from the disk (but not the bytes transferred to the database) with smart scan and no storage indexes being used.
  • Neither the Total number of bytes read from the disk nor the bytes transferred to the database with smart scan and storage indexes being used.

 

Posted in Exadata | 2 Comments