Push the oracle alert.log and listener.log into Elasticsearch and analyze/visualize their content with Kibana

Introduction

The oracle alert.log and listener.log contain useful information to provide answer to questions like:

  • When did the Instance start?
  • When has the Instance been shutdown?
  • When did ORA- occur? With which code?
  • Which IP client did connect to the Instance? With which user?
  • How did it connect? Through a service? Through the SID?
  • Which program has been used to connect?
  • A connection storm occurred, what is the source of it?

What about having all this information centralized? What about having the possibility to gather, format, search, analyze and visualize this information in real time?

To achieve this, let’s use the ELK stack:

Installation

The installation is very simple.

  • Install elasticsearch
[root@elk ~]# wget https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.2.1/elasticsearch-2.2.1.rpm
[root@elk ~]# yum localinstall elasticsearch-2.2.1.rpm
  • Edit the configuration file to mention on which host it has been installed (namely elk in my case):
[root@elk ~]# grep network.host /etc/elasticsearch/elasticsearch.yml
network.host: elk
  • Start elasticsearch:
[root@elk ~]# /etc/init.d/elasticsearch start
Starting elasticsearch: [ OK ]
  • Install Kibana:
[root@elk ~]# wget https://download.elastic.co/kibana/kibana/kibana-4.4.2-linux-x64.tar.gz
[root@elk ~]# tar -xf kibana-4.4.2-linux-x64.tar.gz --directory /opt
[root@elk ~]# mv /opt/kibana-4.4.2-linux-x64 /opt/kibana
  • Edit the configuration file so that the url is updated accordingly:
[root@elk ~]# grep elasticsearch.url /opt/kibana/config/kibana.yml
elasticsearch.url: "http://elk:9200"
  • Start Kibana:
[root@elk ~]# /opt/kibana/bin/kibana
  • Install logstash on the oracle host (namely dprima in my case):
[root@dprima ~]# wget https://download.elastic.co/logstash/logstash/packages/centos/logstash-2.2.2-1.noarch.rpm
[root@dprima ~]# yum localinstall logstash-2.2.2-1.noarch.rpm

Configure logstash to push and format the alert.log to elasticsearch the way we want to

So that:

  • The @timestamp field is reflecting the timestamp at which the log entry was created (rather than when logstash read the log entry).
  • It traps ORA- entries and creates a field ORA- when it occurs.
  • It traps the start of the Instance (and fill a field oradb_status accordingly).
  • It traps the shutdown of the Instance (and fill a field oradb_status accordingly).
  • It traps the fact that the Instance is running (and fill a field oradb_status accordingly).

New fields are being created so that we can analyze/visualize them later on with Kibana.

  • To trap and format this information, let’s create an alert_log.conf configuration file that looks like (the filter part contains the important stuff):
input {
  file {
      path => "/u01/app/oracle/diag/rdbms/pbdt/PBDT/trace/alert_PBDT.log"
  }
}

filter {

# Join lines based on the time
  multiline {
    pattern => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
    negate => true
    what => "previous"
  }

# Create new field: oradb_status: starting,running,shutdown
 if [message] =~ /Starting ORACLE instance/ {
    mutate {
        add_field => [ "oradb_status", "starting" ]
    }
 } else if [message] =~ /Instance shutdown complete/ {
    mutate {
        add_field => [ "oradb_status", "shutdown" ]
    }
 } else {
      mutate {
        add_field => [ "oradb_status", "running" ]
    }
 }

# Search for ORA- and create field if match

if [message] =~ /ORA-/ {
 grok {
   match => [ "message","(?<ORA->ORA-[0-9]*)" ]
 }
}

# Extract the date and the rest from the message
  grok {
    match => [ "message","%{DAY:day} %{MONTH:month} %{MONTHDAY:monthday} %{TIME:time} %{YEAR:year}(?<log_message>.*$)" ]
  }

  mutate {
       add_field => {
        "timestamp" => "%{year} %{month} %{monthday} %{time}"
       }
  }
# replace the timestamp by the one coming from the alert.log
  date {
      locale => "en"
      match => [ "timestamp" , "yyyy MMM dd HH:mm:ss" ]
  }

# replace the message (remove the date)
  mutate { replace => [ "message", "%{log_message}" ]  }

  mutate {
      remove_field => [ "time" ,"month","monthday","year","timestamp","day","log_message"]
  }

}

output {
elasticsearch {
hosts => ["elk:9200"]
index => "oracle-%{+YYYY.MM.dd}"
}
}
  • Start logstash with this configuration file:
[root@dprima ~]# /opt/logstash/bin/logstash -f /etc/logstash/conf.d/alert_log.conf
  • So that for example an entry in the alert.log file like:
Sat Mar 26 08:30:26 2016
ORA-1653: unable to extend table SYS.BDT by 8 in                 tablespace BDT

will be formatted and send to elasticsearch that way:

{
         "message" => "\nORA-1653: unable to extend table SYS.BDT by 8 in                 tablespace BDT ",
        "@version" => "1",
      "@timestamp" => "2016-03-26T08:30:26.000Z",
            "path" => "/u01/app/oracle/diag/rdbms/pbdt/PBDT/trace/alert_PBDT.log",
            "host" => "Dprima",
            "tags" => [
        [0] "multiline"
    ],
    "oradb_status" => "running",
            "ORA-" => "ORA-1653"
}

Configure logstash to push and format the listener.log to elasticsearch the way we want to

So that:

  • The @timestamp field is reflecting the timestamp at which the log entry was created (rather than when logstash read the log entry).
  • It traps the connections and records the program into a dedicated field program.
  • It traps the connections and records the user into a dedicated field user.
  • It traps the connections and records the ip of the client into a dedicated field ip_client.
  • It traps the connections and records the destination into a dedicated field dest.
  • It traps the connections and records the destination type (SID or service_name) into a dedicated field dest_type.
  • It traps the command (stop, status, reload) and records it into a dedicated field command.

New fields are being created so that we can analyze/visualize them later on with Kibana.

  • To trap and format this information, let’s create a lsnr_log.conf configuration file that looks like (the filter part contains the important stuff):
input {
 file {
   path => "/u01/app/oracle/diag/tnslsnr/Dprima/listener/trace/listener.log"
  }
}

filter {

 if [message] =~ /(?i)CONNECT_DATA/ {

  # Extract the date and the rest from the message
  grok {
    match => [ "message","(?<the_date>.*%{TIME})(?<lsnr_message>.*$)" ]
  }

  # Extract COMMAND (like status,reload,stop) and add a field
  if [message] =~ /(?i)COMMAND=/ {

   grok {
   match => [ "lsnr_message","^.*(?i)COMMAND=(?<command>.*?)\).*$" ]
   }

  } else {

  # Extract useful Info (USER,PROGRAM,IPCLIENT) and add fields
   grok {
   match => [ "lsnr_message","^.*PROGRAM=(?<program>.*?)\).*USER=(?<user>.*?)\).*ADDRESS.*HOST=(?<ip_client>%{IP}).*$" ]
   }
  }

  # replace the timestamp by the one coming from the listener.log
  date {
      locale => "en"
      match => [ "the_date" , "dd-MMM-yyyy HH:mm:ss" ]
  }

  # replace the message (remove the date)
  mutate { replace => [ "message", "%{lsnr_message}" ]  }

  # remove temporary fields
  mutate { remove_field => [ "the_date","lsnr_message"] }

  # search for SID or SERVICE_NAME, collect dest and add dest type
  if [message] =~ /(?i)SID=/ {
  grok { match => [ "message","^.*(?i)SID=(?<dest>.*?)\).*$" ] }
  mutate { add_field => [ "dest_type", "SID" ] }
  }

  if [message] =~ /(?i)SERVICE_NAME=/ {
  grok { match => [ "message","^.*(?i)SERVICE_NAME=(?<dest>.*?)\).*$" ] }
  mutate { add_field => [ "dest_type", "SERVICE" ] }
  }

  } else {
   drop {}
  }
}

output {
elasticsearch {
hosts => ["elk:9200"]
index => "oracle-%{+YYYY.MM.dd}"
}
}
  • Start logstash with this configuration file:
[root@Dprima conf.d]# /opt/logstash/bin/logstash -f /etc/logstash/conf.d/lsnr_log.conf
  • So that for example an entry in the listener.log file like:
26-MAR-2016 08:34:57 * (CONNECT_DATA=(SID=PBDT)(CID=(PROGRAM=SQL Developer)(HOST=__jdbc__)(USER=bdt))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.1)(PORT=50379)) * establish * PBDT * 0

will be formatted and send to elasticsearch that way:

{
       "message" => " * (CONNECT_DATA=(SID=PBDT)(CID=(PROGRAM=SQL Developer)(HOST=__jdbc__)(USER=bdt))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.1)(PORT=50380)) * establish * PBDT * 0",
      "@version" => "1",
    "@timestamp" => "2016-03-26T08:34:57.000Z",
          "path" => "/u01/app/oracle/diag/tnslsnr/Dprima/listener/trace/listener.log",
          "host" => "Dprima",
       "program" => "SQL Developer",
          "user" => "bdt",
     "ip_client" => "192.168.56.1",
          "dest" => "PBDT",
     "dest_type" => "SID"
}

Analyze and Visualize the data with Kibana

  • Connect to the elk host, (http://elk:5601) and create an index pattern (Pic 1):

elk-index-pattern

  • Check that all our custom fields have been indexed (this is the default behaviour) (Pic 2):

all_indices

so that we can now visualize them.

  • Example 1: thanks to the listener.log data, let’s graph the connection repartition to our databases by program and by dest_type (Pic 3):

kibana_example

  • Example 2: thanks to the listener.log data, visualize when a connection “storm” occurred and where it came from (ip_client field):

storm

Remarks

  • As you can see (into the Pic 2) the index on the program field has not been analyzed. By doing so, a connection to the database with “SQL Developer” will be stored in the index as “SQL Developer” and this is what we want. While an analyzed index would have stored 2 distincts values (“SQL” and “Developer”). The same apply for the ORA- field: ORA-1653 would store ORA and 1653 if analyzed (This is why it is specified as not analyzed as well). You can find more details here.
  • To get the indexes on the program and ORA- fields not analyzed, a template has been created that way:
curl -XDELETE elk:9200/oracle*

curl -XPUT elk:9200/_template/oracle_template -d ' {
 "template" : "oracle*",
 "settings" : {
   "number_of_shards" : 1
   },
 "mappings" : {
  "oracle" : {
   "properties" : {
    "program" : {"type" : "string", "index": "not_analyzed" },
    "ORA-" : {"type" : "string", "index": "not_analyzed" }
    }
   }
  }
}'
  • The configuration files are using grok. You can find more information about it here.
  • You can find much more informations about the ELK stack (and another way to use it) into this blog post from Robin Moffatt
  • All you need to do to visualize the data is to extract the fields of interest from the log files and be sure an index is created on each field you want to visualize.
  • All this information coming from all the machines of a datacenter being centralized into a single place is a gold mine from my point of view.

Conclusion

Thanks to the ELK stack you can gather, centralize, analyze and visualize the content of the alert.log and listener.log files for your whole datacenter the way you want to. This information is a gold mine and the imagination is the only limit.

Posted in Other | 10 Comments

Graphing Oracle performance metrics with Telegraf, InfluxDB and Grafana

Introduction

As a picture is worth a thousand words, we may want to visualise the oracle performance metrics. There is already several tools to do so, but what if you could do it the way you want? build your own graph? Let’s try to achieve this with 3 layers:

  • telegraf: to collect the oracle metrics
  • InfluxDB: to store the time-series oracle metrics
  • grafana: to visualise the oracle time-series metrics

Installation

Let’s install InfluxDB and grafana that way (for example on the same host, namely influxgraf):

  • Install InfluxDB (detail here):
[root@influxgraf ~]# wget https://s3.amazonaws.com/influxdb/influxdb-0.10.2-1.x86_64.rpm

[root@influxgraf ~]# yum localinstall influxdb-0.10.2-1.x86_64.rpm
  • Install grafana (detail here):
[root@influxgraf ~]# yum install https://grafanarel.s3.amazonaws.com/builds/grafana-2.6.0-1.x86_64.rpm

Let’s install telegraf on the oracle host (namely Dprima) that way (detail here):

[root@Dprima ~]# wget http://get.influxdb.org/telegraf/telegraf-0.10.4.1-1.x86_64.rpm

[root@Dprima ~]# yum localinstall telegraf-0.10.4.1-1.x86_64.rpm

Setup

  • Create a telegraf database into InfluxDB (using the web interface):

Screen Shot 2016-03-05 at 09.06.10

  • Create a root user into InfluxDB (using the web interface):

Screen Shot 2016-03-05 at 09.08.38

  • Write a script to collect the oracle metrics. I am using python but this is not mandatory at all. Only the output of the script does matter, it has to be InfluxDB line-protocol. The script query the v$sysmetric and v$eventmetric views to get the wait class and the wait event metrics during the last minute. I am not reinventing the wheel, I am using Kyle Hailey‘s queries. The python code is:
[oracle@Dprima scripts]$ cat oracle_metrics.py
import os
import sys
import cx_Oracle
import argparse
import re

class OraMetrics():
    def __init__(self, user, passwd, sid):
        import cx_Oracle
        self.user = user
        self.passwd = passwd
        self.sid = sid
        self.connection = cx_Oracle.connect( self.user , self.passwd , self.sid )
        cursor = self.connection.cursor()
        cursor.execute("select HOST_NAME from v$instance")
        for hostname in cursor:
            self.hostname = hostname[0]

    def waitclassstats(self, user, passwd, sid):
        cursor = self.connection.cursor()
        cursor.execute("""
        select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS
        from   v$waitclassmetric  m, v$system_wait_class n
        where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle'
        union
        select  'CPU', round(value/100,3) AAS
        from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2
        union select 'CPU_OS', round((prcnt.busy*parameter.cpu_count)/100,3) - aas.cpu
        from
            ( select value busy
                from v$sysmetric
                where metric_name='Host CPU Utilization (%)'
                and group_id=2 ) prcnt,
                ( select value cpu_count from v$parameter where name='cpu_count' )  parameter,
                ( select  'CPU', round(value/100,3) cpu from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2) aas
        """)
        for wait in cursor:
            wait_name = wait[0]
            wait_value = wait[1]
            print "oracle_wait_class,host=%s,db=%s,wait_class=%s wait_value=%s" % (self.hostname, sid,re.sub(' ', '_', wait_name), wait_value )

    def waitstats(self, user, passwd, sid):
        cursor = self.connection.cursor()
        cursor.execute("""
	select
	n.wait_class wait_class,
       	n.name wait_name,
       	m.wait_count cnt,
       	round(10*m.time_waited/nullif(m.wait_count,0),3) avgms
	from v$eventmetric m,
     	v$event_name n
	where m.event_id=n.event_id
  	and n.wait_class <> 'Idle' and m.wait_count > 0 order by 1 """)
        for wait in cursor:
         wait_class = wait[0]
         wait_name = wait[1]
         wait_cnt = wait[2]
         wait_avgms = wait[3]
         print "oracle_wait_event,host=%s,db=%s,wait_class=%s,wait_event=%s count=%s,latency=%s" % (self.hostname, sid,re.sub(' ', '_', wait_class),re.sub(' ', '_', wait_name)
, wait_cnt,wait_avgms)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--user', help="Username", required=True)
    parser.add_argument('-p', '--passwd', required=True)
    parser.add_argument('-s', '--sid', help="SID", required=True)

    args = parser.parse_args()

    stats = OraMetrics(args.user, args.passwd, args.sid)
    stats.waitclassstats(args.user, args.passwd, args.sid)
    stats.waitstats(args.user, args.passwd, args.sid)

The output looks like (the output format is the InfluxDB line-protocol):

[oracle@Dprima scripts]$ python "/home/oracle/scripts/oracle_metrics.py" "-u" "system" "-p" "bdtbdt" "-s" PBDT
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Administrative wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=CPU wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=CPU_OS wait_value=0.035
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Commit wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Concurrency wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Configuration wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Network wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Other wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Scheduler wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=System_I/O wait_value=0.005
oracle_wait_class,host=Dprima,db=PBDT,wait_class=User_I/O wait_value=0
oracle_wait_event,host=Dprima,db=PBDT,wait_class=System_I/O,wait_event=control_file_sequential_read count=163,latency=0.009
oracle_wait_event,host=Dprima,db=PBDT,wait_class=System_I/O,wait_event=control_file_parallel_write count=60,latency=3.933
oracle_wait_event,host=Dprima,db=PBDT,wait_class=System_I/O,wait_event=log_file_parallel_write count=60,latency=1.35
oracle_wait_event,host=Dprima,db=PBDT,wait_class=User_I/O,wait_event=Disk_file_operations_I/O count=16,latency=0.037
oracle_wait_event,host=Dprima,db=PBDT,wait_class=User_I/O,wait_event=Parameter_File_I/O count=16,latency=0.004
  • On the oracle host, configure telegraf to execute the python script with 60 seconds interval and send the output to InfluxDB. Edit the /etc/telegraf/telegraf.conf file so that it contains:
###############################################################################
#                                  OUTPUTS                                    #
###############################################################################

# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
  urls = ["http://influxgraf:8086"] # required
  database = "telegraf" # required
  precision = "s"
  timeout = "5s"


###############################################################################
#                                  INPUTS                                     #
###############################################################################

# Oracle metrics
[[inputs.exec]]
  # Shell/commands array
  commands = ["/home/oracle/scripts/oracle_metrics.sh"]
  # Data format to consume. This can be "json", "influx" or "graphite" (line-protocol)
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"
  interval = "60s"

The oracle_metrics.sh script contains the call to the python script:

#!/bin/env bash

export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0/dbhome_1//lib
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1/

python "/home/oracle/scripts/oracle_metrics.py" "-u" "system" "-p" "bdtbdt" "-s" PBDT
  • Launch telegraf:
[root@Dprima ~]# telegraf -config /etc/telegraf/telegraf.conf

Graph the metrics

  • First check that the metrics are stored the way we want into InfluxDB:

Screen Shot 2016-03-05 at 09.49.09

 

  • Configure InfluxDB as datasource in grafana:

Screen Shot 2016-03-05 at 10.03.55

  • In grafana, create a dashboard and create some variables (hosts, db and wait_class):

Screen Shot 2016-03-05 at 10.59.00

Screen Shot 2016-03-05 at 10.59.57

  • Now let’s create a graph:

Screen Shot 2016-03-05 at 11.04.09

  • Get the metrics:

Screen Shot 2016-03-05 at 11.06.18

  • So that the graph looks like:

Screen Shot 2016-03-05 at 11.20.18

Remarks

  • Thanks to the variables defined in grafana, we can filter on the host, database and wait class (instead of visualising all of them).
  • We can also graph the wait events we collected, or whatever you want to collect.

Conclusion

Thanks to:

  • telegraf and InfluxDB, we are able to collect and store the metrics we want to.
  • grafana, we are able to visualise the metrics the way we want to.

Update 04/21/2016:

As another example, with more informations collected:

dash1

dash2

dash3

dash4

dash5

 

Posted in Performance | 18 Comments

Monitor the dNFS activity as of 11.2.0.4

As you may know, there is some views available since 11g to monitor the dnfs activity:

  • v$dnfs_servers: Shows a table of servers accessed using Direct NFS.
  • v$dnfs_files: Shows a table of files now open with Direct NFS.
  • v$dnfs_channels: Shows a table of open network paths (or channels) to servers for which Direct NFS is providing files.
  • v$dnfs_stats: Shows a table of performance statistics for Direct NFS.

One interesting thing is that the v$dnfs_stats view provides two new columns as of 11.2.0.4:

  • NFS_READBYTES: Number of bytes read from NFS server
  • NFS_WRITEBYTES: Number of bytes written to NFS server

See the oracle documentation here.

Then, by sampling the view we can provide those metrics:

  • Reads/s: Number of read per second.
  • KbyRead/s: Kbytes read per second (as of 11.2.0.4).
  • AvgBy/Read: Average bytes per read (as of 11.2.0.4).
  • Writes/s: Number of Write per second.
  • KbyWrite/s: Kbytes write per second (as of 11.2.0.4).
  • AvgBy/Write: Average bytes per write (as of 11.2.0.4).

To do so, I just created a very simple db_io_dnfs_metrics.pl utility. It basically takes a snapshot each second (default interval) from the gv$dnfs_stats view and computes the delta with the previous snapshot. The utility is RAC aware.

Let’s see the help:

$>./db_io_dnfs_metrics.pl -help

Usage: ./db_io_dnfs_metrics.pl [-interval] [-count] [-inst] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                           Default
  ---------         -------                                                           -------
  -INST=            ALL - Show all Instance(s)                                        ALL
                    CURRENT - Show Current Instance
  -DISPLAY=         What to display: snap,avg (comma separated list)                  SNAP
  -SORT_FIELD=      reads|writes|iops                                                 NONE

Example: ./db_io_dnfs_metrics.pl
Example: ./db_io_dnfs_metrics.pl  -inst=BDT_1
Example: ./db_io_dnfs_metrics.pl  -sort_field=reads

This is a very simple utility, the options/features are:

  1. You can choose the number of snapshots to display and the time to wait between the snapshots.
  2. In case of RAC, you can choose on which database instance to collect the metrics thanks to the –INST= parameter.
  3. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the –DISPLAY= parameter.
  4. In case of RAC, you can sort the instances based on the number of reads, number of writes, number of IOPS (reads+writes) thanks to the –SORT_FIELD= parameter.

Examples:

Collecting on a single Instance:

$>./db_io_dnfs_metrics.pl
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

14:26:18                          Kby       AvgBy/               Kby       AvgBy/
14:26:18   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
14:26:18   ----------   -------   -------   -------   --------   -------   --------   --------   --------
14:26:18   VSBDT        321       2568      8192      0          0         0          321        2.5
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

14:26:19                          Kby       AvgBy/               Kby       AvgBy/
14:26:19   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
14:26:19   ----------   -------   -------   -------   --------   -------   --------   --------   --------
14:26:19   VSBDT        320       2560      8192      1          16        16384      321        2.5

Collecting on a RAC database, sorting the Instances by the number of read:

$>./db_io_dnfs_metrics.pl -sort_field=reads
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:21                          Kby       AvgBy/               Kby       AvgBy/
17:21:21   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:21   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:21   VBDTO_1      175       44536     260599    0          0         0          175        43.5
17:21:21   VBDTO_2      69        2272      33718     0          0         0          69         2.2
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:22                          Kby       AvgBy/               Kby       AvgBy/
17:21:22   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:22   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:22   VBDTO_2      151       36976     250751    0          0         0          151        36.1
17:21:22   VBDTO_1      131       33408     261143    0          0         0          131        32.6
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:23                          Kby       AvgBy/               Kby       AvgBy/
17:21:23   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:23   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:23   VBDTO_2      133       33592     258633    0          0         0          133        32.8
17:21:23   VBDTO_1      121       31360     265394    0          0         0          121        30.6

Remarks:

Posted in Perl Scripts, ToolKit | Leave a comment

Hyper-threading: measure its impact during SLOB runs thanks to numactl and turbostat

Introduction

As you probably know, a single physical CPU core with hyper-threading appears as two logical CPUs to an operating system. 

While the operating system sees two CPUs for each core, the actual CPU hardware only has a single set of execution resources for each core.

Hyper-threading can help speed your system up, but it’s nowhere near as good as having additional cores.

Out of curiosity, I checked the impact of hyper-threading on the number of logical IO per second (LIOPS) during SLOB runs. The main idea is to compare:

  • 2 FULL CACHED SLOB sessions running on the same core
  • 2 FULL CACHED SLOB sessions running on two distincts cores

Into this blog post, I just want to share how I did the measure and my results (of course your results could differ depending of the cpus, the slob configuration and so on..).

Tests environment and setup

  • Oracle database 11.2.0.4.
  • SLOB with the following slob.conf in place (the rest of the file contains the default slob.conf values):
UPDATE_PCT=0
RUN_TIME=60
WORK_LOOP=0
SCALE=80M
WORK_UNIT=64
  • The cpu_topology is the following (the cpu_topology script comes from Karl Arao’s cputoolkit here):
$ sh ./cpu_topology
	
model name	: Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
processors  (OS CPU count)          0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
physical id (processor socket)      0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
siblings    (logical CPUs/socket)   16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16
core id     (# assigned to a core)  0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
cpu cores   (physical cores/socket) 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

$ lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 2893.187
BogoMIPS: 5785.23
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K

As you can see the box is made of 2 sockets, 16 cores and 32 threads (so that the database sees 32 cpus).

  • Turbo boost does not come into play (you’ll see the proof during the tests thanks to the turbostat command).
  • Obviously the same SLOB configuration has been used for the tests.

Launch the tests

Test 1: launch 2 SLOB sessions on 2 distincts sockets (so on 2 distincts cores) that way:

$ cd SLOB
$ numactl --physcpubind=4,9 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

Cpus 4 and 9 are on 2 distincts sockets (see cpu_topology output).

The turbsostat command shows that they are 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     241    8.34    2893    2893       0   91.66    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00  116.66   88.59    0.00    0.00    0.00
       0       0       0      86    2.98    2893    2893       0   97.02    0.00    0.00    0.00      46      48    0.00    0.00    0.00    0.00   50.44   36.66    0.00    0.00    0.00
       0       0      16      70    2.41    2893    2893       0   97.59
       0       1       1      71    2.45    2893    2893       0   97.55    0.00    0.00    0.00      43
       0       1      17     131    4.54    2893    2893       0   95.46
       0       2       2      81    2.81    2893    2893       0   97.19    0.00    0.00    0.00      44
       0       2      18      73    2.53    2893    2893       0   97.47
       0       3       3      29    1.00    2893    2893       0   99.00    0.00    0.00    0.00      44
       0       3      19      21    0.72    2893    2893       0   99.28
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      46
       0       4      20      40    1.39    2893    2893       0   98.61
       0       5       5      58    2.02    2893    2893       0   97.98    0.00    0.00    0.00      46
       0       5      21      82    2.82    2893    2893       0   97.18
       0       6       6     106    3.65    2893    2893       0   96.35    0.00    0.00    0.00      44
       0       6      22      42    1.46    2893    2893       0   98.54
       0       7       7      28    0.97    2893    2893       0   99.03    0.00    0.00    0.00      48
       0       7      23      15    0.52    2893    2893       0   99.48
       1       0       8     222    7.67    2893    2893       0   92.33    0.00    0.00    0.00      72      76    0.00    0.00    0.00    0.00   66.22   51.92    0.00    0.00    0.00
       1       0      24      63    2.17    2893    2893       0   97.83
       1       1       9    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      76
       1       1      25     108    3.73    2893    2893       0   96.27
       1       2      10      86    2.98    2893    2893       0   97.02    0.00    0.00    0.00      71
       1       2      26      78    2.71    2893    2893       0   97.29
       1       3      11      57    1.96    2893    2893       0   98.04    0.00    0.00    0.00      69
       1       3      27      41    1.40    2893    2893       0   98.60
       1       4      12      56    1.93    2893    2893       0   98.07    0.00    0.00    0.00      71
       1       4      28      30    1.03    2893    2893       0   98.97
       1       5      13      32    1.12    2893    2893       0   98.88    0.00    0.00    0.00      71
       1       5      29      23    0.79    2893    2893       0   99.21
       1       6      14      49    1.68    2893    2893       0   98.32    0.00    0.00    0.00      71
       1       6      30      93    3.20    2893    2893       0   96.80
       1       7      15      41    1.41    2893    2893       0   98.59    0.00    0.00    0.00      72
       1       7      31      21    0.72    2893    2893       0   99.28

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 955 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      2.95
              DB CPU(s):               1.9               5.6      0.00      2.89
      Redo size (bytes):          16,541.5          48,682.3
  Logical read (blocks):         955,750.1       2,812,818.1

Test 2: launch 2 SLOB sessions on 2 distincts cores on the same socket that way:

$ cd SLOB
$ numactl --physcpubind=4,5 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

Cpus 4 and 5 are on 2 distincts cores on the same socket (see cpu_topology output).

The turbsostat command shows that they are 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     270    9.34    2893    2893       0   90.66    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00  119.34   91.17    0.00    0.00    0.00
       0       0       0     152    5.26    2893    2893       0   94.74    0.00    0.00    0.00      47      49    0.00    0.00    0.00    0.00   55.29   41.51    0.00    0.00    0.00
       0       0      16      59    2.04    2893    2893       0   97.96
       0       1       1      62    2.13    2893    2893       0   97.87    0.00    0.00    0.00      42
       0       1      17      49    1.70    2893    2893       0   98.30
       0       2       2      59    2.03    2893    2893       0   97.97    0.00    0.00    0.00      43
       0       2      18      55    1.90    2893    2893       0   98.10
       0       3       3      10    0.33    2893    2893       0   99.67    0.00    0.00    0.00      44
       0       3      19      16    0.56    2893    2893       0   99.44
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      46
       0       4      20      40    1.40    2893    2893       0   98.60
       0       5       5    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      49
       0       5      21     133    4.59    2893    2893       0   95.41
       0       6       6      71    2.47    2893    2893       0   97.53    0.00    0.00    0.00      45
       0       6      22      57    1.97    2893    2893       0   98.03
       0       7       7      20    0.70    2893    2893       0   99.30    0.00    0.00    0.00      47
       0       7      23      14    0.47    2893    2893       0   99.53
       1       0       8     364   12.58    2893    2893       0   87.42    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00   64.05   49.66    0.00    0.00    0.00
       1       0      24      93    3.20    2893    2893       0   96.80
       1       1       9     270    9.34    2893    2893       0   90.66    0.00    0.00    0.00      76
       1       1      25     105    3.62    2893    2893       0   96.38
       1       2      10     267    9.23    2893    2893       0   90.77    0.00    0.00    0.00      75
       1       2      26     283    9.77    2893    2893       0   90.23
       1       3      11      73    2.52    2893    2893       0   97.48    0.00    0.00    0.00      73
       1       3      27      92    3.19    2893    2893       0   96.81
       1       4      12      43    1.50    2893    2893       0   98.50    0.00    0.00    0.00      75
       1       4      28      44    1.53    2893    2893       0   98.47
       1       5      13      46    1.60    2893    2893       0   98.40    0.00    0.00    0.00      75
       1       5      29      48    1.67    2893    2893       0   98.33
       1       6      14      76    2.63    2893    2893       0   97.37    0.00    0.00    0.00      75
       1       6      30     105    3.64    2893    2893       0   96.36
       1       7      15     106    3.67    2893    2893       0   96.33    0.00    0.00    0.00      75

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 920 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      3.02
              DB CPU(s):               1.9               5.7      0.00      2.97
      Redo size (bytes):          16,014.6          47,163.1
  Logical read (blocks):         921,446.7       2,713,660.6

So, let’s say this is more or less the same as with 2 sessions on 2 distincts sockets.

Test 3: launch 2 SLOB sessions on the same core that way:

$ cd SLOB
$ numactl --physcpubind=4,20 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

Cpus 4 and 20 are sharing the same core (see cpu_topology output).

The turbsostat command shows that they are 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     250    8.65    2893    2893       0   91.35    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00  114.06   85.95    0.00    0.00    0.00
       0       0       0     100    3.45    2893    2893       0   96.55    0.00    0.00    0.00      48      48    0.00    0.00    0.00    0.00   51.21   37.45    0.00    0.00    0.00
       0       0      16      41    1.41    2893    2893       0   98.59
       0       1       1      75    2.61    2893    2893       0   97.39    0.00    0.00    0.00      42
       0       1      17      58    2.01    2893    2893       0   97.99
       0       2       2      51    1.75    2893    2893       0   98.25    0.00    0.00    0.00      44
       0       2      18      40    1.38    2893    2893       0   98.62
       0       3       3      25    0.86    2893    2893       0   99.14    0.00    0.00    0.00      45
       0       3      19       9    0.30    2893    2893       0   99.70
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      47
       0       4      20    2893  100.00    2893    2893       0    0.00
       0       5       5     120    4.14    2893    2893       0   95.86    0.00    0.00    0.00      44
       0       5      21      73    2.54    2893    2893       0   97.46
       0       6       6      65    2.25    2893    2893       0   97.75    0.00    0.00    0.00      45
       0       6      22      51    1.76    2893    2893       0   98.24
       0       7       7      18    0.64    2893    2893       0   99.36    0.00    0.00    0.00      48
       0       7      23       7    0.25    2893    2893       0   99.75
       1       0       8     491   16.97    2893    2893       0   83.03    0.00    0.00    0.00      76      76    0.00    0.00    0.00    0.00   62.85   48.50    0.00    0.00    0.00
       1       0      24      88    3.04    2893    2893       0   96.96
       1       1       9      91    3.16    2893    2893       0   96.84    0.00    0.00    0.00      75
       1       1      25      80    2.75    2893    2893       0   97.25
       1       2      10      87    3.00    2893    2893       0   97.00    0.00    0.00    0.00      74
       1       2      26     104    3.61    2893    2893       0   96.39
       1       3      11      62    2.16    2893    2893       0   97.84    0.00    0.00    0.00      72
       1       3      27      80    2.77    2893    2893       0   97.23
       1       4      12      36    1.24    2893    2893       0   98.76    0.00    0.00    0.00      74
       1       4      28      36    1.25    2893    2893       0   98.75
       1       5      13      47    1.62    2893    2893       0   98.38    0.00    0.00    0.00      75
       1       5      29      75    2.61    2893    2893       0   97.39
       1       6      14      52    1.79    2893    2893       0   98.21    0.00    0.00    0.00      75
       1       6      30      92    3.17    2893    2893       0   96.83
       1       7      15      39    1.35    2893    2893       0   98.65    0.00    0.00    0.00      76
       1       7      31      32    1.11    2893    2893       0   98.89

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 600 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      3.02
              DB CPU(s):               1.9               5.6      0.00      2.96
      Redo size (bytes):          16,204.0          47,101.1
  Logical read (blocks):         603,430.3       1,754,028.2

So, as you can see, the SLOB runs (with the same SLOB configuration) produced about:

  • 955 000 LIOPS with the 2 sessions running on 2 distincts sockets (so on 2 distincts cores).
  • 920 000 LIOPS with the 2 sessions running on 2 distincts cores on the same socket.
  • 600 000 LIOPS with the 2 sessions running on the same core.

Out of curiosity, what if we disable the hyper-threading on one core and launch the 2 sessions on it?

Let’s put the cpu 20 offline (to be sure there is no noise on it during the test)

echo 0 > /sys/devices/system/cpu/cpu20/online

launch the 2 sessions on the same cpu (the remaining one online for this core as the second one has just been put offline) that way:

$ cd SLOB
$ numactl --physcpubind=4 /bin/bash 
$ echo "startup" | sqlplus / as sysdba
$ ./runit.sh 2

The turbsostat command shows that cpu 4 is 100% busy during the run:

Package    Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt RAMWatt   PKG_%   RAM_%
       -       -       -     153    5.30    2893    2893       0   94.70    0.00    0.00    0.00      77      77    0.00    0.00    0.00    0.00  112.76   84.64    0.00    0.00    0.00
       0       0       0      56    1.94    2893    2893       0   98.06    0.00    0.00    0.00      45      46    0.00    0.00    0.00    0.00   49.86   36.07    0.00    0.00    0.00
       0       0      16      42    1.45    2893    2893       0   98.55
       0       1       1      97    3.35    2893    2893       0   96.65    0.00    0.00    0.00      42
       0       1      17      61    2.11    2893    2893       0   97.89
       0       2       2      47    1.62    2893    2893       0   98.38    0.00    0.00    0.00      44
       0       2      18      45    1.54    2893    2893       0   98.46
       0       3       3      19    0.64    2893    2893       0   99.36    0.00    0.00    0.00      44
       0       3      19      25    0.85    2893    2893       0   99.15
       0       4       4    2893  100.00    2893    2893       0    0.00    0.00    0.00    0.00      46
       0       5       5       9    0.31    2893    2893       0   99.69    0.00    0.00    0.00      44
       0       5      21      14    0.49    2893    2893       0   99.51
       0       6       6      14    0.48    2893    2893       0   99.52    0.00    0.00    0.00      44
       0       6      22      12    0.41    2893    2893       0   99.59
       0       7       7      25    0.88    2893    2893       0   99.12    0.00    0.00    0.00      46
       0       7      23       9    0.33    2893    2893       0   99.67
       1       0       8     133    4.59    2893    2893       0   95.41    0.00    0.00    0.00      76      77    0.00    0.00    0.00    0.00   62.90   48.56    0.00    0.00    0.00
       1       0      24      69    2.37    2893    2893       0   97.63
       1       1       9      89    3.07    2893    2893       0   96.93    0.00    0.00    0.00      76
       1       1      25      73    2.52    2893    2893       0   97.48
       1       2      10      98    3.38    2893    2893       0   96.62    0.00    0.00    0.00      76
       1       2      26     113    3.92    2893    2893       0   96.08
       1       3      11      76    2.62    2893    2893       0   97.38    0.00    0.00    0.00      74
       1       3      27      78    2.71    2893    2893       0   97.29
       1       4      12     194    6.71    2893    2893       0   93.29    0.00    0.00    0.00      75
       1       4      28      27    0.94    2893    2893       0   99.06
       1       5      13      62    2.14    2893    2893       0   97.86    0.00    0.00    0.00      76
       1       5      29      95    3.29    2893    2893       0   96.71
       1       6      14     101    3.48    2893    2893       0   96.52    0.00    0.00    0.00      75
       1       6      30      92    3.19    2893    2893       0   96.81
       1       7      15      64    2.20    2893    2893       0   97.80    0.00    0.00    0.00      77
       1       7      31      20    0.68    2893    2893       0   99.32

and as you can see it is the only cpu available on that core.

Turbo boost did not play  (Bzy_MHz = TSC_MHz).

Those 2 sessions produced about 460 000 LIOPS:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0               5.8      0.00      2.69
              DB CPU(s):               1.0               2.8      0.00      1.32
      Redo size (bytes):          18,978.1          55,154.9
  Logical read (blocks):         458,451.7       1,332,369.7

which makes fully sense as 2 sessions on 2 distincts cores on the same sockets produced about 920 000 LIOPS (Test 2).

So what?

Imagine that you launched multiple full cached SLOB runs on your new box to measure its behaviour/scalability (from one session up to the cpu_count value).

Example: I launched 11 full cached SLOB runs with 2,4,6,8,12,16,18,20,24,28 and 32 sessions running. The LIOPS graph is the following:

Screen Shot 2015-09-22 at 09.29.08

Just check the curve of the graph (not the values). As you can see, once the number of sessions reached the number of cores (16) the LIOPS is still increasing but its increase rate is slow down. One of the reason is the one we observed during our previous tests: 2 sessions running on the same core don’t perform as fast as 2 sessions running on 2 distincts cores.

Remarks

Worth watching presentation around oracle and cpu stuff:

The absolute LIOPS numbers are relatively low. It would have been possible to increase those numbers during the tests by increasing the WORK_UNIT slob parameter (see this blog post).

To get the turbostat utility, copy/paste the turbostat.c and msr-index.h contents into a directory and compile turbostat.c that way:

$ gcc -Wall -DMSRHEADER='"./msr-index.h"' turbostat.c -o turbostat

Conclusion

During the test I observed the following (of course your results could differ depending of the cpus, the slob configuration and so on..):

  • 955 000 LIOPS with the 2 sessions running on 2 distincts sockets (so on 2 distincts cores)
  • 920 000 LIOPS with the 2 sessions running on 2 distincts cores on the same socket
  • 600 000 LIOPS with the 2 sessions running on the same core
  • 460 000 LIOPS with the 2 sessions running on the same core with hyper-threading disabled

Hyper-threading helps to increase the overall LIOPS throughput of the system, but once the numbers of sessions is > number of cores then the average LIOPS per session decrease.

Posted in Performance | 6 Comments

Exadata cell metrics: Join the metrics and their descriptions on the fly

Introduction

Cells metrics are very useful but their name are not so friendly. The name is a concatenation of abbreviations for the type of component, delimited by the underscore character. Then you have to understand the naming convention to understand the meaning of the metric name.

For example, knowing that:

  • CD stands for “Cell Disks metrics”
  • IO_BY stands for “Number of megabytes”
  • R stands for Read
  • LG stands for Large

You can conclude that the CD_IO_BY_R_LG metric is linked to the “Number of megabytes read in large blocks from a cell disk”.

Hopefully the metrics are explained into the Oracle Documentation and you can also retrieve their description from cellcli:

$ cellcli -e "list metricdefinition attributes name,description where name='CD_IO_BY_R_LG'"

CD_IO_BY_R_LG	 "Number of megabytes read in large blocks from a cell disk"

Lack of description

That said, as an example, let’s query the current metric values for a particular database that way:

$ cellcli -e "list metriccurrent attributes metricObjectName,name,metricValue where name like 'DB.*' and metricObjectName='BDT'"

	 BDT	 DB_FC_IO_BY_SEC   	 0 MB/sec
	 BDT	 DB_FC_IO_RQ       	 47,638 IO requests
	 BDT	 DB_FC_IO_RQ_SEC   	 2.1 IO/sec
	 BDT	 DB_FD_IO_BY_SEC   	 0 MB/sec
	 BDT	 DB_FD_IO_LOAD     	 19,885
	 BDT	 DB_FD_IO_RQ_LG    	 36 IO requests
	 BDT	 DB_FD_IO_RQ_LG_SEC	 0.0 IO/sec
	 BDT	 DB_FD_IO_RQ_SM    	 47,602 IO requests
	 BDT	 DB_FD_IO_RQ_SM_SEC	 2.1 IO/sec
	 BDT	 DB_FL_IO_BY       	 0.000 MB
	 BDT	 DB_FL_IO_BY_SEC   	 0.000 MB/sec
	 BDT	 DB_FL_IO_RQ       	 0 IO requests
	 BDT	 DB_FL_IO_RQ_SEC   	 0.0 IO/sec
	 BDT	 DB_IO_BY_SEC      	 0 MB/sec
	 BDT	 DB_IO_LOAD        	 0.0
	 BDT	 DB_IO_RQ_LG       	 0 IO requests
	 BDT	 DB_IO_RQ_LG_SEC   	 0.0 IO/sec
	 BDT	 DB_IO_RQ_SM       	 19 IO requests
	 BDT	 DB_IO_RQ_SM_SEC   	 0.0 IO/sec
	 BDT	 DB_IO_UTIL_LG     	 0 %
	 BDT	 DB_IO_UTIL_SM     	 0 %
	 BDT	 DB_IO_WT_LG       	 0 ms
	 BDT	 DB_IO_WT_LG_RQ    	 0.0 ms/request
	 BDT	 DB_IO_WT_SM       	 0 ms
	 BDT	 DB_IO_WT_SM_RQ    	 0.0 ms/request

As you can see the metric description is not there and there is no way to retrieve it from metriccurrent (or metrichistory) because this is not an attribute:

$ cellcli -e "describe metriccurrent"
	name
	alertState
	collectionTime
	metricObjectName
	metricType
	metricValue
	objectType

$ cellcli -e "describe metrichistory"
	name
	alertState
	collectionTime
	metricObjectName
	metricType
	metricValue
	metricValueAvg
	metricValueMax
	metricValueMin
	objectType

But if you send the result of our example to someone that don’t know (or don’t remember) the naming convention (or if you are not 100% sure of the definition of a particular metric) then he/you’ll have to:

  • go back to the oracle documentation
  • query the metricdefinition with cellcli

New script: exadata_metrics_desc.pl

Thanks to the exadata_metrics_desc.pl script, you can add (to the cellcli output) the description of the metric on the fly.

Let’s launch the same query (as the one used in the previous example) and add a call to exadata_metrics_desc.pl that way:

$ cellcli -e "list metriccurrent attributes metricObjectName,name,metricValue where name like 'DB.*' and metricObjectName='BDT'" | ./exadata_metrics_desc.pl

  BDT   DB_FC_IO_BY_SEC (Number of megabytes of I/O per second for this database to flash cache)          0 MB/sec
  BDT   DB_FC_IO_RQ (Number of IO requests issued by a database to flash cache)                           48,123 IO requests
  BDT   DB_FC_IO_RQ_SEC (Number of IO requests issued by a database to flash cache per second)            2.1 IO/sec
  BDT   DB_FD_IO_BY_SEC (Number of megabytes of I/O per second for this database to flash disks)          0 MB/sec
  BDT   DB_FD_IO_LOAD (Average I/O load from this database for flash disks)                               4,419
  BDT   DB_FD_IO_RQ_LG (Number of large IO requests issued by a database to flash disks)                  36 IO requests
  BDT   DB_FD_IO_RQ_LG_SEC (Number of large IO requests issued by a database to flash disks per second)   0.0 IO/sec
  BDT   DB_FD_IO_RQ_SM (Number of small IO requests issued by a database to flash disks)                  48,087 IO requests
  BDT   DB_FD_IO_RQ_SM_SEC (Number of small IO requests issued by a database to flash disks per second)   2.1 IO/sec
  BDT   DB_FL_IO_BY (The number of MB written to the Flash Log)                                           0.000 MB
  BDT   DB_FL_IO_BY_SEC (The number of MB written per second to the Flash Log)                            0.000 MB/sec
  BDT   DB_FL_IO_RQ (The number of I/O requests issued to the Flash Log)                                  0 IO requests
  BDT   DB_FL_IO_RQ_SEC (The number of I/O requests per second issued to the Flash Log)                   0.0 IO/sec
  BDT   DB_IO_BY_SEC (Number of megabytes of I/O per second for this database to hard disks)              0 MB/sec
  BDT   DB_IO_LOAD (Average I/O load from this database for hard disks)                                   0.0
  BDT   DB_IO_RQ_LG (Number of large IO requests issued by a database to hard disks)                      0 IO requests
  BDT   DB_IO_RQ_LG_SEC (Number of large IO requests issued by a database to hard disks per second)       0.0 IO/sec
  BDT   DB_IO_RQ_SM (Number of small IO requests issued by a database to hard disks)                      19 IO requests
  BDT   DB_IO_RQ_SM_SEC (Number of small IO requests issued by a database to hard disks per second)       0.0 IO/sec
  BDT   DB_IO_UTIL_LG (Percentage of disk resources utilized by large requests from this database)        0 %
  BDT   DB_IO_UTIL_SM (Percentage of disk resources utilized by small requests from this database)        0 %
  BDT   DB_IO_WT_LG (IORM wait time for large IO requests issued by a database)                           0 ms
  BDT   DB_IO_WT_LG_RQ (Average IORM wait time per request for large IO requests issued by a database)    0.0 ms/request
  BDT   DB_IO_WT_SM (IORM wait time for small IO requests issued by a database)                           0 ms
  BDT   DB_IO_WT_SM_RQ (Average IORM wait time per request for small IO requests issued by a database)    0.0 ms/request

As you can see the description of each metric being part of the initial output has been added.

Remarks

  • You can download the script from this repository or from GitHub.
  • Feel free to build the query you want on the metrics. You just need to add a call to exadata_metrics_desc.pl to see the metric description being added on the fly (as long as the metric name appears in the output of your initial query).
  • The idea of this script is all to credit to Martin Bach.
  • This script works with any input (could be a text file):

Screen Shot 2015-09-14 at 15.34.36

Conclusion

The exadata_metrics_desc.pl can be used to join on the fly the metric name, its value (and whatever attribute you would love to see) with its associated description.

Posted in Exadata | Leave a comment

Extract Exadata cells historical metrics in CSV format

Exadata provides a lot of useful metrics to monitor the Cells and you may want to retrieve historical values for some metrics. To do so, you can use the “LIST METRICHISTORY” command through CellCLI on the cell.

But as usual, visualising the metrics is even more better. For this purpose, you can use a perl script (see the download link in the remarks section) that extracts the historical metrics in CSV format so that you can graph them with the visualisation tool of your choice.

Let’s see the help:

Usage: ./csv_exadata_metrics_history.pl [cell=|groupfile=] [serial] [type=] [name=] [objectname=] [name!=] [objectname!=] [ago_unit=] [ago_value=]

 Parameter                 Comment                                                      Default
 ---------                 -------                                                      -------
 CELL=                     comma-separated list of cells
 GROUPFILE=                file containing list of cells
 SERIAL                    serialize execution over the cells (default is no)
 TYPE=                     Metrics type to extract: Cumulative|Rate|Instantaneous       ALL
 NAME=                     Metrics to extract (wildcard allowed)                        ALL
 OBJECTNAME=               Objects to extract (wildcard allowed)                        ALL
 NAME!=                    Exclude metrics (wildcard allowed)                           EMPTY
 OBJECTNAME!=              Exclude objects (wildcard allowed)                           EMPTY
 AGO_UNIT=                 Unit to retrieve historical metrics back: day|hour|minute    HOUR
 AGO_VALUE=                Value associated to Unit to retrieve historical metrics back 1

utility assumes passwordless SSH from this cell node to the other cell nodes
utility assumes ORACLE_HOME has been set (with celladmin user for example)

Example : ./csv_exadata_metrics_history.pl cell=cell
Example : ./csv_exadata_metrics_history.pl groupfile=./cell_group
Example : ./csv_exadata_metrics_history.pl cell=cell objectname='CD_disk03_cell'
Example : ./csv_exadata_metrics_history.pl cell=cell name='.*BY.*' objectname='.*disk.*'
Example : ./csv_exadata_metrics_history.pl cell=enkcel02 name='.*DB_IO.*' objectname!='ASM' name!='.*RQ.*' ago_unit=minute ago_value=4
Example : ./csv_exadata_metrics_history.pl cell=enkcel02 type='Instantaneous' name='.*DB_IO.*' objectname!='ASM' name!='.*RQ.*' ago_unit=hour ago_value=4
Example : ./csv_exadata_metrics_history.pl cell=enkcel01,enkcel02 type='Instantaneous' name='.*DB_IO.*' objectname!='ASM' name!='.*RQ.*' ago_unit=minute ago_value=4 serial

You have to setup passwordless SSH from one cell to the other cells (Then you can launch the script from this cell and retrieve data from the other cells).

The main options/features are:

  1. You can specify the cells on which you want to collect the metrics thanks to the cell or groupfile parameter.
  2. You can choose to serialize the execution over the cells thanks to the serial parameter.
  3. You can choose the type of metrics you want to retrieve (Cumulative, rate or instantaneous) thanks to the type parameter.
  4. You can focus on some metrics thanks to the name parameter (wildcard allowed).
  5. You can exclude some metrics thanks to the name! parameter (wildcard allowed).
  6. You can focus on some metricobjectname thanks to the objectname parameter (wildcard allowed).
  7. You can exclude some metricobjectname thanks to the objectname! parameter (wildcard allowed).
  8. You can choose the unit to retrieve metrics back (day, hour, minute) thanks to the ago_unit parameter.
  9. You can choose the value associated to the unit to retrieve metrics back thanks to the ago_value parameter.

Let’s see an example:

I want to retrieve in csv format the metrics from 2 cells related to databases for the last 20 minutes:

$> ./csv_exadata_metrics_history.pl cell=enkx3cel01,enkx3cel02 name='DB_.*' ago_unit=minute ago_value=20
Cell;metricType;DateTime;name;objectname;value;unit
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;ACSTBY;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;ASM;15,779;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;BDT;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;BIGDATA;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DBFS;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DBM;15,779;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DEMO;794,329;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DEMOX3;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;EXDB;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;WZSDB;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;_OTHER_DATABASE_;48,764;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;ACSTBY;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;ASM;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;BDT;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;BIGDATA;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DBFS;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DBM;15;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DEMO;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DEMOX3;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;EXDB;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;WZSDB;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;_OTHER_DATABASE_;0;MB/sec
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;ACSTBY;2,318;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;ASM;0;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;BDT;2,966;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;BIGDATA;25,415;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DBFS;3,489;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DBM;1,627,066;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DEMO;4,506;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DEMOX3;4,172;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;EXDB;0;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;WZSDB;4,378;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;_OTHER_DATABASE_;6,227;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ_LG;ACSTBY;0;IO requests
.
.
.

That way, you could visualize your data the way you feel comfortable with. For example, I used tableau to create this “Database metrics dashboard” on DB_* rate metrics:

cells_metrics

Remarks:

  • If you retrieve too much data, you could receive something like:
Error: enkx3cel02 is returning over 100000 lines; output is truncated !!!
 Command could be retried with the serialize option: --serial
 Killing child pid 15720 to enkx3cel02...

Then, you can launch the script with the serial option (see the help).

Conclusion:

You probably already have a way to build your own graph of the historical metrics. But if you don’t, feel free to use this script and the visualisation tool of your choice.

Posted in Exadata | Leave a comment

12c Application Continuity with Data Guard in action

Introduction

Application Continuity enables replay, in a non-disruptive and rapid manner, of a database request when a recoverable error makes the database session unavailable. Application Continuity uses Transaction Guard. You can find more details on those features here.

With RAC, Application Continuity can be used to launch the replay on another Instance. It can also be used with Data Guard under some circumstances and conditions (Extract from here):

Screen Shot 2015-05-07 at 08.52.52

Well, I just want to see this feature in action during a Data Guard switchover. Let’s give it a try.

Test preparation

The test environment is made of:

  • NBDT database (12.1.0.2 and initially primary).
  • SNBDT database (12.1.0.2 and initially standby).

On the primary database, connected as user bdt, let’s create the following tables:

SQL> create table BDT_CONT as select rownum num,rpad('U',20) val from dual connect by level <= 500000;

Table created.

SQL> create table BDT_SPY (d date, instance_name varchar2(16), num number);

Table created.

And the following trigger:

create or replace trigger BDT_CONT_SPY_T
before update
on BDT_CONT
for each row
begin
insert into BDT_SPY values(sysdate,sys_context('USERENV', 'INSTANCE_NAME'), :old.num);
end;
/

Trigger created.

The test application will update the 500000 rows of the BDT_CONT table in one transaction. The aim of the trigger is to track on which Instance the update has been successfully executed .

Now let’s create a service on the primary and standby databases that way:

# Primary
$ srvctl add service -d NBDT -s appco -preferred NBDT1,NBDT2 -l primary -policy automatic -j SHORT -B SERVICE_TIME -z 30 -w 10 -commit_outcome TRUE -e TRANSACTION -replay_init_time 1800 -retention 86400 -notification TRUE

# Standby
$ srvctl add service -d SNBDT -s appco -preferred SNBDT1,SNBDT2  -l primary -policy automatic -j SHORT -B SERVICE_TIME -z 30 -w 10 -commit_outcome TRUE -e TRANSACTION -replay_init_time 1800 -retention 86400 -notification TRUE

So that the appco service:

  • Is automatically started on the standby if it becomes primary.
  • Is defined for Application Continuity (see more details here on how to configure a service for Application Continuity).

Now let’s create a JAVA application that will be used to test the Application Continuity:

import java.sql.*;
import oracle.jdbc.pool.*;
import oracle.jdbc.*;
import oracle.jdbc.replay.*;

public class AppCont {
    public String getInstanceName(Connection conn) throws SQLException {
        PreparedStatement preStatement = conn.prepareStatement("select instance_name from v$instance");
        String r=new String();
        ResultSet result = preStatement.executeQuery();

        while (result.next()){
            r=result.getString("instance_name");
        }
        return r;
    }

    public static void main(String args[]) throws SQLException {
         Connection conn = null;
         Statement stmt = null;
         try {
              // OracleDataSourceImpl instead of OracleDataSource.
              OracleDataSourceImpl ocpds = new OracleDataSourceImpl();
              // Create the database URL
              String dbURL =
              "jdbc:oracle:thin:@(DESCRIPTION_LIST="+
              "(LOAD_BALANCE=off)"+
              "(FAILOVER=on)"+
              "(DESCRIPTION=(CONNECT_TIMEOUT=90) (RETRY_COUNT=10)(RETRY_DELAY=3)"+
              "(ADDRESS_LIST="+
              "(LOAD_BALANCE=on)"+
              "(ADDRESS=(PROTOCOL=TCP) (HOST=rac-cluster-scan)(PORT=1521)))"+
              "(CONNECT_DATA=(SERVICE_NAME=appco)))"+
              "(DESCRIPTION=(CONNECT_TIMEOUT=90) (RETRY_COUNT=10)(RETRY_DELAY=10)"+
              "(ADDRESS_LIST="+
              "(LOAD_BALANCE=on)"+
	      "(ADDRESS=(PROTOCOL=TCP)(HOST=srac-cluster-scan)(PORT=1521)))"+
              "(CONNECT_DATA=(SERVICE_NAME=appco))))";

              ocpds.setURL(dbURL);
              ocpds.setUser("bdt");
              ocpds.setPassword("bdt");

              // Get a connection
              conn = ocpds.getConnection();

              // Instantiation
              AppCont ex = new AppCont();

              // On which Instance are we connected?
              System.out.println("Instance Name = "+ex.getInstanceName(conn));

              System.out.println("Performing transaction");
              /* Don't forget to disable autocommit
	       * And launch the transaction
	       */
              ((oracle.jdbc.replay.ReplayableConnection)conn).beginRequest();
              conn.setAutoCommit(false);
              stmt = conn.createStatement();
              String updsql = "UPDATE BDT_CONT " +
                              "SET val=UPPER(val)";
              stmt.executeUpdate(updsql);
              conn.commit();
              // End
              ((oracle.jdbc.replay.ReplayableConnection)conn).endRequest();

	      stmt.close();

              System.out.println("Instance Name = "+ex.getInstanceName(conn));

              // On which Instance are we connected?
              conn.close();
         }
         catch (Exception e) {
              e.printStackTrace();
        }
    }
}

The important parts are:

  • The use of OracleDataSourceImpl instead of OracleDataSource.
  • The database URL definition that includes the access to the appco service for both the primary and standby databases (NBDT is hosted on “rac-cluster” and SNBDT is hosted on “srac-cluster”).
  • Autocommit is disabled (Read this if you need more details).

We are ready to launch the application and test the Application Continuity in a Data Guard context.

Run it

First let’s check the status of the data guard configuration and that it is ready for the switchover:

DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/toto@NBDT
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - nbdt_dr

  Protection Mode: MaxPerformance
  Members:
  nbdt  - Primary database
    snbdt - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 24 seconds ago)

DGMGRL> validate database SNBDT;

  Database Role:     Physical standby database
  Primary Database:  nbdt

  Ready for Switchover:  Yes
  Ready for Failover:    Yes (Primary Running)

  Flashback Database Status:
    nbdt:   Off
    snbdt:  Off

Launch the application

$ cat launch.ksh
export CLASSPATH=/u01/app/oracle/product/12.1.0.2/jdbc/lib/ojdbc7.jar:.
time java -cp $CLASSPATH:. AppCont
$ ksh ./launch.ksh
Instance Name = NBDT1
Performing transaction

At this stage, launch the switchover (from another terminal):

DGMGRL> switchover to snbdt;
Performing switchover NOW, please wait...
Operation requires a connection to instance "SNBDT1" on database "snbdt"
Connecting to instance "SNBDT1"...
Connected as SYSDBA.
New primary database "snbdt" is opening...
Oracle Clusterware is restarting database "nbdt" ...
Switchover succeeded, new primary is "snbdt"

And then wait until the application ends:

$ ksh ./launch.ksh
Instance Name = NBDT1
Performing transaction
.
.   <= Switchover occurred here
.
Instance Name = SNBDT1

real	1m46.19s
user	0m1.33s
sys	0m0.13s

As we can see:

  • No errors have been reported.
  • The application terminated on the SNBDT1 instance (Then on the new primary database).

Let’s check (thanks to the spy table) on which Instance did the transaction commit:

$ sqlplus bdt/bdt@SNBDT

SQL*Plus: Release 12.1.0.2.0 Production on Thu May 7 10:04:54 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Thu May 07 2015 09:59:05 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> select distinct(instance_name) from bdt_spy;

INSTANCE_NAME
----------------
SNBDT1

As we can see the transaction has been replayed and did commit on the SNBDT database (The new primary database).

Remarks

  1. Application Continuity is supported for Oracle Data Guard switchovers to physical standby databases. It is also supported for fast-start failover to physical standbys in maximum availability data protection mode.
  2. Don’t forget that the primary and standby databases must be licensed for Oracle RAC or Oracle Active Data Guard in order to use Application Continuity.
  3. The bdt oracle user that has been used during the test, has grant execute on DBMS_APP_CONT.
  4. This Java Application has been created and shared by Laurent Leturgez during the Paris Oracle Meetup. User groups are always good, as you can share your work with others. Thanks again Laurent!

Conclusion

We have seen the Application Continuity feature in Action in a Data Guard context.

Posted in Availability | 3 Comments