SystemTap and Oracle RDBMS: Page Faults

Introduction

Now that I am able to aggregate SytemTap probes by Oracle database, let’s focus on page faults.

For this purpose a new SystemTap script (page_faults_per_db.stp) has been created and has been added into this github repository.

page_faults_per_db

This script tracks the page faults per database. It reports the total number of page faults and splits them into Major or Minor faults as well as Read or Write access.

Usage:

$> stap -g ./page_faults_per_db.stp <oracle uid> <refresh time ms>

Output example:

$> stap -g ./page_faults_per_db.stp 54321 10000

---------------------------------------------------------------------------------------
DBNAME      : READS_PFLT   WRITES_PFLT  TOTAL_PFLT   MAJOR_PFLT   MINOR_PFLT
---------------------------------------------------------------------------------------
BDT         : 30418        22526        52944        22           52922
NOT_A_DB    : 773          1088         1861         4            1858

Remarks

  • In this post the word database stands for “all the foreground and background processes linked to an oracle database”.
  • In a consolidated environment, having a view per database can be very useful.
  • Should you be interested by this subject, then do read Hatem Mahmoud post.

Conclusion

We are able to track and group the page faults at the database level.

Advertisements

SystemTap and Oracle RDBMS: Time in Various States, VFS I/O and Block I/O

Introduction

Now that I am able to aggregate SytemTap probes by Oracle database, it’s time to create several scripts in a toolkit. The toolkit is available in this github repository.

Let’s describe 3 new members of the toolkit:

  • schedtimes_per_db.stp: To track time databases spend in various states
  • vfsio_per_db.stp: To track I/O by database through the Virtual File System (vfs) layer
  • blkio_per_db.stp: To track I/O by database through the block IO layer

schedtimes_per_db

This script tracks the time databases spend in various states. It also reports the time spend by non oracle database.

Usage:

$> stap -g ./schedtimes_per_db.stp <oracle uid> <refresh time ms>

Output example:

$> stap -g ./schedtimes_per_db.stp 54321 10000

------------------------------------------------------------------
DBNAME    :    run(us)  sleep(us) iowait(us) queued(us)  total(us)
------------------------------------------------------------------
NOT_A_DB  :     447327  200561911       1328     517522  201528088
BDT       :      42277  316189082          0      69355  316300714
VBDT      :      74426  326694570          0      77489  326846485


vfsio_per_db

This script tracks the database I/O through the VFS layer. It also reports the I/O in this layer for non oracle database.

Usage:

$> stap -g ./vfsio_per_db.stp <oracle uid> <refresh time ms>

Output example:

$> stap -g ./vfsio_per_db.stp 54321 10000

------------------------------------------------------------------------
DBNAME      : NB_READ   READ_KB   NB_WRITE  WRITE_KB  NB_TOTAL  TOTAL_KB
------------------------------------------------------------------------
BDTS        : 110       347       6         96        116       443
NOT_A_DB    : 89        11        2         0         91        11

blkio_per_db

This script tracks the database I/O through the block IO layer. It also reports the I/O in this layer for non oracle database.

Usage:

$> stap -g ./blkio_per_db.stp <oracle uid> <refresh time ms>

Output example:

$> stap -g ./blkio_per_db.stp 54321 10000

------------------------------------------------------------------------
DBNAME      : NB_READ   READ_KB   NB_WRITE  WRITE_KB  NB_TOTAL  TOTAL_KB
------------------------------------------------------------------------
BDTS        : 9690      110768    18        192       9708      110960
NOT_A_DB    : 0         0         6         560       6         560

Remarks

  • The schedtimes_per_db script is mainly inspired by this one (full credit goes to the authors).
  • Why is it  interesting to look at the vfs layer? Answers are in this awesome File System Latency series (see parts 1, 2, 3, 4 and 5) from Brendan Gregg.
  • In this post the word database stands for “all the foreground and background processes linked to an oracle database”.
  • In a consolidated environment, having a view per database can be very useful.

Conclusion

The toolkit has been created and 3 new members are part of it. Expect from it to grow a lot.

SystemTap and Oracle RDBMS: Aggregate by database

Introduction

The purpose of this post is to share a way to aggregate by Oracle database within SystemTap probes. Let’s describe a simple use case to make things clear.

Use Case

Let’s say that I want to get the number and the total size of TCP messages that have been sent and received by an Oracle database. To do so, let’s use 2 probes:

and fetch the command line of the processes that trigger the event thanks to the cmdline_str() function. In case of a process related to an oracle database, the cmdline_str() output would look like one of those 2:

  • ora_<something>_<instance_name>
  • oracle<instance_name> (LOCAL=<something>

So let’s write two embedded C functions to extract the Instance name from each of the 2 strings described above.

Functions

get_oracle_name_b:

For example, this function would extract BDT from “ora_dbw0_BDT” or any “ora_<something>_BDT” string.

The code is the following:

function get_oracle_name_b:string (mystr:string) %{
char *ptr;
int  ch = '_';
char *strargs = STAP_ARG_mystr;
ptr = strchr( strchr( strargs , ch) + 1 , ch);
snprintf(STAP_RETVALUE, MAXSTRINGLEN, "%s",ptr + 1);
%}

get_oracle_name_f:

For example, this function would extract BDT from “oracleBDT (LOCAL=NO)”, “oracleBDT (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))” or any “oracleBDT (LOCAL=<something>” string.

The code is the following:

function get_oracle_name_f:string (mystr:string) %{
char *ptr;
int    ch = ' ';
char substr_res[30];
char *strargs = STAP_ARG_mystr;
ptr = strchr( strargs, ch );
strncpy (substr_res,strargs+6, ptr - strargs - 6);
substr_res[ptr - strargs - 6]='\0';
snprintf(STAP_RETVALUE, MAXSTRINGLEN, "%s",substr_res);
%}

Having in mind that the SystemTap aggregation operator is “<<<” (as explained here) we can use those 2 functions to aggregate within the probes by Instance Name (passing as parameter the cmdline_str()) that way:

probe tcp.recvmsg
{

if ( isinstr(cmdline_str() , "ora_" ) == 1 && uid() == orauid) {
tcprecv[get_oracle_name_b(cmdline_str())] <<< size
} else if ( isinstr(cmdline_str() , "LOCAL=" ) == 1 && uid() == orauid) {
tcprecv[get_oracle_name_f(cmdline_str())] <<< size
} else {
tcprecv["NOT_A_DB"] <<< size
}
}

probe tcp.sendmsg
{

if ( isinstr(cmdline_str() , "ora_" ) == 1 && uid() == orauid) {
tcpsend[get_oracle_name_b(cmdline_str())] <<< size
} else if ( isinstr(cmdline_str() , "LOCAL=" ) == 1 && uid() == orauid) {
tcpsend[get_oracle_name_f(cmdline_str())] <<< size
} else {
tcpsend["NOT_A_DB"] <<< size
}
}

As you can see, non oracle database would be recorded and displayed as “NOT_A_DB”.

Based on this, the tcp_by_db.stp script has been created.

tcp_by_db.stp: script usage and output example

Usage:

$> stap -g ./tcp_by_db.stp <oracle uid> <refresh time ms>

Output Example:

$> stap -g ./tcp_by_db.stp 54321 10000
---------------------------------------------------------
NAME              NB_SENT   SENT_KB   NB_RECV   RECV_KB
---------------------------------------------------------
VBDT              5439      8231      10939     64154
NOT_A_DB          19        4         41        128
BDT               19        50        35        259

---------------------------------------------------------
NAME              NB_SENT   SENT_KB   NB_RECV   RECV_KB
---------------------------------------------------------
VBDT              267       109       391       2854
NOT_A_DB          102       19        116       680
BDT               26        55        47        326

---------------------------------------------------------
NAME              NB_SENT   SENT_KB   NB_RECV   RECV_KB
---------------------------------------------------------
VBDT              340       176       510       2940
NOT_A_DB          150       8         151       1165
BDT               42        77        61        423

Remarks:

  • The oracle uid on my machine is 54321
  • The refresh time has been set to 10 seconds
  • You can see the aggregation for 2 databases on my machine and also for all that is not an oracle database

Whole Script source code

The whole code is available at:

Conclusion

Thanks to the embedded C functions we have been able to aggregate by Oracle database within SystemTap probes.

Running 12.1.0.2 Oracle Database on Docker

Introduction

As you may have noticed, Oracle has released a few weeks ago Docker build files for the Oracle Database on Github and an associated blog post. Should you read the blog post, you would notice that you need to download manually the oracle binaries.

The purpose of this post is to provide a modified version of the original build files (focusing on the 12.1.0.2 database EE only), so that the oracle binaries are downloaded during the build of the image (thanks to Maris Elsins’s getMOSPatch script).

Create the 12.1.0.2 database docker image

The manual steps are:

Install the docker engine (You can find detailed instructions here), basically you have to add the yum repository and launch:

root@docker# yum install docker-engine

Change the maximum container size to 20 GB:

root@docker# docker daemon --storage-opt dm.basesize=20G

Clone those files from github:

root@docker# git clone https://github.com/bdrouvot/oracledb-docker.git

Update the oracledb-docker/Dockerfile file (ENV section only) with the appropriate values:

  • ORACLE_PWD=”<put_the_password_you_want>”
  • MOSU=”<your_oracle_support_username>”
  • MOSP=”<your_oracle_support_password>”

Build the Image:

root@docker# docker build --force-rm=true --no-cache=true -t oracle/database:12.1.0.2 .

That’s it, now we can:

Use the Image

Simply start a docker container that way:

root@docker# docker run -p 1521:1521 --name bdt12c oracle/database:12.1.0.2

The host that is running the docker engine is “docker”.  You can connect to the database as you would do normally, for example, using Oracle SQL Developper:

Screen Shot 2016-08-18 at 18.46.30

Note that the Hostname is “docker”, that is to say the one that is hosting the docker engine.

Remarks

  • At the time of this writing Oracle Database on Docker is NOT supported by Oracle. Use these files at your own discretion.
  • If you are interested in this kind of stuff, then you should also read Frits Hoogland’s blog post.
  • The Dockerfile used is very closed to the one provided by Oracle (Gerald Venzl). Only a few things have been modified to download the oracle binaries during the image creation.
  • Thanks to Maris Elsins for getMOSPatch.

Push oracle audit data into Elasticsearch and analyze/visualize it with Kibana

Introduction

Auditing the oracle database may lead to a wide variety of information.

What about having all this information centralized? What about having the possibility to gather, format, search, analyze and visualize this information in near real time?

To achieve this, let’s use the ELK stack:

We’ll focus on the audit information coming from:

  • The dba_audit_trail oracle view.
  • The audit files (linked to the audit_file_dest parameter).

You should first read this post: Push the oracle alert.log and listener.log into Elasticsearch and analyze/visualize their content with Kibana prior to this one. The reason is that the current post relies on it (as the current post gives less explanation about Installation, setup and so on).

Installation

The Installation of those 3 layers is the same as described into this blog post.

Configure logstash to push and format the dba_audit_trail records to elasticsearch the way we want to

To achieve this we’ll use the logstash’s JDBC input (Robin Moffatt provided an interesting use case and explanation of the logstash’s JDBC input into this blog post) so that:

  • The @timestamp field is reflecting the timestamp at which audit information has been recorded (rather than when logstash read the information).
  • It records the os_usernameusernameuserhostaction_namesessionidreturncodepriv_used and global_uid fields coming from the dba_audit_trail view into the elasticsearch.
  • It traps the kind of authentification (database, directory password..) and external name (if any) from the comment_text field of the dba_audit_trail view.

To trap and format this information, let’s create an audit_database.conf configuration file that looks like:

input {
    jdbc {
        jdbc_validate_connection => true
        jdbc_connection_string => "jdbc:oracle:thin:@localhost:1521/PBDT"
        jdbc_user => "system"
        jdbc_password => "bdtbdt"
        jdbc_driver_library => "/u01/app/oracle/product/11.2.0/dbhome_1/jdbc/lib/ojdbc6.jar"
        jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
        statement => "select os_username,username,userhost,timestamp,action_name,comment_text,sessionid,returncode,priv_used,global_uid from dba_audit_trail where timestamp > :sql_l
ast_value"
	schedule => "*/2 * * * *"
       }
}

filter {
    # Set the timestamp to the one of dba_audit_trail
    mutate { convert => [ "timestamp" , "string" ]}
    date { match => ["timestamp", "ISO8601"]}

    if [comment_text] =~ /(?i)Authenticated by/ {

    grok {
    match => [ "comment_text","^.*(?i)Authenticated by: (?<authenticated_by>.*?)\;.*$" ]
     }

    if [comment_text] =~ /(?i)EXTERNAL NAME/ {
    grok {
    match => [ "comment_text","^.*(?i)EXTERNAL NAME: (?<external_name>.*?)\;.*$" ]
     }
     }
    }

    # remove temporary fields
    mutate { remove_field => ["timestamp"] }
}

output   {
elasticsearch {
hosts => ["elk:9200"]
index => "audit_databases_oracle-%{+YYYY.MM.dd}"
}
}

so that an entry into dba_audit_trail like:

Screen Shot 2016-07-01 at 06.58.40

will be formatted and send to elasticsearch that way:

{
         "os_username" => "bdt",
            "username" => "ORG_USER",
            "userhost" => "bdts-MacBook-Pro.local",
         "action_name" => "LOGON",
        "comment_text" => "Authenticated by: DIRECTORY PASSWORD;EXTERNAL NAME: cn=bdt_dba,cn=users,dc=bdt,dc=com; Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.1)(PORT=49515))",
           "sessionid" => 171615.0,
          "returncode" => 0.0,
           "priv_used" => "CREATE SESSION",
          "global_uid" => "4773e70a9c6f4316be03169d8a06ecab",
            "@version" => "1",
          "@timestamp" => "2016-07-01T06:47:51.000Z",
    "authenticated_by" => "DIRECTORY PASSWORD",
       "external_name" => "cn=bdt_dba,cn=users,dc=bdt,dc=com"
}

Configure logstash to push and format the *.aud files content to elasticsearch the way we want to

So that:

  • The @timestamp field is reflecting the timestamp at which audit information has been recorded (rather than when logstash read the information).
  • It records the action, the database user, the privilege, the client user, the client terminal, the status and the dbid into the elasticsearch.

To trap and format this information, let’s create an audit_files.conf configuration file that looks like:

input {
        file {
                path => "/u01/app/oracle/admin/PBDT/adump/*.aud"
                }
        }

filter {

# Join lines based on the time
  multiline {
    pattern => "%{DAY} %{MONTH} *%{MONTHDAY} %{TIME} %{YEAR}.*"
    negate => true
    what => "previous"
  }

# Extract the date and the rest from the message
  grok {
    match => [ "message","%{DAY:day} %{MONTH:month} *%{MONTHDAY:monthday} %{TIME:time} %{YEAR:year}(?<audit_message>.*$)" ]
  }

  grok {
    match => [ "audit_message","^.*ACTION :\[[0-9]*\] (?<action>.*?)DATABASE USER:\[[0-9]*\] (?<database_user>.*?)PRIVILEGE :\[[0-9]*\] (?<privilege>.*?)CLIENT USER:\[[0-9]*\] (?<cl ient_user>.*?)CLIENT TERMINAL:\[[0-9]*\] (?<client_terminal>.*?)STATUS:\[[0-9]*\] (?<status>.*?)DBID:\[[0-9]*\] (?<dbid>.*$?)" ]
  }

if "_grokparsefailure" in [tags] { drop {} }

mutate {
       add_field => {
        "timestamp" => "%{year} %{month} %{monthday} %{time}"
       }
  }

# replace the timestamp by the one coming from the audit file
  date {
      locale => "en"
      match => [ "timestamp" , "yyyy MMM dd HH:mm:ss" ]
  }

  # remove temporary fields
  mutate { remove_field => ["audit_message","day","month","monthday","time","year","timestamp"] }

}

output {
elasticsearch {
hosts => ["elk:9200"]
index => "audit_databases_oracle-%{+YYYY.MM.dd}"
}
}

so that an audit file content like:

Fri Jul  1 07:13:56 2016 +02:00
LENGTH : '160'
ACTION :[7] 'CONNECT'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[6] 'oracle'
CLIENT TERMINAL:[5] 'pts/1'
STATUS:[1] '0'
DBID:[10] '3270644858'

will be formatted and send to elasticsearch that way:

{
            "message" => "Fri Jul  1 07:13:56 2016 +02:00\nLENGTH : '160'\nACTION :[7] 'CONNECT'\nDATABASE USER:[1] '/'\nPRIVILEGE :[6] 'SYSDBA'\nCLIENT USER:[6] 'oracle'\nCLIENT TERMINAL:[5] 'pts/1'\nSTATUS:[1] '0'\nDBID:[10] '3270644858'\n",
           "@version" => "1",
         "@timestamp" => "2016-07-01T07:13:56.000Z",
               "path" => "/u01/app/oracle/admin/PBDT/adump/PBDT_ora_2387_20160701071356285876143795.aud",
               "host" => "Dprima",
               "tags" => [
        [0] "multiline"
    ],
             "action" => "'CONNECT'\n",
      "database_user" => "'/'\n",
          "privilege" => "'SYSDBA'\n",
        "client_user" => "'oracle'\n",
    "client_terminal" => "'pts/1'\n",
             "status" => "'0'\n",
               "dbid" => "'3270644858'\n"
}

Analyze and Visualize the data with Kibana

The Kibana configuration has already been described into this blog post.

Let’s see 2 examples of audit data visualisation:

  • Example 1: thanks to the dba_audit_trail data, let’s graph the connection repartition to our databases by authentification type, username and returncode:

Screen Shot 2016-07-01 at 07.42.20

As we can see most of the connections are authenticated by Directory Password and are successful.

  • Example 2: thanks to the *.aud files data, let’s graph the sysdba connection over time and their status:

Screen Shot 2016-07-01 at 08.06.11

As we can see, some of the sysdba connections are not successful between 07:57 am and 7:58 am. Furthermore the number of unsuccessful connections is greater than the number of successful ones.

Conclusion

Thanks to the ELK stack you can gather, centralize, analyze and visualize the oracle audit data for your whole datacenter the way you want to.

Graphing Exadata cells metrics with Telegraf, InfluxDB and Grafana

Introduction

As a picture is worth a thousand words, we may want to visualise the Exadata cells metrics. What if you could do it the way you want? build your own graph? Let’s try to achieve this with 3 layers:

  • telegraf: to collect the Exadata metrics
  • InfluxDB: to store the time-series Exadata metrics
  • grafana: to visualise the Exadata metrics

You should first read this post: Graphing Oracle performance metrics with Telegraf, InfluxDB and Grafana prior to this one. The reason is that the current post relies on it (as the current post gives less explanation about Installation, setup and so on).

Installation

  • The Installation of those 3 layers is the same as described into this blog post.
  • Telegraf has been installed on one database node.
  • The dcli utility has also been copied on this database node.

Setup

The setup is the same as described into this blog post except for the script being used to collect the metrics:

  • The script collects the cells metrics from one database node and prints the output as InfluxDB line-protocol.
  • The script assumes passwordless ssh setup to connect from the database node to the cells.
  • It collects all metriccurrent metrics that are not cumulative: the dcli command being used is:
$> dcli -g ./cells_group cellcli -e "list metriccurrent attributes name,metricObjectName,metricValue,metricType where metricType !='cumulative'"

The perl script code is:

$> cat influxdb_exadata_metrics.pl
#!/usr/bin/env perl
#
# Author: Bertrand Drouvot
# influxdb_exadata_metrics.pl : V1.0 (2016/05)
#
# Utility used to extract exadata metrics in InfluxDB line protocol
#
#----------------------------------------------------------------#

BEGIN {
die "ORACLE_HOME not set\n" unless $ENV{ORACLE_HOME};
unless ($ENV{OrAcLePeRl}) {
$ENV{OrAcLePeRl} = "$ENV{ORACLE_HOME}/perl";
$ENV{PERL5LIB} = "$ENV{PERL5LIB}:$ENV{OrAcLePeRl}/lib:$ENV{OrAcLePeRl}/lib/site_perl";
$ENV{LD_LIBRARY_PATH} = "$ENV{LD_LIBRARY_PATH}:$ENV{ORACLE_HOME}/lib32:$ENV{ORACLE_HOME}/lib";
exec "$ENV{OrAcLePeRl}/bin/perl", $0, @ARGV;
}
}

use strict;
use Time::Local;

#
# Variables
#
my $nbmatch=-1;
my $help=0;
my $goodparam=0;

my $dclicomm='';
my $groupcell_pattern='';
my $metrictype_pattern='ALL';
my $result;

# Parameter parsing

foreach my $para (@ARGV) {

if ( $para =~ m/^help.*/i ) {
$nbmatch++;
$help=1;
}

if ( $para =~ m/^groupfile=(.*)$/i ) {
$nbmatch++;
$groupcell_pattern=$1;
$goodparam++;
}
}

# Check if groupfile is empty

if ((!$goodparam) | $goodparam > 1) {
print "\n Error while processing parameters : GROUPFILE parameter is mandatory! \n\n" unless ($help);
$help=1;
}

# Print usage if a difference exists between parameters checked
#
if ($nbmatch != $#ARGV | $help) {
print "\n Error while processing parameters \n\n" unless ($help);
print " \nUsage: $0 [groupfile=] \n\n";

printf (" %-25s %-60s %-10s \n",'Parameter','Comment','Default');
printf (" %-25s %-60s %-10s \n",'---------','-------','-------');
printf (" %-25s %-60s %-10s \n",'GROUPFILE=','file containing list of cells','');
print ("\n");
print ("utility assumes passwordless SSH from this node to the cell nodes\n");
print ("utility assumes ORACLE_HOME has been set \n");
print ("\n");
print ("Example : $0 groupfile=./cell_group\n");
print "\n\n";
exit 0;
}

# dcli command

$dclicomm="dcli -g ".$groupcell_pattern. " cellcli -e \"list metriccurrent attributes name,metricObjectName,metricValue,metricType where metricType !='cumulative'\"";

# Launch the dcli command
my $result=`$dclicomm` ;

if ( $? != 0 )
{
print "\n";
print "\n";
die "Something went wrong executing [$dclicomm]\n";
}

# Split the string into array
my @array_result = split(/\n/, $result);

foreach my $line ( @array_result ) {
# drop tab
$line =~ s/\t+/ /g;
# drop blanks
$line =~ s/\s+/ /g;

#Split each line on 6 pieces based on blanks
my @tab1 = split (/ +/,$line,6);

# Supress : from the cell name
$tab1[0] =~ s/://;

# Supress , from the value
$tab1[3] =~ s/,//g;

# Add "N/A" if no Unit
if ($tab1[5] eq "") {$tab1[5]="N/A"};

# Print
print "exadata_cell_metrics,cell=$tab1[0],metric_name=$tab1[1],metricObjectName=$tab1[2],metric_type=$tab1[5],metric_unit=$tab1[4] metric_value=$tab1[3]\n";
}

The output looks like (the output format is the InfluxDB line-protocol):

$> perl ./influxdb_exadata_metrics.pl groupfile=./cells_group

exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk01_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk02_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk03_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk04_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk05_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk06_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk07_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk08_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk09_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk10_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=FD_00_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=FD_01_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk01_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk02_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk03_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk04_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk05_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk06_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk07_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk08_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk09_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk10_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=FD_00_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=FD_01_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
.
.
.
  • The output has been cut for readability.
  • The groupfile parameter is mandatory: This file contains the list of the cells you want the metrics to be collected on.

On the database node, configure telegraf to execute the perl script with 60 seconds interval and send the output to InfluxDB. Edit the /etc/telegraf/telegraf.conf file so that it contains:

###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
urls = ["http://influxgraf:8086"] # required
  database = "telegraf" # required
  precision = "s"
  timeout = "5s"

###############################################################################
#                                  INPUTS                                     #
###############################################################################

# Exadata metrics
[[inputs.exec]]
  # Shell/commands array
  commands = ["/home/oracle/scripts/exadata_metrics.sh"]
  # Data format to consume. This can be "json", "influx" or "graphite" (line-protocol)
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"
  interval = "60s"

The exadata_metrics.sh script contains the call to the perl script:

$> cat exadata_metrics.sh
#!/bin/env bash

export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1
export PATH=$PATH:/home/oracle/scripts

perl /home/oracle/scripts/influxdb_exadata_metrics.pl groupfile=/home/oracle/scripts/cells_group

Now you can connect to grafana and create a new dashboard with the Exadata cells metrics the way you want to.

Example

Screen Shot 2016-05-05 at 16.37.40

Remark

Nothing has been installed on the cells to collect those metrics with telegraf.

Conclusion

Thanks to:

  • telegraf and InfluxDB, we are able to collect and store the Exadata cells metrics we want to.
  • grafana, we are able to visualise the Exadata cells metrics the way we want to.