Running 12.1.0.2 Oracle Database on Docker

Introduction

As you may have noticed, Oracle has released a few weeks ago Docker build files for the Oracle Database on Github and an associated blog post. Should you read the blog post, you would notice that you need to download manually the oracle binaries.

The purpose of this post is to provide a modified version of the original build files (focusing on the 12.1.0.2 database EE only), so that the oracle binaries are downloaded during the build of the image (thanks to Maris Elsins’s getMOSPatch script).

Create the 12.1.0.2 database docker image

The manual steps are:

Install the docker engine (You can find detailed instructions here), basically you have to add the yum repository and launch:

root@docker# yum install docker-engine

Change the maximum container size to 20 GB:

root@docker# docker daemon --storage-opt dm.basesize=20G

Clone those files from github:

root@docker# git clone https://github.com/bdrouvot/oracledb-docker.git

Update the oracledb-docker/Dockerfile file (ENV section only) with the appropriate values:

  • ORACLE_PWD=”<put_the_password_you_want>”
  • MOSU=”<your_oracle_support_username>”
  • MOSP=”<your_oracle_support_password>”

Build the Image:

root@docker# docker build --force-rm=true --no-cache=true -t oracle/database:12.1.0.2 .

That’s it, now we can:

Use the Image

Simply start a docker container that way:

root@docker# docker run -p 1521:1521 --name bdt12c oracle/database:12.1.0.2

The host that is running the docker engine is “docker”.  You can connect to the database as you would do normally, for example, using Oracle SQL Developper:

Screen Shot 2016-08-18 at 18.46.30

Note that the Hostname is “docker”, that is to say the one that is hosting the docker engine.

Remarks

  • At the time of this writing Oracle Database on Docker is NOT supported by Oracle. Use these files at your own discretion.
  • If you are interested in this kind of stuff, then you should also read Frits Hoogland’s blog post.
  • The Dockerfile used is very closed to the one provided by Oracle (Gerald Venzl). Only a few things have been modified to download the oracle binaries during the image creation.
  • Thanks to Maris Elsins for getMOSPatch.
Posted in Other | Leave a comment

Push oracle audit data into Elasticsearch and analyze/visualize it with Kibana

Introduction

Auditing the oracle database may lead to a wide variety of information.

What about having all this information centralized? What about having the possibility to gather, format, search, analyze and visualize this information in near real time?

To achieve this, let’s use the ELK stack:

We’ll focus on the audit information coming from:

  • The dba_audit_trail oracle view.
  • The audit files (linked to the audit_file_dest parameter).

You should first read this post: Push the oracle alert.log and listener.log into Elasticsearch and analyze/visualize their content with Kibana prior to this one. The reason is that the current post relies on it (as the current post gives less explanation about Installation, setup and so on).

Installation

The Installation of those 3 layers is the same as described into this blog post.

Configure logstash to push and format the dba_audit_trail records to elasticsearch the way we want to

To achieve this we’ll use the logstash’s JDBC input (Robin Moffatt provided an interesting use case and explanation of the logstash’s JDBC input into this blog post) so that:

  • The @timestamp field is reflecting the timestamp at which audit information has been recorded (rather than when logstash read the information).
  • It records the os_usernameusernameuserhostaction_namesessionidreturncodepriv_used and global_uid fields coming from the dba_audit_trail view into the elasticsearch.
  • It traps the kind of authentification (database, directory password..) and external name (if any) from the comment_text field of the dba_audit_trail view.

To trap and format this information, let’s create an audit_database.conf configuration file that looks like:

input {
    jdbc {
        jdbc_validate_connection => true
        jdbc_connection_string => "jdbc:oracle:thin:@localhost:1521/PBDT"
        jdbc_user => "system"
        jdbc_password => "bdtbdt"
        jdbc_driver_library => "/u01/app/oracle/product/11.2.0/dbhome_1/jdbc/lib/ojdbc6.jar"
        jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
        statement => "select os_username,username,userhost,timestamp,action_name,comment_text,sessionid,returncode,priv_used,global_uid from dba_audit_trail where timestamp > :sql_l
ast_value"
	schedule => "*/2 * * * *"
       }
}

filter {
    # Set the timestamp to the one of dba_audit_trail
    mutate { convert => [ "timestamp" , "string" ]}
    date { match => ["timestamp", "ISO8601"]}

    if [comment_text] =~ /(?i)Authenticated by/ {

    grok {
    match => [ "comment_text","^.*(?i)Authenticated by: (?<authenticated_by>.*?)\;.*$" ]
     }

    if [comment_text] =~ /(?i)EXTERNAL NAME/ {
    grok {
    match => [ "comment_text","^.*(?i)EXTERNAL NAME: (?<external_name>.*?)\;.*$" ]
     }
     }
    }

    # remove temporary fields
    mutate { remove_field => ["timestamp"] }
}

output   {
elasticsearch {
hosts => ["elk:9200"]
index => "audit_databases_oracle-%{+YYYY.MM.dd}"
}
}

so that an entry into dba_audit_trail like:

Screen Shot 2016-07-01 at 06.58.40

will be formatted and send to elasticsearch that way:

{
         "os_username" => "bdt",
            "username" => "ORG_USER",
            "userhost" => "bdts-MacBook-Pro.local",
         "action_name" => "LOGON",
        "comment_text" => "Authenticated by: DIRECTORY PASSWORD;EXTERNAL NAME: cn=bdt_dba,cn=users,dc=bdt,dc=com; Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.1)(PORT=49515))",
           "sessionid" => 171615.0,
          "returncode" => 0.0,
           "priv_used" => "CREATE SESSION",
          "global_uid" => "4773e70a9c6f4316be03169d8a06ecab",
            "@version" => "1",
          "@timestamp" => "2016-07-01T06:47:51.000Z",
    "authenticated_by" => "DIRECTORY PASSWORD",
       "external_name" => "cn=bdt_dba,cn=users,dc=bdt,dc=com"
}

Configure logstash to push and format the *.aud files content to elasticsearch the way we want to

So that:

  • The @timestamp field is reflecting the timestamp at which audit information has been recorded (rather than when logstash read the information).
  • It records the action, the database user, the privilege, the client user, the client terminal, the status and the dbid into the elasticsearch.

To trap and format this information, let’s create an audit_files.conf configuration file that looks like:

input {
        file {
                path => "/u01/app/oracle/admin/PBDT/adump/*.aud"
                }
        }

filter {

# Join lines based on the time
  multiline {
    pattern => "%{DAY} %{MONTH} *%{MONTHDAY} %{TIME} %{YEAR}.*"
    negate => true
    what => "previous"
  }

# Extract the date and the rest from the message
  grok {
    match => [ "message","%{DAY:day} %{MONTH:month} *%{MONTHDAY:monthday} %{TIME:time} %{YEAR:year}(?<audit_message>.*$)" ]
  }

  grok {
    match => [ "audit_message","^.*ACTION :\[[0-9]*\] (?<action>.*?)DATABASE USER:\[[0-9]*\] (?<database_user>.*?)PRIVILEGE :\[[0-9]*\] (?<privilege>.*?)CLIENT USER:\[[0-9]*\] (?<cl ient_user>.*?)CLIENT TERMINAL:\[[0-9]*\] (?<client_terminal>.*?)STATUS:\[[0-9]*\] (?<status>.*?)DBID:\[[0-9]*\] (?<dbid>.*$?)" ]
  }

if "_grokparsefailure" in [tags] { drop {} }

mutate {
       add_field => {
        "timestamp" => "%{year} %{month} %{monthday} %{time}"
       }
  }

# replace the timestamp by the one coming from the audit file
  date {
      locale => "en"
      match => [ "timestamp" , "yyyy MMM dd HH:mm:ss" ]
  }

  # remove temporary fields
  mutate { remove_field => ["audit_message","day","month","monthday","time","year","timestamp"] }

}

output {
elasticsearch {
hosts => ["elk:9200"]
index => "audit_databases_oracle-%{+YYYY.MM.dd}"
}
}

so that an audit file content like:

Fri Jul  1 07:13:56 2016 +02:00
LENGTH : '160'
ACTION :[7] 'CONNECT'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[6] 'oracle'
CLIENT TERMINAL:[5] 'pts/1'
STATUS:[1] '0'
DBID:[10] '3270644858'

will be formatted and send to elasticsearch that way:

{
            "message" => "Fri Jul  1 07:13:56 2016 +02:00\nLENGTH : '160'\nACTION :[7] 'CONNECT'\nDATABASE USER:[1] '/'\nPRIVILEGE :[6] 'SYSDBA'\nCLIENT USER:[6] 'oracle'\nCLIENT TERMINAL:[5] 'pts/1'\nSTATUS:[1] '0'\nDBID:[10] '3270644858'\n",
           "@version" => "1",
         "@timestamp" => "2016-07-01T07:13:56.000Z",
               "path" => "/u01/app/oracle/admin/PBDT/adump/PBDT_ora_2387_20160701071356285876143795.aud",
               "host" => "Dprima",
               "tags" => [
        [0] "multiline"
    ],
             "action" => "'CONNECT'\n",
      "database_user" => "'/'\n",
          "privilege" => "'SYSDBA'\n",
        "client_user" => "'oracle'\n",
    "client_terminal" => "'pts/1'\n",
             "status" => "'0'\n",
               "dbid" => "'3270644858'\n"
}

Analyze and Visualize the data with Kibana

The Kibana configuration has already been described into this blog post.

Let’s see 2 examples of audit data visualisation:

  • Example 1: thanks to the dba_audit_trail data, let’s graph the connection repartition to our databases by authentification type, username and returncode:

Screen Shot 2016-07-01 at 07.42.20

As we can see most of the connections are authenticated by Directory Password and are successful.

  • Example 2: thanks to the *.aud files data, let’s graph the sysdba connection over time and their status:

Screen Shot 2016-07-01 at 08.06.11

As we can see, some of the sysdba connections are not successful between 07:57 am and 7:58 am. Furthermore the number of unsuccessful connections is greater than the number of successful ones.

Conclusion

Thanks to the ELK stack you can gather, centralize, analyze and visualize the oracle audit data for your whole datacenter the way you want to.

Posted in Other | 4 Comments

Graphing Exadata cells metrics with Telegraf, InfluxDB and Grafana

Introduction

As a picture is worth a thousand words, we may want to visualise the Exadata cells metrics. What if you could do it the way you want? build your own graph? Let’s try to achieve this with 3 layers:

  • telegraf: to collect the Exadata metrics
  • InfluxDB: to store the time-series Exadata metrics
  • grafana: to visualise the Exadata metrics

You should first read this post: Graphing Oracle performance metrics with Telegraf, InfluxDB and Grafana prior to this one. The reason is that the current post relies on it (as the current post gives less explanation about Installation, setup and so on).

Installation

  • The Installation of those 3 layers is the same as described into this blog post.
  • Telegraf has been installed on one database node.
  • The dcli utility has also been copied on this database node.

Setup

The setup is the same as described into this blog post except for the script being used to collect the metrics:

  • The script collects the cells metrics from one database node and prints the output as InfluxDB line-protocol.
  • The script assumes passwordless ssh setup to connect from the database node to the cells.
  • It collects all metriccurrent metrics that are not cumulative: the dcli command being used is:
$> dcli -g ./cells_group cellcli -e "list metriccurrent attributes name,metricObjectName,metricValue,metricType where metricType !='cumulative'"

The perl script code is:

$> cat influxdb_exadata_metrics.pl
#!/usr/bin/env perl
#
# Author: Bertrand Drouvot
# influxdb_exadata_metrics.pl : V1.0 (2016/05)
#
# Utility used to extract exadata metrics in InfluxDB line protocol
#
#----------------------------------------------------------------#

BEGIN {
die "ORACLE_HOME not set\n" unless $ENV{ORACLE_HOME};
unless ($ENV{OrAcLePeRl}) {
$ENV{OrAcLePeRl} = "$ENV{ORACLE_HOME}/perl";
$ENV{PERL5LIB} = "$ENV{PERL5LIB}:$ENV{OrAcLePeRl}/lib:$ENV{OrAcLePeRl}/lib/site_perl";
$ENV{LD_LIBRARY_PATH} = "$ENV{LD_LIBRARY_PATH}:$ENV{ORACLE_HOME}/lib32:$ENV{ORACLE_HOME}/lib";
exec "$ENV{OrAcLePeRl}/bin/perl", $0, @ARGV;
}
}

use strict;
use Time::Local;

#
# Variables
#
my $nbmatch=-1;
my $help=0;
my $goodparam=0;

my $dclicomm='';
my $groupcell_pattern='';
my $metrictype_pattern='ALL';
my $result;

# Parameter parsing

foreach my $para (@ARGV) {

if ( $para =~ m/^help.*/i ) {
$nbmatch++;
$help=1;
}

if ( $para =~ m/^groupfile=(.*)$/i ) {
$nbmatch++;
$groupcell_pattern=$1;
$goodparam++;
}
}

# Check if groupfile is empty

if ((!$goodparam) | $goodparam > 1) {
print "\n Error while processing parameters : GROUPFILE parameter is mandatory! \n\n" unless ($help);
$help=1;
}

# Print usage if a difference exists between parameters checked
#
if ($nbmatch != $#ARGV | $help) {
print "\n Error while processing parameters \n\n" unless ($help);
print " \nUsage: $0 [groupfile=] \n\n";

printf (" %-25s %-60s %-10s \n",'Parameter','Comment','Default');
printf (" %-25s %-60s %-10s \n",'---------','-------','-------');
printf (" %-25s %-60s %-10s \n",'GROUPFILE=','file containing list of cells','');
print ("\n");
print ("utility assumes passwordless SSH from this node to the cell nodes\n");
print ("utility assumes ORACLE_HOME has been set \n");
print ("\n");
print ("Example : $0 groupfile=./cell_group\n");
print "\n\n";
exit 0;
}

# dcli command

$dclicomm="dcli -g ".$groupcell_pattern. " cellcli -e \"list metriccurrent attributes name,metricObjectName,metricValue,metricType where metricType !='cumulative'\"";

# Launch the dcli command
my $result=`$dclicomm` ;

if ( $? != 0 )
{
print "\n";
print "\n";
die "Something went wrong executing [$dclicomm]\n";
}

# Split the string into array
my @array_result = split(/\n/, $result);

foreach my $line ( @array_result ) {
# drop tab
$line =~ s/\t+/ /g;
# drop blanks
$line =~ s/\s+/ /g;

#Split each line on 6 pieces based on blanks
my @tab1 = split (/ +/,$line,6);

# Supress : from the cell name
$tab1[0] =~ s/://;

# Supress , from the value
$tab1[3] =~ s/,//g;

# Add "N/A" if no Unit
if ($tab1[5] eq "") {$tab1[5]="N/A"};

# Print
print "exadata_cell_metrics,cell=$tab1[0],metric_name=$tab1[1],metricObjectName=$tab1[2],metric_type=$tab1[5],metric_unit=$tab1[4] metric_value=$tab1[3]\n";
}

The output looks like (the output format is the InfluxDB line-protocol):

$> perl ./influxdb_exadata_metrics.pl groupfile=./cells_group

exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk01_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk02_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk03_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk04_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk05_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk06_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk07_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk08_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk09_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=CD_disk10_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=FD_00_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_BY_FC_DIRTY,metricObjectName=FD_01_cell12,metric_type=Instantaneous,metric_unit=MB metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk01_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk02_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk03_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk04_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk05_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk06_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk07_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk08_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk09_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=CD_disk10_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=FD_00_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
exadata_cell_metrics,cell=cell1,metric_name=CD_IO_BY_R_LG_SEC,metricObjectName=FD_01_cell12,metric_type=Rate,metric_unit=MB/sec metric_value=0.000
.
.
.
  • The output has been cut for readability.
  • The groupfile parameter is mandatory: This file contains the list of the cells you want the metrics to be collected on.

On the database node, configure telegraf to execute the perl script with 60 seconds interval and send the output to InfluxDB. Edit the /etc/telegraf/telegraf.conf file so that it contains:

###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################

# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
urls = ["http://influxgraf:8086"] # required
  database = "telegraf" # required
  precision = "s"
  timeout = "5s"

###############################################################################
#                                  INPUTS                                     #
###############################################################################

# Exadata metrics
[[inputs.exec]]
  # Shell/commands array
  commands = ["/home/oracle/scripts/exadata_metrics.sh"]
  # Data format to consume. This can be "json", "influx" or "graphite" (line-protocol)
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"
  interval = "60s"

The exadata_metrics.sh script contains the call to the perl script:

$> cat exadata_metrics.sh
#!/bin/env bash

export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1
export PATH=$PATH:/home/oracle/scripts

perl /home/oracle/scripts/influxdb_exadata_metrics.pl groupfile=/home/oracle/scripts/cells_group

Now you can connect to grafana and create a new dashboard with the Exadata cells metrics the way you want to.

Example

Screen Shot 2016-05-05 at 16.37.40

Remark

Nothing has been installed on the cells to collect those metrics with telegraf.

Conclusion

Thanks to:

  • telegraf and InfluxDB, we are able to collect and store the Exadata cells metrics we want to.
  • grafana, we are able to visualise the Exadata cells metrics the way we want to.
Posted in Exadata | Leave a comment

Oracle Unified Directory and user security: quick demo

Introduction

Oracle Unified Directory (OUD) can be used to centrally manage database users across the enterprise.

It allows us to manage roles and privileges across various databases registered with the directory.

Users connect to the database by providing credentials that are stored in Oracle Unified Directory, then the database executes LDAP search operations to query user specific authentication and authorization information.

This post does not cover the OUD installation.

Setup Steps

So, once the OUD has been installed the remaining steps are:

  1. Register the database into the OUD.
  2. Create global roles and global users into the database.
  3. Create groups and users into the OUD.
  4. Link OUD groups with databases roles.

Let’s setup.

Step 1: Register the database into the OUD

>$ cat $ORACLE_HOME/network/admin/ldap.ora
DIRECTORY_SERVERS=(oud:1389:1636)
DEFAULT_ADMIN_CONTEXT="dc=bdt,dc=com"
DIRECTORY_SERVER_TYPE=OID

>$ cat register_database_oud.ksh
dbca -silent -configureDatabase -sourceDB PBDT -registerWithDirService true -dirServiceUserName "cn=orcladmin" -dirServicePassword "bdtbdt" -walletPassword "monster123#"

>$ ksh ./register_database_oud.ksh
Preparing to Configure Database
6% complete
13% complete
66% complete
Completing Database Configuration
100% complete
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/PBDT/PBDT20.log" for further details.

Step 2: Create global roles and global users into the database

SQL> !cat prepare_oud_users.sql
create role org_dba identified globally;
grant dba to org_dba;
create role org_connect identified globally;
grant create session to org_connect;
create user org_user identified globally;

SQL> @prepare_oud_users.sql

Role created.


Grant succeeded.


Role created.


Grant succeeded.


User created.

As you can see:

  • DBA has been granted to the ORG_DBA role.
  • CREATE SESSION has been granted to the ORG_CONNECT role.
  • The user ORG_USER has not been granted any privileges.

Step 3: Create groups and users into the OUD

As an example, a ldif file has been created to:

  • create 2 users: bdt_dba and bdt_connect.
  • create 2 groups: DBA_GROUP and CONNECT_GROUP.
  • assign bdt_dba to the DBA_GROUP and bdt_connect to the CONNECT_GROUP.
>$ cat newgroupsusers.ldif
dn: cn=groups,dc=bdt,dc=com
changetype: add
objectclass: top
objectclass: groupOfNames
cn: groups

dn: cn=users,dc=bdt,dc=com
changetype: add
objectclass: top
objectclass: groupOfNames
cn: users

dn: cn=bdt_connect,cn=users,dc=bdt,dc=com
changetype: add
objectclass: top
objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
cn: bdt_connect
sn: bdt_connect
uid: bdt_connect
userpassword: bdtconnect

dn: cn=CONNECT_GROUP,cn=groups,dc=bdt,dc=com
changetype: add
objectclass: top
objectclass: groupOfNames
cn: CONNECT_GROUP
member: cn=bdt_connect,cn=users,dc=bdt,dc=com

dn: cn=bdt_dba,cn=users,dc=bdt,dc=com
changetype: add
objectclass: top
objectclass: person
objectclass: organizationalPerson
objectclass: inetOrgPerson
cn: bdt_dba
sn: bdt_dba
uid: bdt_dba
userpassword: bdtdba

dn: cn=DBA_GROUP,cn=groups,dc=bdt,dc=com
changetype: add
objectclass: top
objectclass: groupOfNames
cn: DBA_GROUP
member: cn=bdt_dba,cn=users,dc=bdt,dc=com

and then launch:

$> cat create_ldap_groups_users.ksh
ldapadd -h oud -p 1389 -D "cn=orcladmin" -w bdtbdt -f ./newgroupsusers.ldif -v

$> ksh ./create_ldap_groups_users.ksh

So that the users and groups have been created into the OUD.

A graphical view of what has been done into the OUD (thanks to the Apache Directory Studio) is:

Screen Shot 2016-04-30 at 13.48.21

Step 4: Link OUD groups with database roles.

>$ ksh ./mapdb_ldap.ksh
+ echo 'Mapping User'
Mapping User
+ eusm createMapping database_name=PBDT realm_dn='dc=bdt,dc=com' map_type=SUBTREE map_dn='cn=users,dc=bdt,dc=com' schema=ORG_USER ldap_host=oud ldap_port=1389 ldap_user_dn='cn=orcladmin' ldap_user_password=bdtbdt
+ echo 'Create Enterprise role'
Create Enterprise role
+ eusm createRole enterprise_role=PBDT_dba_role domain_name=OracleDefaultDomain realm_dn='dc=bdt,dc=com' ldap_host=oud ldap_port=1389 ldap_user_dn='cn=orcladmin' ldap_user_password=bdtbdt
+ eusm createRole enterprise_role=PBDT_connect_role domain_name=OracleDefaultDomain realm_dn='dc=bdt,dc=com' ldap_host=oud ldap_port=1389 ldap_user_dn='cn=orcladmin' ldap_user_password=bdtbdt
+ echo 'Link Roles'
Link Roles
+ eusm addGlobalRole enterprise_role=PBDT_dba_role domain_name=OracleDefaultDomain realm_dn='dc=bdt,dc=com' database_name=PBDT global_role=ORG_DBA dbuser=system dbuser_password=bdtbdt dbconnect_string=dprima:1521:PBDT ldap_host=oud ldap_port=1389 ldap_user_dn='cn=orcladmin' ldap_user_password=bdtbdt
+ eusm addGlobalRole enterprise_role=PBDT_connect_role domain_name=OracleDefaultDomain realm_dn='dc=bdt,dc=com' database_name=PBDT global_role=ORG_CONNECT dbuser=system dbuser_password=bdtbdt dbconnect_string=dprima:1521:PBDT ldap_host=oud ldap_port=1389 ldap_user_dn='cn=orcladmin' ldap_user_password=bdtbdt
+ echo 'Grant Roles'
Grant Roles
+ eusm grantRole enterprise_role=PBDT_dba_role domain_name=OracleDefaultDomain realm_dn='dc=bdt,dc=com' group_dn='cn=DBA_GROUP,cn=groups,dc=bdt,dc=com' ldap_host=oud ldap_port=1389 ldap_user_dn='cn=orcladmin' ldap_user_password=bdtbdt
+ eusm grantRole enterprise_role=PBDT_connect_role domain_name=OracleDefaultDomain realm_dn='dc=bdt,dc=com' group_dn='cn=CONNECT_GROUP,cn=groups,dc=bdt,dc=com' ldap_host=oud ldap_port=1389 ldap_user_dn='cn=orcladmin' ldap_user_password=bdtbdt

So that, for this database only, there is a mapping between:

  • The database ORG_DBA role (created in step 2) and the OUD DBA_GROUP group (created in step 3).
  • The database ORG_CONNECT role (created in step 2) and the OUD CONNECT_GROUP group (created in step 3).

Authentication and authorization results:

You can view the result into this video:

As you can see:

  • There is no bdt_dba nor bdt_connect oracle users into the database.
  • I logged in with the bdt_dba OUD user, then was connected as ORG_USER into the database and have been able to query the dba_users view.
  • I logged in with the bdt_connect OUD user, then was connected as ORG_USER into the database and (as expected) have not been able to query the dba_users view due to the lack of permission.

Remark

Frank Van Bortel already covered this subject into this blog post.

Posted in Other | Leave a comment

Push the oracle alert.log and listener.log into Elasticsearch and analyze/visualize their content with Kibana

Introduction

The oracle alert.log and listener.log contain useful information to provide answer to questions like:

  • When did the Instance start?
  • When has the Instance been shutdown?
  • When did ORA- occur? With which code?
  • Which IP client did connect to the Instance? With which user?
  • How did it connect? Through a service? Through the SID?
  • Which program has been used to connect?
  • A connection storm occurred, what is the source of it?

What about having all this information centralized? What about having the possibility to gather, format, search, analyze and visualize this information in real time?

To achieve this, let’s use the ELK stack:

Installation

The installation is very simple.

  • Install elasticsearch
[root@elk ~]# wget https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.2.1/elasticsearch-2.2.1.rpm
[root@elk ~]# yum localinstall elasticsearch-2.2.1.rpm
  • Edit the configuration file to mention on which host it has been installed (namely elk in my case):
[root@elk ~]# grep network.host /etc/elasticsearch/elasticsearch.yml
network.host: elk
  • Start elasticsearch:
[root@elk ~]# /etc/init.d/elasticsearch start
Starting elasticsearch: [ OK ]
  • Install Kibana:
[root@elk ~]# wget https://download.elastic.co/kibana/kibana/kibana-4.4.2-linux-x64.tar.gz
[root@elk ~]# tar -xf kibana-4.4.2-linux-x64.tar.gz --directory /opt
[root@elk ~]# mv /opt/kibana-4.4.2-linux-x64 /opt/kibana
  • Edit the configuration file so that the url is updated accordingly:
[root@elk ~]# grep elasticsearch.url /opt/kibana/config/kibana.yml
elasticsearch.url: "http://elk:9200"
  • Start Kibana:
[root@elk ~]# /opt/kibana/bin/kibana
  • Install logstash on the oracle host (namely dprima in my case):
[root@dprima ~]# wget https://download.elastic.co/logstash/logstash/packages/centos/logstash-2.2.2-1.noarch.rpm
[root@dprima ~]# yum localinstall logstash-2.2.2-1.noarch.rpm

Configure logstash to push and format the alert.log to elasticsearch the way we want to

So that:

  • The @timestamp field is reflecting the timestamp at which the log entry was created (rather than when logstash read the log entry).
  • It traps ORA- entries and creates a field ORA- when it occurs.
  • It traps the start of the Instance (and fill a field oradb_status accordingly).
  • It traps the shutdown of the Instance (and fill a field oradb_status accordingly).
  • It traps the fact that the Instance is running (and fill a field oradb_status accordingly).

New fields are being created so that we can analyze/visualize them later on with Kibana.

  • To trap and format this information, let’s create an alert_log.conf configuration file that looks like (the filter part contains the important stuff):
input {
  file {
      path => "/u01/app/oracle/diag/rdbms/pbdt/PBDT/trace/alert_PBDT.log"
  }
}

filter {

# Join lines based on the time
  multiline {
    pattern => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
    negate => true
    what => "previous"
  }

# Create new field: oradb_status: starting,running,shutdown
 if [message] =~ /Starting ORACLE instance/ {
    mutate {
        add_field => [ "oradb_status", "starting" ]
    }
 } else if [message] =~ /Instance shutdown complete/ {
    mutate {
        add_field => [ "oradb_status", "shutdown" ]
    }
 } else {
      mutate {
        add_field => [ "oradb_status", "running" ]
    }
 }

# Search for ORA- and create field if match

if [message] =~ /ORA-/ {
 grok {
   match => [ "message","(?<ORA->ORA-[0-9]*)" ]
 }
}

# Extract the date and the rest from the message
  grok {
    match => [ "message","%{DAY:day} %{MONTH:month} %{MONTHDAY:monthday} %{TIME:time} %{YEAR:year}(?<log_message>.*$)" ]
  }

  mutate {
       add_field => {
        "timestamp" => "%{year} %{month} %{monthday} %{time}"
       }
  }
# replace the timestamp by the one coming from the alert.log
  date {
      locale => "en"
      match => [ "timestamp" , "yyyy MMM dd HH:mm:ss" ]
  }

# replace the message (remove the date)
  mutate { replace => [ "message", "%{log_message}" ]  }

  mutate {
      remove_field => [ "time" ,"month","monthday","year","timestamp","day","log_message"]
  }

}

output {
elasticsearch {
hosts => ["elk:9200"]
index => "oracle-%{+YYYY.MM.dd}"
}
}
  • Start logstash with this configuration file:
[root@dprima ~]# /opt/logstash/bin/logstash -f /etc/logstash/conf.d/alert_log.conf
  • So that for example an entry in the alert.log file like:
Sat Mar 26 08:30:26 2016
ORA-1653: unable to extend table SYS.BDT by 8 in                 tablespace BDT

will be formatted and send to elasticsearch that way:

{
         "message" => "\nORA-1653: unable to extend table SYS.BDT by 8 in                 tablespace BDT ",
        "@version" => "1",
      "@timestamp" => "2016-03-26T08:30:26.000Z",
            "path" => "/u01/app/oracle/diag/rdbms/pbdt/PBDT/trace/alert_PBDT.log",
            "host" => "Dprima",
            "tags" => [
        [0] "multiline"
    ],
    "oradb_status" => "running",
            "ORA-" => "ORA-1653"
}

Configure logstash to push and format the listener.log to elasticsearch the way we want to

So that:

  • The @timestamp field is reflecting the timestamp at which the log entry was created (rather than when logstash read the log entry).
  • It traps the connections and records the program into a dedicated field program.
  • It traps the connections and records the user into a dedicated field user.
  • It traps the connections and records the ip of the client into a dedicated field ip_client.
  • It traps the connections and records the destination into a dedicated field dest.
  • It traps the connections and records the destination type (SID or service_name) into a dedicated field dest_type.
  • It traps the command (stop, status, reload) and records it into a dedicated field command.

New fields are being created so that we can analyze/visualize them later on with Kibana.

  • To trap and format this information, let’s create a lsnr_log.conf configuration file that looks like (the filter part contains the important stuff):
input {
 file {
   path => "/u01/app/oracle/diag/tnslsnr/Dprima/listener/trace/listener.log"
  }
}

filter {

 if [message] =~ /(?i)CONNECT_DATA/ {

  # Extract the date and the rest from the message
  grok {
    match => [ "message","(?<the_date>.*%{TIME})(?<lsnr_message>.*$)" ]
  }

  # Extract COMMAND (like status,reload,stop) and add a field
  if [message] =~ /(?i)COMMAND=/ {

   grok {
   match => [ "lsnr_message","^.*(?i)COMMAND=(?<command>.*?)\).*$" ]
   }

  } else {

  # Extract useful Info (USER,PROGRAM,IPCLIENT) and add fields
   grok {
   match => [ "lsnr_message","^.*PROGRAM=(?<program>.*?)\).*USER=(?<user>.*?)\).*ADDRESS.*HOST=(?<ip_client>%{IP}).*$" ]
   }
  }

  # replace the timestamp by the one coming from the listener.log
  date {
      locale => "en"
      match => [ "the_date" , "dd-MMM-yyyy HH:mm:ss" ]
  }

  # replace the message (remove the date)
  mutate { replace => [ "message", "%{lsnr_message}" ]  }

  # remove temporary fields
  mutate { remove_field => [ "the_date","lsnr_message"] }

  # search for SID or SERVICE_NAME, collect dest and add dest type
  if [message] =~ /(?i)SID=/ {
  grok { match => [ "message","^.*(?i)SID=(?<dest>.*?)\).*$" ] }
  mutate { add_field => [ "dest_type", "SID" ] }
  }

  if [message] =~ /(?i)SERVICE_NAME=/ {
  grok { match => [ "message","^.*(?i)SERVICE_NAME=(?<dest>.*?)\).*$" ] }
  mutate { add_field => [ "dest_type", "SERVICE" ] }
  }

  } else {
   drop {}
  }
}

output {
elasticsearch {
hosts => ["elk:9200"]
index => "oracle-%{+YYYY.MM.dd}"
}
}
  • Start logstash with this configuration file:
[root@Dprima conf.d]# /opt/logstash/bin/logstash -f /etc/logstash/conf.d/lsnr_log.conf
  • So that for example an entry in the listener.log file like:
26-MAR-2016 08:34:57 * (CONNECT_DATA=(SID=PBDT)(CID=(PROGRAM=SQL Developer)(HOST=__jdbc__)(USER=bdt))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.1)(PORT=50379)) * establish * PBDT * 0

will be formatted and send to elasticsearch that way:

{
       "message" => " * (CONNECT_DATA=(SID=PBDT)(CID=(PROGRAM=SQL Developer)(HOST=__jdbc__)(USER=bdt))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.1)(PORT=50380)) * establish * PBDT * 0",
      "@version" => "1",
    "@timestamp" => "2016-03-26T08:34:57.000Z",
          "path" => "/u01/app/oracle/diag/tnslsnr/Dprima/listener/trace/listener.log",
          "host" => "Dprima",
       "program" => "SQL Developer",
          "user" => "bdt",
     "ip_client" => "192.168.56.1",
          "dest" => "PBDT",
     "dest_type" => "SID"
}

Analyze and Visualize the data with Kibana

  • Connect to the elk host, (http://elk:5601) and create an index pattern (Pic 1):

elk-index-pattern

  • Check that all our custom fields have been indexed (this is the default behaviour) (Pic 2):

all_indices

so that we can now visualize them.

  • Example 1: thanks to the listener.log data, let’s graph the connection repartition to our databases by program and by dest_type (Pic 3):

kibana_example

  • Example 2: thanks to the listener.log data, visualize when a connection “storm” occurred and where it came from (ip_client field):

storm

Remarks

  • As you can see (into the Pic 2) the index on the program field has not been analyzed. By doing so, a connection to the database with “SQL Developer” will be stored in the index as “SQL Developer” and this is what we want. While an analyzed index would have stored 2 distincts values (“SQL” and “Developer”). The same apply for the ORA- field: ORA-1653 would store ORA and 1653 if analyzed (This is why it is specified as not analyzed as well). You can find more details here.
  • To get the indexes on the program and ORA- fields not analyzed, a template has been created that way:
curl -XDELETE elk:9200/oracle*

curl -XPUT elk:9200/_template/oracle_template -d ' {
 "template" : "oracle*",
 "settings" : {
   "number_of_shards" : 1
   },
 "mappings" : {
  "oracle" : {
   "properties" : {
    "program" : {"type" : "string", "index": "not_analyzed" },
    "ORA-" : {"type" : "string", "index": "not_analyzed" }
    }
   }
  }
}'
  • The configuration files are using grok. You can find more information about it here.
  • You can find much more informations about the ELK stack (and another way to use it) into this blog post from Robin Moffatt
  • All you need to do to visualize the data is to extract the fields of interest from the log files and be sure an index is created on each field you want to visualize.
  • All this information coming from all the machines of a datacenter being centralized into a single place is a gold mine from my point of view.

Conclusion

Thanks to the ELK stack you can gather, centralize, analyze and visualize the content of the alert.log and listener.log files for your whole datacenter the way you want to. This information is a gold mine and the imagination is the only limit.

Posted in Other | 3 Comments

Graphing Oracle performance metrics with Telegraf, InfluxDB and Grafana

Introduction

As a picture is worth a thousand words, we may want to visualise the oracle performance metrics. There is already several tools to do so, but what if you could do it the way you want? build your own graph? Let’s try to achieve this with 3 layers:

  • telegraf: to collect the oracle metrics
  • InfluxDB: to store the time-series oracle metrics
  • grafana: to visualise the oracle time-series metrics

Installation

Let’s install InfluxDB and grafana that way (for example on the same host, namely influxgraf):

  • Install InfluxDB (detail here):
[root@influxgraf ~]# wget https://s3.amazonaws.com/influxdb/influxdb-0.10.2-1.x86_64.rpm

[root@influxgraf ~]# yum localinstall influxdb-0.10.2-1.x86_64.rpm
  • Install grafana (detail here):
[root@influxgraf ~]# yum install https://grafanarel.s3.amazonaws.com/builds/grafana-2.6.0-1.x86_64.rpm

Let’s install telegraf on the oracle host (namely Dprima) that way (detail here):

[root@Dprima ~]# wget http://get.influxdb.org/telegraf/telegraf-0.10.4.1-1.x86_64.rpm

[root@Dprima ~]# yum localinstall telegraf-0.10.4.1-1.x86_64.rpm

Setup

  • Create a telegraf database into InfluxDB (using the web interface):

Screen Shot 2016-03-05 at 09.06.10

  • Create a root user into InfluxDB (using the web interface):

Screen Shot 2016-03-05 at 09.08.38

  • Write a script to collect the oracle metrics. I am using python but this is not mandatory at all. Only the output of the script does matter, it has to be InfluxDB line-protocol. The script query the v$sysmetric and v$eventmetric views to get the wait class and the wait event metrics during the last minute. I am not reinventing the wheel, I am using Kyle Hailey‘s queries. The python code is:
[oracle@Dprima scripts]$ cat oracle_metrics.py
import os
import sys
import cx_Oracle
import argparse
import re

class OraMetrics():
    def __init__(self, user, passwd, sid):
        import cx_Oracle
        self.user = user
        self.passwd = passwd
        self.sid = sid
        self.connection = cx_Oracle.connect( self.user , self.passwd , self.sid )
        cursor = self.connection.cursor()
        cursor.execute("select HOST_NAME from v$instance")
        for hostname in cursor:
            self.hostname = hostname[0]

    def waitclassstats(self, user, passwd, sid):
        cursor = self.connection.cursor()
        cursor.execute("""
        select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS
        from   v$waitclassmetric  m, v$system_wait_class n
        where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle'
        union
        select  'CPU', round(value/100,3) AAS
        from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2
        union select 'CPU_OS', round((prcnt.busy*parameter.cpu_count)/100,3) - aas.cpu
        from
            ( select value busy
                from v$sysmetric
                where metric_name='Host CPU Utilization (%)'
                and group_id=2 ) prcnt,
                ( select value cpu_count from v$parameter where name='cpu_count' )  parameter,
                ( select  'CPU', round(value/100,3) cpu from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2) aas
        """)
        for wait in cursor:
            wait_name = wait[0]
            wait_value = wait[1]
            print "oracle_wait_class,host=%s,db=%s,wait_class=%s wait_value=%s" % (self.hostname, sid,re.sub(' ', '_', wait_name), wait_value )

    def waitstats(self, user, passwd, sid):
        cursor = self.connection.cursor()
        cursor.execute("""
	select
	n.wait_class wait_class,
       	n.name wait_name,
       	m.wait_count cnt,
       	round(10*m.time_waited/nullif(m.wait_count,0),3) avgms
	from v$eventmetric m,
     	v$event_name n
	where m.event_id=n.event_id
  	and n.wait_class <> 'Idle' and m.wait_count > 0 order by 1 """)
        for wait in cursor:
         wait_class = wait[0]
         wait_name = wait[1]
         wait_cnt = wait[2]
         wait_avgms = wait[3]
         print "oracle_wait_event,host=%s,db=%s,wait_class=%s,wait_event=%s count=%s,latency=%s" % (self.hostname, sid,re.sub(' ', '_', wait_class),re.sub(' ', '_', wait_name)
, wait_cnt,wait_avgms)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--user', help="Username", required=True)
    parser.add_argument('-p', '--passwd', required=True)
    parser.add_argument('-s', '--sid', help="SID", required=True)

    args = parser.parse_args()

    stats = OraMetrics(args.user, args.passwd, args.sid)
    stats.waitclassstats(args.user, args.passwd, args.sid)
    stats.waitstats(args.user, args.passwd, args.sid)

The output looks like (the output format is the InfluxDB line-protocol):

[oracle@Dprima scripts]$ python "/home/oracle/scripts/oracle_metrics.py" "-u" "system" "-p" "bdtbdt" "-s" PBDT
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Administrative wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=CPU wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=CPU_OS wait_value=0.035
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Commit wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Concurrency wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Configuration wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Network wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Other wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=Scheduler wait_value=0
oracle_wait_class,host=Dprima,db=PBDT,wait_class=System_I/O wait_value=0.005
oracle_wait_class,host=Dprima,db=PBDT,wait_class=User_I/O wait_value=0
oracle_wait_event,host=Dprima,db=PBDT,wait_class=System_I/O,wait_event=control_file_sequential_read count=163,latency=0.009
oracle_wait_event,host=Dprima,db=PBDT,wait_class=System_I/O,wait_event=control_file_parallel_write count=60,latency=3.933
oracle_wait_event,host=Dprima,db=PBDT,wait_class=System_I/O,wait_event=log_file_parallel_write count=60,latency=1.35
oracle_wait_event,host=Dprima,db=PBDT,wait_class=User_I/O,wait_event=Disk_file_operations_I/O count=16,latency=0.037
oracle_wait_event,host=Dprima,db=PBDT,wait_class=User_I/O,wait_event=Parameter_File_I/O count=16,latency=0.004
  • On the oracle host, configure telegraf to execute the python script with 60 seconds interval and send the output to InfluxDB. Edit the /etc/telegraf/telegraf.conf file so that it contains:
###############################################################################
#                                  OUTPUTS                                    #
###############################################################################

# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
  urls = ["http://influxgraf:8086"] # required
  database = "telegraf" # required
  precision = "s"
  timeout = "5s"


###############################################################################
#                                  INPUTS                                     #
###############################################################################

# Oracle metrics
[[inputs.exec]]
  # Shell/commands array
  commands = ["/home/oracle/scripts/oracle_metrics.sh"]
  # Data format to consume. This can be "json", "influx" or "graphite" (line-protocol)
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"
  interval = "60s"

The oracle_metrics.sh script contains the call to the python script:

#!/bin/env bash

export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0/dbhome_1//lib
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1/

python "/home/oracle/scripts/oracle_metrics.py" "-u" "system" "-p" "bdtbdt" "-s" PBDT
  • Launch telegraf:
[root@Dprima ~]# telegraf -config /etc/telegraf/telegraf.conf

Graph the metrics

  • First check that the metrics are stored the way we want into InfluxDB:

Screen Shot 2016-03-05 at 09.49.09

 

  • Configure InfluxDB as datasource in grafana:

Screen Shot 2016-03-05 at 10.03.55

  • In grafana, create a dashboard and create some variables (hosts, db and wait_class):

Screen Shot 2016-03-05 at 10.59.00

Screen Shot 2016-03-05 at 10.59.57

  • Now let’s create a graph:

Screen Shot 2016-03-05 at 11.04.09

  • Get the metrics:

Screen Shot 2016-03-05 at 11.06.18

  • So that the graph looks like:

Screen Shot 2016-03-05 at 11.20.18

Remarks

  • Thanks to the variables defined in grafana, we can filter on the host, database and wait class (instead of visualising all of them).
  • We can also graph the wait events we collected, or whatever you want to collect.

Conclusion

Thanks to:

  • telegraf and InfluxDB, we are able to collect and store the metrics we want to.
  • grafana, we are able to visualise the metrics the way we want to.

Update 04/21/2016:

As another example, with more informations collected, we can create a dashboard that looks like:

dash1

dash2

dash3

dash4

dash5

New version of oracle_metrics.py used to collect the related metrics:

import os
import sys
import cx_Oracle
import argparse
import re

class OraMetrics():
    def __init__(self, user, passwd, sid):
        import cx_Oracle
        self.user = user
        self.passwd = passwd
        self.sid = sid
        self.connection = cx_Oracle.connect( self.user , self.passwd , self.sid )
        cursor = self.connection.cursor()
        cursor.execute("select HOST_NAME from v$instance")
        for hostname in cursor:
            self.hostname = hostname[0]

    def waitclassstats(self, user, passwd, sid):
        cursor = self.connection.cursor()
        cursor.execute("""
        select n.wait_class, round(m.time_waited/m.INTSIZE_CSEC,3) AAS
        from   v$waitclassmetric  m, v$system_wait_class n
        where m.wait_class_id=n.wait_class_id and n.wait_class != 'Idle'
        union
        select  'CPU', round(value/100,3) AAS
        from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2
        union select 'CPU_OS', round((prcnt.busy*parameter.cpu_count)/100,3) - aas.cpu
        from
            ( select value busy
                from v$sysmetric
                where metric_name='Host CPU Utilization (%)'
                and group_id=2 ) prcnt,
                ( select value cpu_count from v$parameter where name='cpu_count' )  parameter,
                ( select  'CPU', round(value/100,3) cpu from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2) aas
        """)
        for wait in cursor:
            wait_name = wait[0]
            wait_value = wait[1]
            print "oracle_wait_class,host={0},db={1},wait_class={2} wait_value={3}".format(self.hostname,sid,re.sub(' ', '_',wait_name),wait_value)

    def waitstats(self, user, passwd, sid):
        cursor = self.connection.cursor()
        cursor.execute("""
	select
	n.wait_class wait_class,
       	n.name wait_name,
       	m.wait_count cnt,
       	round(10*m.time_waited/nullif(m.wait_count,0),3) avgms
	from v$eventmetric m,
     	v$event_name n
	where m.event_id=n.event_id
  	and n.wait_class <> 'Idle' and m.wait_count > 0 order by 1 """)
        for wait in cursor:
         wait_class = wait[0]
         wait_name = wait[1]
         wait_cnt = wait[2]
         wait_avgms = wait[3]
         print "oracle_wait_event,host={0},db={1},wait_class={2},wait_event={3} count={4},latency={5}".format(self.hostname,sid,re.sub(' ', '_',wait_class),re.sub(' ', '_', wait_nam
e),wait_cnt,wait_avgms)

    def sysmetrics(self, user, passwd, sid):
        cursor = self.connection.cursor()
        cursor.execute("""
        select metric_name,value,metric_unit from v$sysmetric where group_id=2
        """)
        for metric in cursor:
         metric_name = metric[0]
         metric_value = metric[1]
         print "oracle_sysmetric,host={0},db={1},metric_name={2} metric_value={3}".format(self.hostname,sid,re.sub(' ', '_',metric_name),metric_value)

    def tbsstats(self, user, passwd, sid):
        cursor = self.connection.cursor()
        cursor.execute("""
	select tablespace_name,total_space,free_space,perc_used,percextend_used,max_size_mb,free_space_extend
	from(
	select t1.tablespace_name,
       	round(used_space/1024/1024) total_space,
       	round(nvl(lib,0)/1024/1024) free_space,
       	round(100*(used_space-nvl(lib,0))/used_space,1) perc_used,
       	round(100*(used_space-nvl(lib,0))/smax_bytes,1) percextend_used,
       	round(nvl(smax_bytes,0)/1024/1024) max_size_mb,
       	round(nvl(smax_bytes-(used_space-nvl(lib,0)),0)/1024/1024) free_space_extend,
       	nb_ext nb_ext
  	from (select tablespace_name,sum(bytes) used_space from dba_data_files i
         group by tablespace_name) t1,
       	(select tablespace_name,
               sum(bytes) lib,
               max(bytes) max_nb ,
               count(bytes) nb_ext
        from dba_free_space
        group by tablespace_name) t2,
        (select tablespace_name,sum(max_bytes) smax_bytes
        from (select tablespace_name, case when autoextensible = 'YES' then greatest(bytes,maxbytes)
                else bytes end max_bytes
                from dba_data_files i)
        group by tablespace_name ) t3
  	where t1.tablespace_name=t2.tablespace_name(+)
        and t1.tablespace_name=t3.tablespace_name(+)
	)
        """)
        for tbs in cursor:
         tbs_name = tbs[0]
         total_space_mb = tbs[1]
         free_space_mb = tbs[2]
         percent_used = tbs[3]
         percent_used_autoext = tbs[4]
         max_size_mb = tbs[5]
         free_space_autoextend_mb = tbs[6]
         print "oracle_tablespaces,host={0},db={1},tbs_name={2} total_space_mb={3},free_space_mb={4},percent_used={5},percent_used_autoext={6},max_size_mb={7},free_space_autoextend_
mb={8}".format(self.hostname,sid,re.sub(' ', '_',tbs_name),total_space_mb,free_space_mb,percent_used,percent_used_autoext,max_size_mb,free_space_autoextend_mb)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-u', '--user', help="Username", required=True)
    parser.add_argument('-p', '--passwd', required=True)
    parser.add_argument('-s', '--sid', help="SID", required=True)

    args = parser.parse_args()

    stats = OraMetrics(args.user, args.passwd, args.sid)
    stats.waitclassstats(args.user, args.passwd, args.sid)
    stats.waitstats(args.user, args.passwd, args.sid)
    stats.sysmetrics(args.user, args.passwd, args.sid)
    stats.tbsstats(args.user, args.passwd, args.sid)
Posted in Performance | 2 Comments

Monitor the dNFS activity as of 11.2.0.4

As you may know, there is some views available since 11g to monitor the dnfs activity:

  • v$dnfs_servers: Shows a table of servers accessed using Direct NFS.
  • v$dnfs_files: Shows a table of files now open with Direct NFS.
  • v$dnfs_channels: Shows a table of open network paths (or channels) to servers for which Direct NFS is providing files.
  • v$dnfs_stats: Shows a table of performance statistics for Direct NFS.

One interesting thing is that the v$dnfs_stats view provides two new columns as of 11.2.0.4:

  • NFS_READBYTES: Number of bytes read from NFS server
  • NFS_WRITEBYTES: Number of bytes written to NFS server

See the oracle documentation here.

Then, by sampling the view we can provide those metrics:

  • Reads/s: Number of read per second.
  • KbyRead/s: Kbytes read per second (as of 11.2.0.4).
  • AvgBy/Read: Average bytes per read (as of 11.2.0.4).
  • Writes/s: Number of Write per second.
  • KbyWrite/s: Kbytes write per second (as of 11.2.0.4).
  • AvgBy/Write: Average bytes per write (as of 11.2.0.4).

To do so, I just created a very simple db_io_dnfs_metrics.pl utility. It basically takes a snapshot each second (default interval) from the gv$dnfs_stats view and computes the delta with the previous snapshot. The utility is RAC aware.

Let’s see the help:

$>./db_io_dnfs_metrics.pl -help

Usage: ./db_io_dnfs_metrics.pl [-interval] [-count] [-inst] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                           Default
  ---------         -------                                                           -------
  -INST=            ALL - Show all Instance(s)                                        ALL
                    CURRENT - Show Current Instance
  -DISPLAY=         What to display: snap,avg (comma separated list)                  SNAP
  -SORT_FIELD=      reads|writes|iops                                                 NONE

Example: ./db_io_dnfs_metrics.pl
Example: ./db_io_dnfs_metrics.pl  -inst=BDT_1
Example: ./db_io_dnfs_metrics.pl  -sort_field=reads

This is a very simple utility, the options/features are:

  1. You can choose the number of snapshots to display and the time to wait between the snapshots.
  2. In case of RAC, you can choose on which database instance to collect the metrics thanks to the –INST= parameter.
  3. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the –DISPLAY= parameter.
  4. In case of RAC, you can sort the instances based on the number of reads, number of writes, number of IOPS (reads+writes) thanks to the –SORT_FIELD= parameter.

Examples:

Collecting on a single Instance:

$>./db_io_dnfs_metrics.pl
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

14:26:18                          Kby       AvgBy/               Kby       AvgBy/
14:26:18   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
14:26:18   ----------   -------   -------   -------   --------   -------   --------   --------   --------
14:26:18   VSBDT        321       2568      8192      0          0         0          321        2.5
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

14:26:19                          Kby       AvgBy/               Kby       AvgBy/
14:26:19   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
14:26:19   ----------   -------   -------   -------   --------   -------   --------   --------   --------
14:26:19   VSBDT        320       2560      8192      1          16        16384      321        2.5

Collecting on a RAC database, sorting the Instances by the number of read:

$>./db_io_dnfs_metrics.pl -sort_field=reads
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:21                          Kby       AvgBy/               Kby       AvgBy/
17:21:21   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:21   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:21   VBDTO_1      175       44536     260599    0          0         0          175        43.5
17:21:21   VBDTO_2      69        2272      33718     0          0         0          69         2.2
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:22                          Kby       AvgBy/               Kby       AvgBy/
17:21:22   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:22   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:22   VBDTO_2      151       36976     250751    0          0         0          151        36.1
17:21:22   VBDTO_1      131       33408     261143    0          0         0          131        32.6
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:23                          Kby       AvgBy/               Kby       AvgBy/
17:21:23   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:23   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:23   VBDTO_2      133       33592     258633    0          0         0          133        32.8
17:21:23   VBDTO_1      121       31360     265394    0          0         0          121        30.6

Remarks:

Posted in Perl Scripts, ToolKit | Leave a comment