Exadata cell metrics: Join the metrics and their descriptions on the fly

Introduction

Cells metrics are very useful but their name are not so friendly. The name is a concatenation of abbreviations for the type of component, delimited by the underscore character. Then you have to understand the naming convention to understand the meaning of the metric name.

For example, knowing that:

  • CD stands for “Cell Disks metrics”
  • IO_BY stands for “Number of megabytes”
  • R stands for Read
  • LG stands for Large

You can conclude that the CD_IO_BY_R_LG metric is linked to the “Number of megabytes read in large blocks from a cell disk”.

Hopefully the metrics are explained into the Oracle Documentation and you can also retrieve their description from cellcli:

$ cellcli -e "list metricdefinition attributes name,description where name='CD_IO_BY_R_LG'"

CD_IO_BY_R_LG	 "Number of megabytes read in large blocks from a cell disk"

Lack of description

That said, as an example, let’s query the current metric values for a particular database that way:

$ cellcli -e "list metriccurrent attributes metricObjectName,name,metricValue where name like 'DB.*' and metricObjectName='BDT'"

	 BDT	 DB_FC_IO_BY_SEC   	 0 MB/sec
	 BDT	 DB_FC_IO_RQ       	 47,638 IO requests
	 BDT	 DB_FC_IO_RQ_SEC   	 2.1 IO/sec
	 BDT	 DB_FD_IO_BY_SEC   	 0 MB/sec
	 BDT	 DB_FD_IO_LOAD     	 19,885
	 BDT	 DB_FD_IO_RQ_LG    	 36 IO requests
	 BDT	 DB_FD_IO_RQ_LG_SEC	 0.0 IO/sec
	 BDT	 DB_FD_IO_RQ_SM    	 47,602 IO requests
	 BDT	 DB_FD_IO_RQ_SM_SEC	 2.1 IO/sec
	 BDT	 DB_FL_IO_BY       	 0.000 MB
	 BDT	 DB_FL_IO_BY_SEC   	 0.000 MB/sec
	 BDT	 DB_FL_IO_RQ       	 0 IO requests
	 BDT	 DB_FL_IO_RQ_SEC   	 0.0 IO/sec
	 BDT	 DB_IO_BY_SEC      	 0 MB/sec
	 BDT	 DB_IO_LOAD        	 0.0
	 BDT	 DB_IO_RQ_LG       	 0 IO requests
	 BDT	 DB_IO_RQ_LG_SEC   	 0.0 IO/sec
	 BDT	 DB_IO_RQ_SM       	 19 IO requests
	 BDT	 DB_IO_RQ_SM_SEC   	 0.0 IO/sec
	 BDT	 DB_IO_UTIL_LG     	 0 %
	 BDT	 DB_IO_UTIL_SM     	 0 %
	 BDT	 DB_IO_WT_LG       	 0 ms
	 BDT	 DB_IO_WT_LG_RQ    	 0.0 ms/request
	 BDT	 DB_IO_WT_SM       	 0 ms
	 BDT	 DB_IO_WT_SM_RQ    	 0.0 ms/request

As you can see the metric description is not there and there is no way to retrieve it from metriccurrent (or metrichistory) because this is not an attribute:

$ cellcli -e "describe metriccurrent"
	name
	alertState
	collectionTime
	metricObjectName
	metricType
	metricValue
	objectType

$ cellcli -e "describe metrichistory"
	name
	alertState
	collectionTime
	metricObjectName
	metricType
	metricValue
	metricValueAvg
	metricValueMax
	metricValueMin
	objectType

But if you send the result of our example to someone that don’t know (or don’t remember) the naming convention (or if you are not 100% sure of the definition of a particular metric) then he/you’ll have to:

  • go back to the oracle documentation
  • query the metricdefinition with cellcli

New script: exadata_metrics_desc.pl

Thanks to the exadata_metrics_desc.pl script, you can add (to the cellcli output) the description of the metric on the fly.

Let’s launch the same query (as the one used in the previous example) and add a call to exadata_metrics_desc.pl that way:

$ cellcli -e "list metriccurrent attributes metricObjectName,name,metricValue where name like 'DB.*' and metricObjectName='BDT'" | ./exadata_metrics_desc.pl

  BDT   DB_FC_IO_BY_SEC (Number of megabytes of I/O per second for this database to flash cache)          0 MB/sec
  BDT   DB_FC_IO_RQ (Number of IO requests issued by a database to flash cache)                           48,123 IO requests
  BDT   DB_FC_IO_RQ_SEC (Number of IO requests issued by a database to flash cache per second)            2.1 IO/sec
  BDT   DB_FD_IO_BY_SEC (Number of megabytes of I/O per second for this database to flash disks)          0 MB/sec
  BDT   DB_FD_IO_LOAD (Average I/O load from this database for flash disks)                               4,419
  BDT   DB_FD_IO_RQ_LG (Number of large IO requests issued by a database to flash disks)                  36 IO requests
  BDT   DB_FD_IO_RQ_LG_SEC (Number of large IO requests issued by a database to flash disks per second)   0.0 IO/sec
  BDT   DB_FD_IO_RQ_SM (Number of small IO requests issued by a database to flash disks)                  48,087 IO requests
  BDT   DB_FD_IO_RQ_SM_SEC (Number of small IO requests issued by a database to flash disks per second)   2.1 IO/sec
  BDT   DB_FL_IO_BY (The number of MB written to the Flash Log)                                           0.000 MB
  BDT   DB_FL_IO_BY_SEC (The number of MB written per second to the Flash Log)                            0.000 MB/sec
  BDT   DB_FL_IO_RQ (The number of I/O requests issued to the Flash Log)                                  0 IO requests
  BDT   DB_FL_IO_RQ_SEC (The number of I/O requests per second issued to the Flash Log)                   0.0 IO/sec
  BDT   DB_IO_BY_SEC (Number of megabytes of I/O per second for this database to hard disks)              0 MB/sec
  BDT   DB_IO_LOAD (Average I/O load from this database for hard disks)                                   0.0
  BDT   DB_IO_RQ_LG (Number of large IO requests issued by a database to hard disks)                      0 IO requests
  BDT   DB_IO_RQ_LG_SEC (Number of large IO requests issued by a database to hard disks per second)       0.0 IO/sec
  BDT   DB_IO_RQ_SM (Number of small IO requests issued by a database to hard disks)                      19 IO requests
  BDT   DB_IO_RQ_SM_SEC (Number of small IO requests issued by a database to hard disks per second)       0.0 IO/sec
  BDT   DB_IO_UTIL_LG (Percentage of disk resources utilized by large requests from this database)        0 %
  BDT   DB_IO_UTIL_SM (Percentage of disk resources utilized by small requests from this database)        0 %
  BDT   DB_IO_WT_LG (IORM wait time for large IO requests issued by a database)                           0 ms
  BDT   DB_IO_WT_LG_RQ (Average IORM wait time per request for large IO requests issued by a database)    0.0 ms/request
  BDT   DB_IO_WT_SM (IORM wait time for small IO requests issued by a database)                           0 ms
  BDT   DB_IO_WT_SM_RQ (Average IORM wait time per request for small IO requests issued by a database)    0.0 ms/request

As you can see the description of each metric being part of the initial output has been added.

Remarks

  • You can download the script from this repository or from GitHub.
  • Feel free to build the query you want on the metrics. You just need to add a call to exadata_metrics_desc.pl to see the metric description being added on the fly (as long as the metric name appears in the output of your initial query).
  • The idea of this script is all to credit to Martin Bach.
  • This script works with any input (could be a text file):

Screen Shot 2015-09-14 at 15.34.36

Conclusion

The exadata_metrics_desc.pl can be used to join on the fly the metric name, its value (and whatever attribute you would love to see) with its associated description.

Advertisements

Extract Exadata cells historical metrics in CSV format

Exadata provides a lot of useful metrics to monitor the Cells and you may want to retrieve historical values for some metrics. To do so, you can use the “LIST METRICHISTORY” command through CellCLI on the cell.

But as usual, visualising the metrics is even more better. For this purpose, you can use a perl script (see the download link in the remarks section) that extracts the historical metrics in CSV format so that you can graph them with the visualisation tool of your choice.

Let’s see the help:

Usage: ./csv_exadata_metrics_history.pl [cell=|groupfile=] [serial] [type=] [name=] [objectname=] [name!=] [objectname!=] [ago_unit=] [ago_value=]

 Parameter                 Comment                                                      Default
 ---------                 -------                                                      -------
 CELL=                     comma-separated list of cells
 GROUPFILE=                file containing list of cells
 SERIAL                    serialize execution over the cells (default is no)
 TYPE=                     Metrics type to extract: Cumulative|Rate|Instantaneous       ALL
 NAME=                     Metrics to extract (wildcard allowed)                        ALL
 OBJECTNAME=               Objects to extract (wildcard allowed)                        ALL
 NAME!=                    Exclude metrics (wildcard allowed)                           EMPTY
 OBJECTNAME!=              Exclude objects (wildcard allowed)                           EMPTY
 AGO_UNIT=                 Unit to retrieve historical metrics back: day|hour|minute    HOUR
 AGO_VALUE=                Value associated to Unit to retrieve historical metrics back 1

utility assumes passwordless SSH from this cell node to the other cell nodes
utility assumes ORACLE_HOME has been set (with celladmin user for example)

Example : ./csv_exadata_metrics_history.pl cell=cell
Example : ./csv_exadata_metrics_history.pl groupfile=./cell_group
Example : ./csv_exadata_metrics_history.pl cell=cell objectname='CD_disk03_cell'
Example : ./csv_exadata_metrics_history.pl cell=cell name='.*BY.*' objectname='.*disk.*'
Example : ./csv_exadata_metrics_history.pl cell=enkcel02 name='.*DB_IO.*' objectname!='ASM' name!='.*RQ.*' ago_unit=minute ago_value=4
Example : ./csv_exadata_metrics_history.pl cell=enkcel02 type='Instantaneous' name='.*DB_IO.*' objectname!='ASM' name!='.*RQ.*' ago_unit=hour ago_value=4
Example : ./csv_exadata_metrics_history.pl cell=enkcel01,enkcel02 type='Instantaneous' name='.*DB_IO.*' objectname!='ASM' name!='.*RQ.*' ago_unit=minute ago_value=4 serial

You have to setup passwordless SSH from one cell to the other cells (Then you can launch the script from this cell and retrieve data from the other cells).

The main options/features are:

  1. You can specify the cells on which you want to collect the metrics thanks to the cell or groupfile parameter.
  2. You can choose to serialize the execution over the cells thanks to the serial parameter.
  3. You can choose the type of metrics you want to retrieve (Cumulative, rate or instantaneous) thanks to the type parameter.
  4. You can focus on some metrics thanks to the name parameter (wildcard allowed).
  5. You can exclude some metrics thanks to the name! parameter (wildcard allowed).
  6. You can focus on some metricobjectname thanks to the objectname parameter (wildcard allowed).
  7. You can exclude some metricobjectname thanks to the objectname! parameter (wildcard allowed).
  8. You can choose the unit to retrieve metrics back (day, hour, minute) thanks to the ago_unit parameter.
  9. You can choose the value associated to the unit to retrieve metrics back thanks to the ago_value parameter.

Let’s see an example:

I want to retrieve in csv format the metrics from 2 cells related to databases for the last 20 minutes:

$> ./csv_exadata_metrics_history.pl cell=enkx3cel01,enkx3cel02 name='DB_.*' ago_unit=minute ago_value=20
Cell;metricType;DateTime;name;objectname;value;unit
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;ACSTBY;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;ASM;15,779;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;BDT;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;BIGDATA;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DBFS;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DBM;15,779;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DEMO;794,329;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;DEMOX3;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;EXDB;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;WZSDB;0.000;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_BY_ALLOCATED;_OTHER_DATABASE_;48,764;MB
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;ACSTBY;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;ASM;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;BDT;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;BIGDATA;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DBFS;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DBM;15;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DEMO;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;DEMOX3;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;EXDB;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;WZSDB;0;MB/sec
enkx3cel01;Instantaneous;2015-07-01T08:57:59-05:00;DB_FC_IO_BY_SEC;_OTHER_DATABASE_;0;MB/sec
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;ACSTBY;2,318;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;ASM;0;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;BDT;2,966;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;BIGDATA;25,415;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DBFS;3,489;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DBM;1,627,066;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DEMO;4,506;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;DEMOX3;4,172;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;EXDB;0;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;WZSDB;4,378;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ;_OTHER_DATABASE_;6,227;IO requests
enkx3cel01;Cumulative;2015-07-01T08:57:59-05:00;DB_FC_IO_RQ_LG;ACSTBY;0;IO requests
.
.
.

That way, you could visualize your data the way you feel comfortable with. For example, I used tableau to create this “Database metrics dashboard” on DB_* rate metrics:

cells_metrics

Remarks:

  • If you retrieve too much data, you could receive something like:
Error: enkx3cel02 is returning over 100000 lines; output is truncated !!!
 Command could be retried with the serialize option: --serial
 Killing child pid 15720 to enkx3cel02...

Then, you can launch the script with the serial option (see the help).

Conclusion:

You probably already have a way to build your own graph of the historical metrics. But if you don’t, feel free to use this script and the visualisation tool of your choice.