Exadata real-time metrics extracted from cumulative metrics

Exadata provides a lot of useful metrics to monitor the Cells.

The Metrics can be of various types:

  • Cumulative: Cumulative statistics since the metric was created.
  • Instantaneous: Value at the time that the metric is collected.
  • Rate: Rates computed by averaging statistics over observation periods.
  • Transition: Are collected at the time when the value of the metrics has changed, and typically captures important transitions in hardware status.

You can found some information on how to exploit those metrics in those posts:

UWE HESSE’s post 

TANEL PODER’s post

But I think those types of metrics are  not enough to answer all the basic questions.

Let me explain why with 2 examples:

Let’s have a look to the metrics GD_IO_RQ_W_SM and GD_IO_RQ_W_SM_SEC (restricted to one Grid Disk for lisibility):

dcli -c cell1 cellcli -e "list metriccurrent attributes name,metricType,metricObjectName,metricValue where name like \'.*GD_IO_RQ_W_SM.*\' and metricObjectName ='data_CD_disk01_cell'"

cell1: GD_IO_RQ_W_SM 		Cumulative 	data_CD_disk01_cell 	2,930 IO requests
cell1: GD_IO_RQ_W_SM_SEC 	Rate 		data_CD_disk01_cell 	0.3 IO/sec

So we can observe that this “cumulative” metric shows the number of small write I/O requests while its associated “rate” metric shows the number of small write I/O requests per seconds.

or

Let’s have a look to the metrics CD_IO_TM_W_SM and CD_IO_TM_W_SM_RQ (restricted to one Cell Disk for lisibility):

dcli -c cell1 cellcli -e "list metriccurrent attributes name,metricType,metricObjectName,metricValue where name like \'.*CD_IO_TM_W.*SM.*\' and metricobjectname='CD_disk07_cell'"

cell1: CD_IO_TM_W_SM	 	Cumulative 	CD_disk07_cell 		1,512,939 us
cell1: CD_IO_TM_W_SM_RQ 	Rate 		CD_disk07_cell 		168 us/request

So we can observe that this “cumulative” metric shows the small write I/O latency in us while its associated “rate” metric shows the small write I/O latency in us per request.

But how can I answer those questions:

  1. How many small write I/O requests have been done during the last 80 seconds? (Unfortunately 0.3 * 80 will not necessary provide the right answer as it depends of the “observation period” of the rate metrics)
  2. What was the small write I/O latency during the last 80 second ?

You could ask for the same kind of questions on all cumulative metrics.

To answer all those questions I created a perl script exadata_metrics.pl (click on the link and then on the view source button  to copy/paste the source code) that extracts exadata real-time information metrics based on cumulative metrics.

That is to say the script works with all the cumulative metrics (the following command list all of them) :

cellcli -e "list metriccurrent attributes name,metricType where metricType='Cumulative'"

To extract real-time information the script takes a snapshot of cumulative metrics each second (default interval) and computes the differences with the previous snapshot.

So, to get the answer to our first question :

./exadata_metrics.pl 80 cell=cell1 name='GD_IO_RQ_W_SM' metricobjectname='data_CD_disk01_cell'

04:30:38 CELL 	NAME 			OBJECTNAME 		VALUE
04:30:38 cell1 	GD_IO_RQ_W_SM 		data_CD_disk01_cell 	0.00 IO requests
--------------------------------------> NEW
04:31:58 CELL 	NAME 			OBJECTNAME 		VALUE
04:31:58 cell1 	GD_IO_RQ_W_SM 		data_CD_disk01_cell 	20.00 IO requests

As you can see 20 small write I/O requests have been generated during the last 80 seconds (which is different from 0.3*80).

To get the answer to our second question :

./exadata_metrics.pl 80 cell=cell1 name_like='.*CD_IO_TM_W.*SM.*' metricobjectname='CD_disk07_cell'

06:48:33 CELL 	NAME 			OBJECTNAME 		VALUE
06:48:33 cell1 	CD_IO_TM_W_SM 		CD_disk07_cell 		0.00 us
--------------------------------------> NEW
06:49:53 CELL 	NAME 			OBJECTNAME 		VALUE
06:49:53 cell1 	CD_IO_TM_W_SM 		CD_disk07_cell 		3613.00 us

As you can see we the small write I/O latency has been  3613 us during the last 80 seconds.

Let’s see the help of the script:

./exadata_metrics.pl help

Usage: ./exadata_metrics.pl [Interval [Count]] [cell=] [top=] [name=] [metricobjectname=] [name_like=] [metricobjectname_like=]

Default Interval : 1 second.
Default Count : Unlimited

Parameter 		Comment 					Default
--------- 		------- 					-------
CELL= 			comma separated list of cell to display
TOP= 			Number of rows to display 			10
NAME= 			ALL - Show all cumulative metrics 		ALL
NAME_LIKE= 		ALL - Show all cumulative metrics 		ALL
METRICOBJECTNAME= 	ALL - Show all objects 				ALL
METRICOBJECTNAME_LIKE= 	ALL - Show all objects 				ALL

Example: ./exadata_metrics.pl cell=cell1,cell2 name_like='.*FC.*'
Example: ./exadata_metrics.pl cell=cell1,cell2 name='CD_IO_BY_W_LG'
Example: ./exadata_metrics.pl cell=cell1,cell2 name='CD_IO_BY_W_LG' metricobjectname_like='.*disk.*'

The script is based on the dcli and the cellcli commands and their regular expressions (wich are described into Kerry Osborne’s  post).

  • You can choose the number of snapshots to display and the time to wait between snapshots.
  • You can choose to filter on name and metricobjectname based on like or equal predicates.
  • You can work on all the cells or a subset thanks to the mandatory CELL parameter.
  • A cell os user allowed to run dcli without password (celladmin for example) can launch the script (ORACLE_HOME must be set).

Please don’t hesitate to tell me if this is useful for you and if you find any issues with this script.

Updates:

About these ads
This entry was posted in Exadata, Perl Scripts, ToolKit and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s