Exadata Cell metrics: collectionTime attribute, something that matters

Exadata provides a lot of useful metrics to monitor the Cells.

The Metrics can be of various types:

  • Cumulative: Cumulative statistics since the metric was created.
  • Instantaneous: Value at the time that the metric is collected.
  • Rate: Rates computed by averaging statistics over observation periods.
  • Transition: Are collected at the time when the value of the metrics has changed, and typically captures important transitions in hardware status.

One attribute of the cumulative metric is the collectionTime.

For example, let’s have a look to one of them:

CellCLI> list METRICCURRENT DB_IO_WT_SM detail
         name:                   DB_IO_WT_SM
         alertState:             normal
         collectionTime:         2013-09-12T23:46:14+02:00
         metricObjectName:       EXABDT
         metricType:             Cumulative
         metricValue:            120 ms
         objectType:             IORM_DATABASE

The collectionTime attribute is the time at which the metric was collected.

Why does it matter ?

Based on it, we can compute the delta in second between 2 collections.

Let’s see two use cases.

First use case: Suppose, you decided to extract real-time metrics from the cumulative ones. To do so, you created a script that takes a snapshot of the cumulative metrics each second (default interval) and computes the delta with the previous snapshot (yes, I am describing my exadata_metrics.pl script introduced into this post 🙂 ).

Then, if the delta value of the metric is 0, you need to know why (two explanations are possible as we’ll see).

Let’s see an example: I’ll take a snapshot with a 40 seconds interval of 2 IORM cumulative metrics:

./exadata_metrics.pl 40 cell=exacell1  name='DB_IO_WT_.*' objectname='EXABDT'
--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

00:19:21   CELL                    NAME                         OBJECTNAME                                                  VALUE
00:19:21   ----                    ----                         ----------                                                  -----
00:19:21   exacell1                DB_IO_WT_LG                  EXABDT                                                      0.00 ms
00:19:21   exacell1                DB_IO_WT_SM                  EXABDT                                                      0.00 ms

Well, as you can see the computed (delta) value is 0.00 ms but:

  • does it mean that no IO has been queued by the IORM ?
  • or does it mean that the 2 snaps are based on the same collectionTime? (could be the case if the collection interval is greater than the interval you are using with my script).

To answer those questions, I modified the script so that it takes care of the collectionTime: It computes the delta in seconds of the collectionTime recorded into the snapshots.

Let’s see it in action:

Enable the IORM plan:

CellCLI> alter iormplan objective=auto;
IORMPLAN successfully altered

and launch the script with a 40 seconds interval:

./exadata_metrics.pl 40 cell=exacell1  name='DB_IO_WT_.*' objectname='EXABDT'

--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

DELTA(s)   CELL                    NAME                         OBJECTNAME                                                  VALUE
--------   ----                    ----                         ----------                                                  -----
61         exacell1                DB_IO_WT_SM                  EXABDT                                                      0.00 ms
61         exacell1                DB_IO_WT_LG                  EXABDT                                                      1444922.00 ms

--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

DELTA(s)   CELL                    NAME                         OBJECTNAME                                                  VALUE
--------   ----                    ----                         ----------                                                  -----
60         exacell1                DB_IO_WT_SM                  EXABDT                                                      1.00 ms
60         exacell1                DB_IO_WT_LG                  EXABDT                                                      2573515.00 ms

--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

DELTA(s)   CELL                    NAME                         OBJECTNAME                                                  VALUE
--------   ----                    ----                         ----------                                                  -----
0          exacell1                DB_IO_WT_LG                  EXABDT                                                      0.00 ms
0          exacell1                DB_IO_WT_SM                  EXABDT                                                      0.00 ms

Look at the DELTA(s) column: It indicates the delta in seconds for the collectionTime attribute.

So that:

  • DELTA(s) > 0: Means you can check the metric value as the snaps are from 2 distinct collectionTime.
  • DELTA(s) = 0: Means the snaps come from the same collectionTime and then a metric value of 0 is obvious.

Second use case:

As we now have the DELTA(s) value we can compute by our own the associated (_SEC) rate metrics.

For example, from:

./exadata_metrics_orig_new.pl 10 cell=exacell1 name='DB_IO_.*' objectname='EXABDT'
--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

DELTA(s)   CELL                    NAME                         OBJECTNAME                                                  VALUE                
--------   ----                    ----                         ----------                                                  -----                
60         exacell1                DB_IO_WT_SM                  EXABDT                                                      0.00 ms        
60         exacell1                DB_IO_RQ_SM                  EXABDT                                                      153.00 IO requests
60         exacell1                DB_IO_RQ_LG                  EXABDT                                                      292.00 IO requests
60         exacell1                DB_IO_WT_LG                  EXABDT                                                      830399.00 ms

We can conclude, that:

  • the number of large IO request per second is 292/60=4.87.
  • the number of small IO request per second is 153/60=2.55.

Let’s verify those numbers with their associated rate metrics (DB_IO_RQ_LG_SEC and DB_IO_RQ_SM_SEC):

cellcli -e "list metriccurrent attributes name,metrictype,metricobjectname,metricvalue,collectionTime where name like 'DB_IO_.*' and metricobjectname='EXABDT' and metrictype='Rate'"
         DB_IO_RQ_LG_SEC         Rate    EXABDT  4.9 IO/sec              2013-09-13T16:13:40+02:00
         DB_IO_RQ_SM_SEC         Rate    EXABDT  2.6 IO/sec              2013-09-13T16:13:40+02:00
         DB_IO_WT_LG_RQ          Rate    EXABDT  2,844 ms/request        2013-09-13T16:13:40+02:00
         DB_IO_WT_SM_RQ          Rate    EXABDT  0.0 ms/request          2013-09-13T16:13:40+02:00

Great, that’s the same numbers.

Conclusion:

The collectionTime metric attribute can be very useful when you extract real-time metrics from the cumulative ones as:

  • It provides a way to interpret the results.
  • it provides a way to extract the rate metrics (_SEC) from their cumulatives ones.

Regarding the script:

  • You are able to collect real-time metrics based on cumulative metrics.
  • You can choose the number of snapshots to display and the time to wait between snapshots.
  • You can choose to filter on name and objectname based on predicates (see the help).
  • You can work on all the cells or a subset thanks to the CELL or the GROUPFILE parameter.
  • You can decide the way to compute the metrics with no aggregation, aggregation on cell, objectname or both.

You can download the exadata_metrics.pl script from this repository.

Advertisements

One thought on “Exadata Cell metrics: collectionTime attribute, something that matters

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s