ASM performance metrics visualization: Use cases

Now that I can graph ASM performance metrics, let’s see some use cases.

To display the ASM metrics I’ll use the csv file generated by the csv_asm_metrics utility and Tableau for the visualization. Of course you could use the visualization tool of your choice.

Use case 1: Display the TOP IO consumers

You consolidated several databases on the same machine and you want to visualize which database is generating most of the IO throughput for Reads. You can visualize this that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce with Tableau) :

Screen Shot 2014-07-12 at 14.46.40

I can see that one of my databases is generating most of the throughput.

Should you use RAC, you could split those metrics per ASM instances as well:

Screen Shot 2014-07-12 at 14.47.58

I can see that most of the activity is recorded on ASM2, which makes sense as my RAC services are configured as preferred/available (Active/Passive configuration) and started on the *_2 database instances (linked to ASM2).

Use case 2: I want to see the Read IO distributions by Failgroups

You can visualize this that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce with Tableau):

Screen Shot 2014-07-12 at 10.09.08

We can see that the IOs are equally distributed over the Failgroups. It is the expected behaviour as I am not using the ASM Preferred Read feature.

Use case 3: Should I use the ASM Preferred Read feature on my extended RAC?

Suppose the host hosting the ASM1 instance is close to the disk array on which the “WIN” failgroup has been created. The same way, the host hosting the ASM2 instance is close to the disk array on which the “JMO” failgroup has been created. Let’s see the Read IO latency between the ASM instances and the failgroups.

You can visualize this that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce with Tableau):

Screen Shot 2014-07-12 at 10.17.16

As you can see the ASM1 instance reports faster reads on the “WIN” failgroup and ASM2 reports faster reads on the “JMO” failgroup which makes sense according to our configuration. I can also check if the reads performance are good enough when the Reads are done on the farthest disk array (ASM1 “reading” on the JMO failgroup and ASM2 “reading” on the WIN failgroup) and then decide if the ASM Preferred Read feature needs to be implemented.

Remarks:

  • ASM is not performing any reads for the database, it records metrics for the database instances that it is servicing.
  • You can imagine a lot of use cases thanks to the measures collected (Reads/s, Kby Read/s, ms/Read, By/Read, Writes/s, Kby Write/s, ms/Write, By/Write) and all those dimensions (Snap Time, INST, DBINST, DG, FG, DSK).

You can download the asm_metrics and the csv_asm_metrics utilities from this repository.

Posted in ASM, tableau | Leave a comment

Graphing ASM performance metrics

ASM metrics are a goldmine, they provide a lot of informations. As you may know, the asm_metrics utility extracts them in real-time.

But sometimes it is not easy to understand the values without the help of a graph. Look at this example: If I cant’ picture it, I can’t understand it.

So depending on your needs, depending on what you are looking for with the ASM metrics: A picture may help.

So let’s graph the output of the asm_metrics utility: For this I created the csv_asm_metrics utility to produce a csv file from the output of the asm_metrics utility.

Once you get the csv file you can graph the metrics with your favourite visualization tool (I’ll use Tableau as an example).

First you have to launch the asm_metrics utility that way (To ensure that all the fields are displayed):

  • -show=inst,dbinst,fg,dg,dsk for ASM >= 11g
  • -show=inst,fg,dg,dsk for ASM < 11g

and redirect the output to a text file:

./asm_metrics.pl -show=inst,dbinst,fg,dg,dsk > asm_metrics.txt

Remark: You can use the -interval parameter to collect data with an interval greater than one second (the default interval), as it could produce a huge output file.

The output file looks like:

............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

13:48:54                                                                              Kby       Avg       AvgBy/               Kby       Avg        AvgBy/ 
13:48:54   INST     DBINST        DG            FG           DSK            Reads/s   Read/s    ms/Read   Read      Writes/s   Write/s   ms/Write   Write  
13:48:54   ------   -----------   -----------   ----------   ----------     -------   -------   -------   ------    ------     -------   --------   ------ 
13:48:54   +ASM1                                                            6731      54224     1.4       8249      42         579       3.0        14117  
13:48:54   +ASM1    BDT10_1                                                 2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA                                      2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST31                      0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST31       HOST31CA0D1C   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST31       HOST31CA0D1D   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST32                      2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST32       HOST32CA0D1C   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST32       HOST32CA0D1D   2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA                                       0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST31                      0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST31       HOST31CC8D0F   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST32                      0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST32       HOST32CC8D0F   0         0         0.0       0         0          0         0.0        0      

and so on...

Now let’s produce the csv file with the csv_asm_metrics utility. Let’s see the help:

./csv_asm_metrics.pl -help

Usage: ./csv_asm_metrics.pl [-if] [-of] [-d] [-help]

  Parameter         Comment
  ---------         -------
  -if=              Input file name (output of asm_metrics)
  -of=              Output file name (the csv file)
  -d=               Day of the first snapshot (YYYY/MM/DD)

Example: ./csv_asm_metrics.pl -if=asm_metrics.txt -of=asm_metrics.csv -d='2014/07/04'

and generate the csv file that way:

./csv_asm_metrics.pl -if=asm_metrics.txt -of=asm_metrics.csv -d='2014/07/04'

The csv file looks like:

Snap Time,INST,DBINST,DG,FG,DSK,Reads/s,Kby Read/s,ms/Read,By/Read,Writes/s,Kby Write/s,ms/Write,By/Write
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST31,HOST31CA0D1C,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST31,HOST31CA0D1D,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST32,HOST32CA0D1C,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST32,HOST32CA0D1D,2,32,0.2,16384,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,FRA,HOST31,HOST31CC8D0F,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,FRA,HOST32,HOST32CC8D0F,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO1,HOST31,HOST31CC0D13,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO1,HOST32,HOST32CC0D13,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO2,HOST31,HOST31CC0D12,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO2,HOST32,HOST32CC0D12,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT11_1,DATA,HOST31,HOST31CA0D1C,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT11_1,DATA,HOST31,HOST31CA0D1D,0,0,0.0,0,2,16,0.5,8448

As you can see:

  1. The day has been added (to create a date) and next ones will be calculated (should the snaps cross multiple days).
  2. Only the rows that contain all the fields have been recorded into the csv file (The script does not record the other ones as they represent aggregated values).

Now I can import this csv file into Tableau.

You can imagine a lot of graphs thanks to the measures collected (Reads/s, Kby Read/s, ms/Read, By/Read, Writes/s, Kby Write/s, ms/Write, By/Write) and all those dimensions (Snap Time, INST, DBINST, DG, FG, DSK).

Let’s graph the throughput and latency per failgroup for example.

Important remark regarding some averages computation/display:

The ms/Read and By/Read measures depend on the number of reads. So the averages have to be calculated using Weighted Averages. (The same apply for ms/Write and By/Write).

Let’s create the calculated field in Tableau for those Weighted Averages:

Screen Shot 2014-07-07 at 20.13.28

so that weighted Average ms/Read is:

Screen Shot 2014-07-07 at 20.16.19

Weighted Average By/Read:

Screen Shot 2014-07-07 at 20.21.12

The same way you have to create:

  • Weighted Average ms/Write = sum([ms/Write]*[Writes/s])/sum([Writes/s])
  • Weighted Average By/Write = sum([By/Write]*[Writes/s])/sum([Writes/s])

Now let’s display the average read latency by Failgroup (using the previous calculated weighted average):

Drag the Snap Time dimension to the “columns” shelf and choose “exact date”:

Screen Shot 2014-07-07 at 20.27.23Drag the Weighted Average ms/Read calculated field to the “Rows” shelf:

Screen Shot 2014-07-07 at 20.29.41Drag the FG dimension to the “Color Marks” shelf:

Screen Shot 2014-07-07 at 20.32.33So that the graph looks like:

Screen Shot 2014-07-07 at 20.33.15Create the same graph for the “Kby Read/s” measure (except that I want to see the sum (i.e the throughput and not the average) and put those 2 graphs into the same dashboard:

Screen Shot 2014-07-07 at 20.39.42

Here we are.

Conclusion:

  • We can create a csv file from the output of the asm_metrics utility thanks to csv_asm_metrics.
  • To do so, we have to collect all the fields of asm_metrics with those options:
      • -show=inst,dbinst,fg,dg,dsk for ASM >= 11g
      • -show=inst,fg,dg,dsk for ASM < 11g
  • Once you uploaded the csv file into your favourite visualization tool, don’t forget to calculate weighted averages for ms/Read, By/Read, ms/Write and By/Write if you plan to graph the averages.
  • You can imagine a lot of graphs thanks to the measures collected (Reads/s, Kby Read/s, ms/Read, By/Read, Writes/s, Kby Write/s, ms/Write, By/Write) and all those dimensions (Snap Time, INST, DBINST, DG, FG, DSK).

You can download the csv_asm_metrics utility from this repository or copy the source code from this page.

Posted in ASM, tableau | Leave a comment

Visualize the IO source thanks to Tableau and AWR

As you know, the wait event “db file sequential read” records “single block” IO performed outside the database buffer cache. But does the IO come from:

  • Filesystem cache (If any and used)
  • Disk Array cache
  • SSD
  • Spindle Disks
  • …..

It could be interesting to visualize the distribution of the IO source:

  • Should you migrate from a cached filesystem to ASM (You may need to increase the database cache to put the previous Filesystem cached IOs into the database cache).
  • Should you use Dynamic Tiering and want to figure out where the IOs come from (SSD, Spindle Disks..).

To do so, I’ll use the AWR data coming from the dba_hist_event_histogram view and Tableau. I’ll also extract the data coming from dba_hist_snapshot (to get the begin_interval_date time).

alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS';
alter session set nls_timestamp_format='YYYY/MM/DD HH24:MI:SS';

select * from dba_hist_event_histogram where
snap_id >= (select min(snap_id) from dba_hist_snapshot
where begin_interval_time >= to_date ('2014/06/01 00:00','YYYY/MM/DD HH24:MI'))
and event_name='db file sequential read';

select * from dba_hist_snapshot where begin_interval_time >= to_date ('2014/06/01 00:00','YYYY/MM/DD HH24:MI');

As you can see, there is no computation. This is just a simple extraction of the data.

Then I put those data into 2 csv files (awr_snap_for_june.csv and awr_event_histogram.csv).

1) Now, launch Tableau and select the csv files and add an inner join between those files:

Screen Shot 2014-06-28 at 13.45.19

2) Go to the worksheet and put the “begin interval time” dimension into the “column” and change it to an “exact date” (Instead of Year):

Screen Shot 2014-06-28 at 13.48.13

3) Put the “Wait count” measure into the “Rows” and create a table calculation on it:

Screen Shot 2014-06-28 at 13.55.33

Choose “difference” as the “WAIT_COUNT” field is cumulative and we want to see the delta between the AWR’s snapshots.

4) My graph now looks like:

Screen Shot 2014-06-28 at 14.02.38

The Jun 14 and Jun 20 the database has been re-started and then the difference is < 0.

5) Let’s modify the formula to take care of database restart into the delta computation:

Screen Shot 2014-06-28 at 14.04.31

Customize

Screen Shot 2014-06-28 at 14.05.19

Name: “Delta Wait Count” and change ZN(SUM([Wait Count])) – LOOKUP(ZN(SUM([Wait Count])), -1) to max(ZN(SUM([Wait Count])) – LOOKUP(ZN(SUM([Wait Count])), -1),0):

Screen Shot 2014-06-28 at 14.07.17

So that now the graph looks like:

Screen Shot 2014-06-28 at 14.09.11

6) Now we have to “split” those wait count into 2 categories based on the wait_time_milli measure coming from dba_hist_event_histogram. Let’s say that:

  • “db file sequential read” <= 4 ms are not coming from spindle disks (So from caching, SSD..).
  • “db file sequential read” > 4 ms are coming from spindle disks.

Let’s implement this in tableau with a calculated field:

Screen Shot 2014-06-28 at 14.18.36

Name: “IO Source” and use this formula:

Screen Shot 2014-06-28 at 14.21.23

Feel free to modify this formula according to your environment.

Now take the “IO Source” Dimension and put it into the Color marks:

Screen Shot 2014-06-28 at 14.23.37

So that we can now visualize the IO source repartition:

Screen Shot 2014-06-28 at 14.25.08

 

Remarks:

  • Karl Arao presented another example usage of Tableau into this blog post.
  • Should you need to retrieve “db file sequential read” buckets < 1 ms, then you can use oracle_trace_parsing from Kyle Hailey.

 

Posted in tableau | Leave a comment

db_io*metrics family

Into the last three posts I introduced three utilities to report database physical IO metrics in real time.

Screen Shot 2014-05-09 at 08.40.54I just want to introduce again those three utilities into a single post as:

  • You may have missed one of them.
  • They are all members of the same family.

The three utilities are:

  1. db_io_metrics for reads and writes metrics related to data files and temp files.
  2. db_io_type_metrics for reads, writes, small, large and synchronous metrics related to data files, temp files,control files, log files, archive logs, and so on.
  3. db_io_function_metrics for reads, writes, small and large metrics related to database functions (LGWR, DBWR, Smart Scan and so on).

If a new member join this family, I will let you know.

Remarks:

  • One member of the family looks like SLOB‘s logo (see below its “official” logo) designed by flashdba‘s childs, don’t you think? ;-)

Screen Shot 2014-05-09 at 11.54.47

  •  If you are interested in IO metrics related to ASM, you can have a look to the asm_metrics utility.
Posted in Perl Scripts | Leave a comment

Report database physical IO activity per database function (LGWR, DBWR, Smart Scan…) and IO type in real time

Welcome to db_io_function_metrics, an utility used to display database physical IO per database functions (DBWR, LGWR, Direct Reads, Smart Scan and so on) real-time metrics. It basically takes a snapshot each second (default interval) from the gv$iostat_function cumulative view and computes the delta with the previous snapshot. The utility is RAC aware (No need to be multitenant aware as con_id=0 into gv$iostat_function for 12.1 in any cases even if pdbs are created).

This utility:

  • provides useful metrics.
  • is RAC aware.
  • is fully customizable: you can aggregate the results depending on your needs.
  • does not install anything into the database.

It displays the following metrics per IO Type (small, large, reads and writes) and database functions:

  • MB/s: Megabytes per second.
  • RQ/s: Requests per second.
  • Avg MB/RQ: Average Megabytes per request.
  • Avg ms/RQ: Average ms per request.

At the following levels:

  • Database Instance.
  • Function

Let’s see the help:

./db_io_function_metrics.pl -help

Usage: ./db_io_function_metrics.pl [-interval] [-count] [-inst] [-function] [-io_type] [-show] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                                     Default
  ---------         -------                                                                     -------
  -INST=            ALL - Show all Instance(s)                                                  ALL
                    CURRENT - Show Current Instance
  -FUNCTION=        IO Function to collect (wildcard allowed)                                   ALL
  -IO_TYPE=         IO Type to collect: reads,writes,small,large                                READS
  -SHOW=            What to show: inst,function (comma separated list)                          INST
  -DISPLAY=         What to display: snap,avg (comma separated list)                            SNAP
  -SORT_FIELD=      small_reads,small_writes,large_reads,large_writes                           NONE

Example: ./db_io_function_metrics.pl
Example: ./db_io_function_metrics.pl  -inst=CBDT1
Example: ./db_io_function_metrics.pl  -show=inst,function
Example: ./db_io_function_metrics.pl  -show=inst,function -function=%Direct%
Example: ./db_io_function_metrics.pl  -show=inst,function -function=%Direct% -io_type=small -sort_field=small_reads

The main options/features are:

  1. You can choose the number of snapshots to display and the time to wait between snapshots.
  2. You can choose on which database instance to collect the metrics thanks to the -INST= parameter.
  3. You can choose on which Database Function to collect the metric thanks to the -FUNCTION=parameter (wilcard allowed).
  4. You can choose on which IO Type to collect the metrics thanks to the -IO_TYPE= parameter.
  5. You can aggregate the results on the database instances and database functions level thanks to the -SHOW= parameter.
  6. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the -DISPLAY= parameter.
  7. You can sort based on the number of small_reads, number of small_writes, number of large_reads or number of large_writes thanks to the -SORT_FIELD= parameter.

Examples:

Report the IO metrics for “small” IO per database instances and functions during a SLOB run:

./db_io_function_metrics.pl -show=inst,function -io_type=small
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

09:49:55                                       SMALL R   SMALL R   SMALL W   SMALL W   Avg MB/   Avg MB/   R+W       Avg ms/
09:49:55   INST         FUNCTION               MB/s      RQ/s      MB/s      RQ/s      SMALL R   SMALL W   IO/s      R+W IO
09:49:55   ----------   --------------------   -------   -------   -------   -------   -------   -------   -------   -------
09:49:55   BDT_1                               147       18817     0         2         0.008     0.000     18847     0.04
09:49:55   BDT_1        ARCH                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Archive Manager        0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Buffer Cache Reads     147       18811     0         0         0.008     0.000     18843     0.04
09:49:55   BDT_1        DBWR                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Data Pump              0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Direct Reads           0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Direct Writes          0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        LGWR                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Others                 0         6         0         2         0.000     0.000     4         0.25
09:49:55   BDT_1        RMAN                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Recovery               0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Smart Scan             0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Streams AQ             0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        XDB                    0         0         0         0         0.000     0.000     0         0.00

 

Report the IO metrics for “reads” IO per database instances and functions for the RMAN function during RMAN backup:

./db_io_function_metrics.pl -show=inst,function -io_type=reads -function=RMAN
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

10:04:58                                       SMALL R   SMALL R   LARGE R   LARGE R   Avg MB/   Avg MB/   R+W       Avg ms/
10:04:58   INST         FUNCTION               MB/s      RQ/s      MB/s      RQ/s      SMALL R   LARGE R   IO/s      R+W IO
10:04:58   ----------   --------------------   -------   -------   -------   -------   -------   -------   -------   ---------
10:04:58   BDT_1                               3         159       248       62        0.019     4.0       176       1.10
10:04:58   BDT_1        RMAN                   3         159       248       62        0.019     4.0       176       1.10

You can also report the metrics for Smart Scan operations, Data Pump operations and so on…

You can download it from this repository or copy the source code from this page.

Conclusion:

Four utilities are now available to report IO metrics in real time depending on our needs:

  1. asm_metrics for reads and writes metrics related to ASM.
  2. db_io_metrics for reads and writes metrics related to data files and temp files.
  3. db_io_type_metrics for reads, writes, small, large and synchronous metrics related to data files, temp files,control files, log files, archive logs, and so on.
  4. db_io_function_metrics for reads, writes, small and large metrics related to database functions (LGWR, DBWR, Smart Scan and so on).
Posted in Perl Scripts | Leave a comment

Report database physical IO activity per IO type and oracle files type in real time

Introduction

After I published the blog post related to the db_io_metrics utiliy (based on gv$filestat and gv$tempstat), Luca Canali suggested me to create a version based on the gv$iostat_file cumulative view (available since 11.1).

The advantages of this view are:

  • it displays information about disk I/O statistics on a lot of file types (including data files, temp files,control files, log files, archive logs, and so on).
  • it takes care of the IO type (small,large,reads, writes and synchronous).

Then based on this, I created a new utility: db_io_type_metrics

db_io_type_metrics description

The db_io_type_metrics.pl utility is used to display database physical IO type (small, large, synchronous, reads and writes) real-time metrics. It basically takes a snapshot each second (default interval) from the gv$iostat_file cumulative view and computes the delta with the previous snapshot. The utility is RAC and Multitenant aware.

This utility:

  • provides useful metrics.
  • is RAC aware.
  • detects if it is connected to a multitenant database and then is able to display the containers metrics.
  • is fully customizable: you can aggregate the results depending on your needs.
  • does not install anything into the database.

It displays the following metrics per IO Type (small, large, synchronous, reads and writes):

  • MB/s: Megabytes per second.
  • RQ/s: Requests per second.
  • Avg MB/RQ: Average Megabytes per request.
  • Avg ms/RQ: Average ms per request.

At the following levels:

  • Database Instance.
  • Database container.
  • File Type (log file, control file, data file, temp file…) or tablespace.

Let’s see the help:

./db_io_type_metrics.pl -help

Usage: ./db_io_type_metrics.pl [-interval] [-count] [-inst] [-cont] [-file_type_tbs] [-io_type] [-file_type] [-tbs] [-show] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                                     Default
  ---------         -------                                                                     -------
  -INST=            ALL - Show all Instance(s)                                                  ALL
                    CURRENT - Show Current Instance
  -CONT=            Container to collect (wildcard allowed)                                     ALL
  -FILE_TYPE_TBS=   Collect on File Type or on Tablespace: file_type,tbs                        FILE_TYPE
  -IO_TYPE=         IO Type to collect: reads,writes,small,large,synch                          READS
  -FILE_TYPE=       File Type to collect (in case FILE_TYPE_TBS=file_type) (wildcard allowed)   NONE
  -TBS=             Tablespace to collect (in case FILE_TYPE_TBS=tbs) (wildcard allowed)        NONE
  -SHOW=            What to show: inst,cont,file_type_tbs (comma separated list)                INST
  -DISPLAY=         What to display: snap,avg (comma separated list)                            SNAP
  -SORT_FIELD=      small_reads,small_writes,large_reads,large_writes                           NONE

Example: ./db_io_type_metrics.pl
Example: ./db_io_type_metrics.pl  -inst=CBDT1
Example: ./db_io_type_metrics.pl  -show=inst,file_type_tbs
Example: ./db_io_type_metrics.pl  -show=inst,file_type_tbs -file_type=%Data%
Example: ./db_io_type_metrics.pl  -show=inst -io_type=large
Example: ./db_io_type_metrics.pl  -show=inst -io_type=small -sort_field=small_reads
Example: ./db_io_type_metrics.pl  -show=inst,file_type_tbs -file_type_tbs=tbs -tbs=%USE%
Example: ./db_io_type_metrics.pl  -show=inst,cont
Example: ./db_io_type_metrics.pl  -show=inst,cont -cont=%P%
Example: ./db_io_type_metrics.pl  -show=inst,cont,file_type_tbs -io_type=small -sort_field=small_reads

The main options/features are:

  1. You can choose the number of snapshots to display and the time to wait between snapshots.
  2. You can choose on which database instance to collect the metrics thanks to the -INST= parameter.
  3. You can choose on which database container to collect the metrics thanks to the -CONT= parameter.
  4. You can choose to collect on File Type or tablespace thanks to the -FILE_TYPE_TBS=parameter.
  5. You can choose on which IO Type to collect the metrics thanks to the -IO_TYPE= parameter.
  6. You can choose on which File Type to collect the metric thanks to the -FILE_TYPE=parameter (wilcard allowed).
  7. You can choose on which Tablespace to collect the metric thanks to the -TBS=parameter (wilcard allowed).
  8. You can aggregate the results on the database instances, containers, file type or tablespace level thanks to the -SHOW= parameter.
  9. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the -DISPLAY= parameter.
  10. You can sort based on the number of small_reads, number of small_writes, number of large_reads or number of large_writes thanks to the -SORT_FIELD= parameter.

Let’s see some use cases:

Report the IO type metrics for “reads” IO per database instances, tablespaces for the SLOB tablespace:

 ./db_io_type_metrics.pl  -show=inst,file_type_tbs -file_type_tbs=tbs -tbs=%SLOB% -io_type=reads
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

10:52:54                                               SMALL R   SMALL R   LARGE R   LARGE R   Avg MB/   Avg MB/   Avg ms/   Avg ms/
10:52:54   INST         FILE_TYPE_TBS                  MB/s      RQ/s      MB/s      RQ/s      SMALL R   LARGE R   SMALL R   LARGE R
10:52:54   ----------   ----------------------------   -------   -------   -------   -------   -------   -------   -------   -------
10:52:54   BDT_1                                       144       18390     0         0         0.008     0.0       0.08      0.00
10:52:54   BDT_1        SLOB                           144       18390     0         0         0.008     0.0       0.08      0.00

 

Report the IO type metrics for “small” IO” and all file type:

./db_io_type_metrics.pl  -show=file_type_tbs -file_type_tbs=file_type -io_type=small
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

10:57:01                                               SMALL R   SMALL R   SMALL W   SMALL W   Avg MB/   Avg MB/   Avg ms/   Avg ms/
10:57:01   INST         FILE_TYPE_TBS                  MB/s      RQ/s      MB/s      RQ/s      SMALL R   SMALL W   SMALL R   SMALL W
10:57:01   ----------   ----------------------------   -------   -------   -------   -------   -------   -------   -------   -------
10:57:01                Archive Log                    0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Archive Log Backup             0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Control File                   1         66        0         0         0.015     0.000     0.00      0.00
10:57:01                Data File                      125       16230     0         0         0.008     0.000     0.15      0.00
10:57:01                Data File Backup               0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Data File Copy                 0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Data File Incremental Backup   0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Data Pump Dump File            0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Flashback Log                  0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Log File                       0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Other                          0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Temp File                      0         0         0         0         0.000     0.000     0.00      0.00

 

Report the IO type metrics for “small” IO per database instances and containers:

./db_io_type_metrics.pl -show=inst,cont -io_type=small
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

13:07:46                                                                 SMALL R   SMALL R   SMALL W   SMALL W   Avg MB/   Avg MB/   Avg ms/   Avg ms/
13:07:46   INST         CONT              FILE_TYPE_TBS                  MB/s      RQ/s      MB/s      RQ/s      SMALL R   SMALL W   SMALL R   SMALL W
13:07:46   ----------   ---------------   ----------------------------   -------   -------   -------   -------   -------   -------   -------   -------
13:07:46   CBDT1                                                         2         97        0         1         0.021     0.000     0.03      1.00
13:07:46   CBDT1        CDB$ROOT                                         2         97        0         1         0.021     0.000     0.03      1.00
13:07:46   CBDT1        PDB$SEED                                         0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT1        P_1                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT1        P_2                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT1        P_3                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2                                                         1         96        0         1         0.010     0.000     0.09      2.00
13:07:46   CBDT2        CDB$ROOT                                         1         96        0         1         0.010     0.000     0.09      2.00
13:07:46   CBDT2        PDB$SEED                                         0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2        P_1                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2        P_2                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2        P_3                                              0         0         0         0         0.000     0.000     0.00      0.00

 

Report the IO type metrics per… I let you finish the sentence as the utility is customizable enough ;-)

You can download it from this repository or copy the source code from this page.

Conclusion:

Three utilities are now available to report IO metrics in real time depending on our needs:

  1. asm_metrics for reads and writes metrics related to ASM.
  2. db_io_metrics for reads and writes metrics related to data files and temp files.
  3. db_io_type_metrics for reads, writes, small, large and synchronous metrics related to data files, temp files,control files, log files, archive logs, and so on.

Enjoy and don’t hesitate to come back to me in case of questions, issues or suggestions.

Posted in Perl Scripts | 2 Comments

Welcome to db_io_metrics, a new utility to display database physical IO metrics in real time

The db_io_metrics.pl utility is used to display database physical IO real-time metrics. It basically takes a snapshot each second (default interval) from the gv$filestat and gv$tempstat cumulative views and computes the delta with the previous snapshot. The utility is RAC and Multitenant aware.

This utility:

  • provides useful metrics.
  • is RAC aware.
  • detects if it is connected to a multitenant database and then is able to display the containers metrics.
  • is fully customizable: you can aggregate the results depending on your needs.
  • does not install anything into the database.

It displays the following metrics:

  • Reads/s: Number of read per second.
  • KbyRead/s: Kbytes read per second.
  • Avg ms/Read: ms per read in average.
  • AvgBy/Read: Average Bytes per read.
  • Same metrics are provided for Write Operations.

At the following levels:

  • Database Instance.
  • Database container.
  • Filesystem or ASM Diskgroup.
  • Tablespace.
  • Datafile.

Let’s see the help:

./db_io_metrics.pl -help

Usage: ./db_io_metrics.pl [-interval] [-count] [-inst] [-cont] [-fs_dg] [-tbs] [-file] [-fs_delimiter] [-show] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                           Default
  ---------         -------                                                           -------
  -INST=            ALL - Show all Instance(s)                                        ALL
                    CURRENT - Show Current Instance
  -CONT=            Container to collect (wildcard allowed)                           ALL
  -FS_DG=           Filesystem or Diskgroup to collect (wildcard allowed)             ALL
  -TBS=             Tablespace to collect (wildcard allowed)                          ALL
  -FILE=            Datafile to collect (wildcard allowed)                            ALL
  -FS_DELIMITER=    Folder which follows the FS mount point                           ORADATA
  -SHOW=            What to show: inst,cont,fs_dg,tbs,file (comma separated list)     INST
  -DISPLAY=         What to display: snap,avg (comma separated list)                  SNAP
  -SORT_FIELD=      reads|writes|iops                                                 NONE

Example: ./db_io_metrics.pl
Example: ./db_io_metrics.pl  -inst=BDT_1
Example: ./db_io_metrics.pl  -show=inst,tbs
Example: ./db_io_metrics.pl  -show=inst,tbs -tbs=%UNDO%
Example: ./db_io_metrics.pl  -show=fs_dg
Example: ./db_io_metrics.pl  -show=inst,fs_dg -display=avg
Example: ./db_io_metrics.pl  -show=inst,fs_dg -sort_field=reads
Example: ./db_io_metrics.pl  -show=inst,tbs,cont
Example: ./db_io_metrics.pl  -show=inst,tbs,cont -cont=%P_%
Example: ./db_io_metrics.pl  -show=inst,cont -sort_field=iops

The main options/features are:

  1. You can choose the number of snapshots to display and the time to wait between snapshots.
  2. You can choose on which database instance to collect the metrics thanks to the -INST= parameter.
  3. You can choose on which database container to collect the metrics thanks to the -CONT= parameter (wilcard allowed).
  4. You can choose on which Diskgroup or Filesystem to collect the metrics thanks to the -FS_DG= parameter (wildcard allowed).
  5. You can choose on which tablespace to collect the metrics thanks to the -TBS= parameter (wildcard allowed).
  6. You can choose on which datafile to collect the metrics thanks to the -FILE= parameter (wildcard allowed).
  7. You can choose which folder is your Filesystem delimiter thanks to the -FS_DELIMITER= parameter.
  8. You can aggregate the results on the database instances, containers, diskgroups or filesystems, tablespaces level thanks to the -SHOW= parameter.
  9. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the -DISPLAY= parameter.
  10. You can sort based on the number of reads, number of writes or number of IOPS (reads+writes) thanks to the -SORT_FIELD= parameter.

Let’s see some use cases:

Report the physical IO metrics for the database instances:

./db_io_metrics.pl -show=inst
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

12:55:31                                                                                Kby      Avg       AvgBy/               Kby       Avg        AvgBy/
12:55:31   INST         CONT         FS_DG      TBS            FILE           Reads/s   Read/s   ms/Read   Read      Writes/s   Write/s   ms/Write   Write
12:55:31   ----------   ----------   --------   ------------   ------------   -------   ------   -------   ------    --------   -------   --------   ------
12:55:31   CBDT1                                                              376.0     3008     5.8       8192      0.0        0         0.0        0
12:55:31   CBDT2                                                              346.0     2768     15.6      8192      0.0        0         0.0        0

Report the physical IO metrics per database instances and per containers and sort by iops:

./db_io_metrics.pl -show=inst,cont -sort_field=iops
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

12:57:59                                                                                Kby      Avg       AvgBy/               Kby       Avg        AvgBy/
12:57:59   INST         CONT         FS_DG      TBS            FILE           Reads/s   Read/s   ms/Read   Read      Writes/s   Write/s   ms/Write   Write
12:57:59   ----------   ----------   --------   ------------   ------------   -------   ------   -------   ------    --------   -------   --------   ------
12:57:59   CBDT2                                                              293.0     2344     18.8      8192      0.0        0         0.0        0
12:57:59   CBDT2        CDB$ROOT                                              150.0     1200     18.4      8192      0.0        0         0.0        0
12:57:59   CBDT2        P_1                                                   143.0     1144     19.2      8192      0.0        0         0.0        0
12:57:59   CBDT2        PDB$SEED                                              0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT2        P_2                                                   0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT2        P_3                                                   0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT1                                                              274.0     2192     9.1       8192      0.0        0         0.0        0
12:57:59   CBDT1        CDB$ROOT                                              274.0     2192     9.1       8192      0.0        0         0.0        0
12:57:59   CBDT1        PDB$SEED                                              0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT1        P_1                                                   0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT1        P_2                                                   0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT1        P_3                                                   0.0       0        0.0       0         0.0        0         0.0        0

Report the physical IO metrics per database instances and per filesystem or ASM diskgroup:

./db_io_metrics.pl -show=inst,fs_dg
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

13:00:39                                                                                Kby      Avg       AvgBy/               Kby       Avg        AvgBy/
13:00:39   INST         CONT         FS_DG      TBS            FILE           Reads/s   Read/s   ms/Read   Read      Writes/s   Write/s   ms/Write   Write
13:00:39   ----------   ----------   --------   ------------   ------------   -------   ------   -------   ------    --------   -------   --------   ------
13:00:39   CBDT1                                                              349.0     2792     8.3       8192      0.0        0         0.0        0
13:00:39   CBDT1                     DATA                                     349.0     2792     8.3       8192      0.0        0         0.0        0
13:00:39   CBDT2                                                              272.0     2176     25.4      8192      0.0        0         0.0        0
13:00:39   CBDT2                     DATA                                     272.0     2176     25.4      8192      0.0        0         0.0        0

Report the physical IO metrics per… I let you finish the sentence as the utility is customizable enough ;-)

Remarks:

  • The metrics are “only” linked to the datafiles and tempfiles (See the oracle documentation for more details).
  • In case your database is on Filesystems, you have to change the “FS_DELIMITER” argument to aggregate the metrics at the Filesystem level. For example, if the FS are :
Filesystem             size   used  avail capacity  Mounted on
devora11-data/u500     960G   927G    33G    97%    /ec/dev/server/oracle/devora11/u500
devora11-data/u501     960G   767G   193G    80%    /ec/dev/server/oracle/devora11/u501
devora11-data/u502     500G   445G    55G    89%    /ec/dev/server/oracle/devora11/u502

And the datafiles are located “behind” the oradata folder:

/ec/dev/server/oracle/devora11/u500/oradata/BDT
/ec/dev/server/oracle/devora11/u501/oradata/BDT
/ec/dev/server/oracle/devora11/u502/oradata/BDT

Then you can launch the utility that way:

./db_io_metrics.pl -show=inst,fs_dg -fs_delimiter=oradata
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

11:00:35                                                                               			    Kby      Avg       AvgBy/               Kby       Avg        AvgBy/
11:00:35   INST         FS_DG              			TBS              FILE             Reads/s   Read/s   ms/Read   Read      Writes/s   Write/s   ms/Write   Write
11:00:35   ----------   ----------------   			--------------   --------------   -------   ------   -------   ------    --------   -------   --------   ------
11:00:35   BDT                                                           			  129.0     1032     8.3       8192      0.0        0         0.0        0
11:00:35   BDT          /ec/dev/server/oracle/devora11/u500/                                      67.0      536      10.0      8192      0.0        0         0.0        0
11:00:35   BDT          /ec/dev/server/oracle/devora11/u501/                                      22.0      176      5.5       8192      0.0        0         0.0        0
11:00:35   BDT          /ec/dev/server/oracle/devora11/u502/                                      40.0      320      7.0       8192      0.0        0         0.0        0

 

  • If you don’t want to see the FS (do not use -show=fs_dg), then there is no need to specify the -fs_delimiter argument.
  • Reading the good article A Closer Look at CALIBRATE_IO from Luca Canali gave me the idea to create this utility.
  • If you are interested in those real-time metrics at the ASM level, you can have a look to the asm_metrics utility.

You can download it from this repository or copy the source code from this page.

Posted in Perl Scripts | 2 Comments