Simulate and Visualize the impact of the ASM preferred feature on the read IOPS and throughput

Suppose that you decided to put the ASM preferred feature in place because you observed that the read latency is too high on the farthest disk array (You can find how you can lead to this conclusion with the use case 3 into this post).

So, you want to enable the ASM preferred read feature so that:

  1. The +ASM1 instance “prefers” to read from the “WIN” failgroup.
  2. The +ASM2 instance “prefers” to read from the “JMO” failgroup.

But doing so may have an impact on the number of read IOPS and the throughput repartition per host/disk array because:

  1. The “previous” ASM1 to JMO reads will now be done on the “WIN” array.
  2. The “previous” ASM2 to WIN reads will now be done on the “JMO” array.

Of course, the total number of read operations and throughput will not change, but the repartition across the failgroup (disk array) may change with the ASM preferred read feature in place.

Question:

  • Is the architecture able to deal with this new read repartition?

To answer this question I will:

  1. Collect the ASM metrics during a certain amount of time (without the ASM preferred read in place) and produce a csv file as described here.
  2. Visualize the ASM metrics with Tableau and simulate the impact of the preferred read feature on the read IOPS and the throughput repartition.

Once the csv file is ready (means you collected a representative workload), let’s check what the current workload is (Without the ASM preferred read in place).

For the Kby Read/s measure:

We can visualize it that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce).

Screen Shot 2014-08-10 at 18.45.03

For the Reads/s measure:

Screen Shot 2014-08-11 at 11.07.01We can see the read IOPS and the throughput repartition by failgroup and ASM instances. We can see that the read IOPS and the throughput are equally distributed over the Failgroups (It is the expected behaviour without the ASM preferred read in place).

Now, what If we implement the ASM preferred feature? What would be the impact on the read IOPS and the throughput repartition?

To simulate and visualize the impact, let’s create this “New FG for Read operations” calculated field:

Screen Shot 2014-08-11 at 11.10.01

Basically it simulates the ASM preferred Read in place by assigning the failgroup per ASM instances.

Now, let’s simulate and visualize the impact of the ASM preferred read feature (should it be implemented) using the same csv file and this calculated field as dimension.

For the Kby Read/s measure:

Screen Shot 2014-08-11 at 11.12.56

Note that the throughput repartition would not be the same and that the peak are higher (> 200 Mo/s compare to about 130 Mo/s without the ASM preferred read).

For the Reads/s measure:

Screen Shot 2014-08-11 at 11.14.31

Note that the read IOPS repartition would not be the same and that the peak on the WIN failgroup is higher (about 8000 Reads/s compare to about 5000 Reads/s without the ASM preferred read).

Now you can check (with your Systems and Storage administrators) if your current architecture would be able to deal with this new repartition.

Remarks:

  • ASM is not performing any reads for the database, it records metrics for the database instances that it is servicing.

Conclusion:

We have been able to simulate and visualize the impact of the ASM preferred read feature on the read IOPS and the throughput repartition without actually implementing it.

Posted in ASM, Tableau | Leave a comment

ASM performance metrics visualization: Use cases

Now that I can graph ASM performance metrics, let’s see some use cases.

To display the ASM metrics I’ll use the csv file generated by the csv_asm_metrics utility and Tableau for the visualization. Of course you could use the visualization tool of your choice.

Use case 1: Display the TOP IO consumers

You consolidated several databases on the same machine and you want to visualize which database is generating most of the IO throughput for Reads. You can visualize this that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce with Tableau) :

Screen Shot 2014-07-12 at 14.46.40

I can see that one of my databases is generating most of the throughput.

Should you use RAC, you could split those metrics per ASM instances as well:

Screen Shot 2014-07-12 at 14.47.58

I can see that most of the activity is recorded on ASM2, which makes sense as my RAC services are configured as preferred/available (Active/Passive configuration) and started on the *_2 database instances (linked to ASM2).

Use case 2: I want to see the Read IO distributions by Failgroups

You can visualize this that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce with Tableau):

Screen Shot 2014-07-12 at 10.09.08

We can see that the IOs are equally distributed over the Failgroups. It is the expected behaviour as I am not using the ASM Preferred Read feature.

Use case 3: Should I use the ASM Preferred Read feature on my extended RAC?

Suppose the host hosting the ASM1 instance is close to the disk array on which the “WIN” failgroup has been created. The same way, the host hosting the ASM2 instance is close to the disk array on which the “JMO” failgroup has been created. Let’s see the Read IO latency between the ASM instances and the failgroups.

You can visualize this that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce with Tableau):

Screen Shot 2014-07-12 at 10.17.16

As you can see the ASM1 instance reports faster reads on the “WIN” failgroup and ASM2 reports faster reads on the “JMO” failgroup which makes sense according to our configuration. I can also check if the reads performance are good enough when the Reads are done on the farthest disk array (ASM1 “reading” on the JMO failgroup and ASM2 “reading” on the WIN failgroup) and then decide if the ASM Preferred Read feature needs to be implemented.

Use case 4: Simulate and Visualize the impact of the ASM preferred feature on the read IOPS and throughput (See this blog post).

Remarks:

  • ASM is not performing any reads for the database, it records metrics for the database instances that it is servicing.
  • You can imagine a lot of use cases thanks to the measures collected (Reads/s, Kby Read/s, ms/Read, By/Read, Writes/s, Kby Write/s, ms/Write, By/Write) and all those dimensions (Snap Time, INST, DBINST, DG, FG, DSK).

You can download the asm_metrics and the csv_asm_metrics utilities from this repository.

Posted in ASM, Tableau | Leave a comment

Graphing ASM performance metrics

ASM metrics are a goldmine, they provide a lot of informations. As you may know, the asm_metrics utility extracts them in real-time.

But sometimes it is not easy to understand the values without the help of a graph. Look at this example: If I cant’ picture it, I can’t understand it.

So depending on your needs, depending on what you are looking for with the ASM metrics: A picture may help.

So let’s graph the output of the asm_metrics utility: For this I created the csv_asm_metrics utility to produce a csv file from the output of the asm_metrics utility.

Once you get the csv file you can graph the metrics with your favourite visualization tool (I’ll use Tableau as an example).

First you have to launch the asm_metrics utility that way (To ensure that all the fields are displayed):

  • -show=inst,dbinst,fg,dg,dsk for ASM >= 11g
  • -show=inst,fg,dg,dsk for ASM < 11g

and redirect the output to a text file:

./asm_metrics.pl -show=inst,dbinst,fg,dg,dsk > asm_metrics.txt

Remark: You can use the -interval parameter to collect data with an interval greater than one second (the default interval), as it could produce a huge output file.

The output file looks like:

............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

13:48:54                                                                              Kby       Avg       AvgBy/               Kby       Avg        AvgBy/ 
13:48:54   INST     DBINST        DG            FG           DSK            Reads/s   Read/s    ms/Read   Read      Writes/s   Write/s   ms/Write   Write  
13:48:54   ------   -----------   -----------   ----------   ----------     -------   -------   -------   ------    ------     -------   --------   ------ 
13:48:54   +ASM1                                                            6731      54224     1.4       8249      42         579       3.0        14117  
13:48:54   +ASM1    BDT10_1                                                 2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA                                      2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST31                      0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST31       HOST31CA0D1C   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST31       HOST31CA0D1D   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST32                      2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST32       HOST32CA0D1C   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST32       HOST32CA0D1D   2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA                                       0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST31                      0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST31       HOST31CC8D0F   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST32                      0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST32       HOST32CC8D0F   0         0         0.0       0         0          0         0.0        0      

and so on...

Now let’s produce the csv file with the csv_asm_metrics utility. Let’s see the help:

./csv_asm_metrics.pl -help

Usage: ./csv_asm_metrics.pl [-if] [-of] [-d] [-help]

  Parameter         Comment
  ---------         -------
  -if=              Input file name (output of asm_metrics)
  -of=              Output file name (the csv file)
  -d=               Day of the first snapshot (YYYY/MM/DD)

Example: ./csv_asm_metrics.pl -if=asm_metrics.txt -of=asm_metrics.csv -d='2014/07/04'

and generate the csv file that way:

./csv_asm_metrics.pl -if=asm_metrics.txt -of=asm_metrics.csv -d='2014/07/04'

The csv file looks like:

Snap Time,INST,DBINST,DG,FG,DSK,Reads/s,Kby Read/s,ms/Read,By/Read,Writes/s,Kby Write/s,ms/Write,By/Write
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST31,HOST31CA0D1C,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST31,HOST31CA0D1D,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST32,HOST32CA0D1C,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST32,HOST32CA0D1D,2,32,0.2,16384,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,FRA,HOST31,HOST31CC8D0F,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,FRA,HOST32,HOST32CC8D0F,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO1,HOST31,HOST31CC0D13,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO1,HOST32,HOST32CC0D13,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO2,HOST31,HOST31CC0D12,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO2,HOST32,HOST32CC0D12,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT11_1,DATA,HOST31,HOST31CA0D1C,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT11_1,DATA,HOST31,HOST31CA0D1D,0,0,0.0,0,2,16,0.5,8448

As you can see:

  1. The day has been added (to create a date) and next ones will be calculated (should the snaps cross multiple days).
  2. Only the rows that contain all the fields have been recorded into the csv file (The script does not record the other ones as they represent aggregated values).

Now I can import this csv file into Tableau.

You can imagine a lot of graphs thanks to the measures collected (Reads/s, Kby Read/s, ms/Read, By/Read, Writes/s, Kby Write/s, ms/Write, By/Write) and all those dimensions (Snap Time, INST, DBINST, DG, FG, DSK).

Let’s graph the throughput and latency per failgroup for example.

Important remark regarding some averages computation/display:

The ms/Read and By/Read measures depend on the number of reads. So the averages have to be calculated using Weighted Averages. (The same apply for ms/Write and By/Write).

Let’s create the calculated field in Tableau for those Weighted Averages:

Screen Shot 2014-07-07 at 20.13.28

so that weighted Average ms/Read is:

Screen Shot 2014-07-07 at 20.16.19

Weighted Average By/Read:

Screen Shot 2014-07-07 at 20.21.12

The same way you have to create:

  • Weighted Average ms/Write = sum([ms/Write]*[Writes/s])/sum([Writes/s])
  • Weighted Average By/Write = sum([By/Write]*[Writes/s])/sum([Writes/s])

Now let’s display the average read latency by Failgroup (using the previous calculated weighted average):

Drag the Snap Time dimension to the “columns” shelf and choose “exact date”:

Screen Shot 2014-07-07 at 20.27.23Drag the Weighted Average ms/Read calculated field to the “Rows” shelf:

Screen Shot 2014-07-07 at 20.29.41Drag the FG dimension to the “Color Marks” shelf:

Screen Shot 2014-07-07 at 20.32.33So that the graph looks like:

Screen Shot 2014-07-07 at 20.33.15Create the same graph for the “Kby Read/s” measure (except that I want to see the sum (i.e the throughput and not the average) and put those 2 graphs into the same dashboard:

Screen Shot 2014-07-07 at 20.39.42

Here we are.

Conclusion:

  • We can create a csv file from the output of the asm_metrics utility thanks to csv_asm_metrics.
  • To do so, we have to collect all the fields of asm_metrics with those options:
      • -show=inst,dbinst,fg,dg,dsk for ASM >= 11g
      • -show=inst,fg,dg,dsk for ASM < 11g
  • Once you uploaded the csv file into your favourite visualization tool, don’t forget to calculate weighted averages for ms/Read, By/Read, ms/Write and By/Write if you plan to graph the averages.
  • You can imagine a lot of graphs thanks to the measures collected (Reads/s, Kby Read/s, ms/Read, By/Read, Writes/s, Kby Write/s, ms/Write, By/Write) and all those dimensions (Snap Time, INST, DBINST, DG, FG, DSK).

You can download the csv_asm_metrics utility from this repository or copy the source code from this page.

UPDATE: You can see some use cases here.

Posted in ASM, Tableau | Leave a comment

Visualize the IO source thanks to Tableau and AWR

As you know, the wait event “db file sequential read” records “single block” IO performed outside the database buffer cache. But does the IO come from:

  • Filesystem cache (If any and used)
  • Disk Array cache
  • SSD
  • Spindle Disks
  • …..

It could be interesting to visualize the distribution of the IO source:

  • Should you migrate from a cached filesystem to ASM (You may need to increase the database cache to put the previous Filesystem cached IOs into the database cache).
  • Should you use Dynamic Tiering and want to figure out where the IOs come from (SSD, Spindle Disks..).

To do so, I’ll use the AWR data coming from the dba_hist_event_histogram view and Tableau. I’ll also extract the data coming from dba_hist_snapshot (to get the begin_interval_date time).

alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS';
alter session set nls_timestamp_format='YYYY/MM/DD HH24:MI:SS';

select * from dba_hist_event_histogram where
snap_id >= (select min(snap_id) from dba_hist_snapshot
where begin_interval_time >= to_date ('2014/06/01 00:00','YYYY/MM/DD HH24:MI'))
and event_name='db file sequential read';

select * from dba_hist_snapshot where begin_interval_time >= to_date ('2014/06/01 00:00','YYYY/MM/DD HH24:MI');

As you can see, there is no computation. This is just a simple extraction of the data.

Then I put those data into 2 csv files (awr_snap_for_june.csv and awr_event_histogram.csv).

1) Now, launch Tableau and select the csv files and add an inner join between those files:

Screen Shot 2014-06-28 at 13.45.19

2) Go to the worksheet and put the “begin interval time” dimension into the “column” and change it to an “exact date” (Instead of Year):

Screen Shot 2014-06-28 at 13.48.13

3) Put the “Wait count” measure into the “Rows” and create a table calculation on it:

Screen Shot 2014-06-28 at 13.55.33

Choose “difference” as the “WAIT_COUNT” field is cumulative and we want to see the delta between the AWR’s snapshots.

4) My graph now looks like:

Screen Shot 2014-06-28 at 14.02.38

The Jun 14 and Jun 20 the database has been re-started and then the difference is < 0.

5) Let’s modify the formula to take care of database restart into the delta computation:

Screen Shot 2014-06-28 at 14.04.31

Customize

Screen Shot 2014-06-28 at 14.05.19

Name: “Delta Wait Count” and change ZN(SUM([Wait Count])) – LOOKUP(ZN(SUM([Wait Count])), -1) to max(ZN(SUM([Wait Count])) – LOOKUP(ZN(SUM([Wait Count])), -1),0):

Screen Shot 2014-06-28 at 14.07.17

So that now the graph looks like:

Screen Shot 2014-06-28 at 14.09.11

6) Now we have to “split” those wait count into 2 categories based on the wait_time_milli measure coming from dba_hist_event_histogram. Let’s say that:

  • “db file sequential read” <= 4 ms are not coming from spindle disks (So from caching, SSD..).
  • “db file sequential read” > 4 ms are coming from spindle disks.

Let’s implement this in tableau with a calculated field:

Screen Shot 2014-06-28 at 14.18.36

Name: “IO Source” and use this formula:

Screen Shot 2014-06-28 at 14.21.23

Feel free to modify this formula according to your environment.

Now take the “IO Source” Dimension and put it into the Color marks:

Screen Shot 2014-06-28 at 14.23.37

So that we can now visualize the IO source repartition:

Screen Shot 2014-06-28 at 14.25.08

 

Remarks:

  • Karl Arao presented another example usage of Tableau into this blog post.
  • Should you need to retrieve “db file sequential read” buckets < 1 ms, then you can use oracle_trace_parsing from Kyle Hailey.

Update 1: Example of oracle_trace_parsing usage into “Oracle “Physical I/O” ? not always physical” blog post.

 

Posted in Tableau | Leave a comment

db_io*metrics family

Into the last three posts I introduced three utilities to report database physical IO metrics in real time.

Screen Shot 2014-05-09 at 08.40.54I just want to introduce again those three utilities into a single post as:

  • You may have missed one of them.
  • They are all members of the same family.

The three utilities are:

  1. db_io_metrics for reads and writes metrics related to data files and temp files.
  2. db_io_type_metrics for reads, writes, small, large and synchronous metrics related to data files, temp files,control files, log files, archive logs, and so on.
  3. db_io_function_metrics for reads, writes, small and large metrics related to database functions (LGWR, DBWR, Smart Scan and so on).

If a new member join this family, I will let you know.

Remarks:

  • One member of the family looks like SLOB‘s logo (see below its “official” logo) designed by flashdba‘s childs, don’t you think? ;-)

Screen Shot 2014-05-09 at 11.54.47

  •  If you are interested in IO metrics related to ASM, you can have a look to the asm_metrics utility.
Posted in Perl Scripts | Leave a comment

Report database physical IO activity per database function (LGWR, DBWR, Smart Scan…) and IO type in real time

Welcome to db_io_function_metrics, an utility used to display database physical IO per database functions (DBWR, LGWR, Direct Reads, Smart Scan and so on) real-time metrics. It basically takes a snapshot each second (default interval) from the gv$iostat_function cumulative view and computes the delta with the previous snapshot. The utility is RAC aware (No need to be multitenant aware as con_id=0 into gv$iostat_function for 12.1 in any cases even if pdbs are created).

This utility:

  • provides useful metrics.
  • is RAC aware.
  • is fully customizable: you can aggregate the results depending on your needs.
  • does not install anything into the database.

It displays the following metrics per IO Type (small, large, reads and writes) and database functions:

  • MB/s: Megabytes per second.
  • RQ/s: Requests per second.
  • Avg MB/RQ: Average Megabytes per request.
  • Avg ms/RQ: Average ms per request.

At the following levels:

  • Database Instance.
  • Function

Let’s see the help:

./db_io_function_metrics.pl -help

Usage: ./db_io_function_metrics.pl [-interval] [-count] [-inst] [-function] [-io_type] [-show] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                                     Default
  ---------         -------                                                                     -------
  -INST=            ALL - Show all Instance(s)                                                  ALL
                    CURRENT - Show Current Instance
  -FUNCTION=        IO Function to collect (wildcard allowed)                                   ALL
  -IO_TYPE=         IO Type to collect: reads,writes,small,large                                READS
  -SHOW=            What to show: inst,function (comma separated list)                          INST
  -DISPLAY=         What to display: snap,avg (comma separated list)                            SNAP
  -SORT_FIELD=      small_reads,small_writes,large_reads,large_writes                           NONE

Example: ./db_io_function_metrics.pl
Example: ./db_io_function_metrics.pl  -inst=CBDT1
Example: ./db_io_function_metrics.pl  -show=inst,function
Example: ./db_io_function_metrics.pl  -show=inst,function -function=%Direct%
Example: ./db_io_function_metrics.pl  -show=inst,function -function=%Direct% -io_type=small -sort_field=small_reads

The main options/features are:

  1. You can choose the number of snapshots to display and the time to wait between snapshots.
  2. You can choose on which database instance to collect the metrics thanks to the -INST= parameter.
  3. You can choose on which Database Function to collect the metric thanks to the -FUNCTION=parameter (wilcard allowed).
  4. You can choose on which IO Type to collect the metrics thanks to the -IO_TYPE= parameter.
  5. You can aggregate the results on the database instances and database functions level thanks to the -SHOW= parameter.
  6. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the -DISPLAY= parameter.
  7. You can sort based on the number of small_reads, number of small_writes, number of large_reads or number of large_writes thanks to the -SORT_FIELD= parameter.

Examples:

Report the IO metrics for “small” IO per database instances and functions during a SLOB run:

./db_io_function_metrics.pl -show=inst,function -io_type=small
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

09:49:55                                       SMALL R   SMALL R   SMALL W   SMALL W   Avg MB/   Avg MB/   R+W       Avg ms/
09:49:55   INST         FUNCTION               MB/s      RQ/s      MB/s      RQ/s      SMALL R   SMALL W   IO/s      R+W IO
09:49:55   ----------   --------------------   -------   -------   -------   -------   -------   -------   -------   -------
09:49:55   BDT_1                               147       18817     0         2         0.008     0.000     18847     0.04
09:49:55   BDT_1        ARCH                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Archive Manager        0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Buffer Cache Reads     147       18811     0         0         0.008     0.000     18843     0.04
09:49:55   BDT_1        DBWR                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Data Pump              0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Direct Reads           0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Direct Writes          0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        LGWR                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Others                 0         6         0         2         0.000     0.000     4         0.25
09:49:55   BDT_1        RMAN                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Recovery               0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Smart Scan             0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Streams AQ             0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        XDB                    0         0         0         0         0.000     0.000     0         0.00

 

Report the IO metrics for “reads” IO per database instances and functions for the RMAN function during RMAN backup:

./db_io_function_metrics.pl -show=inst,function -io_type=reads -function=RMAN
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

10:04:58                                       SMALL R   SMALL R   LARGE R   LARGE R   Avg MB/   Avg MB/   R+W       Avg ms/
10:04:58   INST         FUNCTION               MB/s      RQ/s      MB/s      RQ/s      SMALL R   LARGE R   IO/s      R+W IO
10:04:58   ----------   --------------------   -------   -------   -------   -------   -------   -------   -------   ---------
10:04:58   BDT_1                               3         159       248       62        0.019     4.0       176       1.10
10:04:58   BDT_1        RMAN                   3         159       248       62        0.019     4.0       176       1.10

You can also report the metrics for Smart Scan operations, Data Pump operations and so on…

You can download it from this repository or copy the source code from this page.

Conclusion:

Four utilities are now available to report IO metrics in real time depending on our needs:

  1. asm_metrics for reads and writes metrics related to ASM.
  2. db_io_metrics for reads and writes metrics related to data files and temp files.
  3. db_io_type_metrics for reads, writes, small, large and synchronous metrics related to data files, temp files,control files, log files, archive logs, and so on.
  4. db_io_function_metrics for reads, writes, small and large metrics related to database functions (LGWR, DBWR, Smart Scan and so on).
Posted in Perl Scripts | Leave a comment

Report database physical IO activity per IO type and oracle files type in real time

Introduction

After I published the blog post related to the db_io_metrics utiliy (based on gv$filestat and gv$tempstat), Luca Canali suggested me to create a version based on the gv$iostat_file cumulative view (available since 11.1).

The advantages of this view are:

  • it displays information about disk I/O statistics on a lot of file types (including data files, temp files,control files, log files, archive logs, and so on).
  • it takes care of the IO type (small,large,reads, writes and synchronous).

Then based on this, I created a new utility: db_io_type_metrics

db_io_type_metrics description

The db_io_type_metrics.pl utility is used to display database physical IO type (small, large, synchronous, reads and writes) real-time metrics. It basically takes a snapshot each second (default interval) from the gv$iostat_file cumulative view and computes the delta with the previous snapshot. The utility is RAC and Multitenant aware.

This utility:

  • provides useful metrics.
  • is RAC aware.
  • detects if it is connected to a multitenant database and then is able to display the containers metrics.
  • is fully customizable: you can aggregate the results depending on your needs.
  • does not install anything into the database.

It displays the following metrics per IO Type (small, large, synchronous, reads and writes):

  • MB/s: Megabytes per second.
  • RQ/s: Requests per second.
  • Avg MB/RQ: Average Megabytes per request.
  • Avg ms/RQ: Average ms per request.

At the following levels:

  • Database Instance.
  • Database container.
  • File Type (log file, control file, data file, temp file…) or tablespace.

Let’s see the help:

./db_io_type_metrics.pl -help

Usage: ./db_io_type_metrics.pl [-interval] [-count] [-inst] [-cont] [-file_type_tbs] [-io_type] [-file_type] [-tbs] [-show] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                                     Default
  ---------         -------                                                                     -------
  -INST=            ALL - Show all Instance(s)                                                  ALL
                    CURRENT - Show Current Instance
  -CONT=            Container to collect (wildcard allowed)                                     ALL
  -FILE_TYPE_TBS=   Collect on File Type or on Tablespace: file_type,tbs                        FILE_TYPE
  -IO_TYPE=         IO Type to collect: reads,writes,small,large,synch                          READS
  -FILE_TYPE=       File Type to collect (in case FILE_TYPE_TBS=file_type) (wildcard allowed)   NONE
  -TBS=             Tablespace to collect (in case FILE_TYPE_TBS=tbs) (wildcard allowed)        NONE
  -SHOW=            What to show: inst,cont,file_type_tbs (comma separated list)                INST
  -DISPLAY=         What to display: snap,avg (comma separated list)                            SNAP
  -SORT_FIELD=      small_reads,small_writes,large_reads,large_writes                           NONE

Example: ./db_io_type_metrics.pl
Example: ./db_io_type_metrics.pl  -inst=CBDT1
Example: ./db_io_type_metrics.pl  -show=inst,file_type_tbs
Example: ./db_io_type_metrics.pl  -show=inst,file_type_tbs -file_type=%Data%
Example: ./db_io_type_metrics.pl  -show=inst -io_type=large
Example: ./db_io_type_metrics.pl  -show=inst -io_type=small -sort_field=small_reads
Example: ./db_io_type_metrics.pl  -show=inst,file_type_tbs -file_type_tbs=tbs -tbs=%USE%
Example: ./db_io_type_metrics.pl  -show=inst,cont
Example: ./db_io_type_metrics.pl  -show=inst,cont -cont=%P%
Example: ./db_io_type_metrics.pl  -show=inst,cont,file_type_tbs -io_type=small -sort_field=small_reads

The main options/features are:

  1. You can choose the number of snapshots to display and the time to wait between snapshots.
  2. You can choose on which database instance to collect the metrics thanks to the -INST= parameter.
  3. You can choose on which database container to collect the metrics thanks to the -CONT= parameter.
  4. You can choose to collect on File Type or tablespace thanks to the -FILE_TYPE_TBS=parameter.
  5. You can choose on which IO Type to collect the metrics thanks to the -IO_TYPE= parameter.
  6. You can choose on which File Type to collect the metric thanks to the -FILE_TYPE=parameter (wilcard allowed).
  7. You can choose on which Tablespace to collect the metric thanks to the -TBS=parameter (wilcard allowed).
  8. You can aggregate the results on the database instances, containers, file type or tablespace level thanks to the -SHOW= parameter.
  9. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the -DISPLAY= parameter.
  10. You can sort based on the number of small_reads, number of small_writes, number of large_reads or number of large_writes thanks to the -SORT_FIELD= parameter.

Let’s see some use cases:

Report the IO type metrics for “reads” IO per database instances, tablespaces for the SLOB tablespace:

 ./db_io_type_metrics.pl  -show=inst,file_type_tbs -file_type_tbs=tbs -tbs=%SLOB% -io_type=reads
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

10:52:54                                               SMALL R   SMALL R   LARGE R   LARGE R   Avg MB/   Avg MB/   Avg ms/   Avg ms/
10:52:54   INST         FILE_TYPE_TBS                  MB/s      RQ/s      MB/s      RQ/s      SMALL R   LARGE R   SMALL R   LARGE R
10:52:54   ----------   ----------------------------   -------   -------   -------   -------   -------   -------   -------   -------
10:52:54   BDT_1                                       144       18390     0         0         0.008     0.0       0.08      0.00
10:52:54   BDT_1        SLOB                           144       18390     0         0         0.008     0.0       0.08      0.00

 

Report the IO type metrics for “small” IO” and all file type:

./db_io_type_metrics.pl  -show=file_type_tbs -file_type_tbs=file_type -io_type=small
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

10:57:01                                               SMALL R   SMALL R   SMALL W   SMALL W   Avg MB/   Avg MB/   Avg ms/   Avg ms/
10:57:01   INST         FILE_TYPE_TBS                  MB/s      RQ/s      MB/s      RQ/s      SMALL R   SMALL W   SMALL R   SMALL W
10:57:01   ----------   ----------------------------   -------   -------   -------   -------   -------   -------   -------   -------
10:57:01                Archive Log                    0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Archive Log Backup             0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Control File                   1         66        0         0         0.015     0.000     0.00      0.00
10:57:01                Data File                      125       16230     0         0         0.008     0.000     0.15      0.00
10:57:01                Data File Backup               0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Data File Copy                 0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Data File Incremental Backup   0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Data Pump Dump File            0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Flashback Log                  0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Log File                       0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Other                          0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Temp File                      0         0         0         0         0.000     0.000     0.00      0.00

 

Report the IO type metrics for “small” IO per database instances and containers:

./db_io_type_metrics.pl -show=inst,cont -io_type=small
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

13:07:46                                                                 SMALL R   SMALL R   SMALL W   SMALL W   Avg MB/   Avg MB/   Avg ms/   Avg ms/
13:07:46   INST         CONT              FILE_TYPE_TBS                  MB/s      RQ/s      MB/s      RQ/s      SMALL R   SMALL W   SMALL R   SMALL W
13:07:46   ----------   ---------------   ----------------------------   -------   -------   -------   -------   -------   -------   -------   -------
13:07:46   CBDT1                                                         2         97        0         1         0.021     0.000     0.03      1.00
13:07:46   CBDT1        CDB$ROOT                                         2         97        0         1         0.021     0.000     0.03      1.00
13:07:46   CBDT1        PDB$SEED                                         0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT1        P_1                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT1        P_2                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT1        P_3                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2                                                         1         96        0         1         0.010     0.000     0.09      2.00
13:07:46   CBDT2        CDB$ROOT                                         1         96        0         1         0.010     0.000     0.09      2.00
13:07:46   CBDT2        PDB$SEED                                         0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2        P_1                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2        P_2                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2        P_3                                              0         0         0         0         0.000     0.000     0.00      0.00

 

Report the IO type metrics per… I let you finish the sentence as the utility is customizable enough ;-)

You can download it from this repository or copy the source code from this page.

Conclusion:

Three utilities are now available to report IO metrics in real time depending on our needs:

  1. asm_metrics for reads and writes metrics related to ASM.
  2. db_io_metrics for reads and writes metrics related to data files and temp files.
  3. db_io_type_metrics for reads, writes, small, large and synchronous metrics related to data files, temp files,control files, log files, archive logs, and so on.

Enjoy and don’t hesitate to come back to me in case of questions, issues or suggestions.

Posted in Perl Scripts | 2 Comments