Monitor the dNFS activity as of 11.2.0.4

As you may know, there is some views available since 11g to monitor the dnfs activity:

  • v$dnfs_servers: Shows a table of servers accessed using Direct NFS.
  • v$dnfs_files: Shows a table of files now open with Direct NFS.
  • v$dnfs_channels: Shows a table of open network paths (or channels) to servers for which Direct NFS is providing files.
  • v$dnfs_stats: Shows a table of performance statistics for Direct NFS.

One interesting thing is that the v$dnfs_stats view provides two new columns as of 11.2.0.4:

  • NFS_READBYTES: Number of bytes read from NFS server
  • NFS_WRITEBYTES: Number of bytes written to NFS server

See the oracle documentation here.

Then, by sampling the view we can provide those metrics:

  • Reads/s: Number of read per second.
  • KbyRead/s: Kbytes read per second (as of 11.2.0.4).
  • AvgBy/Read: Average bytes per read (as of 11.2.0.4).
  • Writes/s: Number of Write per second.
  • KbyWrite/s: Kbytes write per second (as of 11.2.0.4).
  • AvgBy/Write: Average bytes per write (as of 11.2.0.4).

To do so, I just created a very simple db_io_dnfs_metrics.pl utility. It basically takes a snapshot each second (default interval) from the gv$dnfs_stats view and computes the delta with the previous snapshot. The utility is RAC aware.

Let’s see the help:

$>./db_io_dnfs_metrics.pl -help

Usage: ./db_io_dnfs_metrics.pl [-interval] [-count] [-inst] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                           Default
  ---------         -------                                                           -------
  -INST=            ALL - Show all Instance(s)                                        ALL
                    CURRENT - Show Current Instance
  -DISPLAY=         What to display: snap,avg (comma separated list)                  SNAP
  -SORT_FIELD=      reads|writes|iops                                                 NONE

Example: ./db_io_dnfs_metrics.pl
Example: ./db_io_dnfs_metrics.pl  -inst=BDT_1
Example: ./db_io_dnfs_metrics.pl  -sort_field=reads

This is a very simple utility, the options/features are:

  1. You can choose the number of snapshots to display and the time to wait between the snapshots.
  2. In case of RAC, you can choose on which database instance to collect the metrics thanks to the –INST= parameter.
  3. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the –DISPLAY= parameter.
  4. In case of RAC, you can sort the instances based on the number of reads, number of writes, number of IOPS (reads+writes) thanks to the –SORT_FIELD= parameter.

Examples:

Collecting on a single Instance:

$>./db_io_dnfs_metrics.pl
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

14:26:18                          Kby       AvgBy/               Kby       AvgBy/
14:26:18   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
14:26:18   ----------   -------   -------   -------   --------   -------   --------   --------   --------
14:26:18   VSBDT        321       2568      8192      0          0         0          321        2.5
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

14:26:19                          Kby       AvgBy/               Kby       AvgBy/
14:26:19   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
14:26:19   ----------   -------   -------   -------   --------   -------   --------   --------   --------
14:26:19   VSBDT        320       2560      8192      1          16        16384      321        2.5

Collecting on a RAC database, sorting the Instances by the number of read:

$>./db_io_dnfs_metrics.pl -sort_field=reads
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:21                          Kby       AvgBy/               Kby       AvgBy/
17:21:21   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:21   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:21   VBDTO_1      175       44536     260599    0          0         0          175        43.5
17:21:21   VBDTO_2      69        2272      33718     0          0         0          69         2.2
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:22                          Kby       AvgBy/               Kby       AvgBy/
17:21:22   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:22   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:22   VBDTO_2      151       36976     250751    0          0         0          151        36.1
17:21:22   VBDTO_1      131       33408     261143    0          0         0          131        32.6
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

17:21:23                          Kby       AvgBy/               Kby       AvgBy/
17:21:23   INST         Reads/s   Read/s    Read      Writes/s   Write/s   Write      IOPS       MB/s
17:21:23   ----------   -------   -------   -------   --------   -------   --------   --------   --------
17:21:23   VBDTO_2      133       33592     258633    0          0         0          133        32.8
17:21:23   VBDTO_1      121       31360     265394    0          0         0          121        30.6

Remarks:

Advertisements

db_io*metrics family

Into the last three posts I introduced three utilities to report database physical IO metrics in real time.

Screen Shot 2014-05-09 at 08.40.54I just want to introduce again those three utilities into a single post as:

  • You may have missed one of them.
  • They are all members of the same family.

The three utilities are:

  1. db_io_metrics for reads and writes metrics related to data files and temp files.
  2. db_io_type_metrics for reads, writes, small, large and synchronous metrics related to data files, temp files,control files, log files, archive logs, and so on.
  3. db_io_function_metrics for reads, writes, small and large metrics related to database functions (LGWR, DBWR, Smart Scan and so on).

If a new member join this family, I will let you know.

Remarks:

  • One member of the family looks like SLOB‘s logo (see below its “official” logo) designed by flashdba‘s childs, don’t you think? 😉

Screen Shot 2014-05-09 at 11.54.47

  •  If you are interested in IO metrics related to ASM, you can have a look to the asm_metrics utility.

Report database physical IO activity per database function (LGWR, DBWR, Smart Scan…) and IO type in real time

Welcome to db_io_function_metrics, an utility used to display database physical IO per database functions (DBWR, LGWR, Direct Reads, Smart Scan and so on) real-time metrics. It basically takes a snapshot each second (default interval) from the gv$iostat_function cumulative view and computes the delta with the previous snapshot. The utility is RAC aware (No need to be multitenant aware as con_id=0 into gv$iostat_function for 12.1 in any cases even if pdbs are created).

This utility:

  • provides useful metrics.
  • is RAC aware.
  • is fully customizable: you can aggregate the results depending on your needs.
  • does not install anything into the database.

It displays the following metrics per IO Type (small, large, reads and writes) and database functions:

  • MB/s: Megabytes per second.
  • RQ/s: Requests per second.
  • Avg MB/RQ: Average Megabytes per request.
  • Avg ms/RQ: Average ms per request.

At the following levels:

  • Database Instance.
  • Function

Let’s see the help:

./db_io_function_metrics.pl -help

Usage: ./db_io_function_metrics.pl [-interval] [-count] [-inst] [-function] [-io_type] [-show] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                                     Default
  ---------         -------                                                                     -------
  -INST=            ALL - Show all Instance(s)                                                  ALL
                    CURRENT - Show Current Instance
  -FUNCTION=        IO Function to collect (wildcard allowed)                                   ALL
  -IO_TYPE=         IO Type to collect: reads,writes,small,large                                READS
  -SHOW=            What to show: inst,function (comma separated list)                          INST
  -DISPLAY=         What to display: snap,avg (comma separated list)                            SNAP
  -SORT_FIELD=      small_reads,small_writes,large_reads,large_writes                           NONE

Example: ./db_io_function_metrics.pl
Example: ./db_io_function_metrics.pl  -inst=CBDT1
Example: ./db_io_function_metrics.pl  -show=inst,function
Example: ./db_io_function_metrics.pl  -show=inst,function -function=%Direct%
Example: ./db_io_function_metrics.pl  -show=inst,function -function=%Direct% -io_type=small -sort_field=small_reads

The main options/features are:

  1. You can choose the number of snapshots to display and the time to wait between snapshots.
  2. You can choose on which database instance to collect the metrics thanks to the –INST= parameter.
  3. You can choose on which Database Function to collect the metric thanks to the -FUNCTION=parameter (wilcard allowed).
  4. You can choose on which IO Type to collect the metrics thanks to the -IO_TYPE= parameter.
  5. You can aggregate the results on the database instances and database functions level thanks to the –SHOW= parameter.
  6. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the –DISPLAY= parameter.
  7. You can sort based on the number of small_reads, number of small_writes, number of large_reads or number of large_writes thanks to the –SORT_FIELD= parameter.

Examples:

Report the IO metrics for “small” IO per database instances and functions during a SLOB run:

./db_io_function_metrics.pl -show=inst,function -io_type=small
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

09:49:55                                       SMALL R   SMALL R   SMALL W   SMALL W   Avg MB/   Avg MB/   R+W       Avg ms/
09:49:55   INST         FUNCTION               MB/s      RQ/s      MB/s      RQ/s      SMALL R   SMALL W   IO/s      R+W IO
09:49:55   ----------   --------------------   -------   -------   -------   -------   -------   -------   -------   -------
09:49:55   BDT_1                               147       18817     0         2         0.008     0.000     18847     0.04
09:49:55   BDT_1        ARCH                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Archive Manager        0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Buffer Cache Reads     147       18811     0         0         0.008     0.000     18843     0.04
09:49:55   BDT_1        DBWR                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Data Pump              0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Direct Reads           0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Direct Writes          0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        LGWR                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Others                 0         6         0         2         0.000     0.000     4         0.25
09:49:55   BDT_1        RMAN                   0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Recovery               0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Smart Scan             0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        Streams AQ             0         0         0         0         0.000     0.000     0         0.00
09:49:55   BDT_1        XDB                    0         0         0         0         0.000     0.000     0         0.00

 

Report the IO metrics for “reads” IO per database instances and functions for the RMAN function during RMAN backup:

./db_io_function_metrics.pl -show=inst,function -io_type=reads -function=RMAN
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

10:04:58                                       SMALL R   SMALL R   LARGE R   LARGE R   Avg MB/   Avg MB/   R+W       Avg ms/
10:04:58   INST         FUNCTION               MB/s      RQ/s      MB/s      RQ/s      SMALL R   LARGE R   IO/s      R+W IO
10:04:58   ----------   --------------------   -------   -------   -------   -------   -------   -------   -------   ---------
10:04:58   BDT_1                               3         159       248       62        0.019     4.0       176       1.10
10:04:58   BDT_1        RMAN                   3         159       248       62        0.019     4.0       176       1.10

You can also report the metrics for Smart Scan operations, Data Pump operations and so on…

You can download it from this repository or copy the source code from this page.

Conclusion:

Four utilities are now available to report IO metrics in real time depending on our needs:

  1. asm_metrics for reads and writes metrics related to ASM.
  2. db_io_metrics for reads and writes metrics related to data files and temp files.
  3. db_io_type_metrics for reads, writes, small, large and synchronous metrics related to data files, temp files,control files, log files, archive logs, and so on.
  4. db_io_function_metrics for reads, writes, small and large metrics related to database functions (LGWR, DBWR, Smart Scan and so on).

Report database physical IO activity per IO type and oracle files type in real time

Introduction

After I published the blog post related to the db_io_metrics utiliy (based on gv$filestat and gv$tempstat), Luca Canali suggested me to create a version based on the gv$iostat_file cumulative view (available since 11.1).

The advantages of this view are:

  • it displays information about disk I/O statistics on a lot of file types (including data files, temp files,control files, log files, archive logs, and so on).
  • it takes care of the IO type (small,large,reads, writes and synchronous).

Then based on this, I created a new utility: db_io_type_metrics

db_io_type_metrics description

The db_io_type_metrics.pl utility is used to display database physical IO type (small, large, synchronous, reads and writes) real-time metrics. It basically takes a snapshot each second (default interval) from the gv$iostat_file cumulative view and computes the delta with the previous snapshot. The utility is RAC and Multitenant aware.

This utility:

  • provides useful metrics.
  • is RAC aware.
  • detects if it is connected to a multitenant database and then is able to display the containers metrics.
  • is fully customizable: you can aggregate the results depending on your needs.
  • does not install anything into the database.

It displays the following metrics per IO Type (small, large, synchronous, reads and writes):

  • MB/s: Megabytes per second.
  • RQ/s: Requests per second.
  • Avg MB/RQ: Average Megabytes per request.
  • Avg ms/RQ: Average ms per request.

At the following levels:

  • Database Instance.
  • Database container.
  • File Type (log file, control file, data file, temp file…) or tablespace.

Let’s see the help:

./db_io_type_metrics.pl -help

Usage: ./db_io_type_metrics.pl [-interval] [-count] [-inst] [-cont] [-file_type_tbs] [-io_type] [-file_type] [-tbs] [-show] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                                     Default
  ---------         -------                                                                     -------
  -INST=            ALL - Show all Instance(s)                                                  ALL
                    CURRENT - Show Current Instance
  -CONT=            Container to collect (wildcard allowed)                                     ALL
  -FILE_TYPE_TBS=   Collect on File Type or on Tablespace: file_type,tbs                        FILE_TYPE
  -IO_TYPE=         IO Type to collect: reads,writes,small,large,synch                          READS
  -FILE_TYPE=       File Type to collect (in case FILE_TYPE_TBS=file_type) (wildcard allowed)   NONE
  -TBS=             Tablespace to collect (in case FILE_TYPE_TBS=tbs) (wildcard allowed)        NONE
  -SHOW=            What to show: inst,cont,file_type_tbs (comma separated list)                INST
  -DISPLAY=         What to display: snap,avg (comma separated list)                            SNAP
  -SORT_FIELD=      small_reads,small_writes,large_reads,large_writes                           NONE

Example: ./db_io_type_metrics.pl
Example: ./db_io_type_metrics.pl  -inst=CBDT1
Example: ./db_io_type_metrics.pl  -show=inst,file_type_tbs
Example: ./db_io_type_metrics.pl  -show=inst,file_type_tbs -file_type=%Data%
Example: ./db_io_type_metrics.pl  -show=inst -io_type=large
Example: ./db_io_type_metrics.pl  -show=inst -io_type=small -sort_field=small_reads
Example: ./db_io_type_metrics.pl  -show=inst,file_type_tbs -file_type_tbs=tbs -tbs=%USE%
Example: ./db_io_type_metrics.pl  -show=inst,cont
Example: ./db_io_type_metrics.pl  -show=inst,cont -cont=%P%
Example: ./db_io_type_metrics.pl  -show=inst,cont,file_type_tbs -io_type=small -sort_field=small_reads

The main options/features are:

  1. You can choose the number of snapshots to display and the time to wait between snapshots.
  2. You can choose on which database instance to collect the metrics thanks to the –INST= parameter.
  3. You can choose on which database container to collect the metrics thanks to the -CONT= parameter.
  4. You can choose to collect on File Type or tablespace thanks to the -FILE_TYPE_TBS=parameter.
  5. You can choose on which IO Type to collect the metrics thanks to the -IO_TYPE= parameter.
  6. You can choose on which File Type to collect the metric thanks to the -FILE_TYPE=parameter (wilcard allowed).
  7. You can choose on which Tablespace to collect the metric thanks to the -TBS=parameter (wilcard allowed).
  8. You can aggregate the results on the database instances, containers, file type or tablespace level thanks to the –SHOW= parameter.
  9. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the –DISPLAY= parameter.
  10. You can sort based on the number of small_reads, number of small_writes, number of large_reads or number of large_writes thanks to the –SORT_FIELD= parameter.

Let’s see some use cases:

Report the IO type metrics for “reads” IO per database instances, tablespaces for the SLOB tablespace:

 ./db_io_type_metrics.pl  -show=inst,file_type_tbs -file_type_tbs=tbs -tbs=%SLOB% -io_type=reads
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

10:52:54                                               SMALL R   SMALL R   LARGE R   LARGE R   Avg MB/   Avg MB/   Avg ms/   Avg ms/
10:52:54   INST         FILE_TYPE_TBS                  MB/s      RQ/s      MB/s      RQ/s      SMALL R   LARGE R   SMALL R   LARGE R
10:52:54   ----------   ----------------------------   -------   -------   -------   -------   -------   -------   -------   -------
10:52:54   BDT_1                                       144       18390     0         0         0.008     0.0       0.08      0.00
10:52:54   BDT_1        SLOB                           144       18390     0         0         0.008     0.0       0.08      0.00

 

Report the IO type metrics for “small” IO” and all file type:

./db_io_type_metrics.pl  -show=file_type_tbs -file_type_tbs=file_type -io_type=small
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

10:57:01                                               SMALL R   SMALL R   SMALL W   SMALL W   Avg MB/   Avg MB/   Avg ms/   Avg ms/
10:57:01   INST         FILE_TYPE_TBS                  MB/s      RQ/s      MB/s      RQ/s      SMALL R   SMALL W   SMALL R   SMALL W
10:57:01   ----------   ----------------------------   -------   -------   -------   -------   -------   -------   -------   -------
10:57:01                Archive Log                    0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Archive Log Backup             0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Control File                   1         66        0         0         0.015     0.000     0.00      0.00
10:57:01                Data File                      125       16230     0         0         0.008     0.000     0.15      0.00
10:57:01                Data File Backup               0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Data File Copy                 0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Data File Incremental Backup   0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Data Pump Dump File            0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Flashback Log                  0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Log File                       0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Other                          0         0         0         0         0.000     0.000     0.00      0.00
10:57:01                Temp File                      0         0         0         0         0.000     0.000     0.00      0.00

 

Report the IO type metrics for “small” IO per database instances and containers:

./db_io_type_metrics.pl -show=inst,cont -io_type=small
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

13:07:46                                                                 SMALL R   SMALL R   SMALL W   SMALL W   Avg MB/   Avg MB/   Avg ms/   Avg ms/
13:07:46   INST         CONT              FILE_TYPE_TBS                  MB/s      RQ/s      MB/s      RQ/s      SMALL R   SMALL W   SMALL R   SMALL W
13:07:46   ----------   ---------------   ----------------------------   -------   -------   -------   -------   -------   -------   -------   -------
13:07:46   CBDT1                                                         2         97        0         1         0.021     0.000     0.03      1.00
13:07:46   CBDT1        CDB$ROOT                                         2         97        0         1         0.021     0.000     0.03      1.00
13:07:46   CBDT1        PDB$SEED                                         0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT1        P_1                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT1        P_2                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT1        P_3                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2                                                         1         96        0         1         0.010     0.000     0.09      2.00
13:07:46   CBDT2        CDB$ROOT                                         1         96        0         1         0.010     0.000     0.09      2.00
13:07:46   CBDT2        PDB$SEED                                         0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2        P_1                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2        P_2                                              0         0         0         0         0.000     0.000     0.00      0.00
13:07:46   CBDT2        P_3                                              0         0         0         0         0.000     0.000     0.00      0.00

 

Report the IO type metrics per… I let you finish the sentence as the utility is customizable enough 😉

You can download it from this repository or copy the source code from this page.

Conclusion:

Three utilities are now available to report IO metrics in real time depending on our needs:

  1. asm_metrics for reads and writes metrics related to ASM.
  2. db_io_metrics for reads and writes metrics related to data files and temp files.
  3. db_io_type_metrics for reads, writes, small, large and synchronous metrics related to data files, temp files,control files, log files, archive logs, and so on.

Enjoy and don’t hesitate to come back to me in case of questions, issues or suggestions.

Welcome to db_io_metrics, a new utility to display database physical IO metrics in real time

The db_io_metrics.pl utility is used to display database physical IO real-time metrics. It basically takes a snapshot each second (default interval) from the gv$filestat and gv$tempstat cumulative views and computes the delta with the previous snapshot. The utility is RAC and Multitenant aware.

This utility:

  • provides useful metrics.
  • is RAC aware.
  • detects if it is connected to a multitenant database and then is able to display the containers metrics.
  • is fully customizable: you can aggregate the results depending on your needs.
  • does not install anything into the database.

It displays the following metrics:

  • Reads/s: Number of read per second.
  • KbyRead/s: Kbytes read per second.
  • Avg ms/Read: ms per read in average.
  • AvgBy/Read: Average Bytes per read.
  • Same metrics are provided for Write Operations.

At the following levels:

  • Database Instance.
  • Database container.
  • Filesystem or ASM Diskgroup.
  • Tablespace.
  • Datafile.

Let’s see the help:

./db_io_metrics.pl -help

Usage: ./db_io_metrics.pl [-interval] [-count] [-inst] [-cont] [-fs_dg] [-tbs] [-file] [-fs_delimiter] [-show] [-display] [-sort_field] [-help]

 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                           Default
  ---------         -------                                                           -------
  -INST=            ALL - Show all Instance(s)                                        ALL
                    CURRENT - Show Current Instance
  -CONT=            Container to collect (wildcard allowed)                           ALL
  -FS_DG=           Filesystem or Diskgroup to collect (wildcard allowed)             ALL
  -TBS=             Tablespace to collect (wildcard allowed)                          ALL
  -FILE=            Datafile to collect (wildcard allowed)                            ALL
  -FS_DELIMITER=    Folder which follows the FS mount point                           ORADATA
  -SHOW=            What to show: inst,cont,fs_dg,tbs,file (comma separated list)     INST
  -DISPLAY=         What to display: snap,avg (comma separated list)                  SNAP
  -SORT_FIELD=      reads|writes|iops                                                 NONE

Example: ./db_io_metrics.pl
Example: ./db_io_metrics.pl  -inst=BDT_1
Example: ./db_io_metrics.pl  -show=inst,tbs
Example: ./db_io_metrics.pl  -show=inst,tbs -tbs=%UNDO%
Example: ./db_io_metrics.pl  -show=fs_dg
Example: ./db_io_metrics.pl  -show=inst,fs_dg -display=avg
Example: ./db_io_metrics.pl  -show=inst,fs_dg -sort_field=reads
Example: ./db_io_metrics.pl  -show=inst,tbs,cont
Example: ./db_io_metrics.pl  -show=inst,tbs,cont -cont=%P_%
Example: ./db_io_metrics.pl  -show=inst,cont -sort_field=iops

The main options/features are:

  1. You can choose the number of snapshots to display and the time to wait between snapshots.
  2. You can choose on which database instance to collect the metrics thanks to the –INST= parameter.
  3. You can choose on which database container to collect the metrics thanks to the -CONT= parameter (wilcard allowed).
  4. You can choose on which Diskgroup or Filesystem to collect the metrics thanks to the -FS_DG= parameter (wildcard allowed).
  5. You can choose on which tablespace to collect the metrics thanks to the -TBS= parameter (wildcard allowed).
  6. You can choose on which datafile to collect the metrics thanks to the -FILE= parameter (wildcard allowed).
  7. You can choose which folder is your Filesystem delimiter thanks to the -FS_DELIMITER= parameter.
  8. You can aggregate the results on the database instances, containers, diskgroups or filesystems, tablespaces level thanks to the –SHOW= parameter.
  9. You can display the metrics per snapshot, the average metrics value since the collection began (that is to say since the script has been launched) or both thanks to the –DISPLAY= parameter.
  10. You can sort based on the number of reads, number of writes or number of IOPS (reads+writes) thanks to the –SORT_FIELD= parameter.

Let’s see some use cases:

Report the physical IO metrics for the database instances:

./db_io_metrics.pl -show=inst
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

12:55:31                                                                                Kby      Avg       AvgBy/               Kby       Avg        AvgBy/
12:55:31   INST         CONT         FS_DG      TBS            FILE           Reads/s   Read/s   ms/Read   Read      Writes/s   Write/s   ms/Write   Write
12:55:31   ----------   ----------   --------   ------------   ------------   -------   ------   -------   ------    --------   -------   --------   ------
12:55:31   CBDT1                                                              376.0     3008     5.8       8192      0.0        0         0.0        0
12:55:31   CBDT2                                                              346.0     2768     15.6      8192      0.0        0         0.0        0

Report the physical IO metrics per database instances and per containers and sort by iops:

./db_io_metrics.pl -show=inst,cont -sort_field=iops
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

12:57:59                                                                                Kby      Avg       AvgBy/               Kby       Avg        AvgBy/
12:57:59   INST         CONT         FS_DG      TBS            FILE           Reads/s   Read/s   ms/Read   Read      Writes/s   Write/s   ms/Write   Write
12:57:59   ----------   ----------   --------   ------------   ------------   -------   ------   -------   ------    --------   -------   --------   ------
12:57:59   CBDT2                                                              293.0     2344     18.8      8192      0.0        0         0.0        0
12:57:59   CBDT2        CDB$ROOT                                              150.0     1200     18.4      8192      0.0        0         0.0        0
12:57:59   CBDT2        P_1                                                   143.0     1144     19.2      8192      0.0        0         0.0        0
12:57:59   CBDT2        PDB$SEED                                              0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT2        P_2                                                   0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT2        P_3                                                   0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT1                                                              274.0     2192     9.1       8192      0.0        0         0.0        0
12:57:59   CBDT1        CDB$ROOT                                              274.0     2192     9.1       8192      0.0        0         0.0        0
12:57:59   CBDT1        PDB$SEED                                              0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT1        P_1                                                   0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT1        P_2                                                   0.0       0        0.0       0         0.0        0         0.0        0
12:57:59   CBDT1        P_3                                                   0.0       0        0.0       0         0.0        0         0.0        0

Report the physical IO metrics per database instances and per filesystem or ASM diskgroup:

./db_io_metrics.pl -show=inst,fs_dg
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

13:00:39                                                                                Kby      Avg       AvgBy/               Kby       Avg        AvgBy/
13:00:39   INST         CONT         FS_DG      TBS            FILE           Reads/s   Read/s   ms/Read   Read      Writes/s   Write/s   ms/Write   Write
13:00:39   ----------   ----------   --------   ------------   ------------   -------   ------   -------   ------    --------   -------   --------   ------
13:00:39   CBDT1                                                              349.0     2792     8.3       8192      0.0        0         0.0        0
13:00:39   CBDT1                     DATA                                     349.0     2792     8.3       8192      0.0        0         0.0        0
13:00:39   CBDT2                                                              272.0     2176     25.4      8192      0.0        0         0.0        0
13:00:39   CBDT2                     DATA                                     272.0     2176     25.4      8192      0.0        0         0.0        0

Report the physical IO metrics per… I let you finish the sentence as the utility is customizable enough 😉

Remarks:

  • The metrics are “only” linked to the datafiles and tempfiles (See the oracle documentation for more details).
  • In case your database is on Filesystems, you have to change the “FS_DELIMITER” argument to aggregate the metrics at the Filesystem level. For example, if the FS are :
Filesystem             size   used  avail capacity  Mounted on
devora11-data/u500     960G   927G    33G    97%    /ec/dev/server/oracle/devora11/u500
devora11-data/u501     960G   767G   193G    80%    /ec/dev/server/oracle/devora11/u501
devora11-data/u502     500G   445G    55G    89%    /ec/dev/server/oracle/devora11/u502

And the datafiles are located “behind” the oradata folder:

/ec/dev/server/oracle/devora11/u500/oradata/BDT
/ec/dev/server/oracle/devora11/u501/oradata/BDT
/ec/dev/server/oracle/devora11/u502/oradata/BDT

Then you can launch the utility that way:

./db_io_metrics.pl -show=inst,fs_dg -fs_delimiter=oradata
............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

11:00:35                                                                               			    Kby      Avg       AvgBy/               Kby       Avg        AvgBy/
11:00:35   INST         FS_DG              			TBS              FILE             Reads/s   Read/s   ms/Read   Read      Writes/s   Write/s   ms/Write   Write
11:00:35   ----------   ----------------   			--------------   --------------   -------   ------   -------   ------    --------   -------   --------   ------
11:00:35   BDT                                                           			  129.0     1032     8.3       8192      0.0        0         0.0        0
11:00:35   BDT          /ec/dev/server/oracle/devora11/u500/                                      67.0      536      10.0      8192      0.0        0         0.0        0
11:00:35   BDT          /ec/dev/server/oracle/devora11/u501/                                      22.0      176      5.5       8192      0.0        0         0.0        0
11:00:35   BDT          /ec/dev/server/oracle/devora11/u502/                                      40.0      320      7.0       8192      0.0        0         0.0        0

 

  • If you don’t want to see the FS (do not use -show=fs_dg), then there is no need to specify the -fs_delimiter argument.
  • Reading the good article A Closer Look at CALIBRATE_IO from Luca Canali gave me the idea to create this utility.
  • If you are interested in those real-time metrics at the ASM level, you can have a look to the asm_metrics utility.

You can download it from this repository or copy the source code from this page.

Recovering after the loss of Redo Log files thanks to the file descriptor

This morning when I came at work I discovered that all the redo log files (current, active, inactive..) have been removed from one of our database (useless to say by mistake) resulting in:

Errors in file /ec/prod/server/oracle/orabdt/u000/admin/BDT/diag/rdbms/bdt/BDT/trace/BDT_m000_17002.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT1b.log'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3

As the database is a large one, I tried to avoid to restore it:

  1. My first reflex is to check if the flashback database is set to on: Unfortunately it was not the case.
  2. Then I remembered this blog post of Jason Arneil: Recovering from rm -rf on a datafile.

And then I decided to give it a try for the redo log files that way:

  • First I searched for the lgwr process ID associated to this database
ps -ef | grep -i lgwr | grep -i BDT
  oracle 17321  8422   0   Sep 10 ?         291:07 ora_lgwr_BDT
  • Now let’s search for the file descriptor linked to the redo logs files: As I don’t have access to lsof (nor pfiles) and as I want to know the Seq#, I did the research that way (redo logs contain the “Thread and Seq# strings) (output truncated):
cd /proc/17321/fd
for i in *
> do
> echo "for fd: $i"
> strings $i | egrep "Thread.*Seq#"
> done
for fd: 258
Thread 0001, Seq# 0000005285, SCN
for fd: 259
Thread 0001, Seq# 0000005285, SCN
for fd: 260
Thread 0001, Seq# 0000005284, SCN
for fd: 261
Thread 0001, Seq# 0000005284, SCN
  • Then based on the Seq#, and the v$log and v$logfile output I recreated the redo logs that way:
cat 258 > /ec/prod/server/oracle/orabdt/u800/oraredo/BDT/redoBDT1a.log
cat 259 > /ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT1b.log
cat 260 > /ec/prod/server/oracle/orabdt/u800/oraredo/BDT/redoBDT2a.log
cat 261 > /ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT2b.log
  • At that time we received:
Tue Dec 10 08:06:43 2013
Archived Log entry 5285 added for thread 1 sequence 5285 ID 0xb2ffdbd dest 1:
Archiver process freed from errors. No longer stopped
Tue Dec 10 08:06:43 2013
Beginning log switch checkpoint up to RBA [0x14a7.2.10], SCN: 9310332806181
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: '/ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT2b.log'
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: '/ec/prod/server/oracle/orabdt/u800/oraredo/BDT/redoBDT2a.log'
.....
.....
ORA-00314: log 1 of thread 1, expected sequence# 5287 doesn't match 5285
ORA-00312: online log 1 thread 1: '/ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT1b.log'
ORA-00314: log 1 of thread 1, expected sequence# 5287 doesn't match 5285
ORA-00312: online log 1 thread 1: '/ec/prod/server/oracle/orabdt/u800/oraredo/BDT/redoBDT1a.log'
  • So in our case, the “restore” has not been enough. Then we created new redo log files and we cleared and dropped the ones we recovered thanks to the file descriptor that way:
ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 1
WARNING! CLEARING REDO LOG WHICH HAS NOT BEEN ARCHIVED. BACKUPS TAKEN
    BEFORE 12/10/2013 08:10:00 (CHANGE 9310332823323) CANNOT BE USED FOR RECOVERY.
Clearing online log 1 of thread 1 sequence number 5287
Archived Log entry 5287 added for thread 1 sequence 5288 ID 0xb2ffdbd dest 1:
Archiver process freed from errors. No longer stopped
Tue Dec 10 08:14:44 2013
Completed: ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 1
ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 2
Tue Dec 10 08:14:54 2013
Beginning global checkpoint up to RBA [0x14a9.5cea.10], SCN: 9310382261905
Completed checkpoint up to RBA [0x14a9.5cea.10], SCN: 9310382261905
Tue Dec 10 08:14:55 2013
WARNING! CLEARING REDO LOG WHICH HAS NOT BEEN ARCHIVED. BACKUPS TAKEN
    BEFORE 12/10/2013 08:06:43 (CHANGE 9310332806181) CANNOT BE USED FOR RECOVERY.
Clearing online log 2 of thread 1 sequence number 5286
Completed: ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 2
ALTER DATABASE DROP LOGFILE GROUP 1
Completed: ALTER DATABASE DROP LOGFILE GROUP 1
ALTER DATABASE DROP LOGFILE GROUP 2
Completed: ALTER DATABASE DROP LOGFILE GROUP 2

And then the database has been able to work as expected and we launched a full backup of it.

Remarks:

  1. Thanks to Jason Jarneil and Frits Hoogland for their initial findings/blog post related to the restore of a lost datafile thanks to the file descriptor.
  2. In our case restoring the redo log files from the file descriptors has not been enough, but it’s worth trying as a last chance.
  3. Adding new redo logs should not be necessary. Clearing the “dropped/restored” redo logs with “ALTER DATABASE CLEAR (UNARCHIVED) LOGFILE GROUP <n>” could be enough as stated here:

loss_redo

Conclusion:

  1. We have been able to restore the service without the need of a database restore/recover (even if the “restore” from the file descriptor has not been enough).
  2. From my point of view you should try this as a last chance before restoring/stopping the database: restore from the file descriptor and if this is not enough launch the “alter database clear logfile group <n>” or “alter database clear unarchived logfile group <n>” on those restored redo logs.