Exadata: Storage Cells IO performance metrics and IO distribution with DB servers

Yesterday day I was reading Uwe Hesse ‘s blog post  “Appliance? How #Exadata will impact your IT Organization” (you should read it too by the way 🙂 ) and then (I don’t know why) my mind switch back to my asmiostat utility and how it could be useful for the Exadata community.

Let me explain:

If you read my previous blog post “ASM Preferred Read: Collect performance metrics” you see that thanks to my asmiostat utility we are able to measure the IO distribution between ASM instances and failure groups. (ASM is not doing any IOs by the way that’s just a simple way to say:  The IOs generated by the databases linked to the ASM instance)

You see, that we are also able to measure the failure groups IO performance metrics (and their associated disks if needed) (see first post related to my asmiostat utility).

That said, now think about Exadata for which:

  • One ASM instance is running per DB server.
  • Each storage cell constitutes a separate failure group (in most common Exadata configuration) (see Expert Oracle Exadata Book for more details)

So, now come back to the first sentence of the explanation and simply change a few words for Exadata:

You see that thanks to my asmiostat utility we are able to measure the IO distribution between DB servers (ASM instances) and Storage cells (failure groups).

You see, that we are also able to measure the Storage cells (failure groups) IO performance metrics (and their associated Grid Disks (disks) if needed)

Remarks:  

  • In case your Exadata configuration does not follow this rule:  One  failure group per storage cell, just be aware that I will update my asmiostat utility so that it will be able to group by storage cells in any case (thanks to the IP located into the disks path). I’ll keep you posted once ready.
  • To get the asmiostat utility included into the real_time.pl script:  Click on the link, and then on the view source button and then copy/paste the source code. You can also download the script from this repository to avoid copy/paste (click on the link)
  • For a full description of my asmiostat utility see this post.

Update: The asmiostat utility is now able to deal with Exadata Cell’s IPs (see this post)

Advertisement

ASM Preferred Read: Collect performance metrics

The purpose of this post is not to explain the ASM Preferred Read feature or the way to put it in place (for such purpose you can have a look to this oracle-base post or Christian Bilien’s one).

The purpose is to give a way to see this feature in action and collect related performance metrics. To do this:

  • I set asm_preferred_read_failure_groups to DATA.WIN on Instance +ASM1
  • I set asm_preferred_read_failure_groups to DATA.JMO on Instance +ASM2
  • I use Kevin Closson’s SLOB Kit to generate I/O on the database
  • I use my asmiostat utility included into real_time.pl (see this post for more information) with a filter on the DATA Diskgroup (-dg=data) and showing metrics at the Instances and Failgroups level (-show=inst,fg)

First test:

Let’s run SLOB to generate IOs Read from a database located on the same Host as the +ASM1 Instance. The result of “./real_time.pl -type=asmiostat -show=inst,fg -dg=data” is the following:

asm_prefer1

As you can see the Read IOs come from the WIN failgroup (as expected). You also get the performance metrics of the failgroup.

Second test:

Let’s run SLOB to generate IOs Read from a database located on the same Host as the +ASM2 Instance. The result of “./real_time.pl -type=asmiostat -show=inst,fg -dg=data” is the following:

asm_prefer2

As you can see the Read IOs come from the JMO failgroup (as expected). You also get the performance metrics of the failgroup.

Conclusion:

Thanks to the -show option of my asmiostat utility I provided a simple way to collect in real time the performance metrics related to your ASM preferred read configuration. (You can also check if this is working as expected that is to say IOs coming from the right failgroup)

To get the asmiostat utility included into the real_time.pl script:  Click on the link, and then on the view source button and then copy/paste the source code. You can also download the script from this repository to avoid copy/paste (click on the link)

Updates: 

  1. Check how it can be useful for Exadata into this post.
  2. SLOB update 2 has been released since this post. Check how we can use it into this post.
  3. The asmiostat utility is not part of the real_time.pl script anymore. A new utility called asm_metrics.pl has been created. See “ASM metrics are a gold mine. Welcome to asm_metrics.pl, a new utility to extract and to manipulate them in real time” for more information.

ASM I/O Statistics Utility

When I need to deal with ASM I/O statistics, the tools provided by Oracle (asmcmd iostat and asmiostat.sh from MOS [ID 437996.1]) do not suit my needs.

Then, I decided to create my own asmiostat utility that is helpful for 3 main reasons:

  1. It provides useful real-time metrics.
  2. You can aggregate the results following your needs in a customizable way.
  3. It does not need any change to the source: Simply download it and use it.

The script takes a snapshot each second (default interval) from the  gv$asm_disk_stat cumulative view (instead of gv$asm_disk because the information is exactly the same) and computes the differences with the previous snapshot.

The only difference with gv$asm_disk_stat is the information available in memory while v$asm_disk access the disks to re-collect some information. Since the information required doesn’t require to “re-collect” it from the disks, gv$asm_disk_stat is more appropriated here.

So, let’s have a look of the metrics collected by the script:

asm_metrics

Description is the following:

  • Reads/s: Number of read per second.
  • KbyRead/s: Kbytes read per second.
  • Avg ms/Read: ms per read in average.
  • AvgBy/Read: Average Bytes per read.
  • Read Errors: Number of Errors.
  • Same metrics are provided for Write Operations.

The interesting part is that you can decide how those metrics have to be calculated/aggregated: I will give an example below and explain how to use the script to get this result.

Suppose I want to display the metrics by Diskgroup (default behavior), the output will be like:

asm_dg

You see the blank values for: INST (instance), FG (Failgroup) and DSK (disks)? It means that those values have been aggregated.

Of course, you can display them as well, for example let’s display INST too:

asm_inst_dg

As you can see you now have the metrics for the diskgroups by Instance and also for the Instance itself (The row with the blank DG Field).

You can “play” with those fields (Inst, dg, fg, dsk) as you want. Display them (or not) to get their metrics using the <-show> argument of the script.

You can also filter on INST, DG and FG:  For example let’s display the metrics for the DATA diskgroup and its associated disks and failgroups:

asm_inst_dg_fg_dsk

Now let’s see the utility usage:

The utility has been implemented as a part of the real_time.pl script (Click on the link, and then on the view source button and then copy/paste the source code. You can also download the script from this repository to avoid copy/paste (click on the link))

This script collects also a lot of useful real-time metrics: See description of this script into this post.

The help associated to amsiostat:

 ./real_time.pl -type=asmiostat -help

Usage: ./real_time.pl -type=asmiostat [-interval] [-count] [-inst] [-dg] [-fg] [-show] [-help]
 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter    Comment                                                      Default
  ---------    -------                                                      -------
  -INST=       ALL - Show all Instance(s)                                   ALL
               CURRENT - Show Current Instance
               INSTANCE_NAME,... - choose Instance(s) to display

  -DG=         Diskgroup to collect (comma separated list)                  ALL
  -FG=         Failgroup to collect (comma separated list)                  ALL
  -SHOW=       What to show: inst,fg,dg,dsk (comma separated list)          DG

Example: ./real_time.pl -type=asmiostat
Example: ./real_time.pl -type=asmiostat -inst=+ASM1
Example: ./real_time.pl -type=asmiostat -dg=DATA -show=dg
Example: ./real_time.pl -type=asmiostat -dg=data -show=inst,dg,fg
Example: ./real_time.pl -type=asmiostat -show=dg,dsk
Example: ./real_time.pl -type=asmiostat -show=inst,dg,fg,dsk
  • You can choose the number of snapshots to display and the time to wait between snapshots.
  • You can choose to filter on INST, DG and FG (by default no filter is applied).
  • You can customize the output (means the fields on which the metrics are reported) following your need thanks to the <-show> argument.
  • You have to set oraenv on one ASM instance.
  • The script has been tested on Linux, Unix and Windows.

I hope this will be useful for you. If you have any suggestions, any metrics that you want to be integrated: Please do not hesitate to come back to me.

UPDATES: