Flex ASM 12c (12.1): be careful to “invisible” I/O !

The starting point of this blog post is a talk that I had with my twitter friend Martin Berger (@martinberx): He suggested me to test the Flex ASM behavior with different ASM disks path per machine.

As the documentation states:

asm_diskstring_121

Different nodes might see the same disks under different names, however each instance must be able to use its ASM_DISKSTRING to discover the same physical media as the other nodes in the cluster.

Not saying this is a good practice but as everything not forbidden is allowed let’s give it a try that way:

  • My Flex ASM lab is a 3 nodes RAC.
  • The ASM_DISKSTRING is set to /dev/asm* on the ASM instances.
  • I’ll add a new disk with udev rules in place on the 3 machines so that the new disk will be identified as:
        • /dev/asm1-disk10 on racnode1
        • /dev/asm2-disk10 on racnode2
        • /dev/asm3-disk10 on racnode3

As you can see, the ASM_DISKSTRING (/dev/asm*) is able to discover this new disk on the three nodes. Please note this is the same shared disk, it is just identified by different path on each nodes.

On my Flex ASM lab, 2 ASM instances are running:

srvctl status asm
ASM is running on racnode2,racnode1

Let’s create a diskgroup IOPS on this new disk (From the ASM1 instance):

. oraenv
ORACLE_SID = [+ASM1] ? +ASM1
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@racnode1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 12.1.0.1.0 Production on Tue Jul 16 12:03:59 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create diskgroup IOPS external redundancy disk '/dev/asm1-disk10';

Diskgroup created.

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

[oracle@racnode1 ~]$ srvctl start diskgroup -g IOPS

[oracle@racnode1 ~]$ srvctl status diskgroup -g IOPS
Disk Group IOPS is running on racnode2,racnode1

So everything went fine. Let’s check the disk from the ASM point of view:

SQL> l
  1  select
  2  i.instance_name,g.name,d.path
  3  from
  4  gv$instance i,gv$asm_diskgroup g, gv$asm_disk d
  5  where
  6  i.inst_id=g.inst_id
  7  and g.inst_id=d.inst_id
  8  and g.group_number=d.group_number
  9  and g.name='IOPS'
 10*
SQL> /

INSTANCE_NAME    NAME                           PATH
---------------- ------------------------------ --------------------
+ASM1            IOPS                           /dev/asm1-disk10
+ASM2            IOPS                           /dev/asm2-disk10

As you can see +ASM1 discovered /dev/asm1-disk10 and +ASM2 discovered /dev/asm2-disk10. This is expected and everything is ok so far.

Now, go on the third node racnode3, where there is no ASM instance.

Remember that on racnode3 the new disk is /dev/asm3-disk10. Let’s connect to the NOPBDT3 database instance and create a tablespace IOPS on the IOPS diskgroup.

SQL> create tablespace IOPS datafile '+IOPS' size 1g;

Tablespace created.

Perfect, everything is ok.  Now check v$asm_disk from the NOPBDT3 database instance:

SQL> select path from v$asm_disk where path like '%10';

PATH
--------------------------------------------------------------------------------
/dev/asm2-disk10

As you can see the NOPBDT3 database instance is linked to the +ASM2 instance (as it reports /dev/asm2-disk10)

But the NOPBDT3 database instance located on racnode3 access /dev/asm3-disk10.

SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
NOPBDT3

SQL> !ls -l /dev/asm2-disk10
ls: cannot access /dev/asm2-disk10: No such file or directory

SQL> !ls -l /dev/asm3-disk10
brw-rw----. 1 oracle dba 8, 193 Jul 16 15:35 /dev/asm3-disk10

Ooooh wait !  The NOPBDT3 database instance access the disk /dev/asm3-disk10 which is not recorded into gv$asm_disk.

So what if I launch SLOB locally on the NOPBDT3 database instance, are the metrics recorded ?

First, let’s setup SLOB on the IOPS tablespace:

[oracle@racnode3 SLOB]$ ./setup.sh IOPS 3

Now, launch SLOB and check the I/O metrics thanks to my asmiostat utility that way:

 ./real_time.pl -type=asmiostat -show=inst,dbinst -dg=IOPS

With the following output:

metrics_not_recorded

As you can see the metrics have not been recorded, while the IOPs have been done (/dev/asm3-disk10 is /dev/sdm):

egrep -i "sdm|device" iostat.out | tail -4
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdm               0.00     0.00  101.64    0.00     0.79     0.00    16.00     1.80   17.74   6.05  61.52
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdm               0.00     0.00  109.34    0.00     0.85     0.00    16.00     1.82   16.66   5.70  62.28

Of course the same test launched from the NOPBDT2 database instance linked to the +ASM2 instance,would produce the following output:

metrics_recorded

So the metrics are recorded as the database Instance is doing the IOPS on devices that the ASM instance is aware of (NOPBDT2 linked to +ASM1 would produce the “invisible” metrics).

Important remark:

The metrics are not visible from the gv$asm_disk view (from the ASM or the database instance), but there is a place where the metrics are recorded: The gv$asm_disk_iostat view from the database instance (Not the ASM one).

Conclusion:

The “invisible” I/O (metrics not recorded into the gv$asm_disk view) issue occurs if:

  1. you set different disks path per machine (so per ASM instance) in a Flex ASM (12.1) configuration.
  2. and the database instance is attached to a remote ASM instance (then using different path).

So I would suggest to use the same path per machine for the ASM disks in a Flex ASM (12.1) configuration to avoid this issue.

Update: The asmiostat utility is not part of the real_time.pl script anymore. A new utility called asm_metrics.pl has been created. See “ASM metrics are a gold mine. Welcome to asm_metrics.pl, a new utility to extract and to manipulate them in real time” for more information.

Advertisements

ASM I/O Statistics Utility V2

Some days ago I wrote about a side effect of Flex ASM 12c (12.1) that I called “unpreferred read“. While I was writing this post I thought that the side effect demonstration will be even more clear if my asmiostat utility could display database Instances as well.

This is done with my asmiostat utility V2 which provides those new features:

  • Ability to display database instances (as of 11gr1).
  • Ability to sort based on the number of reads.
  • Ability to sort based on the number of writes.

The following metrics are still collected:

  • Reads/s: Number of read per second.
  • KbyRead/s: Kbytes read per second.
  • Avg ms/Read: ms per read in average.
  • AvgBy/Read: Average Bytes per read.
  • Same metrics are provided for Write Operations.

Of course the old features remain (see this post for more details about the previous features):

  • Ability to display/aggregate/filter following your needs on ASM instances, diskgroup, failgroup and disks (And now on database instances as well).
  • Ability to display Exadata Cells IPs instead of ASM Failgroup.

Let’s have a look to 2 examples using the V2 features:

First one: I want to know which database Instance is generating most of the Read IO requests per ASM instance, and I also want to see the performance metrics.

Fine, let’s launch my utility that way:

./real_time.pl -type=asmiostat -show=inst,dbinst -sort_field=reads

With the following output:

asmiostatv2_most_reads

As you can see the BDTO_2 database instance is generating the most part of the read IO request using +ASM1.

Second one: I want to see Flex ASM 12c (12.1) “unpreferred” read in action.

Well, I am using the same setup as the one described into the “unpreferred read” post.

I launch Kevin Closson’s SLOB2 Kit to generate Physical IO locally on the NOPBDT3 database instance. I check the behavior with my asmiostat utility that way:

./real_time.pl -type=asmiostat -show=inst,dbinst,dg,fg -dg=DATAP -dbinst='%NOP%'

With the following output:

asmiostatv2_unpreff_reads

As you can see the NOPBDT3 database instance (located in SITEB) is using the ASM1 instance which prefers to read from SITEA. Then the NOPBDT3 database instance is reading from SITEA which is bad.

Remarks and conclusion:

  • My asmiostat utility V2 is helpful to see which database instance is using which ASM instance (and also collect the performance metrics).
  • This will be very useful with Flex ASM in place but it can also be used with non Flex ASM (See the first example).
  • You can download my asmiostat utility (which is part of the real_time.pl script) from this repository.
  • The utility V2 still works with 10gr2 ASM but the “Database instance” feature is triggered as of 11gr1 (as it is based on the gv$asm_disk_iostat view).
  • I did not had the chance to play with pluggable databases yet: This will be the next step around my utility.
  • If you hit this issue:
./real_time.pl 
: No such file or directory
  • Then launch it that way:
perl ./real_time.pl

UPDATE: The asmiostat utility is not part of the real_time.pl script anymore. A new utility called asm_metrics.pl has been created. See “ASM metrics are a gold mine. Welcome to asm_metrics.pl, a new utility to extract and to manipulate them in real time” for more information.

Flex ASM 12c (12.1) and Extended Rac: be careful to “unpreferred” read !

Update 2015/03/06: The following has been recognized as “unpublished BUG 17045279 – ASM_PREFERRED_READ DOES NOT WORK WITH FLEX ASM”, which is planned to be fixed in the next upcoming release.

Update 2017/05/20: As of 12.2, preferred reads are site-aware (extract of  Markus Michalewicz presentation available here) so that the issue described into this blog post has been addressed.

 

Introduction

As you know Oracle 11g introduced a new feature called “Asm Preferred Read”. It is very useful in extended RAC as it allows each node to define a preferred failure group, allowing nodes to access local failure groups in preference to remote ones. This is done thanks to the “asm_preferred_read_failure_groups” parameter.

Fine, but remember:

  1. This parameter has to be set at the ASM instance level (not the database instance one).
  2. This is the database instance (or its shadow processes) that is doing the IOs (not the ASM instance).

Why is it important ?   Because with Flex ASM in place, database instances are connection load balanced across the set of available ASM instances (that of course are not necessary “local” to the database instance anymore):

flex_asm1

And then you could hit what I call the “unpreferred” read behavior.

Let me explain more in depth with an example:

Suppose that you have an extended 3 nodes RAC:

  • racnode1 located in SITE A
  • racnode2 located in SITE A
  • racnode3 located in SITE B

And 2 ASM instances actives:

  • +ASM1 located in SITE A
  • +ASM3 located in SITE B
srvctl status asm
ASM is running on racnode3,racnode1

So you created 2 failgroup SITEA and SITEB and you set the asm_preferred_read_failure_groups parameter that way for the DATAP diskgroup:

SQL> alter system set asm_preferred_read_failure_groups='DATAP.SITEB' sid='+ASM3';
System altered.

SQL> alter system set asm_preferred_read_failure_groups='DATAP.SITEA' sid='+ASM1';
System altered.

So that ASM3 prefers to read from SITEB and ASM1 from SITEA (which fully makes sense from the ASM point of view).

But what if  a database instance located into SITEB (racnode3) is using ASM1 located in SITEA ?

SQL>  select I.INSTANCE_NAME,C.INSTANCE_NAME,C.DB_NAME
  2  from gv$instance I, gv$asm_client C 
  3  where C.INST_ID=I.INST_ID and C.instance_name='NOPBDT3';

INSTANCE_NAME    INSTANCE_NAME                                                    DB_NAME
---------------- ---------------------------------------------------------------- --------
+ASM1            NOPBDT3                                                          NOPBDT

As you can see the NOPBDT3 database instance is using the +ASM1 instance, while the NOPBDT3 database instance is located on racnode3:

srvctl status instance -i NOPBDT3 -d NOPBDT
Instance NOPBDT3 is running on node racnode3

Which means NOPBDT3 located into SITEB will prefer to request read IO from SITEA which is of course very bad.

Let’s check this with my asmiostat utility and Kevin Closson’s SLOB2 kit:

Let’s launch SLOB locally on NOPBDT3 only:

[oracle@racnode3 SLOB]$ ./runit.sh 3
NOTIFY: 
UPDATE_PCT == 0
RUN_TIME == 300
WORK_LOOP == 0
SCALE == 10000
WORK_UNIT == 256
ADMIN_SQLNET_SERVICE == ""
ADMIN_CONNECT_STRING == "/ as sysdba"
NON_ADMIN_CONNECT_STRING == ""
SQLNET_SERVICE_MAX == "0"

And check the IO metrics with my asmiostat utility that way (I want to see Instance, Diskgroup and Failgroup):

./real_time.pl -type=asmiostat -show=inst,dg,fg -dg=DATAP

With the following output:

flex_asm_pref_read_12c

As I am the only one to work on this Lab, you can see with no doubt that the IO metrics coming from the Instance NOPBDT3 are recorded into the ASM instance +ASM1 and clearly indicates that the read IOs have been done on SITEA.

How can we “fix” this ?

You can temporary fix this that way (connected on the +ASM1 instance):

SQL> ALTER SYSTEM RELOCATE CLIENT 'NOPBDT3:NOPBDT';
System altered.

SQL> select I.INSTANCE_NAME,C.INSTANCE_NAME,C.DB_NAME
  2  from gv$instance I, gv$asm_client C 
  3   where C.INST_ID=I.INST_ID and C.instance_name='NOPBDT3';

INSTANCE_NAME    INSTANCE_NAME                                                    DB_NAME
---------------- ---------------------------------------------------------------- --------
+ASM3            NOPBDT3                                                          NOPBDT
+ASM3            NOPBDT3                                                          NOPBDT

That way the NPPBDT3 database instance will use the +ASM3 instance and then will launch its read IO on SITEB .

But I had to bounce the NOPBDT3 database instance so that it launchs the read IO from SITEB (If not it was still using SITEA, well maybe a subject for another post)

Conclusion:

Flex ASM is a great feature but you have to be careful if you want to use it in an extended Rac with preferred read in place.  If not you may hit the “unpreferred” read behavior.

UPDATES:

unpref_12102