Watch out for optimizer_adaptive_features as It may have a huge negative impact

Let me describe how I discovered that the optimizer_adaptive_features may have a huge negative impact (specially on RAC databases).

I upgraded a 11gR2 database to 12.1.0.2 and then I converted this database to a PDB. To do so I launched “@$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql” (See MOS Doc ID 1564657.1 for more details).

This sql script took about 75 minutes to complete on my 2 nodes RAC database. This is quite long and then I decided to try to reduce this duration.

To diagnose: I dropped the PDB, plugged it back, re-launched noncdb_to_pdb.sql and enabled a 10046 trace on the session.

The top SQL (sort by elapsed time) for this session is the following:

SQL ID: frjd8zfy2jfdq Plan Hash: 510421217

SELECT executions, end_of_fetch_count,              elapsed_time/px_servers
  elapsed_time,        cpu_time/px_servers     cpu_time,
  buffer_gets/executions  buffer_gets
FROM
 (SELECT sum(executions)   as executions,                            sum(case
  when px_servers_executions > 0                              then
  px_servers_executions                                  else executions end)
  as px_servers,                sum(end_of_fetch_count) as end_of_fetch_count,
                sum(elapsed_time) as elapsed_time,
  sum(cpu_time)     as cpu_time,                     sum(buffer_gets)  as
  buffer_gets            FROM   gv$sql
  WHERE executions > 0                                 AND sql_id = :1
                              AND parsing_schema_name = :2)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    70502      4.02       4.25          0          0          0           0
Execute  70502    264.90     759.33          0          0          0           0
Fetch    70502    215.77    1848.46          0          0          0       70502
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   211506    484.70    2612.05          0          0          0       70502

Misses in library cache during parse: 17
Misses in library cache during execute: 17
Optimizer mode: CHOOSE
Parsing user id: SYS   (recursive depth: 2)
Number of plan statistics captured: 70502

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  VIEW  (cr=0 pr=0 pw=0 time=76 us)
         1          1          1   SORT AGGREGATE (cr=0 pr=0 pw=0 time=76 us)
         0          0          1    PX COORDINATOR  (cr=0 pr=0 pw=0 time=76 us)
         0          0          0     PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
         0          0          0      VIEW  GV$SQL (cr=0 pr=0 pw=0 time=0 us)
         0          0          0       FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=0 us)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Join ACK                           281741        0.03        368.85
  PX Deq: reap credit                       1649736        0.00         25.22
  IPC send completion sync                   141000        0.02        135.07
  PX Deq: Parse Reply                        141004        0.04        170.64
  PX Deq: Execute Reply                      141004        0.08       1366.42
  reliable message                            70502        0.00        125.51
  PX Deq: Signal ACK EXT                     140996        0.03         13.99
  PX Deq: Slave Session Stats                140996        0.02         25.54
  enq: PS - contention                        70605        0.11        145.80
  KJC: Wait for msg sends to complete          2584        0.00          0.04
  latch free                                     16        0.00          0.00
  PX qref latch                                  14        0.00          0.00
  latch: shared pool                              4        0.00          0.00
  latch: active service list                      1        0.00          0.00
  row cache lock                                127        0.00          0.08
  library cache lock                             36        0.00          0.04
  library cache pin                              36        0.00          0.04
  gc cr grant 2-way                               1        0.00          0.00
  db file sequential read                         1        0.00          0.00
  oracle thread bootstrap                         1        0.03          0.03

As you can see this SQL elapsed time is about 2600 seconds in total (for about 70 000 executions), most of the wait time comes from “PX %” events and this SQL queries the GV$SQL view.

But wait:

Why the hell this SQL (which looks like an SQL that collects metrics for a particular sql_id) is somehow part of the session that launched the noncdb_to_pdb.sql script? Does it make sense?

I really don’t think so, this sql (sql_id “frjd8zfy2jfdq”) should have been triggered by something else (parsing or whatever) than noncdb_to_pdb.sql.

Let’s prove it with a simple test case (I am alone on the database):

alter system flush shared_pool;

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

connect / as sysdba

begin
execute immediate 'select object_name from dba_objects';
end;
/

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

With the following result:

SQL> @test_case

System altered.


  COUNT(*)
----------
         0

Connected.

PL/SQL procedure successfully completed.


  COUNT(*)
----------
         2

Did you see that the sql_id “frjd8zfy2jfdq” has been produced by this simple test case? (If you trace the session you would see this query into the trace file).

I reproduced this behavior on:

  • 12.1.0.2 database with CDB/PDB.
  • 12.1.0.2 database (Non CDB).

and was not able to reproduce it on a 11gR2 database.

So, it looks like that this SQL is introduced by a 12.1.0.2 (12cR1) new feature. As it is somehow linked to “SQL metrics collection”, I tried to disable some 12cR1 features (linked to this area) until I found the “culprit”.

It did not produce any change until I set the optimizer_adaptive_features parameter to false (true is the default).

Here is the result:

SQL> !cat test_case.sql
alter system flush shared_pool;

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

connect / as sysdba
alter session set optimizer_adaptive_features=false;

begin
execute immediate 'select object_name from dba_objects';
end;
/

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

SQL> @test_case

System altered.

  COUNT(*)
----------
         0

Connected.

Session altered.

PL/SQL procedure successfully completed.

  COUNT(*)
----------
         0

BINGO!!! The sql_id “frjd8zfy2jfdq” has not been executed!

Well, now what is the impact on noncdb_to_pdb.sql on my 2 nodes RAC database?

I edited the script and added:

alter session set optimizer_adaptive_features=false;

just above this line:

exec dbms_pdb.noncdb_to_pdb(1);

Guess what?

  1. noncdb_to_pdb.sql took about 20 minutes to execute (compare to about 75 minutes without setting optimizer_adaptive_features to false).
  2. The sql_id “frjd8zfy2jfdq” is not part of the trace file anymore.

Remarks:

  • The impact of this SQL is much more “visible” with a RAC database, as it queries a GLOBAL V$ view (GV$SQL) and then needs parallel query to run (If more than one instance is up). With only one instance up, the 10046 trace file produced:
SQL ID: frjd8zfy2jfdq Plan Hash: 510421217

SELECT executions, end_of_fetch_count,              elapsed_time/px_servers
  elapsed_time,        cpu_time/px_servers     cpu_time,
  buffer_gets/executions  buffer_gets
FROM
 (SELECT sum(executions)   as executions,                            sum(case
  when px_servers_executions > 0                              then
  px_servers_executions                                  else executions end)
  as px_servers,                sum(end_of_fetch_count) as end_of_fetch_count,
                sum(elapsed_time) as elapsed_time,
  sum(cpu_time)     as cpu_time,                     sum(buffer_gets)  as
  buffer_gets            FROM   gv$sql
  WHERE executions > 0                                 AND sql_id = :1
                              AND parsing_schema_name = :2)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    69204      2.58       2.78          0          0          0           0
Execute  69204     23.93      25.45          0          0          0           0
Fetch    69204      2.68       2.64          0          0          0       69204
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   207612     29.20      30.88          0          0          0       69204

Misses in library cache during parse: 18
Misses in library cache during execute: 18
Optimizer mode: CHOOSE
Parsing user id: SYS   (recursive depth: 2)
Number of plan statistics captured: 62

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  VIEW  (cr=0 pr=0 pw=0 time=120 us)
         1          1          1   SORT AGGREGATE (cr=0 pr=0 pw=0 time=113 us)
         0          0          0    PX COORDINATOR  (cr=0 pr=0 pw=0 time=100 us)
         0          0          0     PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=50 us)
         0          0          0      VIEW  GV$SQL (cr=0 pr=0 pw=0 time=43 us)
         0          0          0       FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=39 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  row cache lock                                  1        0.00          0.00

As you can see: elapsed time of about 30 seconds (for about 70 000 executions) and no “PX%” wait events.

  • I checked which parameters (hidden or not) changed when setting optimizer_adaptive_features to false. Then I tried one by one those parameters with my test case and discovered that setting “_optimizer_dsdir_usage_control” to 0 is enough to get rid of the sql_id “frjd8zfy2jfdq”.
  • During the run of noncdb_to_pdb.sql, this is the large number of executions (about 70 000) of the sql_id “frjd8zfy2jfdq” that produced this long “overall” duration and highlighted the fact that this query has to be somehow removed.
  • You could meet this SQL on 12cR1 databases (CDB/PDB or non CDB).

Conclusion:

  • The optimizer_adaptive_features (set to true) may produce a huge negative impact on the performance (specially with RAC database). I provided an example of such an impact when running the noncdb_to_pdb.sql script.
  • Should you meet the sql_id “frjd8zfy2jfdq” in your database and would like to get rid of it (because you observe a negative impact): Then simply set optimizer_adaptive_features to false (or “_optimizer_dsdir_usage_control” to 0) at the session or system level.
Posted in Performance, Rac | 2 Comments

Issue installing the 11gR2 database software after the 12.1.0.2 clusterware setup

If you don’t plan to install the 11gR2 database software after a 12.1.0.2 clusterware installation, I guess there is no need for you to read this post.

I just want to share the issue I got and the way you could workaround it. The purpose of this post is just to save your time, in case of.

So, after a 12.1.0.2 clusterware installation (on a 2 nodes RAC cluster), I decided to install the 11.2.0.4 database software. I launched the runInstaller, followed the install process until the Step 4:

install11g_db

As you can see the nodes that are part of the cluster have not been proposed and clicking next would produce “CRS is not installed on any of the nodes”. 

The first question you may ask is: Is the 11gR2 database compatible with CRS 12.1? Yes it is, as you can see here:

crs_supported_version

As I am curious, I canceled the 11gR2 database software installation and gave a try with the 12.1.0.2 database software. The Step 4 produced:

install12c_db

As you can see the nodes that are part of the cluster have been proposed.

I canceled the 12.1.0.2 database software installation and I had a look to the Inventory. Then I discovered that the CRS=”true” flag was not set for the 12.1.0.2 GI HOME:

$ cat /etc/oraInst.loc | grep inventory_loc
inventory_loc=/ec/poc/server/oracle/olrpoc1/u000/oraInventory

$ cat /ec/poc/server/oracle/olrpoc1/u000/oraInventory/ContentsXML/inventory.xml
..
HOME NAME="OraGI12Home1" LOC="/ec/poc/server/oracle/olrpoc1/u000/product/GRID.12.1.0.2" TYPE="O" IDX="1"
..

Then I added the the CRS=”true” flag that way:

$GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME="/ec/poc/server/oracle/olrpoc1/u000/product/GRID.12.1.0.2" CRS=true

So that:

$ cat /ec/poc/server/oracle/olrpoc1/u000/oraInventory/ContentsXML/inventory.xml
..
HOME NAME="OraGI12Home1" LOC="/ec/poc/server/oracle/olrpoc1/u000/product/GRID.12.1.0.2" TYPE="O" IDX="1" CRS="true"
..

Then I relaunched the 11.2.0.4 database software installation, and the Step 4 produced:

install11g_db_fixed

As you can see the nodes have been proposed so that I have been able to complete the installation successfully.

Remarks:

  1. MOS 1053393.1 provides more details about the CRS=”true” flag.
  2. I guess the “issue” will be the same for database software >=10.1 and < 12.1 (But I did not test it).
  3. On another RAC, I upgraded the GI from 12.1.0.1 to 12.1.0.2 and then the CRS=”true” flag has been set automatically for the 12.1.0.2 GI.

Conclusion: After a 12.1.0.2 CRS installation,

  1. You may need to put the CRS=”true” flag to avoid this issue during a 11gR2 database software installation.
  2. You don’t need to put the CRS=”true” flag during a 12.1.0.2 database software installation as the issue does not appear.
  3. It looks like you won’t hit the issue if the GI has been upgraded to 12.1.0.2 (See third remark).
Posted in Rac | Leave a comment

ASM Rebalance: Why is the avg By/Read equal to 1MB while the allocation unit is 4MB?

During the “A closer look at ASM rebalance series” I observed that the average By/Read extracted from the v$asm_disk_stat view during a rebalance is equal to 1MB even if the ASM allocation unit is set to 4MB.

Example:

Screen Shot 2014-08-25 at 18.38.07

The question is: why?

To try to answer this question I decided to use strace (the Linux system call trace utility) on the ASM arb process (this process rebalances data extents within an ASM disk group) during the rebalance of the DATA diskgroup (AU size is 4MB).

From strace I can see:

  • That a very large majority (more than 80% in my case) of the IOs are submitted that way:
io_submit(140691479359488, 4, {{0x7ff547f6c210, 0, 0, 0, 261}, {0x7ff547f6d0b0, 0, 0, 0, 261}, {0x7ff547f6c960, 0, 0, 0, 261}, {0x7ff547f6cbd0, 0, 0, 0, 261}}) = 4

We can see that 4 IOs have been submitted at the same time (with a single io_submit call).

  • And that about 100% of all the IOs submitted are 1MB (look at the IO size which is 1048576 bytes):
io_getevents(140691479359488, 1, 128, {{0x7ff547f6c960, 0x7ff547f6c960, 1048576, 0}, {0x7ff547f6cbd0, 0x7ff547f6cbd0, 1048576, 0}, {0x7ff547f6d0b0, 0x7ff547f6d0b0, 1048576, 0}, {0x7ff547f6c210, 0x7ff547f6c210, 1048576, 0}}, {600, 0}) = 4

I also straced arb during the rebalance of the DATA1M diskgroup (Allocation unit of 1MB) and observed:

  • That about 80% of the IOs are submitted that way:
io_submit(139928633700352, 1, {{0x7f43aadbd210, 0, 1, 0, 262}}) = 1

So 1 IO is submitted per io_submit call.

  • And that about 100% of all the IOs submitted are 1MB:
io_getevents(139928633700352, 3, 128, {{0x7f43aadbd6f0, 0x7f43aadbd6f0, 1048576, 0}}, {0, 0}) = 1

So that it makes sense to conclude that:

  1. The arb process always request IOs of 1MB (whatever the allocation unit is).
  2. The arb IOs requests are grouped and submitted at the same time depending of the diskgroup allocation unit (4 IOs are submitted at the same time with allocation unit set to 4MB, 1 IO submitted by io_submit call with allocation unit set to 1MB).

Based on this, I think it makes sense that the v$asm_disk_stat view reports Avg By/Read of 1MB during the rebalance process (whatever the allocation unit is).

But now one more question: What is the impact of the Linux Maximum IO size?

For this, I launched the rebalance 2 times (on both diskgroups: DATA with AU=4MB and DATA1M with AU=1MB):  One time with max_sectors_kb set to 512 and one time with max_sectors_kb set to 4096.

And observed that:

  • Oracle still does 1MB IOs only and groups the IOs depending of the allocation unit (whatever the max_sectors_kb is).
  • With max_sectors_kb set to 512: The IOs issued on the devices linked to the ASM disks are limited to 512 KB (even if oracle requested 1MB IOs). This can be seen with the iostat output (looking at the avgrq-sz field):
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdak              0.00     0.00    0.00   13.65     0.00 13919.15  1019.72     0.04    3.11   3.11   4.25
sdbd              0.00     0.00    0.10    0.00   102.05     0.00  1020.50     0.00    2.00   2.00   0.02
sddh              0.00     0.00    0.15    0.00   153.25     0.00  1021.67     0.00    4.33   4.33   0.07
sdej              0.00     0.00    0.00   12.35     0.00 12640.10  1023.49     0.04    3.01   3.01   3.72
sdcr              0.00     0.00    0.00   11.80     0.00 12083.20  1024.00     0.03    2.71   2.71   3.20
sdfi              0.00     0.00    0.00   12.35     0.00 12646.40  1024.00     0.07    5.55   5.55   6.85
sdg               0.00     0.00    0.00   11.70     0.00 11980.80  1024.00     0.06    5.35   5.34   6.25

The avgrq-sz is the average size (in sectors) of the requests that were issued to the device. The sector size I am using is 512 bytes (not 4K) so that the IOs are about 1024*512 bytes = 512 KB (which is our max_sectors_kb size).

  • With max_sectors_kb set to 4096: The IOs issued on the devices linked to the ASM disks can be greater than 1MB (even if oracle requested 1MB IOs):
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdi               0.00     0.00    3.75    2.05 20173.15  2253.60  3866.68     0.17   29.72  28.79  16.70
sdas              0.00     0.00    3.90    1.95 20992.35  1638.80  3868.57     0.18   31.08  29.91  17.50
sdgj              0.00     0.00    3.75    2.00 20173.85  2253.20  3900.36     0.20   34.71  33.74  19.40

about 2MB in my case (3900 * 512 bytes).

Remarks:

  • It has been tested with ASM 11.2.0.4 on Linux x86-64 (without asmlib).
  • I’ll update this post with ASM 12c results as soon as I can (if something new needs to be told).

Conclusion:

  • The arb process always request IOs of 1MB (whatever the allocation unit is).
  • The arb process always request IOs of 1MB (whatever the max_sectors_kb is): Then it looks like arb doesn’t probe the IO capabilities of the associated device.
  • The arb IOs requests are grouped and submitted at the same time depending of the diskgroup allocation unit.
  • The kernel splits the arb IOs requests if max_sectors_kb is < 1MB.
  • The kernel try to merge the arb IOs requests if max_sectors_kb is > 1MB.

Thanks to Frits Hoogland for his “Extra huge database IOs series” and for the time he spent answering the questions I asked.

Posted in ASM | Leave a comment

A closer look at ASM rebalance, Part III: Disks have been added and dropped (at the same time)

This article is the third Part of the “A closer look at ASM rebalance” series:

  1. Part I: Disks have been added.
  2. Part II: Disks have been dropped.
  3. Part III: Disks have been added and dropped (at the same time).

If you are not familiar with ASM rebalance I would suggest first to read those 2 blog posts written by Bane Radulovic:

In this part III I want to visualize the rebalance operation (with 3 power values: 2,6 and 11) after disks have been added and dropped (at the same time).

To do so, on a 2 nodes Extended Rac Cluster (11.2.0.4), I added 2 disks and dropped 2 disks (with a single command) into the DATA diskgroup (created with an ASM Allocation Unit of 4MB) and launched (connected on +ASM1):

  1. alter diskgroup DATA rebalance power 2; (At 02:11 PM).
  2. alter diskgroup DATA rebalance power 6; (At 02:24 PM).
  3. alter diskgroup DATA rebalance power 11; (At 02:34 PM).

And then I waited until it finished (means v$asm_operation returns no rows for the DATA diskgroup).

Note that 2) and 3) interrupted the rebalance in progress and launched a new one with a new power.

During this amount of time I collected the ASM performance metrics that way for the DATA diskgroup only.

I’ll present the results with Tableau (For each Graph I’ll keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce).

Note: There is no database activity on the Host where the rebalance has been launched.

Here are the results:

First let’s verify that the whole rebalance activity has been done on the +ASM1 instance (As I launched the rebalance operations from it).

Screen Shot 2014-09-01 at 20.29.42

We can see:

  1. That all Read and Write rebalance activity has been done on +ASM1 .
  2. That the read throughput is very close to the write throughput on +ASM1.
  3. The impact of the power values (2,6 and 11) on the throughput.

Now I would like to compare the behavior of 3 Sets of Disks: The disks that have been dropped, the disks that have been added and the other existing disks into the DATA diskgroup.

To do so, let’s create in Tableau 3 groups:

Screen Shot 2014-09-01 at 21.00.06

Let’s call it “3 Groups”

Screen Shot 2014-09-01 at 20.58.24

So that now we are able to display the ASM metrics for those 3 sets of disks.

I will filter the metrics on ASM1 only (to avoid any “little parasites” coming from ASM2).

Let’s visualize the Reads/s and Writes/s metrics:

Screen Shot 2014-09-01 at 21.03.04

We can see that during the 3 rebalances:

  1. No writes on the dropped disks.
  2. No reads on the new disks.
  3. Number of Reads/s increasing on the dropped disks depending of the power values.
  4. Number of Writes/s increasing on the new disks depending of the power values.
  5. Reads/s and Writes/s both increasing on the other disks depending of the power values.
  6. As of 03:06 PM, no activity on the dropped and new disks while there is still activity on the other disks.
  • Are 1, 2, 3, 4 and 5 surprising? No.
  • What happened for 6? I’ll answer later on.

Let’s visualize the Kby Read/s and Kby Write/s metrics:

Screen Shot 2014-09-01 at 21.12.44

We can see that during the 3 rebalances:

  1. No Kby Write/s on the dropped disks.
  2. No Kby Read/s on the new disks.
  3. Number of Kby Read/s increasing on the dropped disks depending of the power values.
  4. Number of Kby Write/s increasing on the new disks depending of the power values.
  5. Kby Read/s and Kby Write/s both increasing on the other disks depending of the power values.
  6. Kby Read/s and Kby Write/s are very close on the other disks (It was not the case into the Part I).
  7. As of 03:06 PM, no activity on the dropped and new disks while there is still activity on the other disks.
  • Are 1, 2, 3, 4, 5 and 6 surprising? No.
  • What happened for 7? I’ll answer later on.

Let’s visualize the Average By/Read and Average By/Write metrics:

Important remark regarding the averages computation/display: The By/Read and By/Write measures depend on the number of reads. So the averages have to be calculated using Weighted Averages.

Let’s create the calculated field in Tableau for the By/Read Weighted Average:

Screen Shot 2014-08-20 at 21.56.49

The same has to be done for the By/Write Weighted Average.

Let’s see the result:

Screen Shot 2014-09-01 at 21.22.10

We can see:

  1. The Avg By/Read on the dropped disks is about the same (about 1MB) whatever the power value is.
  2. The Avg By/Write on the new disks is about the same (about 1MB) whatever the power value is.
  3. The Avg By/Read and Avg By/Write on the other disks is about the same (about 1MB) whatever the power value is.
  • Are 1,2 and 3 surprising? No for the behaviour,Yes (at least for me) for the 1MB value as the ASM allocation unit is 4MB.

Now that we have seen all those metrics, we can ask:

Q1: So what the hell happened at 03:06 pm?

Let’s check the alert_+ASM1.log file at that time:

Mon Aug 25 15:06:13 2014
NOTE: membership refresh pending for group 4/0x1e089b59 (DATA)
GMON querying group 4 at 396 for pid 18, osid 67864
GMON querying group 4 at 397 for pid 18, osid 67864
NOTE: Disk DATA_0006 in mode 0x0 marked for de-assignment
NOTE: Disk DATA_0007 in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 4/0x1e089b59 (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Mon Aug 25 15:07:16 2014
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 4/0x1e089b59 (DATA)

We can see that the ASM rebalance started the compacting phase (See Bane Radulovic’s blog post for more details about the ASM rebalances phases).

Q2: The ASM Allocation Unit size is 4MB and the Avg By/Read is stucked to 1MB,why?

I don’t have the answer yet, it will be the subject of another post.

Two remarks before to conclude:

  1. The ASM rebalance activity is not recorded into the v$asm_disk_iostat viewIt is recorded into the v$asm_disk_stat view. So, if you are using the asm_metrics utility, you have to change the asm_feature_version variable to a value > your ASM instance version.
  2. I tested with compatible.asm set to 10.1 and 11.2.0.2 and observed the same behaviour for all those metrics.

Conclusion of Part III:

  • Nothing surprising except (at least for me) that the Avg By/Read is stucked to 1MB (While the allocation unit is 4MB).
  • We visualized that the compacting phase of the rebalance operation generates much more activity on the other disks compare to near zero activity on the dropped and new disks.
  • I’ll update this post with ASM 12c results as soon as I can (if something new needs to be told).
Posted in ASM, Tableau | Leave a comment

A closer look at ASM rebalance, Part II: Disks have been dropped

This article is the second Part of the “A closer look at ASM rebalance” series:

  1. Part I: Disks have been added.
  2. Part II: Disks have been dropped.
  3. Part III: Disks have been added and dropped (at the same time).

If you are not familiar with ASM rebalance I would suggest first to read those 2 blog posts written by Bane Radulovic:

In this part II I want to visualize the rebalance operation (with 3 power values: 2,6 and 11) after disks have been dropped.

To do so, on a 2 nodes Extended Rac Cluster (11.2.0.4), I dropped 2 disks into the DATA diskgroup (created with an ASM Allocation Unit of 4MB) and launched (connected on +ASM1):

  1. alter diskgroup DATA rebalance power 2; (At 09:09 AM).
  2. alter diskgroup DATA rebalance power 6; (At 09:19 AM).
  3. alter diskgroup DATA rebalance power 11; (At 09:29 AM).

And then I waited until it finished (means v$asm_operation returns no rows for the DATA diskgroup).

Note that 2) and 3) interrupted the rebalance in progress and launched a new one with a new power.

During this amount of time I collected the ASM performance metrics that way for the DATA diskgroup only.

I’ll present the results with Tableau (For each Graph I’ll keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce).

Note: There is no database activity on the Host where the rebalance has been launched.

Here are the results:

First let’s verify that the whole rebalance activity has been done on the +ASM1 instance (As I launched the rebalance operations from it).

Screen Shot 2014-08-31 at 18.50.48

We can see:

  1. That all Read and Write rebalance activity has been done on +ASM1 .
  2. That the read throughput is very close to the write throughput on +ASM1.
  3. The impact of the power values (2,6 and 11) on the throughput.

Now I would like to compare the behavior of 2 Sets of Disks: The disks that have been dropped and the disks that will remain into the DATA diskgroup.

To do so, let’s create in Tableau a SET that contains the 2 dropped disks.

Screen Shot 2014-08-20 at 21.27.34

Let’s call it “Dropped Disks”

Screen Shot 2014-08-31 at 18.53.04

So that now we are able to display the ASM metrics IN this set (the 2 dropped disks) and OUT this set (the disks that will remain into the DATA diskgroup).

I will filter the metrics on ASM1 only (to avoid any “little parasites” coming from ASM2).

Let’s visualize the Reads/s and Writes/s metrics:

Screen Shot 2014-08-31 at 18.57.08

We can see that during the 3 rebalances:

  1. No writes on the dropped disks.
  2. Number of Reads/s increasing on the dropped disks depending of the power values.
  3. Reads/s and Writes/s both increasing on the remaining disks depending of the power values.
  • Are 1, 2 and 3 surprising? No.

Let’s visualize the Kby Read/s and Kby Write/s metrics:

Screen Shot 2014-08-31 at 19.00.33

We can see that during the 3 rebalances:

  1. No Kby Write/s on the dropped disks.
  2. Number of Kby Read/s increasing on the dropped disks depending of the power values.
  3. Kby Read/s and Kby Write/s both increasing on the remaining disks depending of the power values.

Are 1, 2 and 3 surprising? No.

Let’s visualize the Average By/Read and Average By/Write metrics:

Important remark regarding the averages computation/display: The By/Read and By/Write measures depend on the number of reads. So the averages have to be calculated using Weighted Averages.

Let’s create the calculated field in Tableau for the By/Read Weighted Average:

Screen Shot 2014-08-20 at 21.56.49

The same has to be done for the By/Write Weighted Average.

Let’s see the result:

Screen Shot 2014-09-01 at 21.33.48

We can see:

  1. The Avg By/Read on the dropped disks is about the same (about 1MB) whatever the power value is.
  2. The Avg By/Read and Avg By/Write on the remaining disks is about the same (about 1MB) whatever the power value is.
  • Are 1 and 2 surprising? No for the behaviour, Yes (at least for me) for the 1MB value as the ASM allocation unit is 4MB.

Now that we have seen all those metrics, we can ask:

Q1: The ASM Allocation Unit size is 4MB and the Avg By/Read is stucked to 1MB,why?

I don’t have the answer yet, it will be the subject of another post.

Two remarks before to conclude:

  1. The ASM rebalance activity is not recorded into the v$asm_disk_iostat viewIt is recorded into the v$asm_disk_stat view. So, if you are using the asm_metrics utility, you have to change the asm_feature_version variable to a value > your ASM instance version.
  2. I tested with compatible.asm set to 10.1 and 11.2.0.2 and observed the same behaviour for all those metrics.

Conclusion of Part II:

  • Nothing surprising except (at least for me) that the Avg By/Read is stucked to 1MB (While the allocation unit is 4MB).
  • I’ll update this post with ASM 12c results as soon as I can (if something new needs to be told).
Posted in ASM, Tableau | Leave a comment

A closer look at ASM rebalance, Part I: Disks have been added

This article is the first Part of the “A closer look at ASM rebalance” series:

  1. Part I: Disks have been added.
  2. Part II: Disks have been dropped.
  3. Part III: Disks have been added and dropped (at the same time).

If you are not familiar with ASM rebalance I would suggest first to read those 2 blog posts written by Bane Radulovic:

In this part I want to visualize the rebalance operation (with 3 power values: 2,6 and 11) after disks have been added (no dropped disks yet: It will be for the parts II and III).

To do so, on a 2 nodes Extended Rac Cluster (11.2.0.4), I added 2 disks into the DATA diskgroup (created with an ASM Allocation Unit of 4MB) and launched (connected on +ASM1):

  1. alter diskgroup DATA rebalance power 2; (At 11:55 AM).
  2. alter diskgroup DATA rebalance power 6; (At 12:05 PM).
  3. alter diskgroup DATA rebalance power 11; (At 12:15 PM).

And then I waited until it finished (means v$asm_operation returns no rows for the DATA diskgroup).

Note that 2) and 3) interrupted the rebalance in progress and launched a new one with a new power.

During this amount of time I collected the ASM performance metrics that way for the DATA diskgroup only.

I’ll present the results with Tableau (For each Graph I’ll keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce).

Note: There is no database activity on the Host where the rebalance has been launched.

Here are the results:

First let’s verify that the whole rebalance activity has been done on the +ASM1 instance (As I launched the rebalance operations from it).

Screen Shot 2014-08-25 at 18.19.34

We can see:

  1. That all Read and Write rebalance activity has been done on +ASM1 .
  2. That the read throughput is very close to the write throughput on +ASM1.
  3. The impact of the power values (2,6 and 11) on the throughput.

Now I would like to compare the behavior of 2 Sets of Disks: The disks added and the disks that are already part of the DATA diskgroup.

To do so, let’s create in Tableau a SET that contains the 2 new disks.

Screen Shot 2014-08-20 at 21.27.34

Let’s call it “New Disks”

Screen Shot 2014-08-20 at 21.29.42

So that now we are able to display the ASM metrics IN this set (the 2 new disks) and OUT this set (the already existing disks means already part of the DATA diskgroup).

I will filter the metrics on ASM1 only (to avoid any “little parasites” coming from ASM2).

Let’s visualize the Reads/s and Writes/s metrics:

Screen Shot 2014-08-25 at 18.26.10

We can see that during the 3 rebalances:

  1. No Reads on the new disks (at least until about 12:40 pm).
  2. Number of Writes/s increasing on the new disks depending of the power values.
  3. Reads/s and Writes/s both increasing on the already existing disks depending of the power values.
  4. As of 12.40 pm, activity on the existing disks while near zero activity on the new ones.
  5. As of 12.40 pm number of Writes/s >= Reads/s on the existing disks (while it was the opposite before).
  • Are 1, 2 and 3 surprising? No.
  • What happened for 4 and 5? I’ll answer later on.

Let’s visualize the Kby Read/s and Kby Write/s metrics:

Screen Shot 2014-08-25 at 18.31.59

We can see that during the 3 rebalances:

  1. No Kby Read/s on the new disks.
  2. Number of Kby Write/s increasing on the new disks depending of the power values.
  3. Kby Read/s and Kby Write/s both increasing on the existing disks depending of the power values.
  4. As of 12.40 pm, activity on the existing disks while no activity on the new ones.
  5. As of 12.40 pm same amount of Kby Read/s and Kby Write/s on the existing disks (while it was not the case before).
  • Are 1, 2 and 3 surprising? No.
  • What happened for 4 and 5? I’ll answer later on.

Let’s visualize the Average By/Read and Average By/Write metrics:

Important remark regarding the averages computation/display: The By/Read and By/Write measures depend on the number of reads. So the averages have to be calculated using Weighted Averages.

Let’s create the calculated field in Tableau for the By/Read Weighted Average:

Screen Shot 2014-08-20 at 21.56.49

The same has to be done for the By/Write Weighted Average.

Let’s see the result:

Screen Shot 2014-08-25 at 18.38.07

We can see:

  1. The Avg By/Write on the new disks is about the same (about 1MB) whatever the power value is (before 12:40 pm).
  2. The Avg By/Write tends to increase with the power on the already existing disks.
  3. The Avg By/Read on the existing disks is about the same (about 1MB) whatever the power value is.
  • Is 1 surprising? No.
  • Is 2 surprising? Yes (at least for me).
  • Is 3 surprising? No.

Now that we have seen all those metrics, we can ask:

Q1: So what the hell happened at 12:40 pm?

Let’s check the alert_+ASM1.log file at that time:

Mon Aug 25 12:15:44 2014
ARB0 started with pid=33, OS id=1187132
NOTE: assigning ARB0 to group 4/0x1e089b59 (DATA) with 11 parallel I/Os
Mon Aug 25 12:15:47 2014
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
cellip.ora not found.
Mon Aug 25 12:39:52 2014
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=4
Mon Aug 25 12:40:03 2014
GMON updating for reconfiguration, group 4 at 372 for pid 35, osid 1225810
NOTE: group DATA: updated PST location: disk 0014 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0015 (PST copy 1)
Mon Aug 25 12:40:03 2014
NOTE: group 4 PST updated.
Mon Aug 25 12:40:03 2014
NOTE: membership refresh pending for group 4/0x1e089b59 (DATA)
GMON querying group 4 at 373 for pid 18, osid 67864
SUCCESS: refreshed membership for 4/0x1e089b59 (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Mon Aug 25 12:45:24 2014
NOTE: F1X0 copy 2 relocating from 18:44668 to 18:20099 for diskgroup 4 (DATA)
Mon Aug 25 12:53:49 2014
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 4/0x1e089b59 (DATA)

We can see that the ASM rebalance started the compacting phase (See Bane Radulovic’s blog post for more details about the ASM rebalances phases).

Q2: The ASM Allocation Unit size is 4MB and the Avg By/Read is stucked to 1MB,why?

I guess this is somehow related to the max_sectors_kb and max_hw_sectors_kb SYSFS parameters. It will be the subject of another post.

Two remarks before to conclude:

  1. The ASM rebalance activity is not recorded into the v$asm_disk_iostat viewIt is recorded into the v$asm_disk_stat view. So, if you are using the asm_metrics utility, you have to change the asm_feature_version variable to a value > your ASM instance version.
  2. I tested with compatible.asm set to 10.1 and 11.2.0.2 and observed the same behavior for all those metrics.

Conclusion of Part I:

  • We visualized that the compacting phase of the rebalance operation generates much more activity on the existing disks compare to near zero activity on the new disks.
  • We visualized that the compacting phase of the rebalance operation generates the same amount of Kby Read/s and Kby Write/s on the existing disks (while it was not the case before).
  • We visualized that during the compacting phase the number of Writes/s >= Reads/s on the existing disks (while it was the opposite before).
  • We visualized that the Avg By/Read does not exceed 1MB on the existing disks (while the ASM allocation Unit has been set to 4MB on my diskgroup).
  • We visualized that the Avg By/Write tends to increase with the power on the already existing disks (quite surprising to me).
  • I’ll update this post with ASM 12c results as soon as I can (if something new needs to be told).
Posted in ASM, Tableau | 2 Comments

Are ASM rebalance and preferred read working together?

Introduction:

If I add disks into a diskgroup, then during the rebalance operation, ASM needs to read the data coming from the disks already part of the diskgroup to rebalance them on all the disks (including the new ones).

Question:

If I add 2 disks (one into each failgroup) is the preferred feature took into account for the rebalance process? (“for” means “for the reads generated by the rebalance operation”).

Let’s see:

  • Set the preferred read on +ASM1 (so that +ASM1 “prefers” to read from the “HOST31″ failgroup):
SQL> alter system set asm_preferred_read_failure_groups='DATA.HOST31';

System altered.
  • Add 2 disks (one into each failgroup) into the DATA diskgroup (connected on +ASM1):
SQL> alter diskgroup DATA add failgroup HOST31 disk '/dev/san/HOST31CA8D0D' failgroup HOST32 disk '/dev/san/HOST32CA8D0D';

Diskgroup altered.
  • Check that the ASM compatibility is high enough (>=11.1) to use the preferred read feature:
SQL> select COMPATIBILITY from v$asm_diskgroup where NAME='DATA';

COMPATIBILITY
------------------------------------------------------------
11.2.0.2.0
  • Launch the rebalance:
SQL> alter diskgroup DATA rebalance power 2;

Diskgroup altered.

Now, let’s collect the ASM metrics that way and visualize the result with Tableau.

Note: During the rebalance near zero database activity occurred so that near 100% of the activity is coming from the rebalance process.

Result:

Screen Shot 2014-08-23 at 18.08.39

As you can see:

  1. The +ASM1 instance reads from the HOST31 and the HOST32 failgroups: It did not take into account the preferred read.
  2. I changed the power during the rebalance just for the fun ;-)

Remark:

It has been tested on a 11.2.0.4 extended RAC (Still need to test on 12c).

Conclusion:

  • The ASM preferred read feature is not took into account for the rebalance process.
  • I guess it is still took into account for the reads coming from the databases during the rebalance process.
Posted in ASM, Tableau | Leave a comment