ASM Rebalance: Why is the avg By/Read equal to 1MB while the allocation unit is 4MB?

During the “A closer look at ASM rebalance series” I observed that the average By/Read extracted from the v$asm_disk_stat view during a rebalance is equal to 1MB even if the ASM allocation unit is set to 4MB.

Example:

Screen Shot 2014-08-25 at 18.38.07

The question is: why?

To try to answer this question I decided to use strace (the Linux system call trace utility) on the ASM arb process (this process rebalances data extents within an ASM disk group) during the rebalance of the DATA diskgroup (AU size is 4MB).

From strace I can see:

  • That a very large majority (more than 80% in my case) of the IOs are submitted that way:
io_submit(140691479359488, 4, {{0x7ff547f6c210, 0, 0, 0, 261}, {0x7ff547f6d0b0, 0, 0, 0, 261}, {0x7ff547f6c960, 0, 0, 0, 261}, {0x7ff547f6cbd0, 0, 0, 0, 261}}) = 4

We can see that 4 IOs have been submitted at the same time (with a single io_submit call).

  • And that about 100% of all the IOs submitted are 1MB (look at the IO size which is 1048576 bytes):
io_getevents(140691479359488, 1, 128, {{0x7ff547f6c960, 0x7ff547f6c960, 1048576, 0}, {0x7ff547f6cbd0, 0x7ff547f6cbd0, 1048576, 0}, {0x7ff547f6d0b0, 0x7ff547f6d0b0, 1048576, 0}, {0x7ff547f6c210, 0x7ff547f6c210, 1048576, 0}}, {600, 0}) = 4

I also straced arb during the rebalance of the DATA1M diskgroup (Allocation unit of 1MB) and observed:

  • That about 80% of the IOs are submitted that way:
io_submit(139928633700352, 1, {{0x7f43aadbd210, 0, 1, 0, 262}}) = 1

So 1 IO is submitted per io_submit call.

  • And that about 100% of all the IOs submitted are 1MB:
io_getevents(139928633700352, 3, 128, {{0x7f43aadbd6f0, 0x7f43aadbd6f0, 1048576, 0}}, {0, 0}) = 1

So that it makes sense to conclude that:

  1. The arb process always request IOs of 1MB (whatever the allocation unit is).
  2. The arb IOs requests are grouped and submitted at the same time depending of the diskgroup allocation unit (4 IOs are submitted at the same time with allocation unit set to 4MB, 1 IO submitted by io_submit call with allocation unit set to 1MB).

Based on this, I think it makes sense that the v$asm_disk_stat view reports Avg By/Read of 1MB during the rebalance process (whatever the allocation unit is).

But now one more question: What is the impact of the Linux Maximum IO size?

For this, I launched the rebalance 2 times (on both diskgroups: DATA with AU=4MB and DATA1M with AU=1MB):  One time with max_sectors_kb set to 512 and one time with max_sectors_kb set to 4096.

And observed that:

  • Oracle still does 1MB IOs only and groups the IOs depending of the allocation unit (whatever the max_sectors_kb is).
  • With max_sectors_kb set to 512: The IOs issued on the devices linked to the ASM disks are limited to 512 KB (even if oracle requested 1MB IOs). This can be seen with the iostat output (looking at the avgrq-sz field):
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdak              0.00     0.00    0.00   13.65     0.00 13919.15  1019.72     0.04    3.11   3.11   4.25
sdbd              0.00     0.00    0.10    0.00   102.05     0.00  1020.50     0.00    2.00   2.00   0.02
sddh              0.00     0.00    0.15    0.00   153.25     0.00  1021.67     0.00    4.33   4.33   0.07
sdej              0.00     0.00    0.00   12.35     0.00 12640.10  1023.49     0.04    3.01   3.01   3.72
sdcr              0.00     0.00    0.00   11.80     0.00 12083.20  1024.00     0.03    2.71   2.71   3.20
sdfi              0.00     0.00    0.00   12.35     0.00 12646.40  1024.00     0.07    5.55   5.55   6.85
sdg               0.00     0.00    0.00   11.70     0.00 11980.80  1024.00     0.06    5.35   5.34   6.25

The avgrq-sz is the average size (in sectors) of the requests that were issued to the device. The sector size I am using is 512 bytes (not 4K) so that the IOs are about 1024*512 bytes = 512 KB (which is our max_sectors_kb size).

  • With max_sectors_kb set to 4096: The IOs issued on the devices linked to the ASM disks can be greater than 1MB (even if oracle requested 1MB IOs):
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdi               0.00     0.00    3.75    2.05 20173.15  2253.60  3866.68     0.17   29.72  28.79  16.70
sdas              0.00     0.00    3.90    1.95 20992.35  1638.80  3868.57     0.18   31.08  29.91  17.50
sdgj              0.00     0.00    3.75    2.00 20173.85  2253.20  3900.36     0.20   34.71  33.74  19.40

about 2MB in my case (3900 * 512 bytes).

Remarks:

  • It has been tested with ASM 11.2.0.4 on Linux x86-64 (without asmlib).
  • I’ll update this post with ASM 12c results as soon as I can (if something new needs to be told).

Conclusion:

  • The arb process always request IOs of 1MB (whatever the allocation unit is).
  • The arb process always request IOs of 1MB (whatever the max_sectors_kb is): Then it looks like arb doesn’t probe the IO capabilities of the associated device.
  • The arb IOs requests are grouped and submitted at the same time depending of the diskgroup allocation unit.
  • The kernel splits the arb IOs requests if max_sectors_kb is < 1MB.
  • The kernel try to merge the arb IOs requests if max_sectors_kb is > 1MB.

Thanks to Frits Hoogland for his “Extra huge database IOs series” and for the time he spent answering the questions I asked.

Advertisement

A closer look at ASM rebalance, Part III: Disks have been added and dropped (at the same time)

This article is the third Part of the “A closer look at ASM rebalance” series:

  1. Part I: Disks have been added.
  2. Part II: Disks have been dropped.
  3. Part III: Disks have been added and dropped (at the same time).

If you are not familiar with ASM rebalance I would suggest first to read those 2 blog posts written by Bane Radulovic:

In this part III I want to visualize the rebalance operation (with 3 power values: 2,6 and 11) after disks have been added and dropped (at the same time).

To do so, on a 2 nodes Extended Rac Cluster (11.2.0.4), I added 2 disks and dropped 2 disks (with a single command) into the DATA diskgroup (created with an ASM Allocation Unit of 4MB) and launched (connected on +ASM1):

  1. alter diskgroup DATA rebalance power 2; (At 02:11 PM).
  2. alter diskgroup DATA rebalance power 6; (At 02:24 PM).
  3. alter diskgroup DATA rebalance power 11; (At 02:34 PM).

And then I waited until it finished (means v$asm_operation returns no rows for the DATA diskgroup).

Note that 2) and 3) interrupted the rebalance in progress and launched a new one with a new power.

During this amount of time I collected the ASM performance metrics that way for the DATA diskgroup only.

I’ll present the results with Tableau (For each Graph I’ll keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce).

Note: There is no database activity on the Host where the rebalance has been launched.

Here are the results:

First let’s verify that the whole rebalance activity has been done on the +ASM1 instance (As I launched the rebalance operations from it).

Screen Shot 2014-09-01 at 20.29.42

We can see:

  1. That all Read and Write rebalance activity has been done on +ASM1 .
  2. That the read throughput is very close to the write throughput on +ASM1.
  3. The impact of the power values (2,6 and 11) on the throughput.

Now I would like to compare the behavior of 3 Sets of Disks: The disks that have been dropped, the disks that have been added and the other existing disks into the DATA diskgroup.

To do so, let’s create in Tableau 3 groups:

Screen Shot 2014-09-01 at 21.00.06

Let’s call it “3 Groups”

Screen Shot 2014-09-01 at 20.58.24

So that now we are able to display the ASM metrics for those 3 sets of disks.

I will filter the metrics on ASM1 only (to avoid any “little parasites” coming from ASM2).

Let’s visualize the Reads/s and Writes/s metrics:

Screen Shot 2014-09-01 at 21.03.04

We can see that during the 3 rebalances:

  1. No writes on the dropped disks.
  2. No reads on the new disks.
  3. Number of Reads/s increasing on the dropped disks depending of the power values.
  4. Number of Writes/s increasing on the new disks depending of the power values.
  5. Reads/s and Writes/s both increasing on the other disks depending of the power values.
  6. As of 03:06 PM, no activity on the dropped and new disks while there is still activity on the other disks.
  • Are 1, 2, 3, 4 and 5 surprising? No.
  • What happened for 6? I’ll answer later on.

Let’s visualize the Kby Read/s and Kby Write/s metrics:

Screen Shot 2014-09-01 at 21.12.44

We can see that during the 3 rebalances:

  1. No Kby Write/s on the dropped disks.
  2. No Kby Read/s on the new disks.
  3. Number of Kby Read/s increasing on the dropped disks depending of the power values.
  4. Number of Kby Write/s increasing on the new disks depending of the power values.
  5. Kby Read/s and Kby Write/s both increasing on the other disks depending of the power values.
  6. Kby Read/s and Kby Write/s are very close on the other disks (It was not the case into the Part I).
  7. As of 03:06 PM, no activity on the dropped and new disks while there is still activity on the other disks.
  • Are 1, 2, 3, 4, 5 and 6 surprising? No.
  • What happened for 7? I’ll answer later on.

Let’s visualize the Average By/Read and Average By/Write metrics:

Important remark regarding the averages computation/display: The By/Read and By/Write measures depend on the number of reads. So the averages have to be calculated using Weighted Averages.

Let’s create the calculated field in Tableau for the By/Read Weighted Average:

Screen Shot 2014-08-20 at 21.56.49

The same has to be done for the By/Write Weighted Average.

Let’s see the result:

Screen Shot 2014-09-01 at 21.22.10

We can see:

  1. The Avg By/Read on the dropped disks is about the same (about 1MB) whatever the power value is.
  2. The Avg By/Write on the new disks is about the same (about 1MB) whatever the power value is.
  3. The Avg By/Read and Avg By/Write on the other disks is about the same (about 1MB) whatever the power value is.
  • Are 1,2 and 3 surprising? No for the behaviour,Yes (at least for me) for the 1MB value as the ASM allocation unit is 4MB.

Now that we have seen all those metrics, we can ask:

Q1: So what the hell happened at 03:06 pm?

Let’s check the alert_+ASM1.log file at that time:

Mon Aug 25 15:06:13 2014
NOTE: membership refresh pending for group 4/0x1e089b59 (DATA)
GMON querying group 4 at 396 for pid 18, osid 67864
GMON querying group 4 at 397 for pid 18, osid 67864
NOTE: Disk DATA_0006 in mode 0x0 marked for de-assignment
NOTE: Disk DATA_0007 in mode 0x0 marked for de-assignment
SUCCESS: refreshed membership for 4/0x1e089b59 (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Mon Aug 25 15:07:16 2014
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 4/0x1e089b59 (DATA)

We can see that the ASM rebalance started the compacting phase (See Bane Radulovic’s blog post for more details about the ASM rebalances phases).

Q2: The ASM Allocation Unit size is 4MB and the Avg By/Read is stucked to 1MB,why?

I don’t have the answer yet, it will be the subject of another post.

Two remarks before to conclude:

  1. The ASM rebalance activity is not recorded into the v$asm_disk_iostat viewIt is recorded into the v$asm_disk_stat view. So, if you are using the asm_metrics utility, you have to change the asm_feature_version variable to a value > your ASM instance version.
  2. I tested with compatible.asm set to 10.1 and 11.2.0.2 and observed the same behaviour for all those metrics.

Conclusion of Part III:

  • Nothing surprising except (at least for me) that the Avg By/Read is stucked to 1MB (While the allocation unit is 4MB).
  • We visualized that the compacting phase of the rebalance operation generates much more activity on the other disks compare to near zero activity on the dropped and new disks.
  • I’ll update this post with ASM 12c results as soon as I can (if something new needs to be told).

A closer look at ASM rebalance, Part II: Disks have been dropped

This article is the second Part of the “A closer look at ASM rebalance” series:

  1. Part I: Disks have been added.
  2. Part II: Disks have been dropped.
  3. Part III: Disks have been added and dropped (at the same time).

If you are not familiar with ASM rebalance I would suggest first to read those 2 blog posts written by Bane Radulovic:

In this part II I want to visualize the rebalance operation (with 3 power values: 2,6 and 11) after disks have been dropped.

To do so, on a 2 nodes Extended Rac Cluster (11.2.0.4), I dropped 2 disks into the DATA diskgroup (created with an ASM Allocation Unit of 4MB) and launched (connected on +ASM1):

  1. alter diskgroup DATA rebalance power 2; (At 09:09 AM).
  2. alter diskgroup DATA rebalance power 6; (At 09:19 AM).
  3. alter diskgroup DATA rebalance power 11; (At 09:29 AM).

And then I waited until it finished (means v$asm_operation returns no rows for the DATA diskgroup).

Note that 2) and 3) interrupted the rebalance in progress and launched a new one with a new power.

During this amount of time I collected the ASM performance metrics that way for the DATA diskgroup only.

I’ll present the results with Tableau (For each Graph I’ll keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce).

Note: There is no database activity on the Host where the rebalance has been launched.

Here are the results:

First let’s verify that the whole rebalance activity has been done on the +ASM1 instance (As I launched the rebalance operations from it).

Screen Shot 2014-08-31 at 18.50.48

We can see:

  1. That all Read and Write rebalance activity has been done on +ASM1 .
  2. That the read throughput is very close to the write throughput on +ASM1.
  3. The impact of the power values (2,6 and 11) on the throughput.

Now I would like to compare the behavior of 2 Sets of Disks: The disks that have been dropped and the disks that will remain into the DATA diskgroup.

To do so, let’s create in Tableau a SET that contains the 2 dropped disks.

Screen Shot 2014-08-20 at 21.27.34

Let’s call it “Dropped Disks”

Screen Shot 2014-08-31 at 18.53.04

So that now we are able to display the ASM metrics IN this set (the 2 dropped disks) and OUT this set (the disks that will remain into the DATA diskgroup).

I will filter the metrics on ASM1 only (to avoid any “little parasites” coming from ASM2).

Let’s visualize the Reads/s and Writes/s metrics:

Screen Shot 2014-08-31 at 18.57.08

We can see that during the 3 rebalances:

  1. No writes on the dropped disks.
  2. Number of Reads/s increasing on the dropped disks depending of the power values.
  3. Reads/s and Writes/s both increasing on the remaining disks depending of the power values.
  • Are 1, 2 and 3 surprising? No.

Let’s visualize the Kby Read/s and Kby Write/s metrics:

Screen Shot 2014-08-31 at 19.00.33

We can see that during the 3 rebalances:

  1. No Kby Write/s on the dropped disks.
  2. Number of Kby Read/s increasing on the dropped disks depending of the power values.
  3. Kby Read/s and Kby Write/s both increasing on the remaining disks depending of the power values.

Are 1, 2 and 3 surprising? No.

Let’s visualize the Average By/Read and Average By/Write metrics:

Important remark regarding the averages computation/display: The By/Read and By/Write measures depend on the number of reads. So the averages have to be calculated using Weighted Averages.

Let’s create the calculated field in Tableau for the By/Read Weighted Average:

Screen Shot 2014-08-20 at 21.56.49

The same has to be done for the By/Write Weighted Average.

Let’s see the result:

Screen Shot 2014-09-01 at 21.33.48

We can see:

  1. The Avg By/Read on the dropped disks is about the same (about 1MB) whatever the power value is.
  2. The Avg By/Read and Avg By/Write on the remaining disks is about the same (about 1MB) whatever the power value is.
  • Are 1 and 2 surprising? No for the behaviour, Yes (at least for me) for the 1MB value as the ASM allocation unit is 4MB.

Now that we have seen all those metrics, we can ask:

Q1: The ASM Allocation Unit size is 4MB and the Avg By/Read is stucked to 1MB,why?

I don’t have the answer yet, it will be the subject of another post.

Two remarks before to conclude:

  1. The ASM rebalance activity is not recorded into the v$asm_disk_iostat viewIt is recorded into the v$asm_disk_stat view. So, if you are using the asm_metrics utility, you have to change the asm_feature_version variable to a value > your ASM instance version.
  2. I tested with compatible.asm set to 10.1 and 11.2.0.2 and observed the same behaviour for all those metrics.

Conclusion of Part II:

  • Nothing surprising except (at least for me) that the Avg By/Read is stucked to 1MB (While the allocation unit is 4MB).
  • I’ll update this post with ASM 12c results as soon as I can (if something new needs to be told).

A closer look at ASM rebalance, Part I: Disks have been added

This article is the first Part of the “A closer look at ASM rebalance” series:

  1. Part I: Disks have been added.
  2. Part II: Disks have been dropped.
  3. Part III: Disks have been added and dropped (at the same time).

If you are not familiar with ASM rebalance I would suggest first to read those 2 blog posts written by Bane Radulovic:

In this part I want to visualize the rebalance operation (with 3 power values: 2,6 and 11) after disks have been added (no dropped disks yet: It will be for the parts II and III).

To do so, on a 2 nodes Extended Rac Cluster (11.2.0.4), I added 2 disks into the DATA diskgroup (created with an ASM Allocation Unit of 4MB) and launched (connected on +ASM1):

  1. alter diskgroup DATA rebalance power 2; (At 11:55 AM).
  2. alter diskgroup DATA rebalance power 6; (At 12:05 PM).
  3. alter diskgroup DATA rebalance power 11; (At 12:15 PM).

And then I waited until it finished (means v$asm_operation returns no rows for the DATA diskgroup).

Note that 2) and 3) interrupted the rebalance in progress and launched a new one with a new power.

During this amount of time I collected the ASM performance metrics that way for the DATA diskgroup only.

I’ll present the results with Tableau (For each Graph I’ll keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce).

Note: There is no database activity on the Host where the rebalance has been launched.

Here are the results:

First let’s verify that the whole rebalance activity has been done on the +ASM1 instance (As I launched the rebalance operations from it).

Screen Shot 2014-08-25 at 18.19.34

We can see:

  1. That all Read and Write rebalance activity has been done on +ASM1 .
  2. That the read throughput is very close to the write throughput on +ASM1.
  3. The impact of the power values (2,6 and 11) on the throughput.

Now I would like to compare the behavior of 2 Sets of Disks: The disks added and the disks that are already part of the DATA diskgroup.

To do so, let’s create in Tableau a SET that contains the 2 new disks.

Screen Shot 2014-08-20 at 21.27.34

Let’s call it “New Disks”

Screen Shot 2014-08-20 at 21.29.42

So that now we are able to display the ASM metrics IN this set (the 2 new disks) and OUT this set (the already existing disks means already part of the DATA diskgroup).

I will filter the metrics on ASM1 only (to avoid any “little parasites” coming from ASM2).

Let’s visualize the Reads/s and Writes/s metrics:

Screen Shot 2014-08-25 at 18.26.10

We can see that during the 3 rebalances:

  1. No Reads on the new disks (at least until about 12:40 pm).
  2. Number of Writes/s increasing on the new disks depending of the power values.
  3. Reads/s and Writes/s both increasing on the already existing disks depending of the power values.
  4. As of 12.40 pm, activity on the existing disks while near zero activity on the new ones.
  5. As of 12.40 pm number of Writes/s >= Reads/s on the existing disks (while it was the opposite before).
  • Are 1, 2 and 3 surprising? No.
  • What happened for 4 and 5? I’ll answer later on.

Let’s visualize the Kby Read/s and Kby Write/s metrics:

Screen Shot 2014-08-25 at 18.31.59

We can see that during the 3 rebalances:

  1. No Kby Read/s on the new disks.
  2. Number of Kby Write/s increasing on the new disks depending of the power values.
  3. Kby Read/s and Kby Write/s both increasing on the existing disks depending of the power values.
  4. As of 12.40 pm, activity on the existing disks while no activity on the new ones.
  5. As of 12.40 pm same amount of Kby Read/s and Kby Write/s on the existing disks (while it was not the case before).
  • Are 1, 2 and 3 surprising? No.
  • What happened for 4 and 5? I’ll answer later on.

Let’s visualize the Average By/Read and Average By/Write metrics:

Important remark regarding the averages computation/display: The By/Read and By/Write measures depend on the number of reads. So the averages have to be calculated using Weighted Averages.

Let’s create the calculated field in Tableau for the By/Read Weighted Average:

Screen Shot 2014-08-20 at 21.56.49

The same has to be done for the By/Write Weighted Average.

Let’s see the result:

Screen Shot 2014-08-25 at 18.38.07

We can see:

  1. The Avg By/Write on the new disks is about the same (about 1MB) whatever the power value is (before 12:40 pm).
  2. The Avg By/Write tends to increase with the power on the already existing disks.
  3. The Avg By/Read on the existing disks is about the same (about 1MB) whatever the power value is.
  • Is 1 surprising? No.
  • Is 2 surprising? Yes (at least for me).
  • Is 3 surprising? No.

Now that we have seen all those metrics, we can ask:

Q1: So what the hell happened at 12:40 pm?

Let’s check the alert_+ASM1.log file at that time:

Mon Aug 25 12:15:44 2014
ARB0 started with pid=33, OS id=1187132
NOTE: assigning ARB0 to group 4/0x1e089b59 (DATA) with 11 parallel I/Os
Mon Aug 25 12:15:47 2014
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
cellip.ora not found.
Mon Aug 25 12:39:52 2014
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=4
Mon Aug 25 12:40:03 2014
GMON updating for reconfiguration, group 4 at 372 for pid 35, osid 1225810
NOTE: group DATA: updated PST location: disk 0014 (PST copy 0)
NOTE: group DATA: updated PST location: disk 0015 (PST copy 1)
Mon Aug 25 12:40:03 2014
NOTE: group 4 PST updated.
Mon Aug 25 12:40:03 2014
NOTE: membership refresh pending for group 4/0x1e089b59 (DATA)
GMON querying group 4 at 373 for pid 18, osid 67864
SUCCESS: refreshed membership for 4/0x1e089b59 (DATA)
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. No voting file found.
Mon Aug 25 12:45:24 2014
NOTE: F1X0 copy 2 relocating from 18:44668 to 18:20099 for diskgroup 4 (DATA)
Mon Aug 25 12:53:49 2014
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 4/0x1e089b59 (DATA)

We can see that the ASM rebalance started the compacting phase (See Bane Radulovic’s blog post for more details about the ASM rebalances phases).

Q2: The ASM Allocation Unit size is 4MB and the Avg By/Read is stucked to 1MB,why?

I guess this is somehow related to the max_sectors_kb and max_hw_sectors_kb SYSFS parameters. It will be the subject of another post.

Two remarks before to conclude:

  1. The ASM rebalance activity is not recorded into the v$asm_disk_iostat viewIt is recorded into the v$asm_disk_stat view. So, if you are using the asm_metrics utility, you have to change the asm_feature_version variable to a value > your ASM instance version.
  2. I tested with compatible.asm set to 10.1 and 11.2.0.2 and observed the same behavior for all those metrics.

Conclusion of Part I:

  • We visualized that the compacting phase of the rebalance operation generates much more activity on the existing disks compare to near zero activity on the new disks.
  • We visualized that the compacting phase of the rebalance operation generates the same amount of Kby Read/s and Kby Write/s on the existing disks (while it was not the case before).
  • We visualized that during the compacting phase the number of Writes/s >= Reads/s on the existing disks (while it was the opposite before).
  • We visualized that the Avg By/Read does not exceed 1MB on the existing disks (while the ASM allocation Unit has been set to 4MB on my diskgroup).
  • We visualized that the Avg By/Write tends to increase with the power on the already existing disks (quite surprising to me).
  • I’ll update this post with ASM 12c results as soon as I can (if something new needs to be told).

Are ASM rebalance and preferred read working together?

Introduction:

If I add disks into a diskgroup, then during the rebalance operation, ASM needs to read the data coming from the disks already part of the diskgroup to rebalance them on all the disks (including the new ones).

Question:

If I add 2 disks (one into each failgroup) is the preferred feature took into account for the rebalance process? (“for” means “for the reads generated by the rebalance operation”).

Let’s see:

  • Set the preferred read on +ASM1 (so that +ASM1 “prefers” to read from the “HOST31” failgroup):
SQL> alter system set asm_preferred_read_failure_groups='DATA.HOST31';

System altered.
  • Add 2 disks (one into each failgroup) into the DATA diskgroup (connected on +ASM1):
SQL> alter diskgroup DATA add failgroup HOST31 disk '/dev/san/HOST31CA8D0D' failgroup HOST32 disk '/dev/san/HOST32CA8D0D';

Diskgroup altered.
  • Check that the ASM compatibility is high enough (>=11.1) to use the preferred read feature:
SQL> select COMPATIBILITY from v$asm_diskgroup where NAME='DATA';

COMPATIBILITY
------------------------------------------------------------
11.2.0.2.0
  • Launch the rebalance:
SQL> alter diskgroup DATA rebalance power 2;

Diskgroup altered.

Now, let’s collect the ASM metrics that way and visualize the result with Tableau.

Note: During the rebalance near zero database activity occurred so that near 100% of the activity is coming from the rebalance process.

Result:

Screen Shot 2014-08-23 at 18.08.39

As you can see:

  1. The +ASM1 instance reads from the HOST31 and the HOST32 failgroups: It did not take into account the preferred read.
  2. I changed the power during the rebalance just for the fun 😉

Remark:

It has been tested on a 11.2.0.4 extended RAC (Still need to test on 12c).

Conclusion:

  • The ASM preferred read feature is not took into account for the rebalance process.
  • I guess it is still took into account for the reads coming from the databases during the rebalance process.

Simulate and Visualize the impact of the ASM preferred feature on the read IOPS and throughput

Suppose that you decided to put the ASM preferred feature in place because you observed that the read latency is too high on the farthest disk array (You can find how you can lead to this conclusion with the use case 3 into this post).

So, you want to enable the ASM preferred read feature so that:

  1. The +ASM1 instance “prefers” to read from the “WIN” failgroup.
  2. The +ASM2 instance “prefers” to read from the “JMO” failgroup.

But doing so may have an impact on the number of read IOPS and the throughput repartition per host/disk array because:

  1. The “previous” ASM1 to JMO reads will now be done on the “WIN” array.
  2. The “previous” ASM2 to WIN reads will now be done on the “JMO” array.

Of course, the total number of read operations and throughput will not change, but the repartition across the failgroup (disk array) may change with the ASM preferred read feature in place.

Question:

  • Is the architecture able to deal with this new read repartition?

To answer this question I will:

  1. Collect the ASM metrics during a certain amount of time (without the ASM preferred read in place) and produce a csv file as described here.
  2. Visualize the ASM metrics with Tableau and simulate the impact of the preferred read feature on the read IOPS and the throughput repartition.

Once the csv file is ready (means you collected a representative workload), let’s check what the current workload is (Without the ASM preferred read in place).

For the Kby Read/s measure:

We can visualize it that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce).

Screen Shot 2014-08-10 at 18.45.03

For the Reads/s measure:

Screen Shot 2014-08-11 at 11.07.01We can see the read IOPS and the throughput repartition by failgroup and ASM instances. We can see that the read IOPS and the throughput are equally distributed over the Failgroups (It is the expected behaviour without the ASM preferred read in place).

Now, what If we implement the ASM preferred feature? What would be the impact on the read IOPS and the throughput repartition?

To simulate and visualize the impact, let’s create this “New FG for Read operations” calculated field:

Screen Shot 2014-08-11 at 11.10.01

Basically it simulates the ASM preferred Read in place by assigning the failgroup per ASM instances.

Now, let’s simulate and visualize the impact of the ASM preferred read feature (should it be implemented) using the same csv file and this calculated field as dimension.

For the Kby Read/s measure:

Screen Shot 2014-08-11 at 11.12.56

Note that the throughput repartition would not be the same and that the peak are higher (> 200 Mo/s compare to about 130 Mo/s without the ASM preferred read).

For the Reads/s measure:

Screen Shot 2014-08-11 at 11.14.31

Note that the read IOPS repartition would not be the same and that the peak on the WIN failgroup is higher (about 8000 Reads/s compare to about 5000 Reads/s without the ASM preferred read).

Now you can check (with your Systems and Storage administrators) if your current architecture would be able to deal with this new repartition.

Remarks:

  • ASM is not performing any reads for the database, it records metrics for the database instances that it is servicing.

Conclusion:

We have been able to simulate and visualize the impact of the ASM preferred read feature on the read IOPS and the throughput repartition without actually implementing it.

ASM performance metrics visualization: Use cases

Now that I can graph ASM performance metrics, let’s see some use cases.

To display the ASM metrics I’ll use the csv file generated by the csv_asm_metrics utility and Tableau for the visualization. Of course you could use the visualization tool of your choice.

Use case 1: Display the TOP IO consumers

You consolidated several databases on the same machine and you want to visualize which database is generating most of the IO throughput for Reads. You can visualize this that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce with Tableau) :

Screen Shot 2014-07-12 at 14.46.40

I can see that one of my databases is generating most of the throughput.

Should you use RAC, you could split those metrics per ASM instances as well:

Screen Shot 2014-07-12 at 14.47.58

I can see that most of the activity is recorded on ASM2, which makes sense as my RAC services are configured as preferred/available (Active/Passive configuration) and started on the *_2 database instances (linked to ASM2).

Use case 2: I want to see the Read IO distributions by Failgroups

You can visualize this that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce with Tableau):

Screen Shot 2014-07-12 at 10.09.08

We can see that the IOs are equally distributed over the Failgroups. It is the expected behaviour as I am not using the ASM Preferred Read feature.

Use case 3: Should I use the ASM Preferred Read feature on my extended RAC?

Suppose the host hosting the ASM1 instance is close to the disk array on which the “WIN” failgroup has been created. The same way, the host hosting the ASM2 instance is close to the disk array on which the “JMO” failgroup has been created. Let’s see the Read IO latency between the ASM instances and the failgroups.

You can visualize this that way with Tableau (I keep the “columns”, “rows” and “marks” shelf into the print screen so that you can reproduce with Tableau):

Screen Shot 2014-07-12 at 10.17.16

As you can see the ASM1 instance reports faster reads on the “WIN” failgroup and ASM2 reports faster reads on the “JMO” failgroup which makes sense according to our configuration. I can also check if the reads performance are good enough when the Reads are done on the farthest disk array (ASM1 “reading” on the JMO failgroup and ASM2 “reading” on the WIN failgroup) and then decide if the ASM Preferred Read feature needs to be implemented.

Use case 4: Simulate and Visualize the impact of the ASM preferred feature on the read IOPS and throughput (See this blog post).

Use case 5: Are ASM rebalance and preferred read working together? (See this blog post)

Use case 6: A closer look at ASM rebalance, Part I: Disks have been added. (See this blog post)

Use case 7: A closer look at ASM rebalance, Part II: Disks have been dropped. (See this blog post)

Use case 8: A closer look at ASM rebalance, Part III: Disks have been added and dropped (at the same time). (See this blog post)

Remarks:

  • ASM is not performing any reads for the database, it records metrics for the database instances that it is servicing.
  • You can imagine a lot of use cases thanks to the measures collected (Reads/s, Kby Read/s, ms/Read, By/Read, Writes/s, Kby Write/s, ms/Write, By/Write) and all those dimensions (Snap Time, INST, DBINST, DG, FG, DSK).

You can download the asm_metrics and the csv_asm_metrics utilities from this repository.

Graphing ASM performance metrics

ASM metrics are a goldmine, they provide a lot of informations. As you may know, the asm_metrics utility extracts them in real-time.

But sometimes it is not easy to understand the values without the help of a graph. Look at this example: If I cant’ picture it, I can’t understand it.

So depending on your needs, depending on what you are looking for with the ASM metrics: A picture may help.

So let’s graph the output of the asm_metrics utility: For this I created the csv_asm_metrics utility to produce a csv file from the output of the asm_metrics utility.

Once you get the csv file you can graph the metrics with your favourite visualization tool (I’ll use Tableau as an example).

First you have to launch the asm_metrics utility that way (To ensure that all the fields are displayed):

  • -show=inst,dbinst,fg,dg,dsk for ASM >= 11g
  • -show=inst,fg,dg,dsk for ASM < 11g

and redirect the output to a text file:

./asm_metrics.pl -show=inst,dbinst,fg,dg,dsk > asm_metrics.txt

Remark: You can use the -interval parameter to collect data with an interval greater than one second (the default interval), as it could produce a huge output file.

The output file looks like:

............................
Collecting 1 sec....
............................

......... SNAP TAKEN AT ...................

13:48:54                                                                              Kby       Avg       AvgBy/               Kby       Avg        AvgBy/ 
13:48:54   INST     DBINST        DG            FG           DSK            Reads/s   Read/s    ms/Read   Read      Writes/s   Write/s   ms/Write   Write  
13:48:54   ------   -----------   -----------   ----------   ----------     -------   -------   -------   ------    ------     -------   --------   ------ 
13:48:54   +ASM1                                                            6731      54224     1.4       8249      42         579       3.0        14117  
13:48:54   +ASM1    BDT10_1                                                 2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA                                      2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST31                      0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST31       HOST31CA0D1C   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST31       HOST31CA0D1D   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST32                      2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST32       HOST32CA0D1C   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       DATA          HOST32       HOST32CA0D1D   2         32        0.2       16384     0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA                                       0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST31                      0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST31       HOST31CC8D0F   0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST32                      0         0         0.0       0         0          0         0.0        0      
13:48:54   +ASM1    BDT10_1       FRA           HOST32       HOST32CC8D0F   0         0         0.0       0         0          0         0.0        0      

and so on...

Now let’s produce the csv file with the csv_asm_metrics utility. Let’s see the help:

./csv_asm_metrics.pl -help

Usage: ./csv_asm_metrics.pl [-if] [-of] [-d] [-help]

  Parameter         Comment
  ---------         -------
  -if=              Input file name (output of asm_metrics)
  -of=              Output file name (the csv file)
  -d=               Day of the first snapshot (YYYY/MM/DD)

Example: ./csv_asm_metrics.pl -if=asm_metrics.txt -of=asm_metrics.csv -d='2014/07/04'

and generate the csv file that way:

./csv_asm_metrics.pl -if=asm_metrics.txt -of=asm_metrics.csv -d='2014/07/04'

The csv file looks like:

Snap Time,INST,DBINST,DG,FG,DSK,Reads/s,Kby Read/s,ms/Read,By/Read,Writes/s,Kby Write/s,ms/Write,By/Write
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST31,HOST31CA0D1C,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST31,HOST31CA0D1D,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST32,HOST32CA0D1C,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,DATA,HOST32,HOST32CA0D1D,2,32,0.2,16384,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,FRA,HOST31,HOST31CC8D0F,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,FRA,HOST32,HOST32CC8D0F,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO1,HOST31,HOST31CC0D13,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO1,HOST32,HOST32CC0D13,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO2,HOST31,HOST31CC0D12,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT10_1,REDO2,HOST32,HOST32CC0D12,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT11_1,DATA,HOST31,HOST31CA0D1C,0,0,0.0,0,0,0,0.0,0
2014/07/04 13:48:54,+ASM1,BDT11_1,DATA,HOST31,HOST31CA0D1D,0,0,0.0,0,2,16,0.5,8448

As you can see:

  1. The day has been added (to create a date) and next ones will be calculated (should the snaps cross multiple days).
  2. Only the rows that contain all the fields have been recorded into the csv file (The script does not record the other ones as they represent aggregated values).

Now I can import this csv file into Tableau.

You can imagine a lot of graphs thanks to the measures collected (Reads/s, Kby Read/s, ms/Read, By/Read, Writes/s, Kby Write/s, ms/Write, By/Write) and all those dimensions (Snap Time, INST, DBINST, DG, FG, DSK).

Let’s graph the throughput and latency per failgroup for example.

Important remark regarding some averages computation/display:

The ms/Read and By/Read measures depend on the number of reads. So the averages have to be calculated using Weighted Averages. (The same apply for ms/Write and By/Write).

Let’s create the calculated field in Tableau for those Weighted Averages:

Screen Shot 2014-07-07 at 20.13.28

so that weighted Average ms/Read is:

Screen Shot 2014-07-07 at 20.16.19

Weighted Average By/Read:

Screen Shot 2014-07-07 at 20.21.12

The same way you have to create:

  • Weighted Average ms/Write = sum([ms/Write]*[Writes/s])/sum([Writes/s])
  • Weighted Average By/Write = sum([By/Write]*[Writes/s])/sum([Writes/s])

Now let’s display the average read latency by Failgroup (using the previous calculated weighted average):

Drag the Snap Time dimension to the “columns” shelf and choose “exact date”:

Screen Shot 2014-07-07 at 20.27.23Drag the Weighted Average ms/Read calculated field to the “Rows” shelf:

Screen Shot 2014-07-07 at 20.29.41Drag the FG dimension to the “Color Marks” shelf:

Screen Shot 2014-07-07 at 20.32.33So that the graph looks like:

Screen Shot 2014-07-07 at 20.33.15Create the same graph for the “Kby Read/s” measure (except that I want to see the sum (i.e the throughput and not the average) and put those 2 graphs into the same dashboard:

Screen Shot 2014-07-07 at 20.39.42

Here we are.

Conclusion:

  • We can create a csv file from the output of the asm_metrics utility thanks to csv_asm_metrics.
  • To do so, we have to collect all the fields of asm_metrics with those options:
      • -show=inst,dbinst,fg,dg,dsk for ASM >= 11g
      • -show=inst,fg,dg,dsk for ASM < 11g
  • Once you uploaded the csv file into your favourite visualization tool, don’t forget to calculate weighted averages for ms/Read, By/Read, ms/Write and By/Write if you plan to graph the averages.
  • You can imagine a lot of graphs thanks to the measures collected (Reads/s, Kby Read/s, ms/Read, By/Read, Writes/s, Kby Write/s, ms/Write, By/Write) and all those dimensions (Snap Time, INST, DBINST, DG, FG, DSK).

You can download the csv_asm_metrics utility from this repository or copy the source code from this page.

UPDATE: You can see some use cases here.

ASM free and usable space in case of Imbalanced Diskgroup

In case of “normal” situation (I mean no Imbalanced Disks) you should be able to easily know how much space is available into the ASM Diskgroup.

Harald van Breederode already explained it in detail into this blog post.

But now suppose the Diskgroup has imbalanced Disks: It can occurs for several reasons:

  • A rebalance operation has been aborted/halted.
  • A rebalance operation is waiting.
  • Disks are not of the same size.

Let’s simulate an Imbalanced situation and compute the usable free space:

First I created a “balanced” diskgroup with normal redundancy.

SQL> !kfod
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
================================================================================
   1:     225280 Mb /dev/san/JMO7C43D06                      oracle   dba  
   2:      56320 Mb /dev/san/JMO7C52D06                      oracle   dba  
   3:      56320 Mb /dev/san/JMO7C61D06                      oracle   dba  
   4:     225280 Mb /dev/san/WIN7C43D06                      oracle   dba  
   5:      56320 Mb /dev/san/WIN7C52D06                      oracle   dba  
   6:      56320 Mb /dev/san/WIN7C61D06                      oracle   dba  
--------------------------------------------------------------------------------

SQL> create diskgroup BDT normal redundancy failgroup F1 disk '/dev/san/JMO7C52D06' failgroup F2 disk '/dev/san/WIN7C52D06';

Diskgroup created.

Let’s check its free space with asmcmd and with asm_free_usable_imbalance.sql (I’ll share the code later into this post) built to report the free/usable space of Imbalanced Diskgroup:

SQL> !asmcmd lsdg BDT
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  1048576    112640   112538                0           56269              0             N  BDT/

SQL> @asm_free_usable_imbalance.sql

NAME                              FREE_MB
------------------------------ ----------
BDT                                 56269

Well, we can see that from asmcmd (112538 / 2) 56269 Mb is free/usable and that the asm_free_usable_imbalance.sql reports the same value.

So that the asm_free_usable_imbalance.sql script works at least with “Balanced” diskgroup ;-).

Now let’s produce an Imbalanced situation that way:

  • Add datafiles into the Diskgroup.
  • Add Disks into the failgroups without trigerring a rebalance operation (asm_power_limit = 0).
SQL> create tablespace BDT datafile '+BDT' size 30g;

Tablespace created.

SQL> alter tablespace BDT add datafile '+BDT' size 10g;

Tablespace altered.

SQL> alter tablespace BDT add datafile '+BDT' size 20g;
alter tablespace BDT add datafile '+BDT' size 20g
*
ERROR at line 1:
ORA-01119: error in creating database file '+BDT'
ORA-17502: ksfdcre:4 Failed to create file +BDT
ORA-15041: diskgroup "BDT" space exhausted

So, we can’t add a 20 GB datafile anymore (sounds obvious as the free space was about 55 GB and we already added 40 GB into the diskgroup).

Now add disks of different size without trigerring a rebalance operation:

SQL> !kfod
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
================================================================================
   1:     225280 Mb /dev/san/JMO7C43D06                      oracle   dba  
   2:      56320 Mb /dev/san/JMO7C61D06                      oracle   dba  
   3:     225280 Mb /dev/san/WIN7C43D06                      oracle   dba  
   4:      56320 Mb /dev/san/WIN7C61D06                      oracle   dba  
--------------------------------------------------------------------------------

SQL> alter diskgroup BDT add failgroup F1 disk '/dev/san/JMO7C43D06','/dev/san/JMO7C61D06' failgroup F2 disk '/dev/san/WIN7C43D06','/dev/san/WIN7C61D06';

Diskgroup altered.

Verify that the rebalance is waiting:

SQL> select * from v$asm_operation where group_number = (select GROUP_NUMBER from v$asm_diskgroup where NAME='BDT');

GROUP_NUMBER OPERA STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
           8 REBAL WAIT          0

and the disks usage:

SQL> select failgroup,total_mb,FREE_MB from v$asm_disk where group_number = (select GROUP_NUMBER from v$asm_diskgroup where NAME='BDT');

FAILGROUP                        TOTAL_MB    FREE_MB
------------------------------ ---------- ----------
F1                                  56320      56318
F1                                 225280     225277
F1                                  56320      15304
F2                                 225280     225277
F2                                  56320      15304
F2                                  56320      56318

Here we are: How much usable/free space is “really” available into this “Imbalanced” diskgroup ?

Let’s see what asmcmd and asm_free_usable_imbalance.sql report:

SQL> !asmcmd lsdg BDT
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  Y         512   4096  1048576    675840   593798           225280          184259              0             N  BDT/

SQL> @asm_free_usable_imbalance.sql

NAME                              FREE_MB
------------------------------ ----------
BDT                                 91824

Now we have to remember that for primary extents, ASM will allocate new extents in such way as to distribute each file equally and evenly across all disks and to fill all disks evenly. Thus every disk is maintained at the same percentage full, regardless of the size of the disk. 

So it will write 4 times more primary extents into the 225280 MB disks than into the 56320 MB disks (It will not necessary be the case for mirrored extents as you’ll see later on into this post).

So, asmcmd reports 593798/2 MB of free space: This space is “just” the sum of the free space of the disks that belongs to the diskgroup. So this is not fully “usable” due to the Imbalanced Diskgroup and the rule explained above.

My utility reports 91824 MB of free/usable space (less than 90 GB).

Let’s verify how much space is “usable”:

SQL> alter tablespace BDT add datafile '+BDT' size 30g;

Tablespace altered.

SQL> alter tablespace BDT add datafile '+BDT' size 30g;

Tablespace altered.

SQL> alter tablespace BDT add datafile '+BDT' size 30g;
alter tablespace BDT add datafile '+BDT' size 30g
*
ERROR at line 1:
ORA-01119: error in creating database file '+BDT'
ORA-17502: ksfdcre:4 Failed to create file +BDT
ORA-15041: diskgroup "BDT" space exhausted

SQL> alter tablespace BDT add datafile '+BDT' size 29g;

Tablespace altered.

So, it was not possible to add 90 GB while it has been possible to add 89 GB: My utility reported the right “free/usable” value.

Let’s see the asm_free_usable_imbalance.sql:

SQL> !cat asm_free_usable_imbalance.sql
select /* EXTERNAL REDUNDANCY */
g.name,
sum(d.TOTAL_MB) * min(d.FREE_MB / d.total_mb) /
decode(g.type, 'EXTERN', 1, 'NORMAL', 2, 'HIGH', 3, 1) "USABLE_FREE_MB"
from v$asm_disk d, v$asm_diskgroup g
where d.group_number = g.group_number
and g.type = 'EXTERN'
group by g.name, g.type
union
select /* NON EXTERNAL REDUNDANCY WITH SYMMETRIC FG */
g.name,
sum(d.TOTAL_MB) * min(d.FREE_MB / d.total_mb) /
decode(g.type, 'EXTERN', 1, 'NORMAL', 2, 'HIGH', 3, 1) "USABLE_FREE_MB"
from v$asm_disk d, v$asm_diskgroup g
where d.group_number = g.group_number
and g.group_number not in /* KEEP SYMMETRIC*/
(select distinct (group_number)
from (select group_number,
failgroup,
TOTAL_MB,
count_dsk,
greatest(lag(count_dsk, 1, 0)
over(partition by TOTAL_MB,
group_number order by TOTAL_MB,
FAILGROUP),
lead(count_dsk, 1, 0)
over(partition by TOTAL_MB,
group_number order by TOTAL_MB,
FAILGROUP)) as max_lag_lead,
count(distinct(failgroup)) over(partition by group_number, TOTAL_MB) as nb_fg_per_size,
count_fg
from (select group_number,
failgroup,
TOTAL_MB,
count(*) over(partition by group_number, failgroup, TOTAL_MB) as count_dsk,
count(distinct(failgroup)) over(partition by group_number) as count_fg
from v$asm_disk))
where count_dsk <> max_lag_lead
or nb_fg_per_size <> count_fg)
and g.type <> 'EXTERNAL'
group by g.name, g.type
union
select /* NON EXTERNAL REDUNDANCY WITH NON SYMMETRIC FG
AND DOES EXIST AT LEAST ONE DISK WITH PARTNERS OF DIFFERENT SIZE*/
name,
min(free) / decode(type, 'EXTERN', 1, 'NORMAL', 2, 'HIGH', 3, 1) "USABLE_FREE_MB"
from (select name,
disk_number,
free_mb / (factor / sum(factor) over(partition by name)) as free,
type
from (select name,
disk_number,
avg(free_mb) as free_mb,
avg(total_mb) as total_mb,
sum(factor_disk + factor_partner) as factor,
type
from (SELECT g.name,
g.type,
d.group_number as group_number,
d.disk_number disk_number,
d.total_mb as total_mb,
d.free_mb as free_mb,
p.number_kfdpartner "Partner disk#",
f.factor as factor_disk,
fp.factor as factor_partner
FROM x$kfdpartner p,
v$asm_disk d,
v$asm_diskgroup g,
(select disk_number,
group_number,
TOTAL_MB / min(total_mb) over(partition by group_number) as factor
from v$asm_disk
where state = 'NORMAL'
and mount_status = 'CACHED') f,
(select disk_number,
group_number,
TOTAL_MB / min(total_mb) over(partition by group_number) as factor
from v$asm_disk
where state = 'NORMAL'
and mount_status = 'CACHED') fp
WHERE p.disk = d.disk_number
and p.grp = d.group_number
and f.disk_number = d.disk_number
and f.group_number = d.group_number
and fp.disk_number = p.number_kfdpartner
and fp.group_number = p.grp
and d.group_number = g.group_number
and g.type <> 'EXTERN'
and g.group_number in /* KEEP NON SYMMETRIC */
(select distinct (group_number)
from (select group_number,
failgroup,
TOTAL_MB,
count_dsk,
greatest(lag(count_dsk, 1, 0)
over(partition by
TOTAL_MB,
group_number order by
TOTAL_MB,
FAILGROUP),
lead(count_dsk, 1, 0)
over(partition by
TOTAL_MB,
group_number order by
TOTAL_MB,
FAILGROUP)) as max_lag_lead,
count(distinct(failgroup)) over(partition by group_number, TOTAL_MB) as nb_fg_per_size,
count_fg
from (select group_number,
failgroup,
TOTAL_MB,
count(*) over(partition by group_number, failgroup, TOTAL_MB) as count_dsk,
count(distinct(failgroup)) over(partition by group_number) as count_fg
from v$asm_disk))
where count_dsk <> max_lag_lead
or nb_fg_per_size <> count_fg)
and d.group_number not in /* KEEP DG THAT DOES NOT CONTAIN AT LEAST ONE DISK HAVING PARTNERS OF DIFFERENT SIZE*/
(select distinct (group_number)
from (select d.group_number as group_number,
d.disk_number disk_number,
p.number_kfdpartner "Partner disk#",
f.factor as factor_disk,
fp.factor as factor_partner,
greatest(lag(fp.factor, 1, 0)
over(partition by
d.group_number,
d.disk_number order by
d.group_number,
d.disk_number),
lead(fp.factor, 1, 0)
over(partition by
d.group_number,
d.disk_number order by
d.group_number,
d.disk_number)) as max_lag_lead,
count(p.number_kfdpartner) over(partition by d.group_number, d.disk_number) as nb_partner
FROM x$kfdpartner p,
v$asm_disk d,
v$asm_diskgroup g,
(select disk_number,
group_number,
TOTAL_MB / min(total_mb) over(partition by group_number) as factor
from v$asm_disk
where state = 'NORMAL'
and mount_status = 'CACHED') f,
(select disk_number,
group_number,
TOTAL_MB / min(total_mb) over(partition by group_number) as factor
from v$asm_disk
where state = 'NORMAL'
and mount_status = 'CACHED') fp
WHERE p.disk = d.disk_number
and p.grp = d.group_number
and f.disk_number = d.disk_number
and f.group_number = d.group_number
and fp.disk_number =
p.number_kfdpartner
and fp.group_number = p.grp
and d.group_number = g.group_number
and g.type <> 'EXTERN')
where factor_partner <> max_lag_lead
and nb_partner > 1))
group by name, disk_number, type))
group by name, type;

Don’t be afraid ;-): The SQL is composed of 3 cases:

  1. Diskgroup with external redundancy: In that case the calculation is based on the smallest free space for a disk within the diskgroup.
  2. Diskgoup with “non external” redundancy and symmetric failgroups (Same number of disks grouped by size across failgroups): In that case the calculation is based on the smallest free space for a disk within the diskgroup.
  3. Diskgoup with “non external” redundancy and non symmetric failgroups: In that case the calculation is based on a weighting factor depending of the size of the disks and their partners.

Warning: The SQL does not yet cover the case (it will not return any result) of “non external” redundancy with non symmetric failgroups and if it exists at least one disk that has partners of different sizes.

This is due to the following remark.

Remark:

I said that the ASM allocator ensures balanced disk utilization for primary extents and not necessary for mirrored exents.

Proof:

Create an external redundancy diskgroup with 4 failgroups of disks not of the same sizes.

SQL> !kfod
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
================================================================================
   1:     225280 Mb /dev/san/JMO7C43D06                      oracle   dba  
   2:      56320 Mb /dev/san/JMO7C52D06                      oracle   dba  
   3:      56320 Mb /dev/san/JMO7C61D06                      oracle   dba  
   4:     225280 Mb /dev/san/WIN7C43D06                      oracle   dba  
   5:      56320 Mb /dev/san/WIN7C52D06                      oracle   dba  
   6:      56320 Mb /dev/san/WIN7C61D06                      oracle   dba  
--------------------------------------------------------------------------------

SQL> create diskgroup BDT normal redundancy failgroup F1 disk '/dev/san/JMO7C43D06' failgroup F2 disk '/dev/san/JMO7C52D06' failgroup F3 disk '/dev/san/JMO7C61D06' failgroup F4 disk '/dev/san/WIN7C43D06';

Diskgroup created.

Now create a 30 GB tablespace into the database:

SQL> create tablespace BDT datafile '+BDT' size 30g;

Tablespace created.

And check the primary and mirrored extents distribution within the disks into ASM:

SQL> select d.group_number,d.disk_number,d.total_mb,d.free_mb,d.failgroup,decode(LXN_KFFXP,0,'P',1,'M','MM') ,count(*)
from  X$KFFXP, v$asm_disk d
where 
GROUP_KFFXP = (select GROUP_NUMBER from v$asm_diskgroup where NAME='BDT')
and d.group_number=GROUP_KFFXP
and d.disk_number=DISK_KFFXP
group by d.group_number,d.disk_number,d.total_mb,d.free_mb,d.failgroup,LXN_KFFXP
order by d.failgroup,d.total_mb; 

GROUP_NUMBER DISK_NUMBER   TOTAL_MB    FREE_MB FAILGROUP                      DE   COUNT(*)
------------ ----------- ---------- ---------- ------------------------------ -- ----------
           8           0     225280     202020 F1                             P       12310
           8           0     225280     202020 F1                             M       10945
           8           0     225280     202020 F1                             MM          2
           8           1      56320      48778 F2                             P        3076
           8           1      56320      48778 F2                             M        4442
           8           1      56320      48778 F2                             MM         22
           8           2      56320      48779 F3                             P        3077
           8           2      56320      48779 F3                             M        4443
           8           2      56320      48779 F3                             MM         19
           8           3     225280     202018 F4                             P       12309
           8           3     225280     202018 F4                             M       10942
           8           3     225280     202018 F4                             MM          8

P stands for primary extents and M for mirrored “2 way” extents.

So as you can see ASM allocator writes 4 times more primary extents into the 225280 MB disks than into the 56320 MB disks (12310/3076) while this is not the case for the mirrored extents (about 10945/4442= 2.5 times).

This is the main reason why a case is still not covered into asm_free_space_imbalanced.sql. If you have any clues about the mirrored extents distribution please let me know 😉

Conclusion:

  • The sql provides a way to calculate the free space in case of Imbalanced diskgroup.
  • The calculation is based on the smallest free space for a disk within the diskgroup and a weighting factor that depends of the disks size and their partners.
  • The sql provides also right results for Balanced diskgroup 😉
  • The sql does not yet cover the case of “non external” redundancy and if it exists at least one disk that has partners of different sizes.

OTech Magazine – Introduction and Winter 2014 issue

Just a quick post to introduce the OTech Magazine (In case you don’t already know about it).

This magazine is independent, free to download and last but not least non-vendored at all. It’s all about oracle technologies, written by authors who want to contribute for the fun of it.

The winter 2014 issue is life, you can reach it:

  • here for OTech Magazine.
  • here for direct pdf version.

I also want to mention that I am honored to have been invited by Douwe Pieter van den Bos to write an article for this Winter 2014 issue: It’s about “ASM metrics”, you can read it here.

Enjoy reading !