SLOB Logical I/O testing: Check if your benchmark delivers the maximum

With this post, I would like to share one tip related to SLOB Logical I/O testing when you need to find out the maximum “Logical read (blocks) per second” your “benchmark” is capable of.

The tip is the rule number two, but let’s describe the well known rule number one first.

Rule number one: When testing Logical I/O you have to ensure that the AWR report generated by SLOB shows a “Buffer Hit %” of 100% and a “DB CPU” event that is more than 99% of the DB Time. That is to say something like:

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer  Hit   %:  100.00    In-memory Sort %:  100.00
-----------------------------------------------------------------


Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            Tota    Wait   % DB
Event                                 Waits Time Avg(ms)   time Wait Class
------------------------------ ------------ ---- ------- ------ ----------
DB CPU                                      184.           99.6
control file sequential read            176   .1       0     .0 System I/O
Disk file Mirror Read                    43    0       0     .0 User I/O
Disk file operations I/O                 45    0       0     .0 User I/O

Question: Is that enough to ensure that the “benchmark” delivered the maximum ?

Answer: No, this is not enough.

For example during 2 Logical I/O tests (using only one reader), with those slob.conf parameters:

UPDATE_PCT=0
RUN_TIME=180
WORK_LOOP=0
SCALE=1000

I am able to generate (With rule number one followed, that is to say DB CPU=99.6% and Buffer Hit=100%) this load profile:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0               8.9      0.00      1.09
              DB CPU(s):               1.0               8.8      0.00      1.09
      Redo size (bytes):           7,098.4          61,501.9
  Logical read (blocks):          35,568.6         308,171.6

Or this one:

 
Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0               8.8      0.00      0.69
              DB CPU(s):               1.0               8.8      0.00      0.69
      Redo size (bytes):           6,684.2          57,877.5
  Logical read (blocks):       1,075,426.1       9,312,012.1

As you can see the number of Logical I/O per second delivered during the 2 benchmarks are completely different. Going from about 35 000 to about 1 million even if :

  1. The database did not record any wait events, meaning DB CPU is > 99%
  2. The Buffer Hit is 100%
  3. The number of reader is the same (One in this case: DB TIME = 1 second per second)

So, how can I check that the “benchmark” delivered the maximum ?

As slob launchs a query to generate the I/O, we have to check that the elapsed time(s) of the query is about “run time(s) * number of readers” that is to say 180 seconds in our case.

Let’s check the AWR “SQL ordered by Elapsed Time” section.

For the 35 000 Logical I/O run:

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
           180.0              1        180.02   96.9   99.6     .0 0c1s032jgyrd7
Module: SQL*Plus
DECLARE v_default_schema_number PLS_INTEGER := '1'; v_update_pct PLS_INTEGER :=
'0'; v_max_loop_iterations PLS_INTEGER := '0'; v_seconds_to_run PLS_INTEGER := '
180'; v_scale PLS_INTEGER := '1000'; v_work_unit PLS_INTEGER := '1' ; v_redo_str
ess VARCHAR2(12) := 'HEAVY'; v_shared_data_modulus PLS_INTEGER := '0'; v_nt_arg

            72.1      3,230,605          0.00   38.8   99.9     .0 bhdvtsvjhgvrh
Module: SQL*Plus
SELECT COUNT(C2) FROM CF1 WHERE ( CUSTID > ( :B1 - :B2 ) ) AND (CUSTID < :B1 )

For the 1000000 Logical I/O run:

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
           180.0              1        180.03   96.9   99.6     .0 g41kknwap6na9
Module: SQL*Plus
DECLARE v_default_schema_number PLS_INTEGER := '1'; v_update_pct PLS_INTEGER :=
'0'; v_max_loop_iterations PLS_INTEGER := '0'; v_seconds_to_run PLS_INTEGER := '
180'; v_scale PLS_INTEGER := '1000'; v_work_unit PLS_INTEGER := '1000' ; v_redo_
stress VARCHAR2(12) := 'HEAVY'; v_shared_data_modulus PLS_INTEGER := '0'; v_nt_

           172.1        195,156          0.00   92.7   99.7     .0 bhdvtsvjhgvrh
Module: SQL*Plus
SELECT COUNT(C2) FROM CF1 WHERE ( CUSTID > ( :B1 - :B2 ) ) AND (CUSTID < :B1 )

As you can see during the 1000000 Logical I/O run, the elapsed time of the SQL responsible of the Logical I/O is about 172 seconds while it is only about 72 seconds into the 35 000 Logical I/O run.

Which lead me to the rule number 2: We have to check that the elapsed time(s) of the SQL responsible of the Logical I/O is about “number of readers * run time(s)”.

For example for a run time of 180 seconds and 20 readers (./runit.sh 20) you should observe something like:

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
         3,487.2      2,710,259          0.00   96.7   99.7     .0 bhdvtsvjhgvrh
Module: SQL*Plus
SELECT COUNT(C2) FROM CF1 WHERE ( CUSTID > ( :B1 - :B2 ) ) AND (CUSTID < :B1 )

Remarks:

  1. I had to play with the “WORK_UNIT” slob parameter to reach 1000000 Logical I/O per second.
  2. WORK_UNIT=256 per default (which was not enough on my system to deliver the maximum Logical I/O per second): You may need to increase it as well if the rule number 2 is not followed.

Conclusion:

To ensure that your SLOB benchmark delivers the maximum Logical I/O per second, you need to check:

  1. That the AWR report generated by SLOB shows a “Buffer Hit %” of 100% and a “DB CPU” event that is more than 99% of the DB Time.
  2. That the elapsed time(s) of the SQL responsible of the Logical I/O is about “number of readers * run time(s)”. If not, you may need to increase the WORK_UNIT slob parameter.

 

Leave a comment

ASM free and usable space in case of Imbalanced Diskgroup

In case of “normal” situation (I mean no Imbalanced Disks) you should be able to easily know how much space is available into the ASM Diskgroup.

Harald van Breederode already explains it in detail into this blog post.

But now suppose the Diskgroup has imbalanced Disks: It can occurs for several reasons:

  • A rebalance operation has been aborted/halted.
  • A rebalance operation is waiting.
  • Disks are not of the same size.

Let’s simulate an Imbalanced situation and compute the usable free space:

First I created a “balanced” diskgroup with normal redundancy.

SQL> !kfod
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
================================================================================
   1:     225280 Mb /dev/san/JMO7C43D06                      oracle   dba  
   2:      56320 Mb /dev/san/JMO7C52D06                      oracle   dba  
   3:      56320 Mb /dev/san/JMO7C61D06                      oracle   dba  
   4:     225280 Mb /dev/san/WIN7C43D06                      oracle   dba  
   5:      56320 Mb /dev/san/WIN7C52D06                      oracle   dba  
   6:      56320 Mb /dev/san/WIN7C61D06                      oracle   dba  
--------------------------------------------------------------------------------

SQL> create diskgroup BDT normal redundancy failgroup F1 disk '/dev/san/JMO7C52D06' failgroup F2 disk '/dev/san/WIN7C52D06';

Diskgroup created.

Let’s check its free space with asmcmd and with asm_free_usable_imbalance.sql (I’ll share the code later into this post) built to report the free/usable space of Imbalanced Diskgroup:

SQL> !asmcmd lsdg BDT
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  1048576    112640   112538                0           56269              0             N  BDT/

SQL> @asm_free_usable_imbalance.sql

NAME                              FREE_MB
------------------------------ ----------
BDT                                 56269

Well, we can see that from asmcmd (112538 / 2) 56269 Mb is free/usable and that the asm_free_usable_imbalance.sql reports the same value.

So that the asm_free_usable_imbalance.sql script works at least with “Balanced” diskgroup ;-).

Now let’s produce an Imbalanced situation that way:

  • Add datafiles into the Diskgroup.
  • Add Disks into the failgroups without trigerring a rebalance operation (asm_power_limit = 0).
SQL> create tablespace BDT datafile '+BDT' size 30g;

Tablespace created.

SQL> alter tablespace BDT add datafile '+BDT' size 10g;

Tablespace altered.

SQL> alter tablespace BDT add datafile '+BDT' size 20g;
alter tablespace BDT add datafile '+BDT' size 20g
*
ERROR at line 1:
ORA-01119: error in creating database file '+BDT'
ORA-17502: ksfdcre:4 Failed to create file +BDT
ORA-15041: diskgroup "BDT" space exhausted

So, we can’t add a 20 GB datafile anymore (sounds obvious as the free space was about 55 GB and we already added 40 GB into the diskgroup).

Now add disks of different size without trigerring a rebalance operation:

SQL> !kfod
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
================================================================================
   1:     225280 Mb /dev/san/JMO7C43D06                      oracle   dba  
   2:      56320 Mb /dev/san/JMO7C61D06                      oracle   dba  
   3:     225280 Mb /dev/san/WIN7C43D06                      oracle   dba  
   4:      56320 Mb /dev/san/WIN7C61D06                      oracle   dba  
--------------------------------------------------------------------------------

SQL> alter diskgroup BDT add failgroup F1 disk '/dev/san/JMO7C43D06','/dev/san/JMO7C61D06' failgroup F2 disk '/dev/san/WIN7C43D06','/dev/san/WIN7C61D06';

Diskgroup altered.

Verify that the rebalance is waiting:

SQL> select * from v$asm_operation where group_number = (select GROUP_NUMBER from v$asm_diskgroup where NAME='BDT');

GROUP_NUMBER OPERA STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
           8 REBAL WAIT          0

and the disks usage:

SQL> select failgroup,total_mb,FREE_MB from v$asm_disk where group_number = (select GROUP_NUMBER from v$asm_diskgroup where NAME='BDT');

FAILGROUP                        TOTAL_MB    FREE_MB
------------------------------ ---------- ----------
F1                                  56320      56318
F1                                 225280     225277
F1                                  56320      15304
F2                                 225280     225277
F2                                  56320      15304
F2                                  56320      56318

Here we are: How much usable/free space is “really” available into this “Imbalanced” diskgroup ?

Let’s see what asmcmd and asm_free_usable_imbalance.sql report:

SQL> !asmcmd lsdg BDT
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  Y         512   4096  1048576    675840   593798           225280          184259              0             N  BDT/

SQL> @asm_free_usable_imbalance.sql

NAME                              FREE_MB
------------------------------ ----------
BDT                                 91824

Now we have to remember that for primary extents, ASM will allocate new extents in such way as to distribute each file equally and evenly across all disks and to fill all disks evenly. Thus every disk is maintained at the same percentage full, regardless of the size of the disk. 

So it will write 4 times more primary extents into the 225280 MB disks than into the 56320 MB disks (It will not necessary be the case for mirrored extents as you’ll see later on into this post).

So, asmcmd reports 593798/2 MB of free space: This space is “just” the sum of the free space of the disks that belongs to the diskgroup. So this is not fully “usable” due to the Imbalanced Diskgroup and the rule explained above.

My utility reports 91824 MB of free/usable space (less than 90 GB).

Let’s verify how much space is “usable”:

SQL> alter tablespace BDT add datafile '+BDT' size 30g;

Tablespace altered.

SQL> alter tablespace BDT add datafile '+BDT' size 30g;

Tablespace altered.

SQL> alter tablespace BDT add datafile '+BDT' size 30g;
alter tablespace BDT add datafile '+BDT' size 30g
*
ERROR at line 1:
ORA-01119: error in creating database file '+BDT'
ORA-17502: ksfdcre:4 Failed to create file +BDT
ORA-15041: diskgroup "BDT" space exhausted

SQL> alter tablespace BDT add datafile '+BDT' size 29g;

Tablespace altered.

So, it was not possible to add 90 GB while it has been possible to add 89 GB: My utility reported the right “free/usable” value.

Let’s see the asm_free_usable_imbalance.sql:

SQL> !cat asm_free_usable_imbalance.sql
select /* EXTERNAL REDUNDANCY */
g.name,
sum(d.TOTAL_MB) * min(d.FREE_MB / d.total_mb) /
decode(g.type, 'EXTERN', 1, 'NORMAL', 2, 'HIGH', 3, 1) "USABLE_FREE_MB"
from v$asm_disk d, v$asm_diskgroup g
where d.group_number = g.group_number
and g.type = 'EXTERN'
group by g.name, g.type
union
select /* NON EXTERNAL REDUNDANCY WITH SYMMETRIC FG */
g.name,
sum(d.TOTAL_MB) * min(d.FREE_MB / d.total_mb) /
decode(g.type, 'EXTERN', 1, 'NORMAL', 2, 'HIGH', 3, 1) "USABLE_FREE_MB"
from v$asm_disk d, v$asm_diskgroup g
where d.group_number = g.group_number
and g.group_number not in /* KEEP SYMMETRIC*/
(select distinct (group_number)
from (select group_number,
failgroup,
TOTAL_MB,
count_dsk,
greatest(lag(count_dsk, 1, 0)
over(partition by TOTAL_MB,
group_number order by TOTAL_MB,
FAILGROUP),
lead(count_dsk, 1, 0)
over(partition by TOTAL_MB,
group_number order by TOTAL_MB,
FAILGROUP)) as max_lag_lead,
count(distinct(failgroup)) over(partition by group_number, TOTAL_MB) as nb_fg_per_size,
count_fg
from (select group_number,
failgroup,
TOTAL_MB,
count(*) over(partition by group_number, failgroup, TOTAL_MB) as count_dsk,
count(distinct(failgroup)) over(partition by group_number) as count_fg
from v$asm_disk))
where count_dsk <> max_lag_lead
or nb_fg_per_size <> count_fg)
and g.type <> 'EXTERNAL'
group by g.name, g.type
union
select /* NON EXTERNAL REDUNDANCY WITH NON SYMMETRIC FG
AND DOES EXIST AT LEAST ONE DISK WITH PARTNERS OF DIFFERENT SIZE*/
name,
min(free) / decode(type, 'EXTERN', 1, 'NORMAL', 2, 'HIGH', 3, 1) "USABLE_FREE_MB"
from (select name,
disk_number,
free_mb / (factor / sum(factor) over(partition by name)) as free,
type
from (select name,
disk_number,
avg(free_mb) as free_mb,
avg(total_mb) as total_mb,
sum(factor_disk + factor_partner) as factor,
type
from (SELECT g.name,
g.type,
d.group_number as group_number,
d.disk_number disk_number,
d.total_mb as total_mb,
d.free_mb as free_mb,
p.number_kfdpartner "Partner disk#",
f.factor as factor_disk,
fp.factor as factor_partner
FROM x$kfdpartner p,
v$asm_disk d,
v$asm_diskgroup g,
(select disk_number,
group_number,
TOTAL_MB / min(total_mb) over(partition by group_number) as factor
from v$asm_disk
where state = 'NORMAL'
and mount_status = 'CACHED') f,
(select disk_number,
group_number,
TOTAL_MB / min(total_mb) over(partition by group_number) as factor
from v$asm_disk
where state = 'NORMAL'
and mount_status = 'CACHED') fp
WHERE p.disk = d.disk_number
and p.grp = d.group_number
and f.disk_number = d.disk_number
and f.group_number = d.group_number
and fp.disk_number = p.number_kfdpartner
and fp.group_number = p.grp
and d.group_number = g.group_number
and g.type <> 'EXTERN'
and g.group_number in /* KEEP NON SYMMETRIC */
(select distinct (group_number)
from (select group_number,
failgroup,
TOTAL_MB,
count_dsk,
greatest(lag(count_dsk, 1, 0)
over(partition by
TOTAL_MB,
group_number order by
TOTAL_MB,
FAILGROUP),
lead(count_dsk, 1, 0)
over(partition by
TOTAL_MB,
group_number order by
TOTAL_MB,
FAILGROUP)) as max_lag_lead,
count(distinct(failgroup)) over(partition by group_number, TOTAL_MB) as nb_fg_per_size,
count_fg
from (select group_number,
failgroup,
TOTAL_MB,
count(*) over(partition by group_number, failgroup, TOTAL_MB) as count_dsk,
count(distinct(failgroup)) over(partition by group_number) as count_fg
from v$asm_disk))
where count_dsk <> max_lag_lead
or nb_fg_per_size <> count_fg)
and d.group_number not in /* KEEP DG THAT DOES NOT CONTAIN AT LEAST ONE DISK HAVING PARTNERS OF DIFFERENT SIZE*/
(select distinct (group_number)
from (select d.group_number as group_number,
d.disk_number disk_number,
p.number_kfdpartner "Partner disk#",
f.factor as factor_disk,
fp.factor as factor_partner,
greatest(lag(fp.factor, 1, 0)
over(partition by
d.group_number,
d.disk_number order by
d.group_number,
d.disk_number),
lead(fp.factor, 1, 0)
over(partition by
d.group_number,
d.disk_number order by
d.group_number,
d.disk_number)) as max_lag_lead,
count(p.number_kfdpartner) over(partition by d.group_number, d.disk_number) as nb_partner
FROM x$kfdpartner p,
v$asm_disk d,
v$asm_diskgroup g,
(select disk_number,
group_number,
TOTAL_MB / min(total_mb) over(partition by group_number) as factor
from v$asm_disk
where state = 'NORMAL'
and mount_status = 'CACHED') f,
(select disk_number,
group_number,
TOTAL_MB / min(total_mb) over(partition by group_number) as factor
from v$asm_disk
where state = 'NORMAL'
and mount_status = 'CACHED') fp
WHERE p.disk = d.disk_number
and p.grp = d.group_number
and f.disk_number = d.disk_number
and f.group_number = d.group_number
and fp.disk_number =
p.number_kfdpartner
and fp.group_number = p.grp
and d.group_number = g.group_number
and g.type <> 'EXTERN')
where factor_partner <> max_lag_lead
and nb_partner > 1))
group by name, disk_number, type))
group by name, type;

Don’t be afraid ;-): The SQL is composed of 3 cases:

  1. Diskgroup with external redundancy: In that case the calculation is based on the smallest free space for a disk within the diskgroup.
  2. Diskgoup with “non external” redundancy and symmetric failgroups (Same number of disks grouped by size across failgroups): In that case the calculation is based on the smallest free space for a disk within the diskgroup.
  3. Diskgoup with “non external” redundancy and non symmetric failgroups: In that case the calculation is based on a weighting factor depending of the size of the disks and their partners.

Warning: The SQL does not yet cover the case (it will not return any result) of “non external” redundancy with non symmetric failgroups and if it exists at least one disk that has partners of different sizes.

This is due to the following remark.

Remark:

I said that the ASM allocator ensures balanced disk utilization for primary extents and not necessary for mirrored exents.

Proof:

Create an external redundancy diskgroup with 4 failgroups of disks not of the same sizes.

SQL> !kfod
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
================================================================================
   1:     225280 Mb /dev/san/JMO7C43D06                      oracle   dba  
   2:      56320 Mb /dev/san/JMO7C52D06                      oracle   dba  
   3:      56320 Mb /dev/san/JMO7C61D06                      oracle   dba  
   4:     225280 Mb /dev/san/WIN7C43D06                      oracle   dba  
   5:      56320 Mb /dev/san/WIN7C52D06                      oracle   dba  
   6:      56320 Mb /dev/san/WIN7C61D06                      oracle   dba  
--------------------------------------------------------------------------------

SQL> create diskgroup BDT normal redundancy failgroup F1 disk '/dev/san/JMO7C43D06' failgroup F2 disk '/dev/san/JMO7C52D06' failgroup F3 disk '/dev/san/JMO7C61D06' failgroup F4 disk '/dev/san/WIN7C43D06';

Diskgroup created.

Now create a 30 GB tablespace into the database:

SQL> create tablespace BDT datafile '+BDT' size 30g;

Tablespace created.

And check the primary and mirrored extents distribution within the disks into ASM:

SQL> select d.group_number,d.disk_number,d.total_mb,d.free_mb,d.failgroup,decode(LXN_KFFXP,0,'P',1,'M','MM') ,count(*)
from  X$KFFXP, v$asm_disk d
where 
GROUP_KFFXP = (select GROUP_NUMBER from v$asm_diskgroup where NAME='BDT')
and d.group_number=GROUP_KFFXP
and d.disk_number=DISK_KFFXP
group by d.group_number,d.disk_number,d.total_mb,d.free_mb,d.failgroup,LXN_KFFXP
order by d.failgroup,d.total_mb; 

GROUP_NUMBER DISK_NUMBER   TOTAL_MB    FREE_MB FAILGROUP                      DE   COUNT(*)
------------ ----------- ---------- ---------- ------------------------------ -- ----------
           8           0     225280     202020 F1                             P       12310
           8           0     225280     202020 F1                             M       10945
           8           0     225280     202020 F1                             MM          2
           8           1      56320      48778 F2                             P        3076
           8           1      56320      48778 F2                             M        4442
           8           1      56320      48778 F2                             MM         22
           8           2      56320      48779 F3                             P        3077
           8           2      56320      48779 F3                             M        4443
           8           2      56320      48779 F3                             MM         19
           8           3     225280     202018 F4                             P       12309
           8           3     225280     202018 F4                             M       10942
           8           3     225280     202018 F4                             MM          8

P stands for primary extents and M for mirrored “2 way” extents.

So as you can see ASM allocator writes 4 times more primary extents into the 225280 MB disks than into the 56320 MB disks (12310/3076) while this is not the case for the mirrored extents (about 10945/4442= 2.5 times).

This is the main reason why a case is still not covered into asm_free_space_imbalanced.sql. If you have any clues about the mirrored extents distribution please let me know ;-)

Conclusion:

  • The sql provides a way to calculate the free space in case of Imbalanced diskgroup.
  • The calculation is based on the smallest free space for a disk within the diskgroup and a weighting factor that depends of the disks size and their partners.
  • The sql provides also right results for Balanced diskgroup ;-)
  • The sql does not yet cover the case of “non external” redundancy and if it exists at least one disk that has partners of different sizes.

1 Comment

Modify the Private Network Information in Oracle Clusterware with HAIP in place

The MOS note “How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)” explains how to change the private Network information with the interconnect made of a single interface (then producing downtime).

So what’s about changing the interconnect configuration with Highly Available IP (HAIP) in place ?

Let’s remember what HAIP is (from Oracle® Database High Availability Best Practices):

haip

As HAIP provides redundant interconnect, we should be able to change the interconnect configuration of one private interface  without any downtime, right ?

First let’s check the current interconnect configuration:

oifcfg getif       
eth4  172.16.0.128  global  cluster_interconnect
eth6  172.16.1.0  global  cluster_interconnect

and the associated Virtual IP:

oifcfg iflist -p -n
eth4  172.16.0.128  PRIVATE  255.255.255.128
eth4  169.254.0.0  UNKNOWN  255.255.128.0
eth6  172.16.1.0  PRIVATE  255.255.255.128
eth6  169.254.128.0  UNKNOWN  255.255.128.0

I removed the public network from the output. As you can see each private interface is hosting a Virtual IP (169.xxx.x.x).

Now your sysadmin do the change (for example he will change the subnet and the VLAN) on one of the private interface (Let’s say eth4 for example), so that the ohasd.log log file reports something like:

2014-01-27 11:16:17.154: [GIPCHGEN][1837012736]gipchaInterfaceFail: marking interface failing 0x7f4c0c1b5f00 { host '', haName 'CLSFRAME_olrdev1', local (nil), ip '172.16.0.129:28029', subnet '172.16.0.128', mask '255.255.255.128', mac 'e8-39-35-12-77-7e', ifname 'eth4', numRef 0, numFail 0, idxBoot 0, flags 0x184d }
2014-01-27 11:16:17.334: [GIPCHGEN][1856595712]gipchaInterfaceDisable: disabling interface 0x7f4c0c1b5f00 { host '', haName 'CLSFRAME_olrdev1', local (nil), ip '172.16.0.129:28029', subnet '172.16.0.128', mask '255.255.255.128', mac 'e8-39-35-12-77-7e', ifname 'eth4', numRef 0, numFail 0, idxBoot 0, flags 0x19cd }
2014-01-27 11:16:17.339: [GIPCHDEM][1856595712]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x7f4c0c1b5f00 { host '', haName 'CLSFRAME_olrdev1', local (nil), ip '172.16.0.129:28029', subnet '172.16.0.128', mask '255.255.255.128', mac 'e8-39-35-12-77-7e', ifname 'eth4', numRef 0, numFail 0, idxBoot 0, flags 0x19ed }

So now let’s check the virtual IP and the available interfaces and subnet again:

oifcfg iflist -p -n
eth4  172.17.3.0  PRIVATE  255.255.255.128
eth6  172.16.1.0  PRIVATE  255.255.255.128
eth6  169.254.128.0  UNKNOWN  255.255.128.0
eth6  169.254.0.0  UNKNOWN  255.255.128.0
bond0  158.168.4.0  UNKNOWN  255.255.252.0

Well, we can see 2 things:

  • The first one, is that eth6 is now “hosting” both Virtual IPs (169.xxxx).
  • The second one, is the new “available” subnet for eth4 (172.17.3.0).

So that, we just have to remove the previous eth4 configuration

oifcfg delif -global eth4/172.16.0.128

and put the new one that way:

oifcfg setif -global eth4/172.17.3.0:cluster_interconnect

Now, check again the Virtual IPs:

oifcfg iflist -p -n
eth4  172.17.3.0  PRIVATE  255.255.255.128
eth4  169.254.128.0  UNKNOWN  255.255.128.0
eth6  172.16.1.0  PRIVATE  255.255.255.128
eth6  169.254.0.0  UNKNOWN  255.255.128.0

Perfect, now each private Interface hosts a Virtual IP (We are back to a “normal” configuration).

Remarks:

No downtime will occur as long as:

  • You don’t change both interfaces configuration at the same time ;-)
  • You don’t remove by mistake the configuration of the interface that hosts both Virtual IPs.
  • The interconnect hosting both VIPs doesn’t fail before you put back the updated interface.

There is nothing new with this post. I just had to do the exercise so that I share it ;-)

Conclusion:

Thanks to HAIP, we have been able to change the Interconnect Network configuration of one interface without any downtime.

Leave a comment

OTech Magazine – Introduction and Winter 2014 issue

Just a quick post to introduce the OTech Magazine (In case you don’t already know about it).

This magazine is independent, free to download and last but not least non-vendored at all. It’s all about oracle technologies, written by authors who want to contribute for the fun of it.

The winter 2014 issue is life, you can reach it:

  • here for OTech Magazine.
  • here for direct pdf version.

I also want to mention that I am honored to have been invited by Douwe Pieter van den Bos to write an article for this Winter 2014 issue: It’s about “ASM metrics”, you can read it here.

Enjoy reading !

 

 

 

Leave a comment

Recovering after the loss of Redo Log files thanks to the file descriptor

This morning when I came at work I discovered that all the redo log files (current, active, inactive..) have been removed from one of our database (useless to say by mistake) resulting in:

Errors in file /ec/prod/server/oracle/orabdt/u000/admin/BDT/diag/rdbms/bdt/BDT/trace/BDT_m000_17002.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT1b.log'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3

As the database is a large one, I tried to avoid to restore it:

  1. My first reflex is to check if the flashback database is set to on: Unfortunately it was not the case.
  2. Then I remembered this blog post of Jason Arneil: Recovering from rm -rf on a datafile.

And then I decided to give it a try for the redo log files that way:

  • First I searched for the lgwr process ID associated to this database
ps -ef | grep -i lgwr | grep -i BDT
  oracle 17321  8422   0   Sep 10 ?         291:07 ora_lgwr_BDT
  • Now let’s search for the file descriptor linked to the redo logs files: As I don’t have access to lsof (nor pfiles) and as I want to know the Seq#, I did the research that way (redo logs contain the “Thread and Seq# strings) (output truncated):
cd /proc/17321/fd
for i in *
> do
> echo "for fd: $i"
> strings $i | egrep "Thread.*Seq#"
> done
for fd: 258
Thread 0001, Seq# 0000005285, SCN
for fd: 259
Thread 0001, Seq# 0000005285, SCN
for fd: 260
Thread 0001, Seq# 0000005284, SCN
for fd: 261
Thread 0001, Seq# 0000005284, SCN
  • Then based on the Seq#, and the v$log and v$logfile output I recreated the redo logs that way:
cat 258 > /ec/prod/server/oracle/orabdt/u800/oraredo/BDT/redoBDT1a.log
cat 259 > /ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT1b.log
cat 260 > /ec/prod/server/oracle/orabdt/u800/oraredo/BDT/redoBDT2a.log
cat 261 > /ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT2b.log
  • At that time we received:
Tue Dec 10 08:06:43 2013
Archived Log entry 5285 added for thread 1 sequence 5285 ID 0xb2ffdbd dest 1:
Archiver process freed from errors. No longer stopped
Tue Dec 10 08:06:43 2013
Beginning log switch checkpoint up to RBA [0x14a7.2.10], SCN: 9310332806181
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: '/ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT2b.log'
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: '/ec/prod/server/oracle/orabdt/u800/oraredo/BDT/redoBDT2a.log'
.....
.....
ORA-00314: log 1 of thread 1, expected sequence# 5287 doesn't match 5285
ORA-00312: online log 1 thread 1: '/ec/prod/server/oracle/orabdt/u801/oraredo/BDT/redoBDT1b.log'
ORA-00314: log 1 of thread 1, expected sequence# 5287 doesn't match 5285
ORA-00312: online log 1 thread 1: '/ec/prod/server/oracle/orabdt/u800/oraredo/BDT/redoBDT1a.log'
  • So in our case, the “restore” has not been enough. Then we created new redo log files and we cleared and dropped the ones we recovered thanks to the file descriptor that way:
ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 1
WARNING! CLEARING REDO LOG WHICH HAS NOT BEEN ARCHIVED. BACKUPS TAKEN
    BEFORE 12/10/2013 08:10:00 (CHANGE 9310332823323) CANNOT BE USED FOR RECOVERY.
Clearing online log 1 of thread 1 sequence number 5287
Archived Log entry 5287 added for thread 1 sequence 5288 ID 0xb2ffdbd dest 1:
Archiver process freed from errors. No longer stopped
Tue Dec 10 08:14:44 2013
Completed: ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 1
ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 2
Tue Dec 10 08:14:54 2013
Beginning global checkpoint up to RBA [0x14a9.5cea.10], SCN: 9310382261905
Completed checkpoint up to RBA [0x14a9.5cea.10], SCN: 9310382261905
Tue Dec 10 08:14:55 2013
WARNING! CLEARING REDO LOG WHICH HAS NOT BEEN ARCHIVED. BACKUPS TAKEN
    BEFORE 12/10/2013 08:06:43 (CHANGE 9310332806181) CANNOT BE USED FOR RECOVERY.
Clearing online log 2 of thread 1 sequence number 5286
Completed: ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 2
ALTER DATABASE DROP LOGFILE GROUP 1
Completed: ALTER DATABASE DROP LOGFILE GROUP 1
ALTER DATABASE DROP LOGFILE GROUP 2
Completed: ALTER DATABASE DROP LOGFILE GROUP 2

And then the database has been able to work as expected and we launched a full backup of it.

Remarks:

  1. Thanks to Jason Jarneil and Frits Hoogland for their initial findings/blog post related to the restore of a lost datafile thanks to the file descriptor.
  2. In our case restoring the redo log files from the file descriptors has not been enough, but it’s worth trying as a last chance.
  3. Adding new redo logs should not be necessary. Clearing the “dropped/restored” redo logs with “ALTER DATABASE CLEAR (UNARCHIVED) LOGFILE GROUP <n>” could be enough as stated here:

loss_redo

Conclusion:

  1. We have been able to restore the service without the need of a database restore/recover (even if the “restore” from the file descriptor has not been enough).
  2. From my point of view you should try this as a last chance before restoring/stopping the database: restore from the file descriptor and if this is not enough launch the “alter database clear logfile group <n>” or “alter database clear unarchived logfile group <n>” on those restored redo logs.

1 Comment

Main pages for exadata_metrics.pl and asm_metrics.pl

A quick post as I realized that I wrote a lot of blog posts related to those 2 scripts (New features, examples, findings…).

For centralization/simplicity I just created 2 main pages for those utilities:

Screen Shot 2013-12-08 at 8.07.51 PM

  • The one related to exadata_metrics.pl: here
  • The one related to asm_metrics.pl: here

Leave a comment

UKOUG TECH13: Slides

A quick post to share the presentations I did during UKOUG TECH 13.

It was my first time presenting, I really enjoyed it (even if it was a little bit scary :-) ). For sure I will try to renew this experience.

2 Comments

Follow

Get every new post delivered to your Inbox.

Join 657 other followers