FAN callouts for In-Memory Instance distribution with RAC, PDB and preferred/available service

The context:

In my previous blog post: In-Memory Instance distribution with RAC and Multitenant Environment, I checked if we can influence (get rid of the default behaviour) how the In-Memory is distributed across all the Instances per PDB. I checked because:

  • By default all objects populated into memory will be distributed across all of the IM column stores in the cluster.
  • Christian Antognini show us into this blog post that having the data not fully populated on an Instance (default behaviour on non Engineered Systems) could lead to bad performance.

Let’s summarize the findings:

  • If the PDB is open on one Instance, then this Instance contains all the data.
  • If the PDB is open on all the Instances then we have to set “_inmemory_auto_distribute” to false to have at least one Instance that contains all the data.

The objective:

That said, let’s try to reach an objective: With a service defined as preferred/available on a PDB, I want the PDB to be open on only one Instance at any time (That way we don’t need to set the hidden parameter).

Into this blog post, Martin Bach already explained the PDB open/close behaviour with a service defined as preferred/available on it. I don’t want to repeat Martin’s work, I just want to draw your attention on the following (with our objective in mind):

The default PDB status is:

SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         1 MOUNTED
         2 MOUNTED

As you can see the PDB is mounted on both instances.

Now, let’s create and start a preferred/available service linked to our PDB that way:

srvctl add service -db SRCCONT -pdb BDTPDB -service BDTPDB_TAF -e select -m BASIC -P BASIC -r SRCCONT1 -a SRCCONT2 -w 5 -z 12
srvctl start service -d SRCCONT

Let’s check the PDB status and the service status:

SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         2 MOUNTED
         1 READ WRITE
srvctl status service -d SRCCONT
Service BDTPDB_TAF is running on instance(s) SRCCONT1

So, the PDB is open on the Instance that is hosting the service and not open on the other Instance:  The Objective is reached.

Now, let’s kill pmon for the SRCCONT1 Instance (that hosts the service), wait SRCCONT1 to be back and check the PDB and the service status:

kill -9 `ps -ef | grep -i pmon | grep -i SRCCONT1 | awk '{print $2}'`

--PDB Status
SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         2 READ WRITE
         1 MOUNTED

-- Service Status
srvctl status service -d SRCCONT
Service BDTPDB_TAF is running on instance(s) SRCCONT2

So, the service failed over, the PDB has been closed on the failing Instance and is open on the one that is now hosting the service: The Objective is reached.

Now, let’s (fail back) relocate the service manually and check the PDB and the service status:

srvctl relocate service -d SRCCONT -s BDTPDB_TAF -oldinst SRCCONT2 -newinst SRCCONT1 -force

--PDB Status
SYS@CDB$ROOT>  select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         1 READ WRITE
         2 READ WRITE

-- Service Status
srvctl status service -d SRCCONT                                                                         
Service BDTPDB_TAF is running on instance(s) SRCCONT1

As you can see the service has been relocated and the PDB is now open on both Instances: The Objective is not reached.

To summarize:

  1. The objective has been reached when we created and started the service.
  2. The objective has been reached when the service failed over (Due to Instance crash).
  3. The objective has not been reached when the service has been relocated manually.

FAN callouts to the rescue:

FAN callouts are server-side scripts or executables that run whenever a FAN event is generated. I’ll use this feature to fix the third point where the objective is not reached.

To do so, I created the following script:

#!/bin/ksh
#
# Close the PDB on the old intance during manual service relocation
#
# Description:
# -  This script closes the PDB on the old intance during manual service relocation
#
# Requirement:
#  - Location of the script must be $ORA_CRS_HOME/racg/usrco/.
#
# Version:
#  - version 1.0
#  - Bertrand Drouvot
#
# Date: 2014/12/04
#
#
# Env settings

ME=`who | cut -d" " -f1`
PASSWDFILE="/etc/passwd"
HOMEDIR=`egrep "^${ME}" ${PASSWDFILE} | cut -d: -f 6`
DATE_LOG=$(date +"%y%m%d_%H%M")
LOGFILE=fan_cloe_pdb_${DATE_LOG}.log

VAR_EVENTTYPE=$1

for ARGS in $*;
 do
        PROPERTY=`echo $ARGS | cut -f1 -d"=" | tr '[:lower:]' '[:upper:]'`
        VALUE=`echo $ARGS | cut -f2 -d"=" | tr '[:lower:]' '[:upper:]'`
        case $PROPERTY in
                VERSION)      VAR_VERSION=$VALUE ;;
                SERVICE)      VAR_SERVICE=$VALUE ;;
                DATABASE)     VAR_DATABASE=$VALUE ;;
                INSTANCE)     VAR_INSTANCE=$VALUE ;;
                HOST)         VAR_HOST=$VALUE ;;
                STATUS)       VAR_STATUS=$VALUE ;;
                REASON)       VAR_REASON=$VALUE ;;
                CARD)         VAR_CARDINALITY=$VALUE ;;
                TIMESTAMP)    VAR_LOGDATE=$VALUE ;;
                ??:??:??)     VAR_LOGTIME=$VALUE ;;
        esac
 done 

# Close the PDB (PDB name extracted from the service name) if service relocated manually

if ( ([ $VAR_EVENTTYPE = "SERVICEMEMBER" ]) && ([ $VAR_STATUS = "DOWN" ]) && ([ $VAR_REASON = "USER" ]))
then

PATH=$PATH:/usr/local/bin
export pATH
ORAENV_ASK=NO
export ORAENV_ASK
ORACLE_SID=${VAR_INSTANCE}
export ORACLE_SID
. oraenv > /dev/null

# Extract PDB name based on our naming convention (You may need to change this)
PDB=`echo "${VAR_SERVICE}" | sed 's/_TAF.*//'`

# Close the PDB on the old instance
sqlplus /nolog <<EOF
connect / as sysdba;
alter pluggable database ${PDB} close instances=('${VAR_INSTANCE}');
EOF

# Log this in a logfile
echo ${*} >> ${HOMEDIR}/log/${LOGFILE}

fi

under $GRID_HOME/racg/usrco/ (on both nodes). Then the script will be executed for all FAN events, but the script will start processing only for manual service relocation (thanks to the “if condition” in it).

The goal of the script is to close the PDB on the old Instance during manual service relocation.

Now, let’s check if the objective is reached. Let’s relocate the service manually from SRCCONT1 to SRCCONT2:

-- Current PDB Status
SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         1 READ WRITE
         2 READ WRITE

-- Current service status
SYS@CDB$ROOT> !srvctl status service -d SRCCONT
Service BDTPDB_TAF is running on instance(s) SRCCONT1

-- Relocate the service
SYS@CDB$ROOT> !srvctl relocate service -d SRCCONT -s BDTPDB_TAF -oldinst SRCCONT1 -newinst SRCCONT2 -force

-- Check PDB status
SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         2 READ WRITE
         1 MOUNTED

-- Check service status
SYS@CDB$ROOT> !srvctl status service -d SRCCONT
Service BDTPDB_TAF is running on instance(s) SRCCONT2

So, As you can see the service has been manually relocated and the PDB has been closed to the old Instance. That way, the PDB is open on only one Instance: The Objective is reached.

Remarks:

  • I used the “force” option during the manual relocation which disconnects all the sessions during the relocate operation. If you are not using the “force” option, then I guess you don’t want to close the PDB on the old Instance ;-).
  • The PDB open/close behaviour with a service linked as preferred/available on it is the one described into this post, unless you have set the save state on it (default is discard state).

Conclusion:

The objective is reached in any case: The PDB is open on only one Instance at any time (with a preferred/available service in place). That way we don’t need to set the hidden parameter “_inmemory_auto_distribute” to false to get the In-Memory data fully populated on one Instance.

Update 2015/03/27: At the PDB level, If you set the parallel_instance_group parameter to the service on both Instances, then the In-Memory data is fully populated on the Instance the service is running on even if the PDB is open on both Instances (After the manual service relocation). You can find more details here.

Advertisements

In-Memory Instance distribution with RAC and Multitenant Environment

In a previous post (see In-Memory Instance distribution with RAC databases: I want at least one instance that contains all the data) I provided a way to get rid of the default In-Memory Instance distribution on a RAC (non CDB) database.

To summarize:

  • Thanks to the hidden parameter “_inmemory_auto_distribute” we are able to change the way the In-Memory Column Store are populated by default on an oracle RAC database.
  • By default all objects populated into memory will be distributed across all of the IM column stores in the cluster.
  • With “_inmemory_auto_distribute” set to false, then we are able to populate one instance (or all the instances) with all the data.

Now, let’s have a look at the In-Memory distribution on a Multitenant RAC database and let’s check if we can influence (get rid of the default behaviour) how the In-Memory is distributed across all the Instances per PDB.

By default the IMCS distribution for a PDB is the following:

First case: The PDB is open on one Instance only:

SYS@CDB$ROOT> alter pluggable database BDTPDB open instances=('BDT12c3_1');

Pluggable database altered.

-- Now connect to the BDTPDB PDB as bdt/bdt
SYS@CDB$ROOT> @connect_user_pdb.sql
Enter value for user: bdt
Enter value for pwd: bdt
Enter value for pdb: BDTPDB
Connected.

-- Let's enable the inmemory attribute on the BDT table
BDT@BDTPDB> alter table bdt inmemory ;

Table altered.

-- Trigger the IMCS population
BDT@BDTPDB> select count(*) from bdt;

  COUNT(*)
----------
  44591000

-- check the distribution
BDT@BDTPDB> SELECT s.inst_id, c.name,s.populate_status, s.inmemory_size, s.bytes, s.bytes_not_populated
FROM gv$im_user_segments s, v$pdbs c
WHERE s.segment_name = 'BDT'
and c.con_id=s.con_id; 

   INST_ID NAME                           POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------------------------ --------- ------------- ---------- -------------------
         1 BDTPDB                         COMPLETED     695074816 1879048192                   0

As you can see the Instance 1 contains all the data for this PDB (as BYTES_NOT_POPULATED=0)

Second case: The PDB is open on all the Instances:

-- Open the BDTPDB PDB on all instances
SYS@CDB$ROOT> alter pluggable database BDTPDB open instances=all;

Pluggable database altered.

-- Now connect to the BDTPDB PDB as bdt/bdt
SYS@CDB$ROOT> @connect_user_pdb.sql
Enter value for user: bdt
Enter value for pwd: bdt
Enter value for pdb: BDTPDB
Connected.  

-- Let's enable the inmemory attribute on the BDT table
BDT@BDTPDB> alter table bdt inmemory ;

Table altered.

-- Trigger the IMCS population
BDT@BDTPDB> select count(*) from bdt;

  COUNT(*)  
----------  
  44591000

-- check the distribution
BDT@BDTPDB> SELECT s.inst_id, c.name,s.populate_status, s.inmemory_size, s.bytes, s.bytes_not_populated
FROM gv$im_user_segments s, v$pdbs c
WHERE s.segment_name = 'BDT'
and c.con_id=s.con_id;


   INST_ID NAME                           POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------------------------ --------- ------------- ---------- -------------------
         1 BDTPDB                         COMPLETED     376307712 1879048192           802889728
         2 BDTPDB                         COMPLETED     298909696 1879048192          1006559232

We can see that (for this PDB), by default, no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0.

Now let’s set the hidden parameter _inmemory_auto_distribute” to false (at the PDB level):

BDT@BDTPDB> alter system set "_inmemory_auto_distribute"=false;
alter system set "_inmemory_auto_distribute"=false
*
ERROR at line 1:
ORA-65040: operation not allowed from within a pluggable database

So, we can’t set this hidden parameter at the PDB level.

Ok, let’s set it at the CDB level:

SYS@CDB$ROOT> alter system set "_inmemory_auto_distribute"=false;

System altered.

then re-trigger the population on instance 1 (for this PDB) and check the distribution:

SYS@CDB$ROOT> @connect_user_pdb.sql
Enter value for user: bdt
Enter value for pwd: bdt
Enter value for pdb: BDTPDB
Connected.  
BDT@BDTPDB> select count(*) from bdt;

  COUNT(*)  
----------  
  44591000

BDT@BDTPDB> SELECT s.inst_id, c.name,s.populate_status, s.inmemory_size, s.bytes, s.bytes_not_populated
FROM gv$im_user_segments s, v$pdbs c
WHERE s.segment_name = 'BDT'
and c.con_id=s.con_id;

   INST_ID NAME                           POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------------------------ --------- ------------- ---------- -------------------
         2 BDTPDB                         COMPLETED     297861120 1879048192          1006559232
         1 BDTPDB                         COMPLETED     675151872 1879048192                   0

BINGO! Look at the Instance 1, it now contains all the data in its IMCS (as BYTES_NOT_POPULATED=0) for this PDB.

If I connect on the Instance 2 (for this PDB) and launch the query to trigger the population, I’ll get:

   INST_ID NAME                           POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------------------------ --------- ------------- ---------- -------------------
         1 BDTPDB                         COMPLETED     675151872 1879048192                   0
         2 BDTPDB                         COMPLETED     672006144 1879048192                   0

So, both instances now contain all the data for this PDB.

Remarks:

  • As I said into the previous post: I used an hidden parameter, so you should get oracle support approval to use it.
  • I guess (because I can’t test) that in a Multitenant and Engineered Systems context it is also possible to have all the data appear in the IM column store on every node for a PDB (thanks to the duplicate attribute): As Christian Antognini show us into this blog post for non CDB.

Conclusion:

The conclusion is the same than in my previous post (with non CDB):

  • Thanks to the hidden parameter “_inmemory_auto_distribute” we are able to change the way the In-Memory Column Store are populated by default on an oracle RAC database.
  • By default all objects populated into memory will be distributed across all of the IM column stores in the cluster.
  • With “_inmemory_auto_distribute” set to false, then we are able to populate one instance (or all the instances) with all the data.

And we can add that:

  • If the PDB is open on one instance only then this instance contains all the data.
  • If the PDB is open on all the instances then we have to set “_inmemory_auto_distribute” to false to have at least one instance that contains all the data.
  • Each PDB population will behave the same (distributed or not) and inherit from the CDB setting (As we can not set the hidden parameter at the PDB level).

 

In-Memory Instance distribution with RAC databases: I want at least one instance that contains all the data

As you may know, with a RAC database, by default all objects populated into
In-memory will be distributed across all of the IM column stores in the cluster. It is also possible to have all the data appear in the IM column store on every node (Engineered Systems only). See this white paper for more details.

There is 2 very interesting blog post around this subject:

  1. Kerry Osborne show us how we can distribute the data (using the distribute INMEMORY attribute) across the nodes into this blog post.
  2. Christian Antognini show us that having the data not fully populated on each instance could lead to bad performance into this blog post.

But wait: If my RAC service is an active/passive one (I mean the service is started on only one instance) then I would like to have all the data fully populated into the In-Memory column store of the “active” instance  (and not distributed across the instances), right?

Let’s try to achieve this (all data fully populated into the In-Memory column store of the active instance):

So, by default, the IMCS distribution is the following:

SQL> alter table bdt inmemory ;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     313720832 1610612736           732684288
         2 COMPLETED     274530304 1610612736           826236928

As you can see no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0.

Now let’s set the hidden parameter _inmemory_auto_distribute” to false and re-trigger the population on instance 1:

SQL> alter system set "_inmemory_auto_distribute"=false;

System altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     587137024 1610612736                   0
         2 COMPLETED     274530304 1610612736           826236928

BINGO! Look at the Instance 1, it now contains all the data in its IMCS (as BYTES_NOT_POPULATED=0).

If I connect to the Instance 2 and launch the query to trigger the population I’ll get:

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';
  2    3  
   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     587137024 1610612736                   0
         2 COMPLETED     589234176 1610612736                   0

So, both instances now contain all the data (like the duplicate attribute would have done).

Remarks:

  1. In case of active/passive service (means preferred/available service) then after a service failover the second instance will also contain all the data (once populated).
  2. In case of active/active service then each database instance will contain all the data but not necessary at the same time (so the behaviour is a little bit different of the duplicate attribute).
  3. I used an hidden parameter, so you should get oracle support approval to use it.
  4. I’ll check the behaviour with PDB into another post.

Conclusion:

  • Thanks to the hidden parameter “_inmemory_auto_distribute” we are able to change the way the In-Memory Column Store are populated by default on an oracle RAC database.
  • By default all objects populated into memory will be distributed across all of the IM column stores in the cluster.
  • With “_inmemory_auto_distribute” set to false, then we are able to populate one instance (or all the instances) with all the data.

Update 2015/03/27:

  • This “workaround” proposed to get rid of the default behaviour on non Engineered Systems (“Every row of the test table is stored in the IMCS of either one instance or the other”) works for any type of service.
  • In case of preferred/available service, there is no need to set the hidden parameter to get the IM fully populated on one node. The secret sauce is linked to the parallel_instance_group parameter. See this blog post for more details.

Watch out for optimizer_adaptive_features as It may have a huge negative impact

Let me describe how I discovered that the optimizer_adaptive_features may have a huge negative impact (specially on RAC databases).

I upgraded a 11gR2 database to 12.1.0.2 and then I converted this database to a PDB. To do so I launched “@$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql” (See MOS Doc ID 1564657.1 for more details).

This sql script took about 75 minutes to complete on my 2 nodes RAC database. This is quite long and then I decided to try to reduce this duration.

To diagnose: I dropped the PDB, plugged it back, re-launched noncdb_to_pdb.sql and enabled a 10046 trace on the session.

The top SQL (sort by elapsed time) for this session is the following:

SQL ID: frjd8zfy2jfdq Plan Hash: 510421217

SELECT executions, end_of_fetch_count,              elapsed_time/px_servers
  elapsed_time,        cpu_time/px_servers     cpu_time,
  buffer_gets/executions  buffer_gets
FROM
 (SELECT sum(executions)   as executions,                            sum(case
  when px_servers_executions > 0                              then
  px_servers_executions                                  else executions end)
  as px_servers,                sum(end_of_fetch_count) as end_of_fetch_count,
                sum(elapsed_time) as elapsed_time,
  sum(cpu_time)     as cpu_time,                     sum(buffer_gets)  as
  buffer_gets            FROM   gv$sql
  WHERE executions > 0                                 AND sql_id = :1
                              AND parsing_schema_name = :2)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    70502      4.02       4.25          0          0          0           0
Execute  70502    264.90     759.33          0          0          0           0
Fetch    70502    215.77    1848.46          0          0          0       70502
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   211506    484.70    2612.05          0          0          0       70502

Misses in library cache during parse: 17
Misses in library cache during execute: 17
Optimizer mode: CHOOSE
Parsing user id: SYS   (recursive depth: 2)
Number of plan statistics captured: 70502

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  VIEW  (cr=0 pr=0 pw=0 time=76 us)
         1          1          1   SORT AGGREGATE (cr=0 pr=0 pw=0 time=76 us)
         0          0          1    PX COORDINATOR  (cr=0 pr=0 pw=0 time=76 us)
         0          0          0     PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
         0          0          0      VIEW  GV$SQL (cr=0 pr=0 pw=0 time=0 us)
         0          0          0       FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=0 us)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Join ACK                           281741        0.03        368.85
  PX Deq: reap credit                       1649736        0.00         25.22
  IPC send completion sync                   141000        0.02        135.07
  PX Deq: Parse Reply                        141004        0.04        170.64
  PX Deq: Execute Reply                      141004        0.08       1366.42
  reliable message                            70502        0.00        125.51
  PX Deq: Signal ACK EXT                     140996        0.03         13.99
  PX Deq: Slave Session Stats                140996        0.02         25.54
  enq: PS - contention                        70605        0.11        145.80
  KJC: Wait for msg sends to complete          2584        0.00          0.04
  latch free                                     16        0.00          0.00
  PX qref latch                                  14        0.00          0.00
  latch: shared pool                              4        0.00          0.00
  latch: active service list                      1        0.00          0.00
  row cache lock                                127        0.00          0.08
  library cache lock                             36        0.00          0.04
  library cache pin                              36        0.00          0.04
  gc cr grant 2-way                               1        0.00          0.00
  db file sequential read                         1        0.00          0.00
  oracle thread bootstrap                         1        0.03          0.03

As you can see this SQL elapsed time is about 2600 seconds in total (for about 70 000 executions), most of the wait time comes from “PX %” events and this SQL queries the GV$SQL view.

But wait:

Why the hell this SQL (which looks like an SQL that collects metrics for a particular sql_id) is somehow part of the session that launched the noncdb_to_pdb.sql script? Does it make sense?

I really don’t think so, this sql (sql_id “frjd8zfy2jfdq”) should have been triggered by something else (parsing or whatever) than noncdb_to_pdb.sql.

Let’s prove it with a simple test case (I am alone on the database):

alter system flush shared_pool;

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

connect / as sysdba

begin
execute immediate 'select object_name from dba_objects';
end;
/

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

With the following result:

SQL> @test_case

System altered.


  COUNT(*)
----------
         0

Connected.

PL/SQL procedure successfully completed.


  COUNT(*)
----------
         2

Did you see that the sql_id “frjd8zfy2jfdq” has been produced by this simple test case? (If you trace the session you would see this query into the trace file).

I reproduced this behavior on:

  • 12.1.0.2 database with CDB/PDB.
  • 12.1.0.2 database (Non CDB).

and was not able to reproduce it on a 11gR2 database.

So, it looks like that this SQL is introduced by a 12.1.0.2 (12cR1) new feature. As it is somehow linked to “SQL metrics collection”, I tried to disable some 12cR1 features (linked to this area) until I found the “culprit”.

It did not produce any change until I set the optimizer_adaptive_features parameter to false (true is the default).

Here is the result:

SQL> !cat test_case.sql
alter system flush shared_pool;

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

connect / as sysdba
alter session set optimizer_adaptive_features=false;

begin
execute immediate 'select object_name from dba_objects';
end;
/

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

SQL> @test_case

System altered.

  COUNT(*)
----------
         0

Connected.

Session altered.

PL/SQL procedure successfully completed.

  COUNT(*)
----------
         0

BINGO!!! The sql_id “frjd8zfy2jfdq” has not been executed!

Well, now what is the impact on noncdb_to_pdb.sql on my 2 nodes RAC database?

I edited the script and added:

alter session set optimizer_adaptive_features=false;

just above this line:

exec dbms_pdb.noncdb_to_pdb(1);

Guess what?

  1. noncdb_to_pdb.sql took about 20 minutes to execute (compare to about 75 minutes without setting optimizer_adaptive_features to false).
  2. The sql_id “frjd8zfy2jfdq” is not part of the trace file anymore.

Remarks:

  • The impact of this SQL is much more “visible” with a RAC database, as it queries a GLOBAL V$ view (GV$SQL) and then needs parallel query to run (If more than one instance is up). With only one instance up, the 10046 trace file produced:
SQL ID: frjd8zfy2jfdq Plan Hash: 510421217

SELECT executions, end_of_fetch_count,              elapsed_time/px_servers
  elapsed_time,        cpu_time/px_servers     cpu_time,
  buffer_gets/executions  buffer_gets
FROM
 (SELECT sum(executions)   as executions,                            sum(case
  when px_servers_executions > 0                              then
  px_servers_executions                                  else executions end)
  as px_servers,                sum(end_of_fetch_count) as end_of_fetch_count,
                sum(elapsed_time) as elapsed_time,
  sum(cpu_time)     as cpu_time,                     sum(buffer_gets)  as
  buffer_gets            FROM   gv$sql
  WHERE executions > 0                                 AND sql_id = :1
                              AND parsing_schema_name = :2)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    69204      2.58       2.78          0          0          0           0
Execute  69204     23.93      25.45          0          0          0           0
Fetch    69204      2.68       2.64          0          0          0       69204
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   207612     29.20      30.88          0          0          0       69204

Misses in library cache during parse: 18
Misses in library cache during execute: 18
Optimizer mode: CHOOSE
Parsing user id: SYS   (recursive depth: 2)
Number of plan statistics captured: 62

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  VIEW  (cr=0 pr=0 pw=0 time=120 us)
         1          1          1   SORT AGGREGATE (cr=0 pr=0 pw=0 time=113 us)
         0          0          0    PX COORDINATOR  (cr=0 pr=0 pw=0 time=100 us)
         0          0          0     PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=50 us)
         0          0          0      VIEW  GV$SQL (cr=0 pr=0 pw=0 time=43 us)
         0          0          0       FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=39 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  row cache lock                                  1        0.00          0.00

As you can see: elapsed time of about 30 seconds (for about 70 000 executions) and no “PX%” wait events.

  • I checked which parameters (hidden or not) changed when setting optimizer_adaptive_features to false. Then I tried one by one those parameters with my test case and discovered that setting “_optimizer_dsdir_usage_control” to 0 is enough to get rid of the sql_id “frjd8zfy2jfdq”.
  • During the run of noncdb_to_pdb.sql, this is the large number of executions (about 70 000) of the sql_id “frjd8zfy2jfdq” that produced this long “overall” duration and highlighted the fact that this query has to be somehow removed.
  • You could meet this SQL on 12cR1 databases (CDB/PDB or non CDB).

Conclusion:

  • The optimizer_adaptive_features (set to true) may produce a huge negative impact on the performance (specially with RAC database). I provided an example of such an impact when running the noncdb_to_pdb.sql script.
  • Should you meet the sql_id “frjd8zfy2jfdq” in your database and would like to get rid of it (because you observe a negative impact): Then simply set optimizer_adaptive_features to false (or “_optimizer_dsdir_usage_control” to 0) at the session or system level.

Issue installing the 11gR2 database software after the 12.1.0.2 clusterware setup

If you don’t plan to install the 11gR2 database software after a 12.1.0.2 clusterware installation, I guess there is no need for you to read this post.

I just want to share the issue I got and the way you could workaround it. The purpose of this post is just to save your time, in case of.

So, after a 12.1.0.2 clusterware installation (on a 2 nodes RAC cluster), I decided to install the 11.2.0.4 database software. I launched the runInstaller, followed the install process until the Step 4:

install11g_db

As you can see the nodes that are part of the cluster have not been proposed and clicking next would produce “CRS is not installed on any of the nodes”. 

The first question you may ask is: Is the 11gR2 database compatible with CRS 12.1? Yes it is, as you can see here:

crs_supported_version

As I am curious, I canceled the 11gR2 database software installation and gave a try with the 12.1.0.2 database software. The Step 4 produced:

install12c_db

As you can see the nodes that are part of the cluster have been proposed.

I canceled the 12.1.0.2 database software installation and I had a look to the Inventory. Then I discovered that the CRS=”true” flag was not set for the 12.1.0.2 GI HOME:

$ cat /etc/oraInst.loc | grep inventory_loc
inventory_loc=/ec/poc/server/oracle/olrpoc1/u000/oraInventory

$ cat /ec/poc/server/oracle/olrpoc1/u000/oraInventory/ContentsXML/inventory.xml
..
HOME NAME="OraGI12Home1" LOC="/ec/poc/server/oracle/olrpoc1/u000/product/GRID.12.1.0.2" TYPE="O" IDX="1"
..

Then I added the the CRS=”true” flag that way:

$GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME="/ec/poc/server/oracle/olrpoc1/u000/product/GRID.12.1.0.2" CRS=true

So that:

$ cat /ec/poc/server/oracle/olrpoc1/u000/oraInventory/ContentsXML/inventory.xml
..
HOME NAME="OraGI12Home1" LOC="/ec/poc/server/oracle/olrpoc1/u000/product/GRID.12.1.0.2" TYPE="O" IDX="1" CRS="true"
..

Then I relaunched the 11.2.0.4 database software installation, and the Step 4 produced:

install11g_db_fixed

As you can see the nodes have been proposed so that I have been able to complete the installation successfully.

Remarks:

  1. MOS 1053393.1 provides more details about the CRS=”true” flag.
  2. I guess the “issue” will be the same for database software >=10.1 and < 12.1 (But I did not test it).
  3. On another RAC, I upgraded the GI from 12.1.0.1 to 12.1.0.2 and then the CRS=”true” flag has been set automatically for the 12.1.0.2 GI.

Conclusion: After a 12.1.0.2 CRS installation,

  1. You may need to put the CRS=”true” flag to avoid this issue during a 11gR2 database software installation.
  2. You don’t need to put the CRS=”true” flag during a 12.1.0.2 database software installation as the issue does not appear.
  3. It looks like you won’t hit the issue if the GI has been upgraded to 12.1.0.2 (See third remark).

Modify the Private Network Information in Oracle Clusterware with HAIP in place

The MOS note “How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)” explains how to change the private Network information with the interconnect made of a single interface (then producing downtime).

So what’s about changing the interconnect configuration with Highly Available IP (HAIP) in place ?

Let’s remember what HAIP is (from Oracle® Database High Availability Best Practices):

haip

As HAIP provides redundant interconnect, we should be able to change the interconnect configuration of one private interface  without any downtime, right ?

First let’s check the current interconnect configuration:

oifcfg getif       
eth4  172.16.0.128  global  cluster_interconnect
eth6  172.16.1.0  global  cluster_interconnect

and the associated Virtual IP:

oifcfg iflist -p -n
eth4  172.16.0.128  PRIVATE  255.255.255.128
eth4  169.254.0.0  UNKNOWN  255.255.128.0
eth6  172.16.1.0  PRIVATE  255.255.255.128
eth6  169.254.128.0  UNKNOWN  255.255.128.0

I removed the public network from the output. As you can see each private interface is hosting a Virtual IP (169.xxx.x.x).

Now your sysadmin do the change (for example he will change the subnet and the VLAN) on one of the private interface (Let’s say eth4 for example), so that the ohasd.log log file reports something like:

2014-01-27 11:16:17.154: [GIPCHGEN][1837012736]gipchaInterfaceFail: marking interface failing 0x7f4c0c1b5f00 { host '', haName 'CLSFRAME_olrdev1', local (nil), ip '172.16.0.129:28029', subnet '172.16.0.128', mask '255.255.255.128', mac 'e8-39-35-12-77-7e', ifname 'eth4', numRef 0, numFail 0, idxBoot 0, flags 0x184d }
2014-01-27 11:16:17.334: [GIPCHGEN][1856595712]gipchaInterfaceDisable: disabling interface 0x7f4c0c1b5f00 { host '', haName 'CLSFRAME_olrdev1', local (nil), ip '172.16.0.129:28029', subnet '172.16.0.128', mask '255.255.255.128', mac 'e8-39-35-12-77-7e', ifname 'eth4', numRef 0, numFail 0, idxBoot 0, flags 0x19cd }
2014-01-27 11:16:17.339: [GIPCHDEM][1856595712]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x7f4c0c1b5f00 { host '', haName 'CLSFRAME_olrdev1', local (nil), ip '172.16.0.129:28029', subnet '172.16.0.128', mask '255.255.255.128', mac 'e8-39-35-12-77-7e', ifname 'eth4', numRef 0, numFail 0, idxBoot 0, flags 0x19ed }

So now let’s check the virtual IP and the available interfaces and subnet again:

oifcfg iflist -p -n
eth4  172.17.3.0  PRIVATE  255.255.255.128
eth6  172.16.1.0  PRIVATE  255.255.255.128
eth6  169.254.128.0  UNKNOWN  255.255.128.0
eth6  169.254.0.0  UNKNOWN  255.255.128.0
bond0  158.168.4.0  UNKNOWN  255.255.252.0

Well, we can see 2 things:

  • The first one, is that eth6 is now “hosting” both Virtual IPs (169.xxxx).
  • The second one, is the new “available” subnet for eth4 (172.17.3.0).

So that, we just have to remove the previous eth4 configuration

oifcfg delif -global eth4/172.16.0.128

and put the new one that way:

oifcfg setif -global eth4/172.17.3.0:cluster_interconnect

Now, check again the Virtual IPs:

oifcfg iflist -p -n
eth4  172.17.3.0  PRIVATE  255.255.255.128
eth4  169.254.128.0  UNKNOWN  255.255.128.0
eth6  172.16.1.0  PRIVATE  255.255.255.128
eth6  169.254.0.0  UNKNOWN  255.255.128.0

Perfect, now each private Interface hosts a Virtual IP (We are back to a “normal” configuration).

Remarks:

No downtime will occur as long as:

  • You don’t change both interfaces configuration at the same time 😉
  • You don’t remove by mistake the configuration of the interface that hosts both Virtual IPs.
  • The interconnect hosting both VIPs doesn’t fail before you put back the updated interface.

There is nothing new with this post. I just had to do the exercise so that I share it 😉

Conclusion:

Thanks to HAIP, we have been able to change the Interconnect Network configuration of one interface without any downtime.

RAC Attack at Oracle OpenWorld 2013: Don’t be afraid !

You’ll attend Oracle OpenWorld 2013 and:

  • You want to install a 12c RAC using oracle Virtual Box on your laptop.
  • Or you just want to have a talk or share your experience with some people around this technology.

So, you should attend to RAC Attack at Oracle OpenWorld 2013:

RAC-attack

Don’t be afraid ! we won’t configure an extended RAC and then no need to break your laptop into 2 separate pieces to simulate a stretched cluster 🙂

You can find more information on the event here.

Join Rac Attack on the social media:

  • Rac Attack is on Google + here.
  • Rac Attack is on twitter: Use the #RacAttack hashtag.
  • Rac Attack is on facebook here.

Hope to see you there !