In-Memory Instance distribution with RAC databases: Impact of the parallel_instance_group parameter

INTRODUCTION

As you may know, with a RAC database, by default all objects populated into
In-memory will be distributed across all of the IM column stores in the cluster. It is also possible to have all the data appear in the IM column store on every node (Engineered Systems only). See this white paper for more details.

Into this blog post we have been told that:

  • RAC services and the parallel_instance_group parameter can be used to control where data is populated, and the subsequent access of that data.
  • In a RAC environment services enable the use of a subset of nodes and still insure that all In-Memory queries will be able to access all IM column stores.

My customer uses a RAC service defined as preferred/available on a 2 nodes RAC. In this configuration, I would like to see how the data is populated into the In-Memory column store (IMCS) on both Instances when the parallel_instance_group parameter is used .

TESTS

I’ll do several tests and stop/start the 12.1.0.2 database between each tests (to avoid confusion).

First let’s create a service BDT_PREF as preferred/available:

srvctl add service -db BDT12c1 -s BDT_PREF -preferred BDT12c1_1 -available BDT12c1_2

so that the Instance 1 (BDT12c1_1) is the preferred one.

Test 1:

  • parallel_instance_group not set.

Connect to the database through the BDT_PREF service and check the data distribution.

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     308477952 1610612736           732684288
         2 COMPLETED     274530304 1610612736           826236928

As you can see no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0. The data is distributed across all of the IM column stores in the cluster.

Test 2:

  • parallel_instance_group set on Instance 1.
  • The service is running on Instance 1.

Set the parallel_instance_group parameter on the Instance BDT12c1_1 (INST_ID=1) to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

Check that the BDT_PREF service is running on the Instance 1 (where parallel_instance_group has been set) :

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     575602688 1610612736                   0

As you can see the Instance 1 contains all the data in its IMCS (as BYTES_NOT_POPULATED=0).

Test 3:

  • parallel_instance_group set on Instance 1.
  • The service is running on Instance 2.

The fact that the service is not running on its preferred Instance could happen after:

  • A failover.
  • The database startup: When you use automatic services in an administrator-managed database, during planned database startup, services may start on the first instances that become available rather than their preferred instances (see the oracle documentation).
  • A manual relocation.

Set the parallel_instance_group parameter on the Instance BDT12c1_1 (INST_ID=1) to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

Relocate the BDT_PREF service on the Instance 2 (if needed):

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1

SQL> host srvctl relocate service -d BDT12c1 -s BDT_PREF -newinst BDT12c1_2 -oldinst BDT12c1_1

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_2

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED     277676032 1610612736           826236928
         1 COMPLETED     312672256 1610612736           732684288

As you can see no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0. The data is distributed across all of the IM column stores in the cluster.

Remarks:

  1. This distribution could lead to performance issue.
  2. If a failover occurred and the Instance 1 is still down, then the data is fully populated on the Instance 2.

Test 4:

  • parallel_instance_group set on Instance 1 and on Instance 2.
  • The service is running on Instance 1.

Set the parallel_instance_group parameter on both Instances to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_2';

System altered.

Check that the BDT_PREF service is running on the Instance 1:

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     575602688 1610612736                   0

As you can see the Instance 1 contains all the data in its IMCS (as BYTES_NOT_POPULATED=0).

Test 5:

  • parallel_instance_group set on Instance 1 and on Instance 2.
  • The service is running on Instance 2.

Set the parallel_instance_group parameter on both Instances to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_2';

System altered.

Relocate the BDT_PREF service on the Instance 2 (if needed):

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1

SQL> host srvctl relocate service -d BDT12c1 -s BDT_PREF -newinst BDT12c1_2 -oldinst BDT12c1_1

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_2

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED     575602688 1610612736                   0

As you can see the Instance 2 contains all the data in its IMCS (as BYTES_NOT_POPULATED=0).

Out of curiosity, what about a service defined as preferred on both Instances and the parallel_instance_group set on both Instances?

Test 6:

  • parallel_instance_group set on Instance 1 and on Instance 2.
  • The service is running on Instance 1 and 2.

First let’s create a service BDT_PREF as preferred on both Instances:

srvctl add service -db BDT12c1 -s BDT_PREF -preferred BDT12c1_1,BDT12c1_2

Set the parallel_instance_group parameter on both Instances to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_2';

System altered.

Check that the BDT_PREF service is running on both Instances:

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1,BDT12c1_2

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED     277676032 1610612736           826236928
         1 COMPLETED     312672256 1610612736           732684288

As you can see no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0. The data is distributed across all of the IM column stores in the cluster.

REMARKS

  • I am not discussing the impact of the distribution on the performance (with or without Auto DOP).
  • I just focus on how the data has been distributed.
  • In this previous blog post the “workaround” proposed to get rid of the default behaviour on non Engineered Systems (“Every row of the test table is stored in the IMCS of either one instance or the other”) works for any type of service.
  • In the current post I focus only on service defined as preferred/available on a 2 nodes RAC (The last test is out of curiosity).

CONCLUSION

With a 12.1.0.2 database and a service defined as preferred/available on a 2 nodes RAC:

  • With the parallel_instance_group parameter set to the service on both Instances:

The data is fully populated on the Instance the service is running on.

  • With the parallel_instance_group parameter set to the service on the preferred Instance only:

If the service is running on the preferred Instance, the data is fully populated on the Instance.

If the service is running on the “available” Instance, the data is distributed across all of the IM column stores in the cluster, .

With a 12.1.0.2 database, a service defined as preferred on both Instances on a 2 nodes RAC and parallel_instance_group parameter set to the service on both Instances:

The data is distributed across all of the IM column stores in the cluster.

Advertisements

FAN callouts for In-Memory Instance distribution with RAC, PDB and preferred/available service

The context:

In my previous blog post: In-Memory Instance distribution with RAC and Multitenant Environment, I checked if we can influence (get rid of the default behaviour) how the In-Memory is distributed across all the Instances per PDB. I checked because:

  • By default all objects populated into memory will be distributed across all of the IM column stores in the cluster.
  • Christian Antognini show us into this blog post that having the data not fully populated on an Instance (default behaviour on non Engineered Systems) could lead to bad performance.

Let’s summarize the findings:

  • If the PDB is open on one Instance, then this Instance contains all the data.
  • If the PDB is open on all the Instances then we have to set “_inmemory_auto_distribute” to false to have at least one Instance that contains all the data.

The objective:

That said, let’s try to reach an objective: With a service defined as preferred/available on a PDB, I want the PDB to be open on only one Instance at any time (That way we don’t need to set the hidden parameter).

Into this blog post, Martin Bach already explained the PDB open/close behaviour with a service defined as preferred/available on it. I don’t want to repeat Martin’s work, I just want to draw your attention on the following (with our objective in mind):

The default PDB status is:

SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         1 MOUNTED
         2 MOUNTED

As you can see the PDB is mounted on both instances.

Now, let’s create and start a preferred/available service linked to our PDB that way:

srvctl add service -db SRCCONT -pdb BDTPDB -service BDTPDB_TAF -e select -m BASIC -P BASIC -r SRCCONT1 -a SRCCONT2 -w 5 -z 12
srvctl start service -d SRCCONT

Let’s check the PDB status and the service status:

SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         2 MOUNTED
         1 READ WRITE
srvctl status service -d SRCCONT
Service BDTPDB_TAF is running on instance(s) SRCCONT1

So, the PDB is open on the Instance that is hosting the service and not open on the other Instance:  The Objective is reached.

Now, let’s kill pmon for the SRCCONT1 Instance (that hosts the service), wait SRCCONT1 to be back and check the PDB and the service status:

kill -9 `ps -ef | grep -i pmon | grep -i SRCCONT1 | awk '{print $2}'`

--PDB Status
SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         2 READ WRITE
         1 MOUNTED

-- Service Status
srvctl status service -d SRCCONT
Service BDTPDB_TAF is running on instance(s) SRCCONT2

So, the service failed over, the PDB has been closed on the failing Instance and is open on the one that is now hosting the service: The Objective is reached.

Now, let’s (fail back) relocate the service manually and check the PDB and the service status:

srvctl relocate service -d SRCCONT -s BDTPDB_TAF -oldinst SRCCONT2 -newinst SRCCONT1 -force

--PDB Status
SYS@CDB$ROOT>  select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         1 READ WRITE
         2 READ WRITE

-- Service Status
srvctl status service -d SRCCONT                                                                         
Service BDTPDB_TAF is running on instance(s) SRCCONT1

As you can see the service has been relocated and the PDB is now open on both Instances: The Objective is not reached.

To summarize:

  1. The objective has been reached when we created and started the service.
  2. The objective has been reached when the service failed over (Due to Instance crash).
  3. The objective has not been reached when the service has been relocated manually.

FAN callouts to the rescue:

FAN callouts are server-side scripts or executables that run whenever a FAN event is generated. I’ll use this feature to fix the third point where the objective is not reached.

To do so, I created the following script:

#!/bin/ksh
#
# Close the PDB on the old intance during manual service relocation
#
# Description:
# -  This script closes the PDB on the old intance during manual service relocation
#
# Requirement:
#  - Location of the script must be $ORA_CRS_HOME/racg/usrco/.
#
# Version:
#  - version 1.0
#  - Bertrand Drouvot
#
# Date: 2014/12/04
#
#
# Env settings

ME=`who | cut -d" " -f1`
PASSWDFILE="/etc/passwd"
HOMEDIR=`egrep "^${ME}" ${PASSWDFILE} | cut -d: -f 6`
DATE_LOG=$(date +"%y%m%d_%H%M")
LOGFILE=fan_cloe_pdb_${DATE_LOG}.log

VAR_EVENTTYPE=$1

for ARGS in $*;
 do
        PROPERTY=`echo $ARGS | cut -f1 -d"=" | tr '[:lower:]' '[:upper:]'`
        VALUE=`echo $ARGS | cut -f2 -d"=" | tr '[:lower:]' '[:upper:]'`
        case $PROPERTY in
                VERSION)      VAR_VERSION=$VALUE ;;
                SERVICE)      VAR_SERVICE=$VALUE ;;
                DATABASE)     VAR_DATABASE=$VALUE ;;
                INSTANCE)     VAR_INSTANCE=$VALUE ;;
                HOST)         VAR_HOST=$VALUE ;;
                STATUS)       VAR_STATUS=$VALUE ;;
                REASON)       VAR_REASON=$VALUE ;;
                CARD)         VAR_CARDINALITY=$VALUE ;;
                TIMESTAMP)    VAR_LOGDATE=$VALUE ;;
                ??:??:??)     VAR_LOGTIME=$VALUE ;;
        esac
 done 

# Close the PDB (PDB name extracted from the service name) if service relocated manually

if ( ([ $VAR_EVENTTYPE = "SERVICEMEMBER" ]) && ([ $VAR_STATUS = "DOWN" ]) && ([ $VAR_REASON = "USER" ]))
then

PATH=$PATH:/usr/local/bin
export pATH
ORAENV_ASK=NO
export ORAENV_ASK
ORACLE_SID=${VAR_INSTANCE}
export ORACLE_SID
. oraenv > /dev/null

# Extract PDB name based on our naming convention (You may need to change this)
PDB=`echo "${VAR_SERVICE}" | sed 's/_TAF.*//'`

# Close the PDB on the old instance
sqlplus /nolog <<EOF
connect / as sysdba;
alter pluggable database ${PDB} close instances=('${VAR_INSTANCE}');
EOF

# Log this in a logfile
echo ${*} >> ${HOMEDIR}/log/${LOGFILE}

fi

under $GRID_HOME/racg/usrco/ (on both nodes). Then the script will be executed for all FAN events, but the script will start processing only for manual service relocation (thanks to the “if condition” in it).

The goal of the script is to close the PDB on the old Instance during manual service relocation.

Now, let’s check if the objective is reached. Let’s relocate the service manually from SRCCONT1 to SRCCONT2:

-- Current PDB Status
SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         1 READ WRITE
         2 READ WRITE

-- Current service status
SYS@CDB$ROOT> !srvctl status service -d SRCCONT
Service BDTPDB_TAF is running on instance(s) SRCCONT1

-- Relocate the service
SYS@CDB$ROOT> !srvctl relocate service -d SRCCONT -s BDTPDB_TAF -oldinst SRCCONT1 -newinst SRCCONT2 -force

-- Check PDB status
SYS@CDB$ROOT> select inst_id,open_mode from gv$pdbs where name='BDTPDB';

   INST_ID OPEN_MODE
---------- ----------
         2 READ WRITE
         1 MOUNTED

-- Check service status
SYS@CDB$ROOT> !srvctl status service -d SRCCONT
Service BDTPDB_TAF is running on instance(s) SRCCONT2

So, As you can see the service has been manually relocated and the PDB has been closed to the old Instance. That way, the PDB is open on only one Instance: The Objective is reached.

Remarks:

  • I used the “force” option during the manual relocation which disconnects all the sessions during the relocate operation. If you are not using the “force” option, then I guess you don’t want to close the PDB on the old Instance ;-).
  • The PDB open/close behaviour with a service linked as preferred/available on it is the one described into this post, unless you have set the save state on it (default is discard state).

Conclusion:

The objective is reached in any case: The PDB is open on only one Instance at any time (with a preferred/available service in place). That way we don’t need to set the hidden parameter “_inmemory_auto_distribute” to false to get the In-Memory data fully populated on one Instance.

Update 2015/03/27: At the PDB level, If you set the parallel_instance_group parameter to the service on both Instances, then the In-Memory data is fully populated on the Instance the service is running on even if the PDB is open on both Instances (After the manual service relocation). You can find more details here.

In-Memory Instance distribution with RAC and Multitenant Environment

In a previous post (see In-Memory Instance distribution with RAC databases: I want at least one instance that contains all the data) I provided a way to get rid of the default In-Memory Instance distribution on a RAC (non CDB) database.

To summarize:

  • Thanks to the hidden parameter “_inmemory_auto_distribute” we are able to change the way the In-Memory Column Store are populated by default on an oracle RAC database.
  • By default all objects populated into memory will be distributed across all of the IM column stores in the cluster.
  • With “_inmemory_auto_distribute” set to false, then we are able to populate one instance (or all the instances) with all the data.

Now, let’s have a look at the In-Memory distribution on a Multitenant RAC database and let’s check if we can influence (get rid of the default behaviour) how the In-Memory is distributed across all the Instances per PDB.

By default the IMCS distribution for a PDB is the following:

First case: The PDB is open on one Instance only:

SYS@CDB$ROOT> alter pluggable database BDTPDB open instances=('BDT12c3_1');

Pluggable database altered.

-- Now connect to the BDTPDB PDB as bdt/bdt
SYS@CDB$ROOT> @connect_user_pdb.sql
Enter value for user: bdt
Enter value for pwd: bdt
Enter value for pdb: BDTPDB
Connected.

-- Let's enable the inmemory attribute on the BDT table
BDT@BDTPDB> alter table bdt inmemory ;

Table altered.

-- Trigger the IMCS population
BDT@BDTPDB> select count(*) from bdt;

  COUNT(*)
----------
  44591000

-- check the distribution
BDT@BDTPDB> SELECT s.inst_id, c.name,s.populate_status, s.inmemory_size, s.bytes, s.bytes_not_populated
FROM gv$im_user_segments s, v$pdbs c
WHERE s.segment_name = 'BDT'
and c.con_id=s.con_id; 

   INST_ID NAME                           POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------------------------ --------- ------------- ---------- -------------------
         1 BDTPDB                         COMPLETED     695074816 1879048192                   0

As you can see the Instance 1 contains all the data for this PDB (as BYTES_NOT_POPULATED=0)

Second case: The PDB is open on all the Instances:

-- Open the BDTPDB PDB on all instances
SYS@CDB$ROOT> alter pluggable database BDTPDB open instances=all;

Pluggable database altered.

-- Now connect to the BDTPDB PDB as bdt/bdt
SYS@CDB$ROOT> @connect_user_pdb.sql
Enter value for user: bdt
Enter value for pwd: bdt
Enter value for pdb: BDTPDB
Connected.  

-- Let's enable the inmemory attribute on the BDT table
BDT@BDTPDB> alter table bdt inmemory ;

Table altered.

-- Trigger the IMCS population
BDT@BDTPDB> select count(*) from bdt;

  COUNT(*)  
----------  
  44591000

-- check the distribution
BDT@BDTPDB> SELECT s.inst_id, c.name,s.populate_status, s.inmemory_size, s.bytes, s.bytes_not_populated
FROM gv$im_user_segments s, v$pdbs c
WHERE s.segment_name = 'BDT'
and c.con_id=s.con_id;


   INST_ID NAME                           POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------------------------ --------- ------------- ---------- -------------------
         1 BDTPDB                         COMPLETED     376307712 1879048192           802889728
         2 BDTPDB                         COMPLETED     298909696 1879048192          1006559232

We can see that (for this PDB), by default, no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0.

Now let’s set the hidden parameter _inmemory_auto_distribute” to false (at the PDB level):

BDT@BDTPDB> alter system set "_inmemory_auto_distribute"=false;
alter system set "_inmemory_auto_distribute"=false
*
ERROR at line 1:
ORA-65040: operation not allowed from within a pluggable database

So, we can’t set this hidden parameter at the PDB level.

Ok, let’s set it at the CDB level:

SYS@CDB$ROOT> alter system set "_inmemory_auto_distribute"=false;

System altered.

then re-trigger the population on instance 1 (for this PDB) and check the distribution:

SYS@CDB$ROOT> @connect_user_pdb.sql
Enter value for user: bdt
Enter value for pwd: bdt
Enter value for pdb: BDTPDB
Connected.  
BDT@BDTPDB> select count(*) from bdt;

  COUNT(*)  
----------  
  44591000

BDT@BDTPDB> SELECT s.inst_id, c.name,s.populate_status, s.inmemory_size, s.bytes, s.bytes_not_populated
FROM gv$im_user_segments s, v$pdbs c
WHERE s.segment_name = 'BDT'
and c.con_id=s.con_id;

   INST_ID NAME                           POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------------------------ --------- ------------- ---------- -------------------
         2 BDTPDB                         COMPLETED     297861120 1879048192          1006559232
         1 BDTPDB                         COMPLETED     675151872 1879048192                   0

BINGO! Look at the Instance 1, it now contains all the data in its IMCS (as BYTES_NOT_POPULATED=0) for this PDB.

If I connect on the Instance 2 (for this PDB) and launch the query to trigger the population, I’ll get:

   INST_ID NAME                           POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- ------------------------------ --------- ------------- ---------- -------------------
         1 BDTPDB                         COMPLETED     675151872 1879048192                   0
         2 BDTPDB                         COMPLETED     672006144 1879048192                   0

So, both instances now contain all the data for this PDB.

Remarks:

  • As I said into the previous post: I used an hidden parameter, so you should get oracle support approval to use it.
  • I guess (because I can’t test) that in a Multitenant and Engineered Systems context it is also possible to have all the data appear in the IM column store on every node for a PDB (thanks to the duplicate attribute): As Christian Antognini show us into this blog post for non CDB.

Conclusion:

The conclusion is the same than in my previous post (with non CDB):

  • Thanks to the hidden parameter “_inmemory_auto_distribute” we are able to change the way the In-Memory Column Store are populated by default on an oracle RAC database.
  • By default all objects populated into memory will be distributed across all of the IM column stores in the cluster.
  • With “_inmemory_auto_distribute” set to false, then we are able to populate one instance (or all the instances) with all the data.

And we can add that:

  • If the PDB is open on one instance only then this instance contains all the data.
  • If the PDB is open on all the instances then we have to set “_inmemory_auto_distribute” to false to have at least one instance that contains all the data.
  • Each PDB population will behave the same (distributed or not) and inherit from the CDB setting (As we can not set the hidden parameter at the PDB level).

 

In-Memory Instance distribution with RAC databases: I want at least one instance that contains all the data

As you may know, with a RAC database, by default all objects populated into
In-memory will be distributed across all of the IM column stores in the cluster. It is also possible to have all the data appear in the IM column store on every node (Engineered Systems only). See this white paper for more details.

There is 2 very interesting blog post around this subject:

  1. Kerry Osborne show us how we can distribute the data (using the distribute INMEMORY attribute) across the nodes into this blog post.
  2. Christian Antognini show us that having the data not fully populated on each instance could lead to bad performance into this blog post.

But wait: If my RAC service is an active/passive one (I mean the service is started on only one instance) then I would like to have all the data fully populated into the In-Memory column store of the “active” instance  (and not distributed across the instances), right?

Let’s try to achieve this (all data fully populated into the In-Memory column store of the active instance):

So, by default, the IMCS distribution is the following:

SQL> alter table bdt inmemory ;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     313720832 1610612736           732684288
         2 COMPLETED     274530304 1610612736           826236928

As you can see no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0.

Now let’s set the hidden parameter _inmemory_auto_distribute” to false and re-trigger the population on instance 1:

SQL> alter system set "_inmemory_auto_distribute"=false;

System altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     587137024 1610612736                   0
         2 COMPLETED     274530304 1610612736           826236928

BINGO! Look at the Instance 1, it now contains all the data in its IMCS (as BYTES_NOT_POPULATED=0).

If I connect to the Instance 2 and launch the query to trigger the population I’ll get:

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';
  2    3  
   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     587137024 1610612736                   0
         2 COMPLETED     589234176 1610612736                   0

So, both instances now contain all the data (like the duplicate attribute would have done).

Remarks:

  1. In case of active/passive service (means preferred/available service) then after a service failover the second instance will also contain all the data (once populated).
  2. In case of active/active service then each database instance will contain all the data but not necessary at the same time (so the behaviour is a little bit different of the duplicate attribute).
  3. I used an hidden parameter, so you should get oracle support approval to use it.
  4. I’ll check the behaviour with PDB into another post.

Conclusion:

  • Thanks to the hidden parameter “_inmemory_auto_distribute” we are able to change the way the In-Memory Column Store are populated by default on an oracle RAC database.
  • By default all objects populated into memory will be distributed across all of the IM column stores in the cluster.
  • With “_inmemory_auto_distribute” set to false, then we are able to populate one instance (or all the instances) with all the data.

Update 2015/03/27:

  • This “workaround” proposed to get rid of the default behaviour on non Engineered Systems (“Every row of the test table is stored in the IMCS of either one instance or the other”) works for any type of service.
  • In case of preferred/available service, there is no need to set the hidden parameter to get the IM fully populated on one node. The secret sauce is linked to the parallel_instance_group parameter. See this blog post for more details.