In-Memory Instance distribution with RAC databases: Impact of the parallel_instance_group parameter

INTRODUCTION

As you may know, with a RAC database, by default all objects populated into
In-memory will be distributed across all of the IM column stores in the cluster. It is also possible to have all the data appear in the IM column store on every node (Engineered Systems only). See this white paper for more details.

Into this blog post we have been told that:

  • RAC services and the parallel_instance_group parameter can be used to control where data is populated, and the subsequent access of that data.
  • In a RAC environment services enable the use of a subset of nodes and still insure that all In-Memory queries will be able to access all IM column stores.

My customer uses a RAC service defined as preferred/available on a 2 nodes RAC. In this configuration, I would like to see how the data is populated into the In-Memory column store (IMCS) on both Instances when the parallel_instance_group parameter is used .

TESTS

I’ll do several tests and stop/start the 12.1.0.2 database between each tests (to avoid confusion).

First let’s create a service BDT_PREF as preferred/available:

srvctl add service -db BDT12c1 -s BDT_PREF -preferred BDT12c1_1 -available BDT12c1_2

so that the Instance 1 (BDT12c1_1) is the preferred one.

Test 1:

  • parallel_instance_group not set.

Connect to the database through the BDT_PREF service and check the data distribution.

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     308477952 1610612736           732684288
         2 COMPLETED     274530304 1610612736           826236928

As you can see no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0. The data is distributed across all of the IM column stores in the cluster.

Test 2:

  • parallel_instance_group set on Instance 1.
  • The service is running on Instance 1.

Set the parallel_instance_group parameter on the Instance BDT12c1_1 (INST_ID=1) to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

Check that the BDT_PREF service is running on the Instance 1 (where parallel_instance_group has been set) :

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     575602688 1610612736                   0

As you can see the Instance 1 contains all the data in its IMCS (as BYTES_NOT_POPULATED=0).

Test 3:

  • parallel_instance_group set on Instance 1.
  • The service is running on Instance 2.

The fact that the service is not running on its preferred Instance could happen after:

  • A failover.
  • The database startup: When you use automatic services in an administrator-managed database, during planned database startup, services may start on the first instances that become available rather than their preferred instances (see the oracle documentation).
  • A manual relocation.

Set the parallel_instance_group parameter on the Instance BDT12c1_1 (INST_ID=1) to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

Relocate the BDT_PREF service on the Instance 2 (if needed):

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1

SQL> host srvctl relocate service -d BDT12c1 -s BDT_PREF -newinst BDT12c1_2 -oldinst BDT12c1_1

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_2

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED     277676032 1610612736           826236928
         1 COMPLETED     312672256 1610612736           732684288

As you can see no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0. The data is distributed across all of the IM column stores in the cluster.

Remarks:

  1. This distribution could lead to performance issue.
  2. If a failover occurred and the Instance 1 is still down, then the data is fully populated on the Instance 2.

Test 4:

  • parallel_instance_group set on Instance 1 and on Instance 2.
  • The service is running on Instance 1.

Set the parallel_instance_group parameter on both Instances to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_2';

System altered.

Check that the BDT_PREF service is running on the Instance 1:

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         1 COMPLETED     575602688 1610612736                   0

As you can see the Instance 1 contains all the data in its IMCS (as BYTES_NOT_POPULATED=0).

Test 5:

  • parallel_instance_group set on Instance 1 and on Instance 2.
  • The service is running on Instance 2.

Set the parallel_instance_group parameter on both Instances to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_2';

System altered.

Relocate the BDT_PREF service on the Instance 2 (if needed):

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1

SQL> host srvctl relocate service -d BDT12c1 -s BDT_PREF -newinst BDT12c1_2 -oldinst BDT12c1_1

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_2

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED     575602688 1610612736                   0

As you can see the Instance 2 contains all the data in its IMCS (as BYTES_NOT_POPULATED=0).

Out of curiosity, what about a service defined as preferred on both Instances and the parallel_instance_group set on both Instances?

Test 6:

  • parallel_instance_group set on Instance 1 and on Instance 2.
  • The service is running on Instance 1 and 2.

First let’s create a service BDT_PREF as preferred on both Instances:

srvctl add service -db BDT12c1 -s BDT_PREF -preferred BDT12c1_1,BDT12c1_2

Set the parallel_instance_group parameter on both Instances to the service:

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_1';

System altered.

SQL> alter system set parallel_instance_group=BDT_PREF scope=both sid='BDT12c1_2';

System altered.

Check that the BDT_PREF service is running on both Instances:

SQL> host srvctl status service -d BDT12c1 -s BDT_PREF
Service BDT_PREF is running on instance(s) BDT12c1_1,BDT12c1_2

Connect to the database through the BDT_PREF service and check the data distribution:

SQL> alter table bdt inmemory priority none;

Table altered.

SQL> select count(*) from bdt;

  COUNT(*)
----------
  38220000

SQL> SELECT inst_id, populate_status, inmemory_size, bytes, bytes_not_populated
FROM gv$im_user_segments
WHERE segment_name = 'BDT';

   INST_ID POPULATE_ INMEMORY_SIZE      BYTES BYTES_NOT_POPULATED
---------- --------- ------------- ---------- -------------------
         2 COMPLETED     277676032 1610612736           826236928
         1 COMPLETED     312672256 1610612736           732684288

As you can see no instance contains all data as the BYTES_NOT_POPULATED column is  greater than 0. The data is distributed across all of the IM column stores in the cluster.

REMARKS

  • I am not discussing the impact of the distribution on the performance (with or without Auto DOP).
  • I just focus on how the data has been distributed.
  • In this previous blog post the “workaround” proposed to get rid of the default behaviour on non Engineered Systems (“Every row of the test table is stored in the IMCS of either one instance or the other”) works for any type of service.
  • In the current post I focus only on service defined as preferred/available on a 2 nodes RAC (The last test is out of curiosity).

CONCLUSION

With a 12.1.0.2 database and a service defined as preferred/available on a 2 nodes RAC:

  • With the parallel_instance_group parameter set to the service on both Instances:

The data is fully populated on the Instance the service is running on.

  • With the parallel_instance_group parameter set to the service on the preferred Instance only:

If the service is running on the preferred Instance, the data is fully populated on the Instance.

If the service is running on the “available” Instance, the data is distributed across all of the IM column stores in the cluster, .

With a 12.1.0.2 database, a service defined as preferred on both Instances on a 2 nodes RAC and parallel_instance_group parameter set to the service on both Instances:

The data is distributed across all of the IM column stores in the cluster.

One thought on “In-Memory Instance distribution with RAC databases: Impact of the parallel_instance_group parameter

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.