October 2014 – bdt's blog

Let me describe how I discovered that the optimizer_adaptive_features may have a huge negative impact (specially on RAC databases).

I upgraded a 11gR2 database to 12.1.0.2 and then I converted this database to a PDB. To do so I launched “@$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql” (See MOS Doc ID 1564657.1 for more details).

This sql script took about 75 minutes to complete on my 2 nodes RAC database. This is quite long and then I decided to try to reduce this duration.

To diagnose: I dropped the PDB, plugged it back, re-launched noncdb_to_pdb.sql and enabled a 10046 trace on the session.

The top SQL (sort by elapsed time) for this session is the following:

SQL ID: frjd8zfy2jfdq Plan Hash: 510421217

SELECT executions, end_of_fetch_count,              elapsed_time/px_servers
  elapsed_time,        cpu_time/px_servers     cpu_time,
  buffer_gets/executions  buffer_gets
FROM
 (SELECT sum(executions)   as executions,                            sum(case
  when px_servers_executions > 0                              then
  px_servers_executions                                  else executions end)
  as px_servers,                sum(end_of_fetch_count) as end_of_fetch_count,
                sum(elapsed_time) as elapsed_time,
  sum(cpu_time)     as cpu_time,                     sum(buffer_gets)  as
  buffer_gets            FROM   gv$sql
  WHERE executions > 0                                 AND sql_id = :1
                              AND parsing_schema_name = :2)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    70502      4.02       4.25          0          0          0           0
Execute  70502    264.90     759.33          0          0          0           0
Fetch    70502    215.77    1848.46          0          0          0       70502
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   211506    484.70    2612.05          0          0          0       70502

Misses in library cache during parse: 17
Misses in library cache during execute: 17
Optimizer mode: CHOOSE
Parsing user id: SYS   (recursive depth: 2)
Number of plan statistics captured: 70502

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  VIEW  (cr=0 pr=0 pw=0 time=76 us)
         1          1          1   SORT AGGREGATE (cr=0 pr=0 pw=0 time=76 us)
         0          0          1    PX COORDINATOR  (cr=0 pr=0 pw=0 time=76 us)
         0          0          0     PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
         0          0          0      VIEW  GV$SQL (cr=0 pr=0 pw=0 time=0 us)
         0          0          0       FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=0 us)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PX Deq: Join ACK                           281741        0.03        368.85
  PX Deq: reap credit                       1649736        0.00         25.22
  IPC send completion sync                   141000        0.02        135.07
  PX Deq: Parse Reply                        141004        0.04        170.64
  PX Deq: Execute Reply                      141004        0.08       1366.42
  reliable message                            70502        0.00        125.51
  PX Deq: Signal ACK EXT                     140996        0.03         13.99
  PX Deq: Slave Session Stats                140996        0.02         25.54
  enq: PS - contention                        70605        0.11        145.80
  KJC: Wait for msg sends to complete          2584        0.00          0.04
  latch free                                     16        0.00          0.00
  PX qref latch                                  14        0.00          0.00
  latch: shared pool                              4        0.00          0.00
  latch: active service list                      1        0.00          0.00
  row cache lock                                127        0.00          0.08
  library cache lock                             36        0.00          0.04
  library cache pin                              36        0.00          0.04
  gc cr grant 2-way                               1        0.00          0.00
  db file sequential read                         1        0.00          0.00
  oracle thread bootstrap                         1        0.03          0.03

As you can see this SQL elapsed time is about 2600 seconds in total (for about 70 000 executions), most of the wait time comes from “PX %” events and this SQL queries the GV$SQL view.

But wait:

Why the hell this SQL (which looks like an SQL that collects metrics for a particular sql_id) is somehow part of the session that launched the noncdb_to_pdb.sql script? Does it make sense?

I really don’t think so, this sql (sql_id “frjd8zfy2jfdq”) should have been triggered by something else (parsing or whatever) than noncdb_to_pdb.sql.

Let’s prove it with a simple test case (I am alone on the database):

alter system flush shared_pool;

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

connect / as sysdba

begin
execute immediate 'select object_name from dba_objects';
end;
/

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

With the following result:

SQL> @test_case

System altered.


  COUNT(*)
----------
         0

Connected.

PL/SQL procedure successfully completed.


  COUNT(*)
----------
         2

Did you see that the sql_id “frjd8zfy2jfdq” has been produced by this simple test case? (If you trace the session you would see this query into the trace file).

I reproduced this behavior on:

12.1.0.2 database with CDB/PDB.
12.1.0.2 database (Non CDB).

and was not able to reproduce it on a 11gR2 database.

So, it looks like that this SQL is introduced by a 12.1.0.2 (12cR1) new feature. As it is somehow linked to “SQL metrics collection”, I tried to disable some 12cR1 features (linked to this area) until I found the “culprit”.

It did not produce any change until I set the optimizer_adaptive_features parameter to false (true is the default).

Here is the result:

SQL> !cat test_case.sql
alter system flush shared_pool;

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

connect / as sysdba
alter session set optimizer_adaptive_features=false;

begin
execute immediate 'select object_name from dba_objects';
end;
/

select count(*) from v$sql where sql_id='frjd8zfy2jfdq';

SQL> @test_case

System altered.

  COUNT(*)
----------
         0

Connected.

Session altered.

PL/SQL procedure successfully completed.

  COUNT(*)
----------
         0

BINGO!!! The sql_id “frjd8zfy2jfdq” has not been executed!

Well, now what is the impact on noncdb_to_pdb.sql on my 2 nodes RAC database?

I edited the script and added:

alter session set optimizer_adaptive_features=false;

just above this line:

exec dbms_pdb.noncdb_to_pdb(1);

Guess what?

noncdb_to_pdb.sql took about 20 minutes to execute (compare to about 75 minutes without setting optimizer_adaptive_features to false).
The sql_id “frjd8zfy2jfdq” is not part of the trace file anymore.

Remarks:

The impact of this SQL is much more “visible” with a RAC database, as it queries a GLOBAL V$ view (GV$SQL) and then needs parallel query to run (If more than one instance is up). With only one instance up, the 10046 trace file produced:

SQL ID: frjd8zfy2jfdq Plan Hash: 510421217

SELECT executions, end_of_fetch_count,              elapsed_time/px_servers
  elapsed_time,        cpu_time/px_servers     cpu_time,
  buffer_gets/executions  buffer_gets
FROM
 (SELECT sum(executions)   as executions,                            sum(case
  when px_servers_executions > 0                              then
  px_servers_executions                                  else executions end)
  as px_servers,                sum(end_of_fetch_count) as end_of_fetch_count,
                sum(elapsed_time) as elapsed_time,
  sum(cpu_time)     as cpu_time,                     sum(buffer_gets)  as
  buffer_gets            FROM   gv$sql
  WHERE executions > 0                                 AND sql_id = :1
                              AND parsing_schema_name = :2)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    69204      2.58       2.78          0          0          0           0
Execute  69204     23.93      25.45          0          0          0           0
Fetch    69204      2.68       2.64          0          0          0       69204
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   207612     29.20      30.88          0          0          0       69204

Misses in library cache during parse: 18
Misses in library cache during execute: 18
Optimizer mode: CHOOSE
Parsing user id: SYS   (recursive depth: 2)
Number of plan statistics captured: 62

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  VIEW  (cr=0 pr=0 pw=0 time=120 us)
         1          1          1   SORT AGGREGATE (cr=0 pr=0 pw=0 time=113 us)
         0          0          0    PX COORDINATOR  (cr=0 pr=0 pw=0 time=100 us)
         0          0          0     PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=50 us)
         0          0          0      VIEW  GV$SQL (cr=0 pr=0 pw=0 time=43 us)
         0          0          0       FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=39 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  row cache lock                                  1        0.00          0.00

As you can see: elapsed time of about 30 seconds (for about 70 000 executions) and no “PX%” wait events.

I checked which parameters (hidden or not) changed when setting optimizer_adaptive_features to false. Then I tried one by one those parameters with my test case and discovered that setting “_optimizer_dsdir_usage_control” to 0 is enough to get rid of the sql_id “frjd8zfy2jfdq”.

During the run of noncdb_to_pdb.sql, this is the large number of executions (about 70 000) of the sql_id “frjd8zfy2jfdq” that produced this long “overall” duration and highlighted the fact that this query has to be somehow removed.

You could meet this SQL on 12cR1 databases (CDB/PDB or non CDB).

Conclusion:

The optimizer_adaptive_features (set to true) may produce a huge negative impact on the performance (specially with RAC database). I provided an example of such an impact when running the noncdb_to_pdb.sql script.

Should you meet the sql_id “frjd8zfy2jfdq” in your database and would like to get rid of it (because you observe a negative impact): Then simply set optimizer_adaptive_features to false (or “_optimizer_dsdir_usage_control” to 0) at the session or system level.