Let me describe how I discovered that the optimizer_adaptive_features may have a huge negative impact (specially on RAC databases).
I upgraded a 11gR2 database to 12.1.0.2 and then I converted this database to a PDB. To do so I launched “@$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql” (See MOS Doc ID 1564657.1 for more details).
This sql script took about 75 minutes to complete on my 2 nodes RAC database. This is quite long and then I decided to try to reduce this duration.
To diagnose: I dropped the PDB, plugged it back, re-launched noncdb_to_pdb.sql and enabled a 10046 trace on the session.
The top SQL (sort by elapsed time) for this session is the following:
SQL ID: frjd8zfy2jfdq Plan Hash: 510421217 SELECT executions, end_of_fetch_count, elapsed_time/px_servers elapsed_time, cpu_time/px_servers cpu_time, buffer_gets/executions buffer_gets FROM (SELECT sum(executions) as executions, sum(case when px_servers_executions > 0 then px_servers_executions else executions end) as px_servers, sum(end_of_fetch_count) as end_of_fetch_count, sum(elapsed_time) as elapsed_time, sum(cpu_time) as cpu_time, sum(buffer_gets) as buffer_gets FROM gv$sql WHERE executions > 0 AND sql_id = :1 AND parsing_schema_name = :2) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 70502 4.02 4.25 0 0 0 0 Execute 70502 264.90 759.33 0 0 0 0 Fetch 70502 215.77 1848.46 0 0 0 70502 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 211506 484.70 2612.05 0 0 0 70502 Misses in library cache during parse: 17 Misses in library cache during execute: 17 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 2) Number of plan statistics captured: 70502 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 1 1 1 VIEW (cr=0 pr=0 pw=0 time=76 us) 1 1 1 SORT AGGREGATE (cr=0 pr=0 pw=0 time=76 us) 0 0 1 PX COORDINATOR (cr=0 pr=0 pw=0 time=76 us) 0 0 0 PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us) 0 0 0 VIEW GV$SQL (cr=0 pr=0 pw=0 time=0 us) 0 0 0 FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=0 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ PX Deq: Join ACK 281741 0.03 368.85 PX Deq: reap credit 1649736 0.00 25.22 IPC send completion sync 141000 0.02 135.07 PX Deq: Parse Reply 141004 0.04 170.64 PX Deq: Execute Reply 141004 0.08 1366.42 reliable message 70502 0.00 125.51 PX Deq: Signal ACK EXT 140996 0.03 13.99 PX Deq: Slave Session Stats 140996 0.02 25.54 enq: PS - contention 70605 0.11 145.80 KJC: Wait for msg sends to complete 2584 0.00 0.04 latch free 16 0.00 0.00 PX qref latch 14 0.00 0.00 latch: shared pool 4 0.00 0.00 latch: active service list 1 0.00 0.00 row cache lock 127 0.00 0.08 library cache lock 36 0.00 0.04 library cache pin 36 0.00 0.04 gc cr grant 2-way 1 0.00 0.00 db file sequential read 1 0.00 0.00 oracle thread bootstrap 1 0.03 0.03
As you can see this SQL elapsed time is about 2600 seconds in total (for about 70 000 executions), most of the wait time comes from “PX %” events and this SQL queries the GV$SQL view.
But wait:
Why the hell this SQL (which looks like an SQL that collects metrics for a particular sql_id) is somehow part of the session that launched the noncdb_to_pdb.sql script? Does it make sense?
I really don’t think so, this sql (sql_id “frjd8zfy2jfdq”) should have been triggered by something else (parsing or whatever) than noncdb_to_pdb.sql.
Let’s prove it with a simple test case (I am alone on the database):
alter system flush shared_pool; select count(*) from v$sql where sql_id='frjd8zfy2jfdq'; connect / as sysdba begin execute immediate 'select object_name from dba_objects'; end; / select count(*) from v$sql where sql_id='frjd8zfy2jfdq';
With the following result:
SQL> @test_case System altered. COUNT(*) ---------- 0 Connected. PL/SQL procedure successfully completed. COUNT(*) ---------- 2
Did you see that the sql_id “frjd8zfy2jfdq” has been produced by this simple test case? (If you trace the session you would see this query into the trace file).
I reproduced this behavior on:
- 12.1.0.2 database with CDB/PDB.
- 12.1.0.2 database (Non CDB).
and was not able to reproduce it on a 11gR2 database.
So, it looks like that this SQL is introduced by a 12.1.0.2 (12cR1) new feature. As it is somehow linked to “SQL metrics collection”, I tried to disable some 12cR1 features (linked to this area) until I found the “culprit”.
It did not produce any change until I set the optimizer_adaptive_features parameter to false (true is the default).
Here is the result:
SQL> !cat test_case.sql alter system flush shared_pool; select count(*) from v$sql where sql_id='frjd8zfy2jfdq'; connect / as sysdba alter session set optimizer_adaptive_features=false; begin execute immediate 'select object_name from dba_objects'; end; / select count(*) from v$sql where sql_id='frjd8zfy2jfdq'; SQL> @test_case System altered. COUNT(*) ---------- 0 Connected. Session altered. PL/SQL procedure successfully completed. COUNT(*) ---------- 0
BINGO!!! The sql_id “frjd8zfy2jfdq” has not been executed!
Well, now what is the impact on noncdb_to_pdb.sql on my 2 nodes RAC database?
I edited the script and added:
alter session set optimizer_adaptive_features=false;
just above this line:
exec dbms_pdb.noncdb_to_pdb(1);
Guess what?
- noncdb_to_pdb.sql took about 20 minutes to execute (compare to about 75 minutes without setting optimizer_adaptive_features to false).
- The sql_id “frjd8zfy2jfdq” is not part of the trace file anymore.
Remarks:
- The impact of this SQL is much more “visible” with a RAC database, as it queries a GLOBAL V$ view (GV$SQL) and then needs parallel query to run (If more than one instance is up). With only one instance up, the 10046 trace file produced:
SQL ID: frjd8zfy2jfdq Plan Hash: 510421217 SELECT executions, end_of_fetch_count, elapsed_time/px_servers elapsed_time, cpu_time/px_servers cpu_time, buffer_gets/executions buffer_gets FROM (SELECT sum(executions) as executions, sum(case when px_servers_executions > 0 then px_servers_executions else executions end) as px_servers, sum(end_of_fetch_count) as end_of_fetch_count, sum(elapsed_time) as elapsed_time, sum(cpu_time) as cpu_time, sum(buffer_gets) as buffer_gets FROM gv$sql WHERE executions > 0 AND sql_id = :1 AND parsing_schema_name = :2) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 69204 2.58 2.78 0 0 0 0 Execute 69204 23.93 25.45 0 0 0 0 Fetch 69204 2.68 2.64 0 0 0 69204 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 207612 29.20 30.88 0 0 0 69204 Misses in library cache during parse: 18 Misses in library cache during execute: 18 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 2) Number of plan statistics captured: 62 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 1 1 1 VIEW (cr=0 pr=0 pw=0 time=120 us) 1 1 1 SORT AGGREGATE (cr=0 pr=0 pw=0 time=113 us) 0 0 0 PX COORDINATOR (cr=0 pr=0 pw=0 time=100 us) 0 0 0 PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=50 us) 0 0 0 VIEW GV$SQL (cr=0 pr=0 pw=0 time=43 us) 0 0 0 FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=39 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ row cache lock 1 0.00 0.00
As you can see: elapsed time of about 30 seconds (for about 70 000 executions) and no “PX%” wait events.
- I checked which parameters (hidden or not) changed when setting optimizer_adaptive_features to false. Then I tried one by one those parameters with my test case and discovered that setting “_optimizer_dsdir_usage_control” to 0 is enough to get rid of the sql_id “frjd8zfy2jfdq”.
- During the run of noncdb_to_pdb.sql, this is the large number of executions (about 70 000) of the sql_id “frjd8zfy2jfdq” that produced this long “overall” duration and highlighted the fact that this query has to be somehow removed.
- You could meet this SQL on 12cR1 databases (CDB/PDB or non CDB).
Conclusion:
- The optimizer_adaptive_features (set to true) may produce a huge negative impact on the performance (specially with RAC database). I provided an example of such an impact when running the noncdb_to_pdb.sql script.
- Should you meet the sql_id “frjd8zfy2jfdq” in your database and would like to get rid of it (because you observe a negative impact): Then simply set optimizer_adaptive_features to false (or “_optimizer_dsdir_usage_control” to 0) at the session or system level.