Let me describe how I discovered that the optimizer_adaptive_features may have a huge negative impact (specially on RAC databases).
I upgraded a 11gR2 database to 12.1.0.2 and then I converted this database to a PDB. To do so I launched “@$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql” (See MOS Doc ID 1564657.1 for more details).
This sql script took about 75 minutes to complete on my 2 nodes RAC database. This is quite long and then I decided to try to reduce this duration.
To diagnose: I dropped the PDB, plugged it back, re-launched noncdb_to_pdb.sql and enabled a 10046 trace on the session.
The top SQL (sort by elapsed time) for this session is the following:
SQL ID: frjd8zfy2jfdq Plan Hash: 510421217 SELECT executions, end_of_fetch_count, elapsed_time/px_servers elapsed_time, cpu_time/px_servers cpu_time, buffer_gets/executions buffer_gets FROM (SELECT sum(executions) as executions, sum(case when px_servers_executions > 0 then px_servers_executions else executions end) as px_servers, sum(end_of_fetch_count) as end_of_fetch_count, sum(elapsed_time) as elapsed_time, sum(cpu_time) as cpu_time, sum(buffer_gets) as buffer_gets FROM gv$sql WHERE executions > 0 AND sql_id = :1 AND parsing_schema_name = :2) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 70502 4.02 4.25 0 0 0 0 Execute 70502 264.90 759.33 0 0 0 0 Fetch 70502 215.77 1848.46 0 0 0 70502 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 211506 484.70 2612.05 0 0 0 70502 Misses in library cache during parse: 17 Misses in library cache during execute: 17 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 2) Number of plan statistics captured: 70502 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 1 1 1 VIEW (cr=0 pr=0 pw=0 time=76 us) 1 1 1 SORT AGGREGATE (cr=0 pr=0 pw=0 time=76 us) 0 0 1 PX COORDINATOR (cr=0 pr=0 pw=0 time=76 us) 0 0 0 PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us) 0 0 0 VIEW GV$SQL (cr=0 pr=0 pw=0 time=0 us) 0 0 0 FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=0 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ PX Deq: Join ACK 281741 0.03 368.85 PX Deq: reap credit 1649736 0.00 25.22 IPC send completion sync 141000 0.02 135.07 PX Deq: Parse Reply 141004 0.04 170.64 PX Deq: Execute Reply 141004 0.08 1366.42 reliable message 70502 0.00 125.51 PX Deq: Signal ACK EXT 140996 0.03 13.99 PX Deq: Slave Session Stats 140996 0.02 25.54 enq: PS - contention 70605 0.11 145.80 KJC: Wait for msg sends to complete 2584 0.00 0.04 latch free 16 0.00 0.00 PX qref latch 14 0.00 0.00 latch: shared pool 4 0.00 0.00 latch: active service list 1 0.00 0.00 row cache lock 127 0.00 0.08 library cache lock 36 0.00 0.04 library cache pin 36 0.00 0.04 gc cr grant 2-way 1 0.00 0.00 db file sequential read 1 0.00 0.00 oracle thread bootstrap 1 0.03 0.03
As you can see this SQL elapsed time is about 2600 seconds in total (for about 70 000 executions), most of the wait time comes from “PX %” events and this SQL queries the GV$SQL view.
But wait:
Why the hell this SQL (which looks like an SQL that collects metrics for a particular sql_id) is somehow part of the session that launched the noncdb_to_pdb.sql script? Does it make sense?
I really don’t think so, this sql (sql_id “frjd8zfy2jfdq”) should have been triggered by something else (parsing or whatever) than noncdb_to_pdb.sql.
Let’s prove it with a simple test case (I am alone on the database):
alter system flush shared_pool; select count(*) from v$sql where sql_id='frjd8zfy2jfdq'; connect / as sysdba begin execute immediate 'select object_name from dba_objects'; end; / select count(*) from v$sql where sql_id='frjd8zfy2jfdq';
With the following result:
SQL> @test_case System altered. COUNT(*) ---------- 0 Connected. PL/SQL procedure successfully completed. COUNT(*) ---------- 2
Did you see that the sql_id “frjd8zfy2jfdq” has been produced by this simple test case? (If you trace the session you would see this query into the trace file).
I reproduced this behavior on:
- 12.1.0.2 database with CDB/PDB.
- 12.1.0.2 database (Non CDB).
and was not able to reproduce it on a 11gR2 database.
So, it looks like that this SQL is introduced by a 12.1.0.2 (12cR1) new feature. As it is somehow linked to “SQL metrics collection”, I tried to disable some 12cR1 features (linked to this area) until I found the “culprit”.
It did not produce any change until I set the optimizer_adaptive_features parameter to false (true is the default).
Here is the result:
SQL> !cat test_case.sql alter system flush shared_pool; select count(*) from v$sql where sql_id='frjd8zfy2jfdq'; connect / as sysdba alter session set optimizer_adaptive_features=false; begin execute immediate 'select object_name from dba_objects'; end; / select count(*) from v$sql where sql_id='frjd8zfy2jfdq'; SQL> @test_case System altered. COUNT(*) ---------- 0 Connected. Session altered. PL/SQL procedure successfully completed. COUNT(*) ---------- 0
BINGO!!! The sql_id “frjd8zfy2jfdq” has not been executed!
Well, now what is the impact on noncdb_to_pdb.sql on my 2 nodes RAC database?
I edited the script and added:
alter session set optimizer_adaptive_features=false;
just above this line:
exec dbms_pdb.noncdb_to_pdb(1);
Guess what?
- noncdb_to_pdb.sql took about 20 minutes to execute (compare to about 75 minutes without setting optimizer_adaptive_features to false).
- The sql_id “frjd8zfy2jfdq” is not part of the trace file anymore.
Remarks:
- The impact of this SQL is much more “visible” with a RAC database, as it queries a GLOBAL V$ view (GV$SQL) and then needs parallel query to run (If more than one instance is up). With only one instance up, the 10046 trace file produced:
SQL ID: frjd8zfy2jfdq Plan Hash: 510421217 SELECT executions, end_of_fetch_count, elapsed_time/px_servers elapsed_time, cpu_time/px_servers cpu_time, buffer_gets/executions buffer_gets FROM (SELECT sum(executions) as executions, sum(case when px_servers_executions > 0 then px_servers_executions else executions end) as px_servers, sum(end_of_fetch_count) as end_of_fetch_count, sum(elapsed_time) as elapsed_time, sum(cpu_time) as cpu_time, sum(buffer_gets) as buffer_gets FROM gv$sql WHERE executions > 0 AND sql_id = :1 AND parsing_schema_name = :2) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 69204 2.58 2.78 0 0 0 0 Execute 69204 23.93 25.45 0 0 0 0 Fetch 69204 2.68 2.64 0 0 0 69204 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 207612 29.20 30.88 0 0 0 69204 Misses in library cache during parse: 18 Misses in library cache during execute: 18 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 2) Number of plan statistics captured: 62 Rows (1st) Rows (avg) Rows (max) Row Source Operation ---------- ---------- ---------- --------------------------------------------------- 1 1 1 VIEW (cr=0 pr=0 pw=0 time=120 us) 1 1 1 SORT AGGREGATE (cr=0 pr=0 pw=0 time=113 us) 0 0 0 PX COORDINATOR (cr=0 pr=0 pw=0 time=100 us) 0 0 0 PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=50 us) 0 0 0 VIEW GV$SQL (cr=0 pr=0 pw=0 time=43 us) 0 0 0 FIXED TABLE FIXED INDEX X$KGLCURSOR_CHILD (ind:2) (cr=0 pr=0 pw=0 time=39 us) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ row cache lock 1 0.00 0.00
As you can see: elapsed time of about 30 seconds (for about 70 000 executions) and no “PX%” wait events.
- I checked which parameters (hidden or not) changed when setting optimizer_adaptive_features to false. Then I tried one by one those parameters with my test case and discovered that setting “_optimizer_dsdir_usage_control” to 0 is enough to get rid of the sql_id “frjd8zfy2jfdq”.
- During the run of noncdb_to_pdb.sql, this is the large number of executions (about 70 000) of the sql_id “frjd8zfy2jfdq” that produced this long “overall” duration and highlighted the fact that this query has to be somehow removed.
- You could meet this SQL on 12cR1 databases (CDB/PDB or non CDB).
Conclusion:
- The optimizer_adaptive_features (set to true) may produce a huge negative impact on the performance (specially with RAC database). I provided an example of such an impact when running the noncdb_to_pdb.sql script.
- Should you meet the sql_id “frjd8zfy2jfdq” in your database and would like to get rid of it (because you observe a negative impact): Then simply set optimizer_adaptive_features to false (or “_optimizer_dsdir_usage_control” to 0) at the session or system level.
Hi Bertrand
The query is also used for adaptive dynamic sampling.
Therefore I have a question… Could you please let us know whether optimizer_dynamic_sampling was set to 11 or not?
Thank you,
Chris
Hello Christian,
Thanks for passing by!
optimizer_dynamic_sampling was set to 2 on my 11gR2 and 12.1.0.2 databases mentioned into the blog post.
Your remark is very interesting, then I just launched my test_case with:
1)
”
optimizer_adaptive_features=false;
“_optimizer_dsdir_usage_control”=0;
optimizer_dynamic_sampling=11;
”
=> the query “frjd8zfy2jfdq” is still there on the 12.1.0.2 database.
2)
”
optimizer_adaptive_features=false;
“_optimizer_dsdir_usage_control”=0;
optimizer_dynamic_sampling=10;
”
=> the query “frjd8zfy2jfdq” is not there on the 12.1.0.2 database.
So, on the 12.1.0.2 database, if optimizer_dynamic_sampling=11 then setting “optimizer_adaptive_features=false” or “”_optimizer_dsdir_usage_control”=0” is NOT ENOUGH to get rid of “frjd8zfy2jfdq”.
Now, What’s about 11.2.0.4?
On my 11.2.0.4 “2 nodes RAC” database:
With optimizer_dynamic_sampling = 10 then I do not see the query into the session’s trace (running my test_case).
With optimizer_dynamic_sampling=11, then I can see this sql “5pk3xph8kxd3u” into the trace:
SQL ID: 5pk3xph8kxd3u Plan Hash: 987782979
SELECT executions, end_of_fetch_count, elapsed_time/executions
elapsed_time, cpu_time/executions cpu_time,
buffer_gets/executions buffer_gets
FROM
(SELECT sum(executions) as executions,
sum(end_of_fetch_count) as end_of_fetch_count,
sum(elapsed_time) as elapsed_time, sum(cpu_time) as
cpu_time, sum(buffer_gets) as buffer_gets
FROM v$sql WHERE executions > 0
AND sql_id = :1
AND parsing_schema_name = :2)
Which:
– Is not exactly the same as “frjd8zfy2jfdq” observed in 12.1.0.2.
– Queries “V$SQL”, while “frjd8zfy2jfdq” queries the global view GV$SQL (Then less impact is expected from “5pk3xph8kxd3u” compare to “frjd8zfy2jfdq”).
Cheers,
Bertrand
This is a very interesting article. We’re seeing something similar in 12c which has been reported to Oracle as a bug, where a CTAS is initiated with DOP>1, and it does some evaluation at start of the query then downgrades to serial execution. We have changed the parameter you suggest, plus some others such as changing parallel policy to manual, but cannot avoid this behavior, unless we add a dummy where clause e.g. where rownum < 100000000000.