Flush a single SQL statement and capture a 10053 trace for it

Just a quick one, to share a simple sql script in case you need to trace the optimizer computations for a single SQL statement.

As you know, Oracle Database 11g, introduced a new diagnostic events infrastructure, which greatly simplifies the task of generating a 10053 trace for a specific SQL statement.

We can capture a 10053 trace for a specific sql_id that way:

alter system set events ‘trace[RDBMS.SQL_Optimizer.*][sql:<YOUR_SQL_ID>]’;

But the trace will be triggered after a hard parse. So, if the sql is already in the shared pool we have 2 choices:

  1. Wait until the statement is hard parsed again (because it has been aged out of the shared pool, because of a new child cursor creation..).
  2. Flush the sql from the shared pool (See Kerry Osborne’s post) so that the next execution will generate a hard parse.

So, If you can’t be patient, then you can use this script to flush the sql and enable the 10053 trace:

SQL> !cat enable_10053_sql_id.sql
set serveroutput on
set pagesize 9999
set linesize 155
var name varchar2(50)

prompt WARNING SQL_ID WILL BE PURGED FROM THE SHARED POOL
accept sql_id prompt 'Enter value for sql_id: '

BEGIN

select address||','||hash_value into :name
from v$sqlarea
where sql_id like '&&sql_id';

dbms_shared_pool.purge(:name,'C',1);

END;
/

alter system set events 'trace[RDBMS.SQL_Optimizer.*][sql:&&sql_id]';

undef sql_id
undef name

Then just wait for the next execution and you’ll get the trace file.

To disable the trace:

SQL> !cat disable_10053_sql_id.sql
prompt DISABLING 10053 trace

accept sql_id prompt 'Enter value for sql_id: '

alter system set events 'trace[RDBMS.SQL_Optimizer.*][sql:&&sql_id] off';

Remark:

  • Starting in 11gR2 you can use DBMS_SQLDIAG.DUMP_TRACE as it doesn’t require you to re-execute the statement to get the trace (It will automatically trigger a hard parse, see MOS 225598.1).
Advertisements

RAC Attack at Oracle OpenWorld 2013: Don’t be afraid !

You’ll attend Oracle OpenWorld 2013 and:

  • You want to install a 12c RAC using oracle Virtual Box on your laptop.
  • Or you just want to have a talk or share your experience with some people around this technology.

So, you should attend to RAC Attack at Oracle OpenWorld 2013:

RAC-attack

Don’t be afraid ! we won’t configure an extended RAC and then no need to break your laptop into 2 separate pieces to simulate a stretched cluster 🙂

You can find more information on the event here.

Join Rac Attack on the social media:

  • Rac Attack is on Google + here.
  • Rac Attack is on twitter: Use the #RacAttack hashtag.
  • Rac Attack is on facebook here.

Hope to see you there !

Exadata Cell metrics: collectionTime attribute, something that matters

Exadata provides a lot of useful metrics to monitor the Cells.

The Metrics can be of various types:

  • Cumulative: Cumulative statistics since the metric was created.
  • Instantaneous: Value at the time that the metric is collected.
  • Rate: Rates computed by averaging statistics over observation periods.
  • Transition: Are collected at the time when the value of the metrics has changed, and typically captures important transitions in hardware status.

One attribute of the cumulative metric is the collectionTime.

For example, let’s have a look to one of them:

CellCLI> list METRICCURRENT DB_IO_WT_SM detail
         name:                   DB_IO_WT_SM
         alertState:             normal
         collectionTime:         2013-09-12T23:46:14+02:00
         metricObjectName:       EXABDT
         metricType:             Cumulative
         metricValue:            120 ms
         objectType:             IORM_DATABASE

The collectionTime attribute is the time at which the metric was collected.

Why does it matter ?

Based on it, we can compute the delta in second between 2 collections.

Let’s see two use cases.

First use case: Suppose, you decided to extract real-time metrics from the cumulative ones. To do so, you created a script that takes a snapshot of the cumulative metrics each second (default interval) and computes the delta with the previous snapshot (yes, I am describing my exadata_metrics.pl script introduced into this post 🙂 ).

Then, if the delta value of the metric is 0, you need to know why (two explanations are possible as we’ll see).

Let’s see an example: I’ll take a snapshot with a 40 seconds interval of 2 IORM cumulative metrics:

./exadata_metrics.pl 40 cell=exacell1  name='DB_IO_WT_.*' objectname='EXABDT'
--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

00:19:21   CELL                    NAME                         OBJECTNAME                                                  VALUE
00:19:21   ----                    ----                         ----------                                                  -----
00:19:21   exacell1                DB_IO_WT_LG                  EXABDT                                                      0.00 ms
00:19:21   exacell1                DB_IO_WT_SM                  EXABDT                                                      0.00 ms

Well, as you can see the computed (delta) value is 0.00 ms but:

  • does it mean that no IO has been queued by the IORM ?
  • or does it mean that the 2 snaps are based on the same collectionTime? (could be the case if the collection interval is greater than the interval you are using with my script).

To answer those questions, I modified the script so that it takes care of the collectionTime: It computes the delta in seconds of the collectionTime recorded into the snapshots.

Let’s see it in action:

Enable the IORM plan:

CellCLI> alter iormplan objective=auto;
IORMPLAN successfully altered

and launch the script with a 40 seconds interval:

./exadata_metrics.pl 40 cell=exacell1  name='DB_IO_WT_.*' objectname='EXABDT'

--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

DELTA(s)   CELL                    NAME                         OBJECTNAME                                                  VALUE
--------   ----                    ----                         ----------                                                  -----
61         exacell1                DB_IO_WT_SM                  EXABDT                                                      0.00 ms
61         exacell1                DB_IO_WT_LG                  EXABDT                                                      1444922.00 ms

--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

DELTA(s)   CELL                    NAME                         OBJECTNAME                                                  VALUE
--------   ----                    ----                         ----------                                                  -----
60         exacell1                DB_IO_WT_SM                  EXABDT                                                      1.00 ms
60         exacell1                DB_IO_WT_LG                  EXABDT                                                      2573515.00 ms

--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

DELTA(s)   CELL                    NAME                         OBJECTNAME                                                  VALUE
--------   ----                    ----                         ----------                                                  -----
0          exacell1                DB_IO_WT_LG                  EXABDT                                                      0.00 ms
0          exacell1                DB_IO_WT_SM                  EXABDT                                                      0.00 ms

Look at the DELTA(s) column: It indicates the delta in seconds for the collectionTime attribute.

So that:

  • DELTA(s) > 0: Means you can check the metric value as the snaps are from 2 distinct collectionTime.
  • DELTA(s) = 0: Means the snaps come from the same collectionTime and then a metric value of 0 is obvious.

Second use case:

As we now have the DELTA(s) value we can compute by our own the associated (_SEC) rate metrics.

For example, from:

./exadata_metrics_orig_new.pl 10 cell=exacell1 name='DB_IO_.*' objectname='EXABDT'
--------------------------------------
----------COLLECTING DATA-------------
--------------------------------------

DELTA(s)   CELL                    NAME                         OBJECTNAME                                                  VALUE                
--------   ----                    ----                         ----------                                                  -----                
60         exacell1                DB_IO_WT_SM                  EXABDT                                                      0.00 ms        
60         exacell1                DB_IO_RQ_SM                  EXABDT                                                      153.00 IO requests
60         exacell1                DB_IO_RQ_LG                  EXABDT                                                      292.00 IO requests
60         exacell1                DB_IO_WT_LG                  EXABDT                                                      830399.00 ms

We can conclude, that:

  • the number of large IO request per second is 292/60=4.87.
  • the number of small IO request per second is 153/60=2.55.

Let’s verify those numbers with their associated rate metrics (DB_IO_RQ_LG_SEC and DB_IO_RQ_SM_SEC):

cellcli -e "list metriccurrent attributes name,metrictype,metricobjectname,metricvalue,collectionTime where name like 'DB_IO_.*' and metricobjectname='EXABDT' and metrictype='Rate'"
         DB_IO_RQ_LG_SEC         Rate    EXABDT  4.9 IO/sec              2013-09-13T16:13:40+02:00
         DB_IO_RQ_SM_SEC         Rate    EXABDT  2.6 IO/sec              2013-09-13T16:13:40+02:00
         DB_IO_WT_LG_RQ          Rate    EXABDT  2,844 ms/request        2013-09-13T16:13:40+02:00
         DB_IO_WT_SM_RQ          Rate    EXABDT  0.0 ms/request          2013-09-13T16:13:40+02:00

Great, that’s the same numbers.

Conclusion:

The collectionTime metric attribute can be very useful when you extract real-time metrics from the cumulative ones as:

  • It provides a way to interpret the results.
  • it provides a way to extract the rate metrics (_SEC) from their cumulatives ones.

Regarding the script:

  • You are able to collect real-time metrics based on cumulative metrics.
  • You can choose the number of snapshots to display and the time to wait between snapshots.
  • You can choose to filter on name and objectname based on predicates (see the help).
  • You can work on all the cells or a subset thanks to the CELL or the GROUPFILE parameter.
  • You can decide the way to compute the metrics with no aggregation, aggregation on cell, objectname or both.

You can download the exadata_metrics.pl script from this repository.

Find out the most physical IO consumers through ASM in real time

Well, suppose you are using ASM and it is servicing a lot of databases per machine. Suddenly the number of IOPS increased in such a way that your sysadmin/storage guy warn you up (I know this is not the real life, we are supposing 😉 ).

So I would like to find out which database(s) are responsible for this load. Of course I could connect on each database and check the oracle statistics but there is a simpler/faster way:  Extract this information from ASM.

Into a previous post ASM I/O Statistics Utility V2, I introduced a new feature of my asmiostat utility: ability to sort on reads or writes.

I just added a new sort option: iops (which is simply Reads/s+Writes/s).

Let’s see the help:

./real_time.pl -type=asmiostat -h

Usage: ./real_time.pl -type=asmiostat [-interval] [-count] [-inst] [-dbinst] [-dg] [-fg] [-ip] [-show] [-sort_field] [-help]
 Default Interval : 1 second.
 Default Count    : Unlimited

  Parameter         Comment                                                           Default
  ---------         -------                                                           -------
  -INST=            ALL - Show all Instance(s)                                        ALL
                    CURRENT - Show Current Instance
                    INSTANCE_NAME,... - choose Instance(s) to display

  -DBINST=          Database Instance to collect (Wildcard allowed)                   ALL
  -DG=              Diskgroup to collect (comma separated list)                       ALL
  -FG=              Failgroup to collect (comma separated list)                       ALL
  -IP=              IP (Exadata Cells) to collect (Wildcard allowed)                  ALL
  -SHOW=            What to show: inst,dbinst,fg|ip,dg,dsk (comma separated list)     DG
  -SORT_FIELD=      reads|writes|iops                                                 NONE

Example: ./real_time.pl -type=asmiostat
Example: ./real_time.pl -type=asmiostat -inst=+ASM1
Example: ./real_time.pl -type=asmiostat -dg=DATA -show=dg
Example: ./real_time.pl -type=asmiostat -dg=data -show=inst,dg,fg
Example: ./real_time.pl -type=asmiostat -show=dg,dsk
Example: ./real_time.pl -type=asmiostat -show=inst,dg,fg,dsk
Example: ./real_time.pl -type=asmiostat -interval=5 -count=3 -sort_field=iops

As you can see, we can now sort on iops.

Let’s now launch the script to retrieve the databases ordered by iops in real time:

  • Using -show=dbinst option as I want to see the databases.
  • Using -sort_field=iops option as I want to order by iops.
./real_time.pl -type=asmiostat -show=dbinst -sort_field=iops
............................
Collecting 1 sec....
............................
17:11:51                                                                             Kby      Avg       AvgBy/               Kby       Avg        AvgBy/
17:11:51   INST     DBINST        DG            FG            DSK          Reads/s   Read/s   ms/Read   Read      Writes/s   Write/s   ms/Write   Write
17:11:51   ------   -----------   -----------   -----------   ----------   -------   ------   -------   ------    ------     -------   --------   ------
17:11:51            IATEBDTO_1                                             575       63696    2.2       113434    56         896       1.9        16384
17:11:51            SMTBDTO_2                                              577       28816    1.1       51140     0          0         0.0        0
17:11:51            BDTO_1                                                 59        920      0.3       15967     30         464       1.7        15838  
17:11:51            BDTO_2                                                 3         48       0.6       16384     0          0         0.0        0      
17:11:51            BKP10GR2_1                                             2         32       0.0       16384     0          0         0.0        0      
17:11:51            JCAASM_1                                               2         32       0.9       16384     0          0         0.0        0      
17:11:51            MILASM_1                                               2         32       0.4       16384     0          0         0.0        0

As you can see the IATEBDTO_1 instance is the one that recorded the most physical IO activity (Reads/s + Writes/s) during the last second.

So we are able to find out quickly which databases are the most physical IO consumers in real time thanks to the ASM metrics.

Remarks:

  • This is real time information: the script takes a snapshot each second (default interval) from the gv$asm_disk_iostat (or gv$asm_disk_stat depending of the version) cumulative view and computes the delta with the previous snapshot.
  • You can display the database instances (show=dbinst) as of 11gr1 (as it is based on the gv$asm_disk_iostat view).
  • You can also find out which host is the most responsible for the physical IO thanks to the show=inst option (as it will display the ASM instances) (if not a 12c Flex ASM).
  • You can also find out which diskgroup is the most responsible for the physical IO thanks to the show=dg option.
  • You can also find out…. (I let you finish the sentence following your needs: failgroups, disks, databases per diskgroup… as the asmiostat utility output is customizable (see this post).
  • You can download the asmiostat utility (which is part of the real_time.pl script) from this repository.

UPDATE: The asmiostat utility is not part of the real_time.pl script anymore. A new utility called asm_metrics.pl has been created. See “ASM metrics are a gold mine. Welcome to asm_metrics.pl, a new utility to extract and to manipulate them in real time” for more information.