Direct Path Read and “enq: KO – fast object checkpoint”

Introduction

When using direct path read, oracle reads the blocks from the disk. The buffer cache is not being used.

Then, when oracle begins to read a segment using direct path read, it flush its dirty blocks to disk (A dirty block is a block that does not have the same image in memory and on disk). During that time, the session that triggered the direct path read is waiting for the “enq: KO – fast object checkpoint” wait event.

In this post, I just want to see with my own eyes that the dirty blocks being flushed to disk belong only to the segment being read.

Tests

For the test, I used two tables on a 11.2.0.4 database. One table is made of 2 partitions and one is not partitioned (then 3 segments):

BDT:BDT1> select SEGMENT_NAME,PARTITION_NAME from dba_segments where segment_name in ('BDT1','BDT2');

SEGMENT_NAME PARTITION_NAME
------------ --------------
BDT2
BDT1         P2
BDT1         P1

The rows distribution across segments, blocks and file relative numbers is the following:

BDT:BDT1> l
  1  select u.object_name,u.subobject_name,dbms_rowid.rowid_block_number(BDT1.rowid) BLOCK_NUMBER, dbms_rowid.rowid_relative_fno(BDT1.rowid) FNO
  2	 from BDT1, user_objects u
  3	 where dbms_rowid.rowid_object(BDT1.rowid) = u.object_id
  4	 union all
  5	 select u.object_name,u.subobject_name,dbms_rowid.rowid_block_number(BDT2.rowid) BLOCK_NUMBER, dbms_rowid.rowid_relative_fno(BDT2.rowid) FNO
  6	 from BDT2, user_objects u
  7	 where dbms_rowid.rowid_object(BDT2.rowid) = u.object_id
  8	 order by 1,2
  9*
BDT:BDT1> /

OBJECT_NAME  SUBOBJECT_NAME  BLOCK_NUMBER FNO
------------ --------------- ------------ -----------
BDT1	     P1              2509842	  5
BDT1	     P1              2509842      5
BDT1	     P1              2509842      5
BDT1	     P1              2509842      5
BDT1	     P2              2510866      5
BDT1	     P2              2510866      5
BDT1	     P2              2510866      5
BDT1	     P2              2510866      5
BDT1	     P2              2510866      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5
BDT2                         1359091      5

18 rows selected.

so that all the rows of the segment:

  • BDT1:P1 are in datafile 5, block 2509842
  • BDT1:P2 are in datafile 5, block 2510866
  • BDT2 are in datafile 5, block 1359091

Remark:

BDT1:P<N> stands for table BDT1 partition <N>

Now, let’s update both tables (without committing):

BDT:BDT1> update BDT1 set ID2=100;

9 rows updated.

BDT:BDT1> update BDT2 set ID2=100;

9 rows updated.

As I do not commit (nor rollback), then those 3 blocks in memory contain an active transaction.

In another session, let’s force serial direct path read and read partition P1 from BDT1 only:

BDT:BDT1> alter session set "_serial_direct_read"=always;

Session altered.

BDT:BDT1> select count(*) from BDT1 partition (P1);

COUNT(*)
-----------
4

Now, check its wait events and related statistics with snapper. A cool thing with snapper is that I can get wait events and statistics from one tool.

SYS:BDT1> @snapper all,end 5 1 20
Sampling SID 20 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.22 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! 🙂

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 SID @INST, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    20  @1, BDT       , STAT, physical reads direct                                     ,             1,        .02,         ,             ,          ,           ,         .5 per execution
    20  @1, BDT       , WAIT, enq: KO - fast object checkpoint                          ,          1117,    18.12us,      .0%, [          ],         1,        .02,     1.12ms average wait

--  End of Stats snap 1, end=2015-04-29 08:25:20, seconds=61.7

--  End of ASH snap 1, end=, seconds=, samples_taken=0, AAS=(No ASH sampling in begin/end snapshot mode)

PL/SQL procedure successfully completed.

For brevity and clarity, I keep only the wait events and statistics of interest.

As we can see, the direct path read has been used and the “fast object checkpoint” occurred (as the session waited for it).

Now let’s check which block(s) has/have been flushed to disk: to do so, I’ll dump those 3 blocks from disk and check if they contain an active transaction (as the ones in memory).

BDT:BDT1> !cat dump_blocks.sql
-- BDT2
alter system dump datafile 5 block 1359091;
-- BDT1:P2
alter system dump datafile 5 block 2510866;
-- BDT1:P1
alter system dump datafile 5 block 2509842;
select value from v$diag_info where name like 'Default%';

BDT:BDT1> @dump_blocks.sql

System altered.

System altered.

System altered.

VALUE
------------------------------
/u01/app/oracle/diag/rdbms/bdt/BDT1/trace/BDT1_ora_91815.trc

BDT:BDT1> !view /u01/app/oracle/diag/rdbms/bdt/BDT1/trace/BDT1_ora_91815.trc

The Interested Transaction List (ITL) for the first block that has been dumped from disk (BDT2 Table) looks like:

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.0016f9c7
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

We can see that the block on disk does not contain an active transaction in it (There is no rows locked as Lck=0). We can conclude that the block that belongs to the BDT2 table has not been flushed to disk (The block is still dirty).

The Interested Transaction List (ITL) for the second block that has been dumped from disk (BDT1:P2) looks like:

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.0016f99f
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

We can see that the block on disk does not contain an active transaction in it (There is no rows locked as Lck=0). We can conclude that the block that belongs to the partition P2 of the BDT1 table has not been flushed to disk (The block is still dirty).

The Interested Transaction List (ITL) for the third block that has been dumped from disk (BDT1:P1) looks like:

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.0016f998
0x02   0x000a.005.00001025  0x00c00211.038e.23  ----    4  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

We can see that the block on disk does contain an active transaction in it (4 rows are locked as Lck=4). This is the block that has been read in direct path read during the select on BDT1:P1.

We can conclude that the block that belongs to BDT1:P1 has been flushed to disk (The block is not dirty anymore).

Conclusion

When oracle begins to read a segment using direct path read:

  • It does not flush to disk the dirty blocks from other segments.
  • Only the dirty blocks of the segment being read are flushed to disk.
Advertisements

ASM Bytes Read and Smart Scan

Introduction

I like to query the v$asm_disk_iostat cumulative view (at the ASM instance level) as this is a centralized location where you can find metrics for all the databases the ASM instance is servicing. One of its metric is:

BYTES_READ: Total number of bytes read from the disk (see the oracle documentation).

I would like to see which value is recorded into this metric in case of smart scan. To do so, I’ll launch some tests and check the output of the asm_metrics utility: It basically takes a snapshot each second (default interval) from the gv$asm_disk_iostat cumulative view and computes the delta with the previous snapshot.

Environment

ASM and database versions are 11.2.0.4. A segment of about 19.5 gb has been created without any indexes, so that a full table scan is triggered during the tests.

Tests

During the tests:

  • This simple query is launched:
select id from bdt where id=1;
  • Then, the asm metrics will be recorded that way:
./asm_metrics.pl -show=dbinst,fg,inst -dbinst=BDT2 -inst=+ASM2 -display=avg
  • We’ll look at the output field “Kby Read/s” which is based on the BYTES_READ column coming from the v$asm_disk_iostat cumulative view.
  • The sql elapsed time and the percentage of IO saved by the offload (if any) will be checked with a query that looks like fsx.sql. By “percentage of IO saved” I mean the ratio of data received from the storage cells to the actual amount of data that would have had to be received on non-Exadata storage.

Test 1: Launch the query without offload

BDT:BDT2> alter session set cell_offload_processing=false;

Session altered.

BDT:BDT2> select /* NO_SMART_SCAN_NO_STORAGEIDX */ id from bdt where id=1;

The elapsed time and the percentage of IO saved are:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD ELIGIBLE_MB  INTERCO_MB OFF_RETU_MB IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ----------- ----------- ----------- ---------- ----------------------------------------------------------------------
7y7xa34jjab2q	   0   627556429      1      23.11	0 No		    0 19413.42969	    0	     .00 select /* NO_SMART_SCAN_NO_STORAGEIDX */ id from bdt where id=1

The elapsed time of the sql is 23.11 seconds and obviously no IO have been saved by offload. As you can see about 19.5 gb has been exchanged between the Oracle Database and the storage system (INTERCO_MB is based on IO_INTERCONNECT_BYTES column from v$sql).

The asm_metrics ouput is the following:

06:39:47                                                                            Kby       Avg       AvgBy/               Kby       Avg        AvgBy/
06:39:47   INST     DBINST        DG            FG           DSK          Reads/s   Read/s    ms/Read   Read      Writes/s   Write/s   ms/Write   Write
06:39:47   ------   -----------   -----------   ----------   ----------   -------   -------   -------   ------    ------     -------   --------   ------
06:39:47   +ASM2                                                          869       885423    3.7       1042916   2          24        0.9        15477
06:39:47   +ASM2    BDT2                                                  869       885423    3.7       1042916   2          24        0.9        15477
06:39:47   +ASM2    BDT2                        ENKCEL01                  300       305865    3.7       1044812   1          8         1.2        15061
06:39:47   +ASM2    BDT2                        ENKCEL02                  330       335901    3.7       1041451   1          8         0.8        15061
06:39:47   +ASM2    BDT2                        ENKCEL03                  239       243657    3.6       1042564   0          8         0.6        16384

So the average Kby Read per second is 885423.

Then we can conclude that the BYTES_READ column records that about 885423 * 23.11 = 19.5 gb has been read from disk.

Does it make sense? Yes.

Test 2: Offload and no storage indexes

BDT:BDT2> alter session set cell_offload_processing=true;

Session altered.

BDT:BDT2> alter session set "_kcfis_storageidx_disabled"=true;

Session altered.

BDT:BDT2> select /* WITH_SMART_SCAN_NO_STORAGEIDX */ id from bdt where id=1;

The elapsed time and the percentage of IO saved are:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD ELIGIBLE_MB  INTERCO_MB OFF_RETU_MB IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ----------- ----------- ----------- ---------- ----------------------------------------------------------------------
5zzvpyn94b05g	   0   627556429      1       4.11	0 Yes	  19413.42969 2.860450745 2.860450745	   99.99 select /* WITH_SMART_SCAN_NO_STORAGEIDX */ id from bdt where id=1

The elapsed time of the sql is 4.11 seconds and 99.99 % of IO has been saved by offload. About 2.8 mb have been exchanged between the Oracle Database and the storage system.

The asm_metrics ouput is the following:

06:41:54                                                                            Kby       Avg       AvgBy/               Kby       Avg        AvgBy/
06:41:54   INST     DBINST        DG            FG           DSK          Reads/s   Read/s    ms/Read   Read      Writes/s   Write/s   ms/Write   Write
06:41:54   ------   -----------   -----------   ----------   ----------   -------   -------   -------   ------    ------     -------   --------   ------
06:41:54   +ASM2                                                          4862      4969898   0.0       1046671   1          12        6.4        16384
06:41:54   +ASM2    BDT2                                                  4862      4969898   0.0       1046671   1          12        6.4        16384
06:41:54   +ASM2    BDT2                        ENKCEL01                  1678      1715380   0.0       1047123   0          4         17.2       16384
06:41:54   +ASM2    BDT2                        ENKCEL02                  1844      1883738   0.0       1046067   0          4         1.0        16384
06:41:54   +ASM2    BDT2                        ENKCEL03                  1341      1370780   0.0       1046935   0          4         1.0        16384

So the average Kby Read per second is 4969898.

Then we can conclude that the BYTES_READ column records that about 4969898 *4.11 = 19.5 gb has been read from disk.

Does it make sense? I would say yes, because the storage indexes haven’t been used. Then, during the smart scan all the datas blocks have been opened in the cells in order to extract and send back to the database layer the requested column (column projection) on the selected rows (row filtering).

Test 3: Offload and storage indexes

BDT:BDT2> alter session set "_kcfis_storageidx_disabled"=false;

Session altered.

BDT:BDT2> select /* WITH_SMART_SCAN_AND_STORAGEIDX */ id from bdt where id=1;

The elapsed time and the percentage of IO saved are:

SQL_ID	       CHILD   PLAN_HASH  EXECS  AVG_ETIME AVG_PX OFFLOAD ELIGIBLE_MB  INTERCO_MB OFF_RETU_MB IO_SAVED_% SQL_TEXT
------------- ------ ----------- ------ ---------- ------ ------- ----------- ----------- ----------- ---------- ----------------------------------------------------------------------
3jdpqa2s4bb0v	   0   627556429      1        .09	0 Yes	  19413.42969  .062171936  .062171936	  100.00 select /* WITH_SMART_SCAN_AND_STORAGEIDX */ id from bdt where id=1

The elapsed time of the sql is 0.09 seconds and about 100 % of IO has been saved by offload. About 0.06 mb have been exchanged between the Oracle Database and the storage system.

The storage indexes saved a lot of reads (almost the whole table):

BDT:BDT2> l
  1* select n.name, s.value from v$statname n, v$mystat s where n.name='cell physical IO bytes saved by storage index' and n.STATISTIC#=s.STATISTIC#
BDT:BDT2> /
NAME							     VALUE
-------------------------------------------------- ---------------
cell physical IO bytes saved by storage index	       20215947264

The asm_metrics ouput is the following:

06:43:58                                                                            Kby        Avg       AvgBy/               Kby       Avg        AvgBy/
06:43:58   INST     DBINST        DG            FG           DSK          Reads/s   Read/s     ms/Read   Read      Writes/s   Write/s   ms/Write   Write
06:43:58   ------   -----------   -----------   ----------   ----------   -------   -------    -------   ------    ------     -------   --------   ------
06:43:58   +ASM2                                                          19436     19879384   0.0       1047360   0          0         0.0        0
06:43:58   +ASM2    BDT2                                                  19436     19879384   0.0       1047360   0          0         0.0        0
06:43:58   +ASM2    BDT2                        ENKCEL01                  6708      6861488    0.0       1047430   0          0         0.0        0
06:43:58   +ASM2    BDT2                        ENKCEL02                  7369      7534840    0.0       1047045   0          0         0.0        0
06:43:58   +ASM2    BDT2                        ENKCEL03                  5359      5483056    0.0       1047705   0          0         0.0        0

So the average Kby Read per second is 19879384.

As the asm_metrics utility’s granularity to collect the data is one second and as the elapsed time is < 1s then we can conclude that the BYTES_READ column records that about 19.5 gb has been read from disk.

Does it make sense? I would say no, because during the smart scan, thanks to the Storage indexes, not all the datas blocks have been opened in the cells in order to extract the requested column (column projection) on the selected rows (row filtering).

Remark

You could also query the v$asm_disk_iostat view and measure the delta for the BYTES_READ column by your own (means without the asm_metrics utility). The results would be the same.

Conclusion

The BYTES_READ column displays:

  • The Total number of bytes read from the disk (and also transferred to the database) without smart scan.
  • The Total number of bytes read from the disk (but not the bytes transferred to the database) with smart scan and no storage indexes being used.
  • Neither the Total number of bytes read from the disk nor the bytes transferred to the database with smart scan and storage indexes being used.