Total Pageviews

Thursday 17 December 2015

Oracle Clustering factor

Clustering factor statistic is used by the optimizer during query optimization to determine the relative efficiency of an index. In short, the index clustering factor is a measure of how many I/Os the database would perform if it were to read every row in that table via the index in index order. 

If the rows of a table on disk are sorted in about the same order as the index keys, the database will perform a minimum number of I/Os on the table to read the entire table via the index. That is because the next row needed from an index key would likely be the next row in the table block as well. The query would not be skipping all over the table to find row after row—they are naturally next to each other on the block.

Conversely, if the rows in the table are not in the same order on disk as the index keys—if the data is scattered—the query will tend to perform the maximum number of I/Os on the table, as many as one I/O for every row in the table. That is because as the database scans through the index, the next row needed will probably not be on the same block as the last row. The database will have to discard that block and get another block from the buffer cache. The query will end up reading every block from the buffer as many times as it has rows on it.

So if a table and an index key are in about the same order, the clustering factor will be near the number of blocks in the table and the index will be useful for very large index range scans and for retrieving numerous rows from the table. 

On the other hand, if the data is randomly scattered, the clustering factor will be near the number of rows in the table, and given that the number of rows in a table is usually at least an order of magnitude more than the number of blocks, the index will be less efficient for returning numerous rows.

Example of clustering factor and query cost :
================================

---> Create a table with data sorted on column object_name :

SQL> create table GoodCF as select x.* from ( select * from all_objects order by object_name)x ;

Table created.

---> Create a table with disorganized/scattered data . Used dbms_random to sort it randomly :

SQL> create table Scattered as select x.* from ( select * from all_objects order by dbms_random.random)x ;

Table created.

---> Created index on sorted column :

SQL> create index GoodCF_IDX on GoodCF(object_name) ;

Index created.

---> Created index on scattered data column :

SQL> create index Scattered_IDX on Scattered(object_name) ;

Index created.

---> Gather stat on table GOODCF :

SQL> begin dbms_stats.gather_table_stats ( 'SYS', 'GOODCF',estimate_percent => 100,method_opt=>'for all indexed columns size 254'); end ;
  /
PL/SQL procedure successfully completed.

----> Gather stat on table scattered :

SQL> begin dbms_stats.gather_table_stats ( 'SYS', 'SCATTERED',estimate_percent => 100,method_opt=>'for all indexed columns size 254'); end ;
    /
PL/SQL procedure successfully completed.

----> Let's check the number of block and rows for both the tables . Both tables are having same data and same number of BLOCKS :

SQL> select table_name, blocks, num_rows from dba_tables where table_name in ('GOODCF','SCATTERED') order by 1;

TABLE_NAME   BLOCKS   NUM_ROWS
------------------------------ ---------- ----------
GOODCF     1020      71713
SCATTERED     1020      71714

----> Let's check the CLUSTERING_FACTOR for both the indexes . I see a large difference between the two. The GOODCF_IDX index has a clustering factor very near the number of blocks in the table, whereas the SCATTERED_IDX index has a clustering factor near the number of rows in the table. 

Again, this clustering factor metric is a measure of how many I/Os the database will perform against the table in order to read every row via the index.We can verify this with the below example :

SQL> select table_name, index_name, clustering_factor from dba_indexes where table_name in ('GOODCF','SCATTERED') order by 1;

TABLE_NAME       INDEX_NAME         CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------------------------------------- -----------------
GOODCF       GOODCF_IDX      1020
SCATTERED               SCATTERED_IDX       71640

SQL> show parameter user_dump_dest

NAME             TYPE VALUE
------------------------------------ ----------- ------------------------------
user_dump_dest   string /data01/app/oracle/diag/rdbms/ggtrg/ggtrg/trace

SQL> ALTER SESSION SET EVENTS '10046 trace name context forever,level 12';

Session altered.

SQL> ALTER SESSION SET tracefile_identifier='MYTRACE_GoodCF' ;

Session altered.

SQL> select /*+ index( GOODCF GOODCF_IDX) */ count(subobject_name) from GOODCF;

COUNT(SUBOBJECT_NAME)
---------------------
 457

SQL> ALTER SESSION SET EVENTS '10046 trace name context off';

Session altered.

========================================================================
I/O performed against the index   = 357 I/Os 
Total I/Os performed by the query = 1377 I/Os
Note : 1377 - 357 = 1020  ( Equal to the Clusterfing factor of the index GOODCF_IDX ) 
========================================================================

PARSING IN CURSOR #1 len=73 dep=0 uid=0 oct=3 lid=0 tim=1450372523290892 hv=966910787 ad='a266a890' sqlid='9d45484wu3su3'
select /*+ index( GOODCF GOODCF_IDX) */ count(subobject_name) from GOODCF
END OF STMT
PARSE #1:c=0,e=69,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2836254517,tim=1450372523290891
EXEC #1:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2836254517,tim=1450372523290955
WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1450372523291015
*** 2015-12-17 12:15:23.334
FETCH #1:c=38994,e=43776,p=0,cr=1377,cu=0,mis=0,r=1,dep=0,og=1,plh=2836254517,tim=1450372523334808
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=1377 pr=0 pw=0 time=0 us)'
STAT #1 id=2 cnt=71713 pid=1 pos=1 obj=74669 op='TABLE ACCESS BY INDEX ROWID GOODCF (cr=1377 pr=0 pw=0 time=37284 us cost=1378 size=1219121 card=71713)'
STAT #1 id=3 cnt=71713 pid=2 pos=1 obj=74671 op='INDEX FULL SCAN GOODCF_IDX (cr=357 pr=0 pw=0 time=11189 us cost=358 size=0 card=71713)'
WAIT #1: nam='SQL*Net message from client' ela= 193 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1450372523335123
FETCH #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=2836254517,tim=1450372523335144
WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1450372523335155
*** 2015-12-17 12:15:33.973
WAIT #1: nam='SQL*Net message from client' ela= 10637912 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1450372533973100
CLOSE #1:c=1000,e=763,dep=0,type=0,tim=1450372533973915
========================================================================

SQL> show parameter user_dump_dest

NAME             TYPE VALUE
------------------------------------ ----------- ------------------------------
user_dump_dest   string /data01/app/oracle/diag/rdbms/ggtrg/ggtrg/trace


SQL> ALTER SESSION SET EVENTS '10046 trace name context forever,level 12';

Session altered.

SQL> ALTER SESSION SET tracefile_identifier='MYTRACE_SCATTERED' ;

Session altered.

SQL> select /*+ index( GOODCF GOODCF_IDX) */ count(subobject_name) from GOODCF;

COUNT(SUBOBJECT_NAME)
---------------------
 457

SQL> ALTER SESSION SET EVENTS '10046 trace name context off';

Session altered.

========================================================================
I/O performed against the index   = 357 I/Os 
Total I/Os performed by the query = 71997 I/Os
Note : 71997 - 357 = 71640  ( Equal to the Clusterfing factor of the index SCATTERED_IDX ) 
========================================================================

PARSING IN CURSOR #1 len=81 dep=0 uid=0 oct=3 lid=0 tim=1450373201636876 hv=15917473 ad='a24955d0' sqlid='8fzruxc0g5sd1'
select /*+ index(SCATTERED SCATTERED_IDX) */ count(subobject_name) from SCATTERED
END OF STMT
FETCH #1:c=117983,e=124417,p=1377,cr=71997,cu=0,mis=0,r=1,dep=0,og=1,plh=300164232,tim=1450373201761488
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=71997 pr=1377 pw=0 time=0 us)'
STAT #1 id=2 cnt=71714 pid=1 pos=1 obj=74670 op='TABLE ACCESS BY INDEX ROWID SCATTERED (cr=71997 pr=1377 pw=0 time=116666 us cost=72014 size=1219138 card=71714)'
STAT #1 id=3 cnt=71714 pid=2 pos=1 obj=74672 op='INDEX FULL SCAN SCATTERED_IDX (cr=357 pr=357 pw=0 time=19344 us cost=358 size=0 card=71714)'
WAIT #1: nam='SQL*Net message from client' ela= 154 driver id=1650815232 #bytes=1 p3=0 obj#=74672 tim=1450373201761751
FETCH #1:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=300164232,tim=1450373201761780
WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=74672 tim=1450373201761790
========================================================================

No comments:

Post a Comment