Total Pageviews

Sunday 20 December 2015

METHOD_OPT : 'FOR ALL INDEXED COLUMNS' Vs 'FOR ALL COLUMNS SIZE AUTO'

Create a table with 3 columns with randomly generated numbers . Below is the some of the attributes of the table columns :

SQL> desc TEST ;
 Name                Type
  ----------------------------
 PRIMKEY         NUMBER
 COL2                 NUMBER
 COL3                 NUMBER


  • Total rows in table = 800000
  • Column PRIMKEY is having 800000 distinct values
  • Column COL2 has only 4 distinct values
  • Column COL3 has only 3 distinct values


begin
      dbms_stats.gather_table_stats( ownname => 'SYS' ,tabname => 'test' ,  estimate_percent => 100 ,
       cascade => true);
   end;
   /

Check Column statistics  :

SQL > select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV        HISTOGRAM                  NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- ---------------------------------------------
PRIMKEY              800000   NONE                                    1                         2015-19-12 16:12:22
COL2                         4           FREQUENCY                      4                         2015-19-12 16:12:22
COL3                         3           NONE                                    1                        2015-19-12 16:12:22


Create a unique index on primkey :

 create unique index PK_T1 on TEST(PRIMKEY);

Gather stat : method_opt => 'for all indexed columns' 

If you are using FOR ALL INDEXED COLUMNS as part of your METHOD_OPT you probably should not be.   "FOR ALL INDEXED COLUMNS"  tells DBMS_STATS to only gather stats on columns that participate in an index . No stats are calculated for non indexed columns of the table .

SQL> begin
    dbms_stats.delete_table_stats(user,'TEST');
   dbms_stats.gather_table_stats( ownname => user ,tabname => 'TEST' ,estimate_percent => 100 ,        method_opt => 'for all indexed columns'
  ,cascade => true);
  end;
  /

PL/SQL procedure successfully completed.


SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV  HISTOGRAM                  NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- ---------------------------------------------------
PRIMKEY         800000   HEIGHT BALANCED      75                    2015-19-12 16:13:36

Notice that in above cases only column PRIMKEY has stats on it. Columns COL2  and COL3 do not have any stats collected on them. Also note that when no SIZE clause is specified, it defaults to 75 buckets.

It is wasteful to create a histogram on a column that is never referenced by a SQL statement.  It is the SQL workload that determines if a column needs histograms.

Gather stat : method_opt => 'for all indexed columns size auto' 

Using MENTOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE AUTO' tells DBMS_STATS: “only gather stats on columns that participate in an index and based on data distribution and the workload of those indexed columns decide if a histogram should be created and how many buckets it should contain”.

The default value of  "for all columns size auto" is the Oracle default and this option will analyze histograms on what the optimizer considers to be "important columns".  The optimizer does this by examining your current workload when you run the dbms_stats command, interrogating all current SQL in the library cache to see which columns might benefit from a histogram.

SQL> begin
   dbms_stats.delete_table_stats(user,'TEST');

   dbms_stats.gather_table_stats(
     ownname => user ,
     tabname => 'TEST' ,
     estimate_percent => 100 ,
     method_opt => 'for all indexed columns size auto' ,
     cascade => true);
 end;
 /

PL/SQL procedure successfully completed.


SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV     HISTOGRAM    NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- -----------------------------------------
PRIMKEY            800000        NONE                        1                     2015-19-12 16:14:39
 

Impact of METHOD_OPT on cardinality  estimates and access path 


CASE1 : method_opt => 'for all indexed columns size auto'  with only single predicate in query

SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 ;

  COUNT(*)
----------
    660000

SQL>  select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID 288h7750wj3z1, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000
Plan hash value: 1950795681
-------------------------------------------------------------------------------------
| Id  | Operation    | Name | Starts | E-Rows | A-Rows |  A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |   |  1 |     |    1 |00:00:00.27 |    1856 |
|   1 |  SORT AGGREGATE    |   |  1 |   1 |    1 |00:00:00.27 |    1856 |
|*  2 |   TABLE ACCESS FULL| TEST |  1 |   8000 |  660K|00:00:00.18 |    1856 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
  2 - filter("COL2"=4000)


Notice the E-Rows estimate for TEST. The Optimizer is estimating 8000 rows when in reality there is 660000. The estimate is off by more than an order of magnitude (80x).

Normally the calculation for the cardinality would be (for a one table single equality predicate): number of rows in TEST * 1/NDV = 800000 * 1/4 = 200K but in this case 8K is the estimate. Because there are no column stats for TEST.COL2, the Optimizer is forced to make a guess and that guess is 8k/200k =0.04 (4%)

CASE 2: Now that we have seen  how poor the cardinality estimation was with a single equality predicate, let’s see what two equality predicates gives us for a cardinality estimate.

SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;

  COUNT(*)
----------
     30000

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID 216fcwk49zkn2, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;

Plan hash value: 1950795681

-------------------------------------------------------------------------------------
| Id  | Operation    | Name | Starts | E-Rows | A-Rows |  A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |   |  1 |     |    1 |00:00:00.03 |    1856 |
|   1 |  SORT AGGREGATE    |   |  1 |   1 |    1 |00:00:00.03 |    1856 |
|*  2 |   TABLE ACCESS FULL| TEST |  1 |  80 |  30000 |00:00:00.03 |    1856 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("COL2"=4000 AND "COL3"=3000))


Hopefully you can see where this is going. If you don’t, here is the all too common chain of events:

Non representative (or missing) statistics lead to :

---> Poor cardinality estimates which leads to
--->Poor access path selection which leads to
---> Poor join method selection which leads to
---> Poor join order selection which leads to
---> Poor SQL execution times

CASE 3 :  method_opt => 'for all columns size auto'   with a single equality predicate

 In this case we have statistics calculated not only for indexed columns but for all table columns :
Now let us check the cardinality estimates in case of single predicates in query .
Query will have spot on cardinality estimates,in this case see the details below :

  SQL> begin
    dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(ownname => user , tabname => 'TEST' , estimate_percent => 100 ,degree => 8, method_opt => 'for all columns size auto' , cascade => true);
  end;
  /

PL/SQL procedure successfully completed.

SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV     HISTOGRAM         NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- --------------------------------------------------
PRIMKEY         800000                   NONE                       1                2015-19-12 16:34:58
COL2                  4                            FREQUENCY          4                2015-19-12 16:34:58
COL3                  3                            FREQUENCY          3                2015-19-12 16:34:58


SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 ;

  COUNT(*)
----------
    660000

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
SQL_ID 288h7750wj3z1, child number 1
-----------------------------------------------------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000

Plan hash value: 1950795681
-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |  A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |  1        |          |    1 |00:00:00.14 |    1856 |
|   1 |  SORT AGGREGATE      |           |  1        |   1     |    1 |00:00:00.14 |    1856 |
|*  2 |   TABLE ACCESS FULL| TEST | 1        | 660K   | 660K |00:00:00.09 |    1856 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("COL2"=4000)

CASE 4 :  method_opt => 'for all columns size auto'   with two equality predicates

In this case we have statistics calculated not only for indexed columns but for all table columns :
Now let us check the cardinality estimates in case of multiple predicates in query .
Query will have spot on cardinality estimates,in this case see the details below :

  SQL> begin
    dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(ownname => user , tabname => 'TEST' , estimate_percent => 100 ,degree => 8, method_opt => 'for all columns size auto' , cascade => true);
  end;
  /

PL/SQL procedure successfully completed.

SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV     HISTOGRAM         NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- --------------------------------------------------
PRIMKEY         800000                   NONE                       1                2015-19-12 16:34:58
COL2                  4                            FREQUENCY          4                2015-19-12 16:34:58
COL3                  3                            FREQUENCY          3                2015-19-12 16:34:58


SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;

  COUNT(*)
----------
     30000

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
SQL_ID 216fcwk49zkn2, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000

Plan hash value: 1950795681

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Starts | E-Rows | A-Rows |  A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |  1 |              |    1        |00:00:00.03 |    1856 |
|   1 |  SORT AGGREGATE        |            |  1 |   1         |    1        |00:00:00.03 |    1856 |
|*  2 |   TABLE ACCESS FULL| TEST |  1 |  30000     |  30000 |00:00:00.03 |    1856 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("COL3"=3000 AND "COL2"=4000))

No comments:

Post a Comment