Create a table with 3 columns with randomly generated numbers . Below is the some of the attributes of the table columns :
SQL> desc TEST ;
Name Type
----------------------------
PRIMKEY NUMBER
COL2 NUMBER
COL3 NUMBER
begin
dbms_stats.gather_table_stats( ownname => 'SYS' ,tabname => 'test' , estimate_percent => 100 ,
cascade => true);
end;
/
Check Column statistics :
SQL > select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- ---------------------------------------------
PRIMKEY 800000 NONE 1 2015-19-12 16:12:22
COL2 4 FREQUENCY 4 2015-19-12 16:12:22
COL3 3 NONE 1 2015-19-12 16:12:22
Create a unique index on primkey :
create unique index PK_T1 on TEST(PRIMKEY);
Gather stat : method_opt => 'for all indexed columns'
If you are using FOR ALL INDEXED COLUMNS as part of your METHOD_OPT you probably should not be. "FOR ALL INDEXED COLUMNS" tells DBMS_STATS to only gather stats on columns that participate in an index . No stats are calculated for non indexed columns of the table .
SQL> begin
dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats( ownname => user ,tabname => 'TEST' ,estimate_percent => 100 , method_opt => 'for all indexed columns'
,cascade => true);
end;
/
PL/SQL procedure successfully completed.
SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- ---------------------------------------------------
PRIMKEY 800000 HEIGHT BALANCED 75 2015-19-12 16:13:36
Notice that in above cases only column PRIMKEY has stats on it. Columns COL2 and COL3 do not have any stats collected on them. Also note that when no SIZE clause is specified, it defaults to 75 buckets.
It is wasteful to create a histogram on a column that is never referenced by a SQL statement. It is the SQL workload that determines if a column needs histograms.
Gather stat : method_opt => 'for all indexed columns size auto'
Using MENTOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE AUTO' tells DBMS_STATS: “only gather stats on columns that participate in an index and based on data distribution and the workload of those indexed columns decide if a histogram should be created and how many buckets it should contain”.
The default value of "for all columns size auto" is the Oracle default and this option will analyze histograms on what the optimizer considers to be "important columns". The optimizer does this by examining your current workload when you run the dbms_stats command, interrogating all current SQL in the library cache to see which columns might benefit from a histogram.
SQL> begin
dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(
ownname => user ,
tabname => 'TEST' ,
estimate_percent => 100 ,
method_opt => 'for all indexed columns size auto' ,
cascade => true);
end;
/
PL/SQL procedure successfully completed.
SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- -----------------------------------------
PRIMKEY 800000 NONE 1 2015-19-12 16:14:39
CASE1 : method_opt => 'for all indexed columns size auto' with only single predicate in query
SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 ;
COUNT(*)
----------
660000
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID 288h7750wj3z1, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000
Plan hash value: 1950795681
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.27 | 1856 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.27 | 1856 |
|* 2 | TABLE ACCESS FULL| TEST | 1 | 8000 | 660K|00:00:00.18 | 1856 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL2"=4000)
Notice the E-Rows estimate for TEST. The Optimizer is estimating 8000 rows when in reality there is 660000. The estimate is off by more than an order of magnitude (80x).
Normally the calculation for the cardinality would be (for a one table single equality predicate): number of rows in TEST * 1/NDV = 800000 * 1/4 = 200K but in this case 8K is the estimate. Because there are no column stats for TEST.COL2, the Optimizer is forced to make a guess and that guess is 8k/200k =0.04 (4%)
CASE 2: Now that we have seen how poor the cardinality estimation was with a single equality predicate, let’s see what two equality predicates gives us for a cardinality estimate.
SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;
COUNT(*)
----------
30000
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID 216fcwk49zkn2, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;
Plan hash value: 1950795681
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.03 | 1856 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.03 | 1856 |
|* 2 | TABLE ACCESS FULL| TEST | 1 | 80 | 30000 |00:00:00.03 | 1856 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("COL2"=4000 AND "COL3"=3000))
Hopefully you can see where this is going. If you don’t, here is the all too common chain of events:
Non representative (or missing) statistics lead to :
---> Poor cardinality estimates which leads to
--->Poor access path selection which leads to
---> Poor join method selection which leads to
---> Poor join order selection which leads to
---> Poor SQL execution times
CASE 3 : method_opt => 'for all columns size auto' with a single equality predicate
In this case we have statistics calculated not only for indexed columns but for all table columns :
Now let us check the cardinality estimates in case of single predicates in query .
Query will have spot on cardinality estimates,in this case see the details below :
SQL> begin
dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(ownname => user , tabname => 'TEST' , estimate_percent => 100 ,degree => 8, method_opt => 'for all columns size auto' , cascade => true);
end;
/
PL/SQL procedure successfully completed.
SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- --------------------------------------------------
PRIMKEY 800000 NONE 1 2015-19-12 16:34:58
COL2 4 FREQUENCY 4 2015-19-12 16:34:58
COL3 3 FREQUENCY 3 2015-19-12 16:34:58
SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 ;
COUNT(*)
----------
660000
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
SQL_ID 288h7750wj3z1, child number 1
-----------------------------------------------------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000
Plan hash value: 1950795681
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.14 | 1856 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.14 | 1856 |
|* 2 | TABLE ACCESS FULL| TEST | 1 | 660K | 660K |00:00:00.09 | 1856 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL2"=4000)
CASE 4 : method_opt => 'for all columns size auto' with two equality predicates
In this case we have statistics calculated not only for indexed columns but for all table columns :
Now let us check the cardinality estimates in case of multiple predicates in query .
Query will have spot on cardinality estimates,in this case see the details below :
SQL> begin
dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(ownname => user , tabname => 'TEST' , estimate_percent => 100 ,degree => 8, method_opt => 'for all columns size auto' , cascade => true);
end;
/
PL/SQL procedure successfully completed.
SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- --------------------------------------------------
PRIMKEY 800000 NONE 1 2015-19-12 16:34:58
COL2 4 FREQUENCY 4 2015-19-12 16:34:58
COL3 3 FREQUENCY 3 2015-19-12 16:34:58
SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;
COUNT(*)
----------
30000
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
SQL_ID 216fcwk49zkn2, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000
Plan hash value: 1950795681
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.03 | 1856 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.03 | 1856 |
|* 2 | TABLE ACCESS FULL| TEST | 1 | 30000 | 30000 |00:00:00.03 | 1856 |
------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("COL3"=3000 AND "COL2"=4000))
SQL> desc TEST ;
Name Type
----------------------------
PRIMKEY NUMBER
COL2 NUMBER
COL3 NUMBER
- Total rows in table = 800000
- Column PRIMKEY is having 800000 distinct values
- Column COL2 has only 4 distinct values
- Column COL3 has only 3 distinct values
begin
dbms_stats.gather_table_stats( ownname => 'SYS' ,tabname => 'test' , estimate_percent => 100 ,
cascade => true);
end;
/
Check Column statistics :
SQL > select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- ---------------------------------------------
PRIMKEY 800000 NONE 1 2015-19-12 16:12:22
COL2 4 FREQUENCY 4 2015-19-12 16:12:22
COL3 3 NONE 1 2015-19-12 16:12:22
Create a unique index on primkey :
create unique index PK_T1 on TEST(PRIMKEY);
Gather stat : method_opt => 'for all indexed columns'
If you are using FOR ALL INDEXED COLUMNS as part of your METHOD_OPT you probably should not be. "FOR ALL INDEXED COLUMNS" tells DBMS_STATS to only gather stats on columns that participate in an index . No stats are calculated for non indexed columns of the table .
SQL> begin
dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats( ownname => user ,tabname => 'TEST' ,estimate_percent => 100 , method_opt => 'for all indexed columns'
,cascade => true);
end;
/
PL/SQL procedure successfully completed.
SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- ---------------------------------------------------
PRIMKEY 800000 HEIGHT BALANCED 75 2015-19-12 16:13:36
Notice that in above cases only column PRIMKEY has stats on it. Columns COL2 and COL3 do not have any stats collected on them. Also note that when no SIZE clause is specified, it defaults to 75 buckets.
It is wasteful to create a histogram on a column that is never referenced by a SQL statement. It is the SQL workload that determines if a column needs histograms.
Gather stat : method_opt => 'for all indexed columns size auto'
Using MENTOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE AUTO' tells DBMS_STATS: “only gather stats on columns that participate in an index and based on data distribution and the workload of those indexed columns decide if a histogram should be created and how many buckets it should contain”.
The default value of "for all columns size auto" is the Oracle default and this option will analyze histograms on what the optimizer considers to be "important columns". The optimizer does this by examining your current workload when you run the dbms_stats command, interrogating all current SQL in the library cache to see which columns might benefit from a histogram.
SQL> begin
dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(
ownname => user ,
tabname => 'TEST' ,
estimate_percent => 100 ,
method_opt => 'for all indexed columns size auto' ,
cascade => true);
end;
/
PL/SQL procedure successfully completed.
SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- -----------------------------------------
PRIMKEY 800000 NONE 1 2015-19-12 16:14:39
Impact of METHOD_OPT on cardinality estimates and access path
CASE1 : method_opt => 'for all indexed columns size auto' with only single predicate in query
SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 ;
COUNT(*)
----------
660000
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID 288h7750wj3z1, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000
Plan hash value: 1950795681
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.27 | 1856 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.27 | 1856 |
|* 2 | TABLE ACCESS FULL| TEST | 1 | 8000 | 660K|00:00:00.18 | 1856 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL2"=4000)
Notice the E-Rows estimate for TEST. The Optimizer is estimating 8000 rows when in reality there is 660000. The estimate is off by more than an order of magnitude (80x).
Normally the calculation for the cardinality would be (for a one table single equality predicate): number of rows in TEST * 1/NDV = 800000 * 1/4 = 200K but in this case 8K is the estimate. Because there are no column stats for TEST.COL2, the Optimizer is forced to make a guess and that guess is 8k/200k =0.04 (4%)
CASE 2: Now that we have seen how poor the cardinality estimation was with a single equality predicate, let’s see what two equality predicates gives us for a cardinality estimate.
SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;
COUNT(*)
----------
30000
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID 216fcwk49zkn2, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;
Plan hash value: 1950795681
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.03 | 1856 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.03 | 1856 |
|* 2 | TABLE ACCESS FULL| TEST | 1 | 80 | 30000 |00:00:00.03 | 1856 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("COL2"=4000 AND "COL3"=3000))
Hopefully you can see where this is going. If you don’t, here is the all too common chain of events:
Non representative (or missing) statistics lead to :
---> Poor cardinality estimates which leads to
--->Poor access path selection which leads to
---> Poor join method selection which leads to
---> Poor join order selection which leads to
---> Poor SQL execution times
CASE 3 : method_opt => 'for all columns size auto' with a single equality predicate
In this case we have statistics calculated not only for indexed columns but for all table columns :
Now let us check the cardinality estimates in case of single predicates in query .
Query will have spot on cardinality estimates,in this case see the details below :
SQL> begin
dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(ownname => user , tabname => 'TEST' , estimate_percent => 100 ,degree => 8, method_opt => 'for all columns size auto' , cascade => true);
end;
/
PL/SQL procedure successfully completed.
SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- --------------------------------------------------
PRIMKEY 800000 NONE 1 2015-19-12 16:34:58
COL2 4 FREQUENCY 4 2015-19-12 16:34:58
COL3 3 FREQUENCY 3 2015-19-12 16:34:58
SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 ;
COUNT(*)
----------
660000
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
SQL_ID 288h7750wj3z1, child number 1
-----------------------------------------------------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000
Plan hash value: 1950795681
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.14 | 1856 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.14 | 1856 |
|* 2 | TABLE ACCESS FULL| TEST | 1 | 660K | 660K |00:00:00.09 | 1856 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL2"=4000)
CASE 4 : method_opt => 'for all columns size auto' with two equality predicates
In this case we have statistics calculated not only for indexed columns but for all table columns :
Now let us check the cardinality estimates in case of multiple predicates in query .
Query will have spot on cardinality estimates,in this case see the details below :
SQL> begin
dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(ownname => user , tabname => 'TEST' , estimate_percent => 100 ,degree => 8, method_opt => 'for all columns size auto' , cascade => true);
end;
/
PL/SQL procedure successfully completed.
SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics where table_name='TEST'
/
COL_NAME NDV HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------------------------ ------------ --------------- ----------- --------------------------------------------------
PRIMKEY 800000 NONE 1 2015-19-12 16:34:58
COL2 4 FREQUENCY 4 2015-19-12 16:34:58
COL3 3 FREQUENCY 3 2015-19-12 16:34:58
SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;
COUNT(*)
----------
30000
SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
SQL_ID 216fcwk49zkn2, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000
Plan hash value: 1950795681
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.03 | 1856 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.03 | 1856 |
|* 2 | TABLE ACCESS FULL| TEST | 1 | 30000 | 30000 |00:00:00.03 | 1856 |
------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("COL3"=3000 AND "COL2"=4000))