Total Pageviews

Saturday, 5 March 2016

Predicate Selectivity


Selectivity
~~~~~~~~~~~
Selectivity is a measure of the proportion of a row source retrieved by  application of a particular predicate or combination of predicates.

Within the Oracle kernel it is expressed as a value between 0 and 1.  The closer the value is to 0 the more selective the predicate is.
Selectivity is only used by the CBO.

Basic Selectivity formula:
~~~~~~~~~~~~~~~~~~~~~~~~~~

       Number of records satisfying a condition
Selectivity = -----------------------------------------
             Total Number of records

In the optimizer, selectivity is used to compare the usefulness of various  predicates in combination with base object costs.

Knowing the proportion of the total data set that a column predicate defines  is very helpful in defining actual access costs.

By default, column selectivity is based on the high and low values and the  number of values in the column with an assumption of even distribution of  data between these two points.

Histogram data can give better selectivity estimates for unevenly distributed  column data. There is more discussion regarding Histograms later.

Selectivity is also used to define the cardinality of a particular row source once predicates have been applied.

Cardinality is the expected number of rows  that will be retrieved from a row source. Cardinality is useful in determining  nested loop join and sort costs. Application of selectivity to the original  cardinality of the row source will produce the expected (computed) cardinality
for the row source.


Glossary of Terms:
~~~~~~~~~~~~~~~~~~

NDV Number of Distinct Values
Cardinality Number of rows
Selectivity Proportion of a dataset returned by a particular predicate(or group of predicates)

In the following illustrations there are 2 tables (T1 & T2) with columns (c1) and (c2) respectively.

Selectivities:
~~~~~~~~~~~~~~
Without histograms
~~~~~~~~~~~~~~~~~~
c1 = '4076'              1/NDV
c1 > '4076'              1 - (High - Value / High - Low)
c1 >= '4076'             1 - (High - Value / High - Low) + 1/NDV
c1 like '4076'           1/NDV

Join selectivity
~~~~~~~~~~~~~~~~

The selectivity of a join is defined as the selectivity of the most selective
join column adjusted by the proportion of not null values in each join column.


 Sel = 1/max[NDV(t1.c1),NDV(t2.c2)] *
( (Card t1 - # t1.c1 NULLs) / Card t1) *
( (Card t2 - # t2.c2 NULLs) / Card t2)

Bind Variable selectivity
~~~~~~~~~~~~~~~~~~~~~~~~~

Bind variables present a special case because the optimizer has no idea what
the bind variable value is prior to query optimization. This does not present
a problem with equality predicates since a uniform distribution of data is
assumed and the selectivity is taken as 1/NDV for the column. However for
range predicates it presents a major issue because the optimizer does not
know where the range starts or stops. Because of this the optimizer has to
make some assumptions as follows:

c1 =    :bind1           1/NDV
c1 >    :bind1           Default of 5%
c1 >=   :bind1           Default of 5%
c1 like :bind1           Default of 25%

For more information on bind variables see Note:70075.1

Selectivity With Histograms
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Histograms provide additional information about column selectivity for  columns whose distribution is non uniform. Histograms store information about column data value ranges. Each range is stored in a single row and is often  called a 'bucket'. There are 2 different methods for storing histograms in
Oracle. If there are a small number of distinct column values (i.e. less than the number of buckets), the column value  and the count of that value is stored. If not then a series of endpoints are stored to enable more accurate selectivity to be determined.

The first method allows the accurate figures to be used. However with  inexact histograms the terms popular and non-popular value are introduced and are used to help determine selectivity. A popular value is a value that spans multiple endpoints whereas a non-popular value does not.
See Note:72539.1 for more information on histograms.

Exact histograms
~~~~~~~~~~~~~~~~
c1 = '4706'         count of value '4076' / Total Number of Rows
c1 > value          count of values > '4076' / Total Number of Rows

In Exact Histograms
~~~~~~~~~~~~~~~~~~
col = pop value         # popular buckets / # buckets
col = non pop           (Density)
col > value             # buckets > value / # buckets


Rules for combining selectivity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let P1 and P2 be 2 distinct predicates of query Q

 P1 AND P2
       S(P1&P2) = S(P1) * S(P2)
 P1 OR P2
       S(P1|P2) = S(P1) + S(P2) -[S(P1) * S(P2)]

Index Selectivity for concatenated indexes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Starting with 10.2, when a concatenated index, with all its columns having equality predicates, is used as an access path, the optimizer uses 1/NDK as the selectivity (where NDK is the number of distinct keys in the index).

On 9.2.0.7 and 9.2.0.8 this may be enabled with event 38060 level 1.
On 10.1.0.4 and 10.1.0.5 this may be enabled with event 38059 level 1.
On 10.2 adjustments will be made to the selectivity to account for nulls
in the index keys. This also occurs on 10.1.0.5 (with event 38059.)


Join cardinality
~~~~~~~~~~~~~~~~

 Card(Pj) = Card(T1) * Card(T2) * Sel(Pj)

+++ Reference : Oracle support Metalink +++

Friday, 4 March 2016

Why Text Search Works for Old Indexed Data but Not for all New Added Data

APPLIES TO:

Oracle Text - Version 10.1.0.2 to 12.1.0.2 [Release 10.1 to 12.1]
Information in this document applies to any platform.
GOAL

How to implement Text index maintenance strategy for non-transactional context indexes?

SOLUTION

When documents in the base table are inserted, updated, or deleted, their ROWIDs are held in a DML queue until you synchronize the index. You can view this queue with the CTX_USER_PENDING view.

For example, to view pending DML on all your indexes, issue the following statement as index owner:

SELECT pnd_index_name, pnd_partition_name, pnd_rowid, to_char(pnd_timestamp, 'dd-mon-yyyy hh24:mi:ss') timestamp FROM ctx_user_pending;

1. Synchronizing the index involves processing all pending updates, inserts, and deletes to the base table. You can do this in PL/SQL with the CTX_DDL.SYNC_INDEX procedure.

The following example synchronizes the index with 50 megabytes of memory:

begin
 ctx_ddl.sync_index('<index_name>', '50M');
end;
/

2. Or use the automatic sync feature and specify that the index be synchronized at regular intervals. To do so, create the index with the SYNC (EVERY "interval-string") parameter.

To use job scheduling, you must log in as a user who has DBA privileges and then grant CREATE JOB privileges.

CONNECT system/passwd@tns_alias
GRANT CREATE JOB TO <index_owner>;

CONNECT <index_owner>/passwd@tns_alias
alter index <index_name> rebuild parameters ('replace metadata sync (every "SYSDATE+5/1440")');


Or create a job through Oracle Scheduler (the Scheduler) that synchronizes the text index every 5 minutes.
Example:

BEGIN
 DBMS_SCHEDULER.CREATE_JOB(
     job_name => 'sync_myindex1',
     job_type => 'PLSQL_BLOCK',
     job_action => 'ctx_output.start_log(''sync_myindex1.log''); ctx_ddl.sync_index(''MYINDEX1'', ''50M''); ctx_output.end_log();',
     start_date => SYSDATE,
     repeat_interval => 'FREQ=MINUTELY; INTERVAL=5;
     comments => 'syncing index MYINDEX1');
END;
/

SELECT job_name, enabled FROM user_scheduler_jobs;

JOB_NAME                      ENABLED
----------------------------- -------
SYNC_MYINDEX1                 FALSE


BEGIN
 DBMS_SCHEDULER.ENABLE ('sync_myindex1');
END;
/

SELECT job_name, enabled FROM user_scheduler_jobs;

JOB_NAME                      ENABLED
----------------------------- -------
SYNC_MYINDEX1                 TRUE

SELECT JOB_NAME, STATUS FROM USER_SCHEDULER_JOB_RUN_DETAILS WHERE JOB_NAME = 'SYNC_MYINDEX1';

JOB_NAME                                STATUS
--------------------------------------- ------------------------------
SYNC_MYINDEX1                           SUCCEEDED


Frequent index synchronization can fragment your CONTEXT index. Index fragmentation can adversely affect query response time. You can optimize your CONTEXT index to reduce fragmentation and index size and so improve query performance.

With the CTX_REPORT.INDEX_STATS procedure, you can create a statistical report on your index. The report includes information on optimal row fragmentation, list of most fragmented tokens, and the amount of garbage data in your index. Although this report might take long to run for large indexes, it can help you decide whether to optimize your index.

To optimize an index, Oracle recommends that you create a job through the Scheduler and run CTX_DDL.OPTIMIZE_INDEX.

Example:

BEGIN
DBMS_SCHEDULER.CREATE_JOB(
   job_name          =>  'opt_rebuild_myindex1',
   job_type          =>  'PLSQL_BLOCK',
   job_action        =>  'ctx_output.start_log(''opt_myindex1.log''); ctx_ddl.optimize_index(''MYINDEX1'',''REBUILD''); ctx_output.end_log();',
   start_date        =>  SYSDATE,
   repeat_interval   =>  'FREQ=WEEKLY; BYDAY=SUN;', /* every sunday */
   comments             => 'optimize index MYINDEX1 in rebuild mode');
END;
/

SELECT job_name, enabled FROM user_scheduler_jobs;

JOB_NAME                       ENABLED
------------------------------ -------
OPT_REBUILD_MYINDEX1           FALSE


BEGIN
 DBMS_SCHEDULER.ENABLE ('opt_rebuild_myindex1');
END;
/

SELECT job_name, enabled FROM user_scheduler_jobs;

JOB_NAME                       ENABLED
------------------------------ -------
OPT_REBUILD_MYINDEX1           TRUE

SELECT JOB_NAME, STATUS FROM USER_SCHEDULER_JOB_RUN_DETAILS WHERE JOB_NAME = 'OPT_REBUILD_MYINDEX1';

JOB_NAME                       STATUS
------------------------------ --------------------
OPT_REBUILD_MYINDEX1           SUCCEEDED

Wednesday, 2 March 2016

Flashback Interview Questions

Q: Is there a separate background process for writing flashback logs?

A: Yes. RVWR (Recovery Writer, a.k.a Flashback Writer) was introduced in Oracle 10g to write flashback data from the Flashback Buffer in the SGA to the flashback database logs on disk.

Q: Do I need to shutdown and mount the database to turn flashback on?

A: ALTER DATABASE FLASHBACK ON is an online operation in 11g Release 2. In 10g Release 2 the database must be mounted, but not open.

Once flashback is turned on Oracle starts to save the before-images of the database blocks before they are changed. This guarantees that the database can be flashbacked to the exact point-in-time when the flashback mode was turned on.

Q: What happens if RVWR cannot write to disk?

A: It depends on the context where the write error occurs:

If there’s a Guaranteed Restore Point, the database crashes to ensure the restore point guarantee is not voided.

If there isn’t a Guaranteed Restore Point and it’s a primary database, the Flashback Mode will be automatically turned off for the database, which will continued to operate normally.

If there isn’t a Guaranteed Restore Point and it’s a standby database, the database will hang until the cause of the write failure is fixed.

Q: Is it possible to specify the size of the Flashback Buffer in the SGA?

A: Yes, but indirectly. The size of the Flashback Buffer is set to 2 * LOG_BUFFER.

For performance reasons, it’s recommended to set LOG_BUFFER to at least 8MB for

databases running in Flashback Mode.

UPDATE: For large 11.1.0.7+ databases with more than a 4GB SGA, you may consider setting LOG_BUFFER to values in the range of 32-64 MB.

Q: Can RMAN be used to backup flashback logs?

A: No. Flashback Logs are not backed up. Even if the command BACKUP RECOVERY AREA is used to backup the contents of the FRA to tape only the following file types are backed up: full and incremental backup sets, control file autobackups, datafile copies, and archived redo logs.

Flashback Logs are considered to be transient files and cannot be backed up by RMAN. They are not needed for media recovery.

Q: When are the flashback logs deleted?

A: Flashback logs are managed by Oracle only. Oracle will try to keep as much Flashback logs as needed to satisfy the DB_FLASHBACK_RETENTION_TARGET parameter. However, if there’s space pressure in the Flash Recovery Area (FRA), flashback logs may be deleted to make room for other things, like backups and archived logs, for example.

If the fast recovery area has enough space, then a flashback log is created whenever necessary to satisfy the flashback retention target.

If a flashback log is old enough that it is no longer needed to satisfy the flashback retention target, then a flashback log is reused.

If the database must create a new flashback log and the fast recovery area is full or there is no disk space, then the oldest flashback log is reused instead.

If the fast recovery area is full, then an archived redo log that is reclaimable according to the FRA rules may be automatically deleted by the fast recovery area to make space for other files. In this case, any flashback logs that would require the use of that redo log file for the use of FLASHBACK DATABASE are also deleted.

No file in the fast recovery area is eligible for deletion if it is required to satisfy a guaranteed restore point. Thus, retention of flashback logs and other files required to satisfy the guaranteed restore point, in addition to files required to satisfy the backup retention policy, can cause the fast recovery area to fill completely.

Other than that flashback logs are deleted according to the below:

When flashback mode is turned off all flashback logs are deleted ONLY if there’s no guaranteed restore points. If there’s at least one guaranteed restore point, no flashback logs are deleted.

When the oldest guaranteed restore point is deleted and flashback mode is off, all flashback logs older than the second oldest guaranteed restore point are deleted. If flashback mode is on for the database OR the guaranteed restore point is not the oldest no flashback logs are deleted.

Q: How to list restore points in RMAN?

A: In RMAN you can use the LIST RESTORE POINT [ALL|restore_point_name] command. If you use a recovery catalog you can use the view RC_RESTORE_POINT in the recovery catalog repository, or the command the V$RESTORE_POINT in the target database..

Q: After flashback’ing to a point-in-time before a RESETLOGSoperation is it possible to flash forward to the incarnation after theRESETLOGS?

A: Yes, it’s perfectly possible.

Q: Can you see the progress of a FLASHBACK DATABASE operation?

A: Yes, you can. During a FLASHBACK DATABASE operation you can query V$SESSION_LONGOPS from another session to see the progress of the flashback.

The FLASHBACK DATABASE operation has two distinct phases: the actual flashback and the media recovery that happens afterwards to bring the database to a consistent state.

While the actual flashback is running you’ll see the following message in V$SESSION_LONGOPS, on Oracle 11gR2:

Flashback Database: Flashback Data Applied : 238 out of 282 Megabytes done

During the media recovery, the following messages will be seen:

Media Recovery: Redo Applied : 263 out of 0 Megabytes done

Media Recovery: Average Apply Rate : 1164 out of 0 KB/sec done

Media Recovery: Last Applied Redo : 626540 out of 0 SCN+Time done

Media Recovery: Elapsed Time : 232 out of 0 Seconds done

Media Recovery: Active Time : 116 out of 0 Seconds done

Media Recovery: Active Apply Rate : 1859 out of 0 KB/sec done

Media Recovery: Maximum Apply Rate : 1859 out of 0 KB/sec done

Media Recovery: Log Files : 15 out of 0 Files done

Media Recovery: Apply Time per Log : 7 out of 0 Seconds done

Q: How should I set the database to improve Flashback performance?

A: Oracle’s recommendations are:

Use a fast file system for your flash recovery area, preferably without operating system file caching. It is recommended to use a file system that avoids operating system file caching, such as ASM.

Configure enough disk spindles for the file system that will hold the flash recovery area. For large production databases, multiple disk spindles may be needed to support the required disk throughput for the database to write the flashback logs effectively.

If the storage system used to hold the flash recovery area does not have non-volatile RAM, try to configure the file system on top of striped storage volumes, with a relatively small stripe size such as 128K. This will allow each write to the flashback logs to be spread across multiple spindles, improving performance

For large, production databases, set the init.ora parameter LOG_BUFFER to be at least 8MB. This makes sure the database allocates maximum memory (typically 16MB) for writing flashback database logs.


Reference : https://www.pythian.com/blog/questions-you-always-wanted-to-ask-about-flashback-database/


Thursday, 21 January 2016

RDBMS vs MongDB



RDBMSMongoDB
DatabaseDatabase
TableCollection
Tuple/RowDocument
columnField
Table JoinEmbedded Documents
Primary KeyPrimary Key ( _id )


Mysqld/Oraclemongod
mysql/sqlplusmongo

Sunday, 10 January 2016

Checkpoint information


 
SQL> select NAME , CHECKPOINT_CHANGE# , CURRENT_SCN from v$database ;

CHECKPOINT_CHANGE#  --->  Last SCN checkpointed
CURRENT_SCN         --->  Current SCN of database

NAME  CHECKPOINT_CHANGE# CURRENT_SCN
--------- ------------------ -----------
SUMIT     1902688 1903608

SQL> select first_time,first_change#,sequence# , status from v$log ;

FIRST_TIME    --> Time of the first SCN in the log
first_change# --> Lowest system change number (SCN) in the log

FIRST_TIME    FIRST_CHANGE#  SEQUENCE# STATUS
------------------- ------------- ---------- ----------------
01-2016-10 07:18:16  1902580  13 ACTIVE
01-2016-10 07:25:24  1903604  14 CURRENT
01-2016-10 06:02:51  1898715  12 INACTIVE

SQL> select CHECKPOINT_CHANGE# , CHECKPOINT_TIME  from V$DATAFILE ;

CHECKPOINT_CHANGE#  -----> SCN at last checkpoint
CHECKPOINT_TIME     -----> Timestamp of the checkpoint#

CHECKPOINT_CHANGE# CHECKPOINT_TIME
------------------ -------------------
  1902688 01-2016-10 07:21:02
  1902688 01-2016-10 07:21:02
  1902688 01-2016-10 07:21:02
  1902688 01-2016-10 07:21:02
  1902688 01-2016-10 07:21:02
  1902688 01-2016-10 07:21:02

6 rows selected.

SQL> select CHECKPOINT_CHANGE# ,CHECKPOINT_TIME,CHECKPOINT_COUNT from V$DATAFILE_HEADER ;

CHECKPOINT_CHANGE# -----> Datafile checkpoint change#
CHECKPOINT_TIME -----> Datafile checkpoint timestamp

CHECKPOINT_CHANGE# CHECKPOINT_TIME     CHECKPOINT_COUNT
------------------ ------------------- ---------------------------- -------------------------- --------------
  1902688 01-2016-10 07:21:02    190
  1902688 01-2016-10 07:21:02     15
  1902688 01-2016-10 07:21:02    190
  1902688 01-2016-10 07:21:02     30
  1902688 01-2016-10 07:21:02     27
  1902688 01-2016-10 07:21:02    188

6 rows selected.

Sunday, 20 December 2015

METHOD_OPT : 'FOR ALL INDEXED COLUMNS' Vs 'FOR ALL COLUMNS SIZE AUTO'

Create a table with 3 columns with randomly generated numbers . Below is the some of the attributes of the table columns :

SQL> desc TEST ;
 Name                Type
  ----------------------------
 PRIMKEY         NUMBER
 COL2                 NUMBER
 COL3                 NUMBER


  • Total rows in table = 800000
  • Column PRIMKEY is having 800000 distinct values
  • Column COL2 has only 4 distinct values
  • Column COL3 has only 3 distinct values


begin
      dbms_stats.gather_table_stats( ownname => 'SYS' ,tabname => 'test' ,  estimate_percent => 100 ,
       cascade => true);
   end;
   /

Check Column statistics  :

SQL > select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV        HISTOGRAM                  NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- ---------------------------------------------
PRIMKEY              800000   NONE                                    1                         2015-19-12 16:12:22
COL2                         4           FREQUENCY                      4                         2015-19-12 16:12:22
COL3                         3           NONE                                    1                        2015-19-12 16:12:22


Create a unique index on primkey :

 create unique index PK_T1 on TEST(PRIMKEY);

Gather stat : method_opt => 'for all indexed columns' 

If you are using FOR ALL INDEXED COLUMNS as part of your METHOD_OPT you probably should not be.   "FOR ALL INDEXED COLUMNS"  tells DBMS_STATS to only gather stats on columns that participate in an index . No stats are calculated for non indexed columns of the table .

SQL> begin
    dbms_stats.delete_table_stats(user,'TEST');
   dbms_stats.gather_table_stats( ownname => user ,tabname => 'TEST' ,estimate_percent => 100 ,        method_opt => 'for all indexed columns'
  ,cascade => true);
  end;
  /

PL/SQL procedure successfully completed.


SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV  HISTOGRAM                  NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- ---------------------------------------------------
PRIMKEY         800000   HEIGHT BALANCED      75                    2015-19-12 16:13:36

Notice that in above cases only column PRIMKEY has stats on it. Columns COL2  and COL3 do not have any stats collected on them. Also note that when no SIZE clause is specified, it defaults to 75 buckets.

It is wasteful to create a histogram on a column that is never referenced by a SQL statement.  It is the SQL workload that determines if a column needs histograms.

Gather stat : method_opt => 'for all indexed columns size auto' 

Using MENTOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE AUTO' tells DBMS_STATS: “only gather stats on columns that participate in an index and based on data distribution and the workload of those indexed columns decide if a histogram should be created and how many buckets it should contain”.

The default value of  "for all columns size auto" is the Oracle default and this option will analyze histograms on what the optimizer considers to be "important columns".  The optimizer does this by examining your current workload when you run the dbms_stats command, interrogating all current SQL in the library cache to see which columns might benefit from a histogram.

SQL> begin
   dbms_stats.delete_table_stats(user,'TEST');

   dbms_stats.gather_table_stats(
     ownname => user ,
     tabname => 'TEST' ,
     estimate_percent => 100 ,
     method_opt => 'for all indexed columns size auto' ,
     cascade => true);
 end;
 /

PL/SQL procedure successfully completed.


SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV     HISTOGRAM    NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- -----------------------------------------
PRIMKEY            800000        NONE                        1                     2015-19-12 16:14:39
 

Impact of METHOD_OPT on cardinality  estimates and access path 


CASE1 : method_opt => 'for all indexed columns size auto'  with only single predicate in query

SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 ;

  COUNT(*)
----------
    660000

SQL>  select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID 288h7750wj3z1, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000
Plan hash value: 1950795681
-------------------------------------------------------------------------------------
| Id  | Operation    | Name | Starts | E-Rows | A-Rows |  A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |   |  1 |     |    1 |00:00:00.27 |    1856 |
|   1 |  SORT AGGREGATE    |   |  1 |   1 |    1 |00:00:00.27 |    1856 |
|*  2 |   TABLE ACCESS FULL| TEST |  1 |   8000 |  660K|00:00:00.18 |    1856 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
  2 - filter("COL2"=4000)


Notice the E-Rows estimate for TEST. The Optimizer is estimating 8000 rows when in reality there is 660000. The estimate is off by more than an order of magnitude (80x).

Normally the calculation for the cardinality would be (for a one table single equality predicate): number of rows in TEST * 1/NDV = 800000 * 1/4 = 200K but in this case 8K is the estimate. Because there are no column stats for TEST.COL2, the Optimizer is forced to make a guess and that guess is 8k/200k =0.04 (4%)

CASE 2: Now that we have seen  how poor the cardinality estimation was with a single equality predicate, let’s see what two equality predicates gives us for a cardinality estimate.

SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;

  COUNT(*)
----------
     30000

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID 216fcwk49zkn2, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;

Plan hash value: 1950795681

-------------------------------------------------------------------------------------
| Id  | Operation    | Name | Starts | E-Rows | A-Rows |  A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |   |  1 |     |    1 |00:00:00.03 |    1856 |
|   1 |  SORT AGGREGATE    |   |  1 |   1 |    1 |00:00:00.03 |    1856 |
|*  2 |   TABLE ACCESS FULL| TEST |  1 |  80 |  30000 |00:00:00.03 |    1856 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("COL2"=4000 AND "COL3"=3000))


Hopefully you can see where this is going. If you don’t, here is the all too common chain of events:

Non representative (or missing) statistics lead to :

---> Poor cardinality estimates which leads to
--->Poor access path selection which leads to
---> Poor join method selection which leads to
---> Poor join order selection which leads to
---> Poor SQL execution times

CASE 3 :  method_opt => 'for all columns size auto'   with a single equality predicate

 In this case we have statistics calculated not only for indexed columns but for all table columns :
Now let us check the cardinality estimates in case of single predicates in query .
Query will have spot on cardinality estimates,in this case see the details below :

  SQL> begin
    dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(ownname => user , tabname => 'TEST' , estimate_percent => 100 ,degree => 8, method_opt => 'for all columns size auto' , cascade => true);
  end;
  /

PL/SQL procedure successfully completed.

SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV     HISTOGRAM         NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- --------------------------------------------------
PRIMKEY         800000                   NONE                       1                2015-19-12 16:34:58
COL2                  4                            FREQUENCY          4                2015-19-12 16:34:58
COL3                  3                            FREQUENCY          3                2015-19-12 16:34:58


SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 ;

  COUNT(*)
----------
    660000

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
SQL_ID 288h7750wj3z1, child number 1
-----------------------------------------------------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000

Plan hash value: 1950795681
-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |  A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |  1        |          |    1 |00:00:00.14 |    1856 |
|   1 |  SORT AGGREGATE      |           |  1        |   1     |    1 |00:00:00.14 |    1856 |
|*  2 |   TABLE ACCESS FULL| TEST | 1        | 660K   | 660K |00:00:00.09 |    1856 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("COL2"=4000)

CASE 4 :  method_opt => 'for all columns size auto'   with two equality predicates

In this case we have statistics calculated not only for indexed columns but for all table columns :
Now let us check the cardinality estimates in case of multiple predicates in query .
Query will have spot on cardinality estimates,in this case see the details below :

  SQL> begin
    dbms_stats.delete_table_stats(user,'TEST');
dbms_stats.gather_table_stats(ownname => user , tabname => 'TEST' , estimate_percent => 100 ,degree => 8, method_opt => 'for all columns size auto' , cascade => true);
  end;
  /

PL/SQL procedure successfully completed.

SQL>select COLUMN_NAME COL_NAME, NUM_DISTINCT NDV, HISTOGRAM, NUM_BUCKETS, to_char(LAST_ANALYZED,'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics where table_name='TEST'
    /

COL_NAME        NDV     HISTOGRAM         NUM_BUCKETS     LAST_ANALYZED
------------------------------ ------------ --------------- ----------- --------------------------------------------------
PRIMKEY         800000                   NONE                       1                2015-19-12 16:34:58
COL2                  4                            FREQUENCY          4                2015-19-12 16:34:58
COL3                  3                            FREQUENCY          3                2015-19-12 16:34:58


SQL> select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000 ;

  COUNT(*)
----------
     30000

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
SQL_ID 216fcwk49zkn2, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test where COL2=4000 and col3=3000

Plan hash value: 1950795681

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Starts | E-Rows | A-Rows |  A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |  1 |              |    1        |00:00:00.03 |    1856 |
|   1 |  SORT AGGREGATE        |            |  1 |   1         |    1        |00:00:00.03 |    1856 |
|*  2 |   TABLE ACCESS FULL| TEST |  1 |  30000     |  30000 |00:00:00.03 |    1856 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("COL3"=3000 AND "COL2"=4000))

Thursday, 17 December 2015

Oracle Clustering factor

Clustering factor statistic is used by the optimizer during query optimization to determine the relative efficiency of an index. In short, the index clustering factor is a measure of how many I/Os the database would perform if it were to read every row in that table via the index in index order. 

If the rows of a table on disk are sorted in about the same order as the index keys, the database will perform a minimum number of I/Os on the table to read the entire table via the index. That is because the next row needed from an index key would likely be the next row in the table block as well. The query would not be skipping all over the table to find row after row—they are naturally next to each other on the block.

Conversely, if the rows in the table are not in the same order on disk as the index keys—if the data is scattered—the query will tend to perform the maximum number of I/Os on the table, as many as one I/O for every row in the table. That is because as the database scans through the index, the next row needed will probably not be on the same block as the last row. The database will have to discard that block and get another block from the buffer cache. The query will end up reading every block from the buffer as many times as it has rows on it.

So if a table and an index key are in about the same order, the clustering factor will be near the number of blocks in the table and the index will be useful for very large index range scans and for retrieving numerous rows from the table. 

On the other hand, if the data is randomly scattered, the clustering factor will be near the number of rows in the table, and given that the number of rows in a table is usually at least an order of magnitude more than the number of blocks, the index will be less efficient for returning numerous rows.

Example of clustering factor and query cost :
================================

---> Create a table with data sorted on column object_name :

SQL> create table GoodCF as select x.* from ( select * from all_objects order by object_name)x ;

Table created.

---> Create a table with disorganized/scattered data . Used dbms_random to sort it randomly :

SQL> create table Scattered as select x.* from ( select * from all_objects order by dbms_random.random)x ;

Table created.

---> Created index on sorted column :

SQL> create index GoodCF_IDX on GoodCF(object_name) ;

Index created.

---> Created index on scattered data column :

SQL> create index Scattered_IDX on Scattered(object_name) ;

Index created.

---> Gather stat on table GOODCF :

SQL> begin dbms_stats.gather_table_stats ( 'SYS', 'GOODCF',estimate_percent => 100,method_opt=>'for all indexed columns size 254'); end ;
  /
PL/SQL procedure successfully completed.

----> Gather stat on table scattered :

SQL> begin dbms_stats.gather_table_stats ( 'SYS', 'SCATTERED',estimate_percent => 100,method_opt=>'for all indexed columns size 254'); end ;
    /
PL/SQL procedure successfully completed.

----> Let's check the number of block and rows for both the tables . Both tables are having same data and same number of BLOCKS :

SQL> select table_name, blocks, num_rows from dba_tables where table_name in ('GOODCF','SCATTERED') order by 1;

TABLE_NAME   BLOCKS   NUM_ROWS
------------------------------ ---------- ----------
GOODCF     1020      71713
SCATTERED     1020      71714

----> Let's check the CLUSTERING_FACTOR for both the indexes . I see a large difference between the two. The GOODCF_IDX index has a clustering factor very near the number of blocks in the table, whereas the SCATTERED_IDX index has a clustering factor near the number of rows in the table. 

Again, this clustering factor metric is a measure of how many I/Os the database will perform against the table in order to read every row via the index.We can verify this with the below example :

SQL> select table_name, index_name, clustering_factor from dba_indexes where table_name in ('GOODCF','SCATTERED') order by 1;

TABLE_NAME       INDEX_NAME         CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------------------------------------- -----------------
GOODCF       GOODCF_IDX      1020
SCATTERED               SCATTERED_IDX       71640

SQL> show parameter user_dump_dest

NAME             TYPE VALUE
------------------------------------ ----------- ------------------------------
user_dump_dest   string /data01/app/oracle/diag/rdbms/ggtrg/ggtrg/trace

SQL> ALTER SESSION SET EVENTS '10046 trace name context forever,level 12';

Session altered.

SQL> ALTER SESSION SET tracefile_identifier='MYTRACE_GoodCF' ;

Session altered.

SQL> select /*+ index( GOODCF GOODCF_IDX) */ count(subobject_name) from GOODCF;

COUNT(SUBOBJECT_NAME)
---------------------
 457

SQL> ALTER SESSION SET EVENTS '10046 trace name context off';

Session altered.

========================================================================
I/O performed against the index   = 357 I/Os 
Total I/Os performed by the query = 1377 I/Os
Note : 1377 - 357 = 1020  ( Equal to the Clusterfing factor of the index GOODCF_IDX ) 
========================================================================

PARSING IN CURSOR #1 len=73 dep=0 uid=0 oct=3 lid=0 tim=1450372523290892 hv=966910787 ad='a266a890' sqlid='9d45484wu3su3'
select /*+ index( GOODCF GOODCF_IDX) */ count(subobject_name) from GOODCF
END OF STMT
PARSE #1:c=0,e=69,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2836254517,tim=1450372523290891
EXEC #1:c=0,e=39,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2836254517,tim=1450372523290955
WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1450372523291015
*** 2015-12-17 12:15:23.334
FETCH #1:c=38994,e=43776,p=0,cr=1377,cu=0,mis=0,r=1,dep=0,og=1,plh=2836254517,tim=1450372523334808
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=1377 pr=0 pw=0 time=0 us)'
STAT #1 id=2 cnt=71713 pid=1 pos=1 obj=74669 op='TABLE ACCESS BY INDEX ROWID GOODCF (cr=1377 pr=0 pw=0 time=37284 us cost=1378 size=1219121 card=71713)'
STAT #1 id=3 cnt=71713 pid=2 pos=1 obj=74671 op='INDEX FULL SCAN GOODCF_IDX (cr=357 pr=0 pw=0 time=11189 us cost=358 size=0 card=71713)'
WAIT #1: nam='SQL*Net message from client' ela= 193 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1450372523335123
FETCH #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=2836254517,tim=1450372523335144
WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1450372523335155
*** 2015-12-17 12:15:33.973
WAIT #1: nam='SQL*Net message from client' ela= 10637912 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1450372533973100
CLOSE #1:c=1000,e=763,dep=0,type=0,tim=1450372533973915
========================================================================

SQL> show parameter user_dump_dest

NAME             TYPE VALUE
------------------------------------ ----------- ------------------------------
user_dump_dest   string /data01/app/oracle/diag/rdbms/ggtrg/ggtrg/trace


SQL> ALTER SESSION SET EVENTS '10046 trace name context forever,level 12';

Session altered.

SQL> ALTER SESSION SET tracefile_identifier='MYTRACE_SCATTERED' ;

Session altered.

SQL> select /*+ index( GOODCF GOODCF_IDX) */ count(subobject_name) from GOODCF;

COUNT(SUBOBJECT_NAME)
---------------------
 457

SQL> ALTER SESSION SET EVENTS '10046 trace name context off';

Session altered.

========================================================================
I/O performed against the index   = 357 I/Os 
Total I/Os performed by the query = 71997 I/Os
Note : 71997 - 357 = 71640  ( Equal to the Clusterfing factor of the index SCATTERED_IDX ) 
========================================================================

PARSING IN CURSOR #1 len=81 dep=0 uid=0 oct=3 lid=0 tim=1450373201636876 hv=15917473 ad='a24955d0' sqlid='8fzruxc0g5sd1'
select /*+ index(SCATTERED SCATTERED_IDX) */ count(subobject_name) from SCATTERED
END OF STMT
FETCH #1:c=117983,e=124417,p=1377,cr=71997,cu=0,mis=0,r=1,dep=0,og=1,plh=300164232,tim=1450373201761488
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=71997 pr=1377 pw=0 time=0 us)'
STAT #1 id=2 cnt=71714 pid=1 pos=1 obj=74670 op='TABLE ACCESS BY INDEX ROWID SCATTERED (cr=71997 pr=1377 pw=0 time=116666 us cost=72014 size=1219138 card=71714)'
STAT #1 id=3 cnt=71714 pid=2 pos=1 obj=74672 op='INDEX FULL SCAN SCATTERED_IDX (cr=357 pr=357 pw=0 time=19344 us cost=358 size=0 card=71714)'
WAIT #1: nam='SQL*Net message from client' ela= 154 driver id=1650815232 #bytes=1 p3=0 obj#=74672 tim=1450373201761751
FETCH #1:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,plh=300164232,tim=1450373201761780
WAIT #1: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=74672 tim=1450373201761790
========================================================================

Saturday, 12 December 2015

AWR : Response Time Per Txn

select min(begin_time), max(end_time),
       sum(case metric_name when 'Response Time Per Txn' then average end) ResponseTimePerTxn,
       snap_id
from dba_hist_sysmetric_summary
group by snap_id
order by snap_id;


TPS Historical

WITH hist_snaps
     AS (SELECT instance_number,
                snap_id,
                round(begin_interval_time,'MI') datetime,
                  (  begin_interval_time
                   + 0
                   - LAG (
                        begin_interval_time + 0)
                     OVER (PARTITION BY dbid, instance_number
                           ORDER BY snap_id))
                * 86400      -- seconds per day
                   diff_time -- calculate difference in time from
                             -- previous snapshot
                   FROM dba_hist_snapshot )
   , hist_stats
     AS (SELECT dbid,
                instance_number,
                snap_id,
                stat_name,
                  VALUE
                - LAG (
                     VALUE)
                  OVER (PARTITION BY dbid,instance_number,stat_name
                        ORDER BY snap_id)
                  delta_value -- difference in value from previous
                              -- snapshot for each stat_name
           FROM dba_hist_sysstat
          WHERE stat_name IN ('user commits', 'user rollbacks'))
  SELECT to_char(cast(datetime as date ), 'YYYY/MM/DD HH24:MI'),
         ROUND (SUM (delta_value) / 3600, 2) "Transactions/s"
     FROM hist_snaps sn, hist_stats st
   WHERE     st.instance_number = sn.instance_number
         AND st.snap_id = sn.snap_id
         AND diff_time IS NOT NULL
GROUP BY datetime
ORDER BY 1 desc;

Tuesday, 1 December 2015

How Do Indexes Become Unusable

Maintenance operations causing index partitions to become INDEX UNUSABLE (IU):

   1. IMPORT PARTITION or conventional path SQL*Loader.

   2. Direct-path SQL*Loader leaves affected local index partitions and  global indexes in an IU state if it does not complete successfully.

   3. Partition maintenance operations like ALTER TABLE MOVE PARTITION.

   4. Partition maintenance operations like ALTER TABLE TRUNCATE PARTITION.

   5. Partition maintenance operations like ALTER TABLE SPLIT PARTITION.

   6. Index maintenance operations like ALTER INDEX SPLIT PARTITION.

   7. Adding a Partition to a Hash-Partitioned Table

Adding a Partition to a Hash-Partitioned Table

When you add a partition to a hash-partitioned table, the database populates the new partition with rows rehashed from an existing partition (selected by the database) as determined by the hash function. Consequently, if the table contains data, then it may take some time to add a hash partition.

The following statements show two ways of adding a hash partition to table scubagear. Choosing the first statement adds a new hash partition whose partition name is system generated, and which is placed in the default tablespace. The second statement also adds a new hash partition, but that partition is explicitly named p_named and is created in tablespace gear5.

ALTER TABLE scubagear ADD PARTITION;

ALTER TABLE scubagear ADD PARTITION p_named TABLESPACE gear5;

Indexes may be marked UNUSABLE as explained in the following table:

Table Type               Index Behavior
Regular (Heap)         Unless you specify UPDATE INDEXES as part of the ALTER TABLE statement:
                              The local indexes for the new partition, and for the existing partition from which rows were redistributed, are marked UNUSABLE and                               must be rebuilt.
                              All global indexes, or all partitions of partitioned global indexes, are marked UNUSABLE and must be rebuilt.


Index-organized       For local indexes, the behavior is identical to heap tables.
                              All global indexes remain usable.

Each of these operations may cause index partitions to be marked IU which will require the index partitions to be rebuilt.

Version 10.2
=============
By default, many table maintenance operations on partitioned tables invalidate (mark UNUSABLE) the corresponding indexes or index partitions. You must then
rebuild the entire index or, in the case of a global index, each of its partitions. The database lets you override this default behavior if you specify
UPDATE INDEXES in your ALTER TABLE statement for the maintenance operation. Specifying this clause tells the database to update the index at the time it
executes the maintenance operation DDL statement.

The following operations support the UPDATE INDEXES clause:
    *  ADD PARTITION | SUBPARTITION
    *  COALESCE PARTITION | SUBPARTITION
    *  DROP PARTITION | SUBPARTITION
    *  EXCHANGE PARTITION | SUBPARTITION
    *  MERGE PARTITION | SUBPARTITION
    *  MOVE PARTITION | SUBPARTITION
    *  SPLIT PARTITION | SUBPARTITION
    *  TRUNCATE PARTITION | SUBPARTITION

Reference : Oracle support metalink