Total Pageviews

Saturday 5 March 2016

Predicate Selectivity


Selectivity
~~~~~~~~~~~
Selectivity is a measure of the proportion of a row source retrieved by  application of a particular predicate or combination of predicates.

Within the Oracle kernel it is expressed as a value between 0 and 1.  The closer the value is to 0 the more selective the predicate is.
Selectivity is only used by the CBO.

Basic Selectivity formula:
~~~~~~~~~~~~~~~~~~~~~~~~~~

       Number of records satisfying a condition
Selectivity = -----------------------------------------
             Total Number of records

In the optimizer, selectivity is used to compare the usefulness of various  predicates in combination with base object costs.

Knowing the proportion of the total data set that a column predicate defines  is very helpful in defining actual access costs.

By default, column selectivity is based on the high and low values and the  number of values in the column with an assumption of even distribution of  data between these two points.

Histogram data can give better selectivity estimates for unevenly distributed  column data. There is more discussion regarding Histograms later.

Selectivity is also used to define the cardinality of a particular row source once predicates have been applied.

Cardinality is the expected number of rows  that will be retrieved from a row source. Cardinality is useful in determining  nested loop join and sort costs. Application of selectivity to the original  cardinality of the row source will produce the expected (computed) cardinality
for the row source.


Glossary of Terms:
~~~~~~~~~~~~~~~~~~

NDV Number of Distinct Values
Cardinality Number of rows
Selectivity Proportion of a dataset returned by a particular predicate(or group of predicates)

In the following illustrations there are 2 tables (T1 & T2) with columns (c1) and (c2) respectively.

Selectivities:
~~~~~~~~~~~~~~
Without histograms
~~~~~~~~~~~~~~~~~~
c1 = '4076'              1/NDV
c1 > '4076'              1 - (High - Value / High - Low)
c1 >= '4076'             1 - (High - Value / High - Low) + 1/NDV
c1 like '4076'           1/NDV

Join selectivity
~~~~~~~~~~~~~~~~

The selectivity of a join is defined as the selectivity of the most selective
join column adjusted by the proportion of not null values in each join column.


 Sel = 1/max[NDV(t1.c1),NDV(t2.c2)] *
( (Card t1 - # t1.c1 NULLs) / Card t1) *
( (Card t2 - # t2.c2 NULLs) / Card t2)

Bind Variable selectivity
~~~~~~~~~~~~~~~~~~~~~~~~~

Bind variables present a special case because the optimizer has no idea what
the bind variable value is prior to query optimization. This does not present
a problem with equality predicates since a uniform distribution of data is
assumed and the selectivity is taken as 1/NDV for the column. However for
range predicates it presents a major issue because the optimizer does not
know where the range starts or stops. Because of this the optimizer has to
make some assumptions as follows:

c1 =    :bind1           1/NDV
c1 >    :bind1           Default of 5%
c1 >=   :bind1           Default of 5%
c1 like :bind1           Default of 25%

For more information on bind variables see Note:70075.1

Selectivity With Histograms
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Histograms provide additional information about column selectivity for  columns whose distribution is non uniform. Histograms store information about column data value ranges. Each range is stored in a single row and is often  called a 'bucket'. There are 2 different methods for storing histograms in
Oracle. If there are a small number of distinct column values (i.e. less than the number of buckets), the column value  and the count of that value is stored. If not then a series of endpoints are stored to enable more accurate selectivity to be determined.

The first method allows the accurate figures to be used. However with  inexact histograms the terms popular and non-popular value are introduced and are used to help determine selectivity. A popular value is a value that spans multiple endpoints whereas a non-popular value does not.
See Note:72539.1 for more information on histograms.

Exact histograms
~~~~~~~~~~~~~~~~
c1 = '4706'         count of value '4076' / Total Number of Rows
c1 > value          count of values > '4076' / Total Number of Rows

In Exact Histograms
~~~~~~~~~~~~~~~~~~
col = pop value         # popular buckets / # buckets
col = non pop           (Density)
col > value             # buckets > value / # buckets


Rules for combining selectivity
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let P1 and P2 be 2 distinct predicates of query Q

 P1 AND P2
       S(P1&P2) = S(P1) * S(P2)
 P1 OR P2
       S(P1|P2) = S(P1) + S(P2) -[S(P1) * S(P2)]

Index Selectivity for concatenated indexes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Starting with 10.2, when a concatenated index, with all its columns having equality predicates, is used as an access path, the optimizer uses 1/NDK as the selectivity (where NDK is the number of distinct keys in the index).

On 9.2.0.7 and 9.2.0.8 this may be enabled with event 38060 level 1.
On 10.1.0.4 and 10.1.0.5 this may be enabled with event 38059 level 1.
On 10.2 adjustments will be made to the selectivity to account for nulls
in the index keys. This also occurs on 10.1.0.5 (with event 38059.)


Join cardinality
~~~~~~~~~~~~~~~~

 Card(Pj) = Card(T1) * Card(T2) * Sel(Pj)

+++ Reference : Oracle support Metalink +++

1 comment:

  1. Jadwal Pertandingan Ayam SV388 7 Maret 2019 - Jumat, Lombok 8 Maret 2019 – Pada Hari Tersebut Akan Di Laksanakan Berbagai Pertandingan Sabung Ayam Secara Live di Arena Sabung Ayam Thailand.

    Situs Judi Sabung Ayam Online SV388 Merupakan Situs Judi Asal Thailand Yang Sangat Terkenal Dengan Permainan Sabung Ayam Yang Fair dan Menghibur Para Penonton Judi Sabung Ayam.

    Untuk Info Lebih Lanjut Bisa Hub kami Di :
    wechat : bolavita
    line : cs_bolavita
    whatsapp : +628122222995
    BBM: BOLAVITA

    ReplyDelete