Oracle DBA & Data Science Enthusiast: Direct and Asynchronous I/O

I/O operations in UNIX and Linux systems typically go through the file system cache. Although this doesn't represent a problem in itself, this extra processing does require resources. Bypassing the file system cache reduces CPU requirements, and frees up the file system cache for other non-database file operations. Operations against raw devices automatically bypass the file system cache.

When a synchronous I/O request is submitted to the operating system, the writing process blocks until the write is complete before continuing processing. With asynchronous I/O, processing continues while the I/O request is submitted and processed. This allows asynchronous I/O to bypass some of the performance bottlenecks associated with I/O operations.

Oracle can take advantage of direct I/O and asynchronous I/O on supported platforms using the FILESYSTEMIO_OPTIONS parameter, whose possible values are listed below.

ASYNCH - Enabled asynchronous I/O where possible.
DIRECTIO- Enabled direct I/O where possible.
SETALL- Enabled both direct I/O and asynchronous I/O where possible.
NONE - Disabled both direct I/O and asynchronous I/O.

The following example shows how the parameter is set.

SQL> SHOW PARAMETER FILESYSTEMIO_OPTIONS

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

filesystemio_options string none

SQL> ALTER SYSTEM SET FILESYSTEMIO_OPTIONS=SETALL SCOPE=SPFILE;

System altered.

SQL> SHOW PARAMETER FILESYSTEMIO_OPTIONS

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

filesystemio_options string SETALL

SQL>

Direct path read wait

You see direct path read waits only when you are doing a parallel full-scan. When Oracle performs a parallel full-table scan, the database blocks are read directly into the program global area (PGA), bypassing the data buffer RAM. If the I/O subsystem doesn't support asynchronous I/Os, then each wait corresponds to a physical read request.

The direct path read wait occurs during Direct Path operations when the data is asynchronously read directly from the database files into the PGA, bypassing the SGA.

If the I/O subsystem supports asynchronous I/O, then the process overlaps read requests with processing the blocks already in the PGA. With asynchronous I/O, processing continues while the I/O request is submitted and processed. This allows asynchronous I/O to bypass some of the performance bottlenecks associated with I/O operations.

When the process attempts to access a block in the PGA that has not yet been read from disk, it issues a wait call and updates the statistics for this event. So, the number of waits is not always the same as the number of read requests.

Direct path write temp wait

The direct path write temp wait event and direct path write wait event can occur when a process is writing buffers directly from PGA. The process will wait on this event for the write call to complete. Operations that could perform direct path writes include when a sort goes to disk, parallel DML operations, direct-path INSERTs, parallel create table as select, and some LOB operations. Like direct path reads, the number of waits is not the same as number of write calls issued if the I/O subsystem supports asynchronous writes. The session will wait if it has processed the buffers in the PGA and can't continue work until an I/O request completes.

Causes for the direct path read temp wait

High disk sorts - If the sorts are too large to fit in memory and get sent to disk, this wait can occur.
Parallel slaves - Parallel slaves are used for scanning data or parallel DML may be used to create and populate objects. These may lead to direct path read wait and direct path write wait respectively.
Direct path loads - The direct path API is used to pass data to the load engine in the server and can cause the related direct path write wait.
Server process ahead of I/O - The server process is processing buffers faster than the I/O system can return the buffers. This can indicate an overloaded I/O system
Data Warehouse - Sorts in a data warehouse environment may always go to disk leading to high waits on direct path read temp and/or direct path write temp.
Hash area size - For query plans that call for a hash join, excessive I/O could result from having HASH_AREA_SIZE too small.

PGA Memory Management

The Program Global Area (PGA) is a private memory region containing data and control information for a server process. Access to it is exclusive to that server process and is read and written only by the Oracle code acting on behalf of it. An example of such information is the runtime area of a cursor. Each time a cursor is executed, a new runtime area is created for that cursor in the PGA memory region of the server process executing that cursor.

For complex queries (for example, decision support queries), a big portion of the runtime area is dedicated to work areas allocated by memory intensive operators, such as the following:

Sort-based operators, such as ORDER BY, GROUP BY, ROLLUP, and window functions
Hash-join
Bitmap merge
Bitmap create
Write buffers used by bulk load operations

A sort operator uses a work area (the sort area) to perform the in-memory sort of a set of rows. Similarly, a hash-join operator uses a work area (the hash area) to build a hash table from its left input.

The size of a work area can be controlled and tuned. Generally, bigger work areas can significantly improve the performance of a particular operator at the cost of higher memory consumption. Ideally, the size of a work area is big enough that it can accommodate the input data and auxiliary memory structures allocated by its associated SQL operator. This is known as the optimal size of a work area. When the size of the work area is smaller than optimal, the response time increases, because an extra pass is performed over part of the input data. This is known as the one-pass size of the work area. Under the one-pass threshold, when the size of a work area is far too small compared to the input data size, multiple passes over the input data are needed. This could dramatically increase the response time of the operator. This is known as the multi-pass size of the work area. For example, a serial sort operation that needs to sort 10GB of data needs a little more than 10GB to run optimal and at least 40MB to run one-pass. If this sort gets less that 40MB, then it must perform several passes over the input data.

The goal is to have most work areas running with an optimal size (for example, more than 90% or even 100% for pure OLTP systems), while a smaller fraction of them are running with a one-pass size (for example, less than 10%). Multi-pass execution should be avoided. Even for DSS systems running large sorts and hash-joins, the memory requirement for the one-pass executions is relatively small. A system configured with a reasonable amount of PGA memory should not need to perform multiple passes over the input data.

Automatic PGA memory management simplifies and improves the way PGA memory is allocated. By default, PGA memory management is enabled. In this mode, Oracle dynamically adjusts the size of the portion of the PGA memory dedicated to work areas, based on 20% of the SGA memory size. The minimum value is 10MB.

10.3.2 buffer busy waits

This wait indicates that there are some buffers in the buffer cache that multiple processes are attempting to access concurrently. Query V$WAITSTAT for the wait statistics for each class of buffer. Common buffer classes that have buffer busy waits include data block, segment header, undo header, and undo block.

Check the following V$SESSION_WAIT parameter columns:

P1 - File ID
P2 - Block ID
P3 - Class ID

10.3.2.1 Causes

To determine the possible causes, first query V$SESSION to identify the value of ROW_WAIT_OBJ# when the session waits for buffer busy waits. For example:

SELECT row_wait_obj# FROM V$SESSION

WHERE EVENT = 'buffer busy waits';

To identify the object and object type contended for, query DBA_OBJECTS using the value for ROW_WAIT_OBJ# that is returned from V$SESSION. For example:

SELECT owner, object_name, subobject_name, object_type FROM DBA_OBJECTS

WHERE data_object_id = &row_wait_obj;

10.3.2.2 Actions

The action required depends on the class of block contended for and the actual segment.

10.3.2.2.1 segment header

If the contention is on the segment header, then this is most likely free list contention.

Automatic segment-space management in locally managed tablespaces eliminates the need to specify the PCTUSED, FREELISTS, and FREELIST GROUPS parameters. If possible, switch from manual space management to automatic segment-space management (ASSM).

The following information is relevant if you are unable to use automatic segment-space management (for example, because the tablespace uses dictionary space management).

A free list is a list of free data blocks that usually includes blocks existing in a number of different extents within the segment. Free lists are composed of blocks in which free space has not yet reached PCTFREE or used space has shrunk below PCTUSED. Specify the number of process free lists with the FREELISTS parameter. The default value of FREELISTS is one. The maximum value depends on the data block size.

To find the current setting for free lists for that segment, run the following:

SELECT SEGMENT_NAME, FREELISTS

FROM DBA_SEGMENTS

WHERE SEGMENT_NAME = segment name

AND SEGMENT_TYPE = segment type;

Set free lists, or increase the number of free lists. If adding more free lists does not alleviate the problem, then use free list groups (even in single instance this can make a difference). If using Oracle Real Application Clusters, then ensure that each instance has its own free list group(s).

10.3.2.2.2 data block

If the contention is on tables or indexes (not the segment header):

Check for right-hand indexes. These are indexes that are inserted into at the same point by many processes. For example, those that use sequence number generators for the key values.

Consider using automatic segment-space management (ASSM), global hash partitioned indexes, or increasing free lists to avoid multiple processes attempting to insert into the same block.

10.3.2.2.3 undo header

For contention on rollback segment header:

If you are not using automatic undo management, then add more rollback segments.

10.3.2.2.4 undo block

For contention on rollback segment block:

If you are not using automatic undo management, then consider making rollback segment sizes larger.

10.3.3 db file scattered read

This event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return. A db file scattered read issues a scattered read to read the data into multiple discontinuous memory locations. A scattered read is usually a multiblock read. It can occur for a fast full scan (of an index) in addition to a full table scan.

The db file scattered read wait event identifies that a full scan is occurring. When performing a full scan into the buffer cache, the blocks read are read into memory locations that are not physically adjacent to each other. Such reads are called scattered read calls, because the blocks are scattered throughout memory. This is why the corresponding wait event is called 'db file scattered read'. multiblock (up to DB_FILE_MULTIBLOCK_READ_COUNT blocks) reads due to full scans into the buffer cache show up as waits for 'db file scattered read'.

Check the following V$SESSION_WAIT parameter columns:

P1 - The absolute file number
P2 - The block being read
P3 - The number of blocks (should be greater than 1)

Oracle DBA & Data Science Enthusiast

Total Pageviews

Thursday, 23 August 2012

Direct and Asynchronous I/O

No comments:

Post a Comment