Oracle7 Server Concepts

Contents Index Home Previous Next

How Oracle Optimizes SQL Statements

This section explains how Oracle optimizes SQL statements. For any SQL statement processed by Oracle, the optimizer does the following:

evaluation of expressions and conditions The optimizer first evaluates expressions and conditions containing constants as fully as possible.
statement transformation For a complex statement involving, for example, correlated subqueries, the optimizer may transform the original statement into an equivalent join statement.
view merging For a SQL statement that accesses a view, the optimizer often merges the query in the statement with that in the view and then optimizes the result.
choice of optimization approaches The optimizer chooses either a rule-based or cost-based approach to optimization.
choice of access paths For each table accessed by the statement, the optimizer chooses one or more of the available access paths to obtain the table's data.
choice of join orders For a join statement that joins more than two tables, the optimizer chooses which pair of tables is joined first, and then which table is joined to the result, on so on.
choice of join operations For any join statement, the optimizer chooses an operation to use to perform the join.

Types of SQL Statements

Oracle optimizes these different types of SQL statements:

simple statements A simple statement is an INSERT, UPDATE, DELETE, or SELECT statement that only involves a single table.
simple queries A query is another name for a SELECT statement.
joins A join is a query that selects data from more than one table. A join is characterized by multiple tables in the FROM clause. Oracle pairs the rows from these tables using the condition specified in the WHERE clause and returns the resulting rows. This condition is called the join condition and usually compares columns of all the joined tables.
equijoins An equijoin is characterized by a join condition containing an equality operator.
nonequijoins A nonequijoin is characterized by a join condition containing something other than an equality operator.
outer joins An outer join is characterized by a join condition that uses the outer join operator (+) with one or more of the columns of one of the tables. Oracle returns all rows that meet the join condition. Oracle also returns all rows from the table without the outer join operator for which there are no matching rows in the table with the outer join operator.
cartesian products A join with no join condition results in a cartesian product, or a cross product. A cartesian product is the set of all possible combinations of rows drawn one from each table. In other words, for a join of two tables, each row in one table is matched in turn with every row in the other. A cartesian product for more than two tables is the result of pairing each row of one table with every row of the cartesian product of the remaining tables. All other kinds of joins are subsets of cartesian products effectively created by deriving the cartesian product and then excluding rows that fail the join condition.
complex statements A complex statement is an INSERT, UPDATE, DELETE, or SELECT statement that contains a form of the SELECT statement called a subquery. This is a query within another statement that produces a set of values for further processing within the statement. The outer portion of the complex statement that contains a subquery is called the parent statement.
compound queries A compound query is a query that uses set operators (UNION, UNION ALL, INTERSECT, or MINUS) to combine two or more simple or complex statements. Each simple or complex statement in a compound query is called a component query.
statements accessing views You can also write simple, join, complex, and compound statements that access views as well as tables.
distributed statements A distributed statement is one that accesses data on a remote database.

Choosing Access Paths

One of the most important choices the optimizer makes when formulating an execution plan is how to retrieve the data from the database. For any row in any table accessed by a SQL statement, there may be many access paths by which that row can be located and retrieved. The optimizer chooses one of them.

This section discusses these topics:

Access Methods

This section describes basic methods by which Oracle can access data.

Full Table Scans A full table scan retrieves rows from a table. To perform a full table scan, Oracle reads all rows in the table, examining each row to determine whether it satisfies the statement's WHERE clause. Oracle reads every data block allocated to the table sequentially, so a full table scan can be performed very efficiently using multiblock reads. Oracle reads each data block only once.

Table Access by ROWID A table access by ROWID also retrieves rows from a table. The ROWID of a row specifies the datafile and data block containing the row and the location of the row in that block. Locating a row by its ROWID is the fastest way for Oracle to find a single row.

To access a table by ROWID, Oracle first obtains the ROWIDs of the selected rows, either from the statement's WHERE clause or through an index scan of one of more of the table's indexes. Oracle then locates each selected row in the table based on its ROWID.

Cluster Scans From a table stored in an indexed cluster, a cluster scan retrieves rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data blocks. To perform a cluster scan, Oracle first obtains the ROWID of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this ROWID.

Hash Scans Oracle can use a hash scan to locate rows in a hash cluster based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data blocks. To perform a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle then scans the data blocks containing rows with that hash value.

Index Scans An index scan retrieves data from an index based on the value of one or more columns of the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, Oracle reads the indexed column values directly from the index, rather than from the table.

In addition to each indexed value, an index also contains the ROWIDs of rows in the table having that value. If the statement accesses other columns in addition to the indexed columns, Oracle then finds the rows in the table with a table access by ROWID or a cluster scan.

An index scan can be one of these types:

Unique A unique scan of an index returns only a single ROWID. Oracle can only perform a unique scan in cases in which only a single ROWID, rather than many ROWIDs, is required. For example, Oracle performs a unique scan if there is a UNIQUE or a PRIMARY KEY constraint that guarantees that the statement accesses only a single row.
Range A range scan of an index can return one or more ROWIDs depending on how many rows are accessed by the statement.

Access Paths

Table 13 - 1 lists the access paths. The rank of each path is used by the rule-based approach to choose a path when more than one path is available.

Rank Access Path
1 Single row by ROWID
2 Single row by cluster join
3 Single row by hash cluster key with unique or primary key
4 Single row by unique or primary key
5 Cluster join
6 Hash cluster key
7 Indexed cluster key
8 Composite key
9 Single-column indexes
10 Bounded range search on indexed columns
11 Unbounded range search on indexed columns
12 Sort-merge join
13 MAX or MIN of indexed column
14 ORDER BY on indexed columns
15 Full table scan
Table 13 - 1. Access Paths

The optimizer can only choose to use a particular access path for a table if the statement contains a WHERE clause condition or other construct that makes that access path available. Each of the following sections describes an access path and discusses

Choosing Among Access Paths

This section describes how the optimizer chooses among available access paths:

Choosing Among Access Paths with the Rule-Based Approach With the rule-based approach, the optimizer chooses whether to use an access path based on these factors:

To choose an access path, the optimizer first examines the conditions in the statement's WHERE clause to determine which access paths are available. The optimizer then chooses the most highly ranked available access path. Note that the full table scan is the lowest ranked access path on the list. This means that the rule-based approach always chooses an access path that uses an index if one is available, even if a full table scan might execute faster.

The order of the conditions in the WHERE clause does not normally affect the optimizer's choice among access paths.

Example

Consider this SQL statement that selects the employee numbers of all employees in the EMP table with an ENAME value of 'CHUNG' and with a SAL value greater than 2000:

SELECT empno 
	FROM emp 
	WHERE ename = 'CHUNG'  
	  AND sal > 2000;

Consider also that the EMP table has these integrity constraints and indexes:

Based on the conditions in the WHERE clause of the SQL statement, the integrity constraints, and the indexes, these access paths are available:

Note that the PK_EMPNO index does not make the single row by primary key access path available because the indexed column does not appear in a condition in the WHERE clause.

Using the rule-based approach, the optimizer chooses the access path that uses the ENAME_IND index to execute this statement. The optimizer chooses this path because it is the most highly ranked path available.

Choosing Among Access Paths with the Cost-Based Approach With the cost-based approach, the optimizer chooses an access path based on these factors:

To choose an access path, the optimizer first determines which access paths are available by examining the conditions in the statement's WHERE clause. The optimizer then generates a set of possible execution plans using available access paths and estimates the cost of each plan using the statistics for the index, columns, and tables accessible to the statement. The optimizer then chooses the execution plan with the lowest estimated cost.

The optimizer's choice among available access paths can be overridden with hints. For information on hints, see Oracle7 Server Tuning.

To choose among available access paths, the optimizer considers these factors:

Example

Consider this query that uses an equality condition in its WHERE clause to select all employees name Jackson:

SELECT * 
	FROM emp 	
	WHERE ename = 'JACKSON';

If the ENAME column is a unique or primary key, the optimizer determines that there is only one employee named Jackson, and the query returns only one row. In this case, the query is very selective, and the optimizer is most likely to access the table using a unique scan on the index that enforces the unique or primary key (access path 4).

Example

Consider again the query in the previous example. If the ENAME column is not a unique or primary key, the optimizer can use these statistics to estimate the query's selectivity:

USER_TAB_ COLUMNS.NUM_DISTINCT This statistic is the number of values for each column in the table.
USER_TABLES. NUM_ROWS This statistic is the number of rows in each table.
By dividing the number of rows in the EMP table by the number of distinct values in the ENAME column, the optimizer estimates what percentage of employees have the same name. By assuming that the ENAME values are uniformly distributed, the optimizer uses this percentage as the estimated selectivity of the query.

Example

Consider this query that selects all employees with employee ID numbers less than 7500:

SELECT * 
	FROM emp 	
	WHERE empno < 7500;

To estimate the selectivity of the query, the optimizer uses the boundary value of 7500 in the WHERE clause condition and the values of the HIGH_VALUE and LOW_VALUE statistics for the EMPNO column if available. These statistics can be found in the USER_TAB_COLUMNS view. The optimizer assumes that EMPNO values are evenly distributed in the range between the lowest value and highest value. The optimizer then determines what percentage of this range is less than the value 7500 and uses this value as the estimated selectivity of the query.

Optimizing Distributed Statements

The optimizer chooses execution plans for SQL statements that access data on remote databases in much the same way it chooses executions for statements that access only local data:

When choosing the execution plan for a distributed statement, the optimizer considers the available indexes on remote databases just as it does indexes on the local database. If the statement uses the cost-based approach, the optimizer also considers statistics on remote databases. Furthermore, the optimizer considers the location of data when estimating the cost of accessing it. For example, a full scan of a remote table has a greater estimated cost than a full scan of an identical local table.


Contents Index Home Previous Next