Showing posts with label merge join. Show all posts
Showing posts with label merge join. Show all posts

Jul 4, 2011

UNDERSTANDING EXPLAIN PLAN 2


Join Operations
OUTER
An outer join that returns rows from one of the tables, even if there is no matching row in the other table. This is achieved in Oracle using the "(+)" operator in the WHERE clause.
OUTER JOINS are used with CONNECT BY, MERGE JOIN, NESTED LOOPS and HASH JOIN operations. OUTER JOIN enables rows from the driving table to be returned to the calling query even though no matching rows were found in the joined table. The following example is based on the same query illustrated in the NESTED LOOPS topic, using an OUTER JOIN, instead.

Example

select c.Name  from COMPANY  c, SALES  s where c.Company_ID = s.Company_ID (+)  and s.Period_ID = 3  and s.Sales_Total >1000;

Execution Plan

NESTED LOOPS OUTER
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK

Interpreting the Execution Plan

The Execution Plan shows that the SALES table is used as the driving table for the query. For each COMPANY_ID value in SALES, the COMPANY_ID index on the COMPANY table will be checked to see if a matching value exists. Even if a match does not exist, that record is returned to the user via the NESTED LOOPS OUTER join operation.
Note the difference in Explain Plan of below 2 queries .
select D.NN_EMPCODE,VC_EMPNAME,D.NN_MGRCODE from hr_emp_mast d  ,hr_emp_personal dd where D.NN_EMPCODE=DD.NN_EMPCODE(+) and D.NN_EMPCODE=269

select D.NN_EMPCODE,VC_EMPNAME,D.NN_MGRCODE from hr_emp_mast d  ,hr_emp_personal dd where D.NN_EMPCODE=DD.NN_EMPCODE(+) and dD.NN_EMPCODE=269


ANTI
An anti-join that returns all rows in one table that do not have a matching row in the other table. It is typically implemented using a NOT IN sub-query.
An ANTI-JOIN is a query that returns rows in one table that do not match some set of rows from another table. Since this is effectively the opposite of normal join behavior, the term ANTI-JOIN has been used to describe this operation. ANTI-JOINs are usually expressed using a sub-query, although there are alternative formulations.
One method of using an ANTI-JOIN query is to combine the IN operator with the NOT operator. This method works well when using the cost-based optimizer.
The rule-based optimizer method of using an ANTI-JOIN query is to use it with the NOT EXISTS operator in place of NOT IN. This method uses the WHERE clause in the sub-query.
Note: When Oracle is using rule-based optimization, avoid using NOT IN to perform an anti-join. Use NOT EXISTS instead.
You can also implement the ANTI-JOIN operation as an OUTER JOIN. An OUTER JOIN includes NULL for rows in the inner table, which have no match in the outer table. This feature can be used to include only rows that have no match in the inner table. However, the most efficient implementation is to use HASH JOIN and its hints.
To take advantage of Oracle’s ANTI-JOIN optimizations, the following must be true:
·         Cost-based optimization must be enabled.
·         The ANTI-JOIN columns used must not be NULL. This either means that they are not NULL in the table definition, or an IS NOT NULL clause appears in the query for all the relevant columns.
·         The subquery is not correlated.
·         The parent query does not contain an Or clause.
·         The database parameter ALWAYS_ANTI_JOIN is set to either MERGE or HASH or a MERGE_AJ or HASH_AJ hint appears within the sub-query.
SEMI
A semi-join that returns rows from a table which have matching rows in a second table but which does not return multiple rows if there are multiple matches. This is usually expressed using a WHERE EXISTS sub-query.

     A SEMI JOIN is a join which returns rows from a table which have matching rows in a second table but which does not return multiple rows if there are multiple matches. This is usually expressed in Oracle using a WHERE EXISTS sub-query.

CARTESIAN
Every row in one result set is joined to every row in the other result set.
A join with no join condition results in a CARTESIAN product, or a cross product. A CARTESIAN product is the set of all possible combinations of rows drawn from each table. In other words, for a join of two tables, each row in one table is matched with every row in the other. A CARTESIAN product for more than two tables is the result of pairing each row of one table with every row of the Cartesian product of the remaining tables.
All other kinds of joins are subsets of CARTESIAN products effectively created by deriving the CARTESIAN product and then excluding rows that fail the join condition.
Note: When using the ORDERED hint, it is important that the tables in the FROM clause are listed in the correct order to prevent CARTESIAN joins.
Hint: Consider using Oracle’s STAR query optimization when joining a very large "fact" table to smaller, unrelated "dimension" tables. You will need a concatenated index on the fact table and may need to specify the STAR hint.

OUTER, ANTI, SEMI, CARTESIAN WILL COME WITH CONNECT BY, MERGE JOIN, NESTED LOOP, HASH JOIN.


CONNECT BY
A hierarchical self-join is performed on the output of the preceding steps.
CONNECT BY does a recursive join of a table to itself, in a hierarchical fashion.

Example

select Company_ID, Name from COMPANY where State = ‘VA’ connect by Parent_Company_ID = prior Company_ID start with Company_ID = 1;
The query shown in the preceding statement selects companies from the COMPANY in a hierarchical fashion; that is, it returns the rows based on each Company’s parent company. If there are multiple levels of company parentage, those levels display in the report.

Execution Plan

FILTER
CONNECT BY
INDEX UNIQUE SCAN COMPANY_PK
TABLE ACCESS BY ROWID COMPANY
TABLE ACCESS BY ROWID COMPANY
INDEX RANGE SCAN COMPANY$PARENT

Interpreting the Execution Plan

The plan shows that first the COMPANY_PK index is used to find the root node (Company_ID = 1), then index on the Parent_Company_ID column is used to provide values for queries against the Company_ID column in an iterative fashion. After the hierarchy of Company_IDs is complete, the FILTER operation—the WHERE clause related to the STATE value—is applied. Notice that the query does not use the index on the STATE column, although it is available and the column is used in the WHERE clause.

MERGE


MERGE JOIN
A MERGE JOIN performed on the output of the preceding steps
MERGE JOIN joins tables by merging sorted lists of records from each table. It is effective for large batch operations, but may be ineffective for joins used by transaction-processing applications. MERGE JOIN is used whenever Oracle cannot use an index while conducting a join. In the following example, all of the tables are fully indexed. So the example deliberately disables the indexes by adding 0 to the numeric keys during the join to force a merge join to occur.

Example

select COMPANY.Name from COMPANY, SALES where COMPANY.Company_ID+0 = SALES.Company_ID+0 and SALES.Period_ID =3 and SALES.Sales_Total>1000;

Execution Plan

MERGE JOIN
SORT JOIN
TABLE ACCESS FULL SALES
SORT JOIN
TABLE ACCESS FULL COMPANY

Interpreting the Execution Plan

There are two potential indexes that could be used by a query joining the COMPANY table to the SALES table. First, there is an index on COMPANY.COMPANY_ID - but that index cannot be used because of the +0 value added to it (disabling indexes is described in detail in the Top SQL Tuning Tips topic). Second, there is an index whose first column is SALES.COMPANY_ID - but that index cannot be used, for the same reason.
As shown in the plan, Oracle will perform a full table scan (TABLE ACCESS FULL) on each table, sort the results (using the SORT JOIN operation), and merge the result sets. The use of merge joins indicates that indexes are either unavailable or disabled by the query’s syntax.

NESTED LOOPS
A nested loops join is performed on the preceding steps. For each row in the upper result set, the lower result set is scanned to find a matching row.
NESTED LOOPS joins table access operations when at least one of the joined columns is indexed.

Example

select COMPANY.Name  from COMPANY, SALES where COMPANY.Company_ID = SALES.Company_ID and SALES.Period_ID =3 and SALES.Sales_Total>1000;

Execution Plan

NESTED LOOPS
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK

Interpreting the Execution Plan

The Execution Plan shows that the SALES table is used as the driving table for the query. During NESTED LOOPS joins, one table is always used to drive the query. The Implications of the Driving Table in a NESTED LOOPS Join topic provides tuning guidance on the selection of a driving table for a NESTED LOOPS operation.
For each COMPANY_ID value in the SALES table, the COMPANY_ID index on the COMPANY table will be checked to see if a matching value exists. If a match exists, the record is returned to the user via the NESTED LOOPS operation.
There are several important things to note about this query:
·         Although all of the primary key columns in the SALES table were specified in the query, the SALES_PK index was not used. The SALES_PK index was not used because there was not a limiting condition on the leading column (the COMPANY_ID column) of the SALES_PK index. The only condition on SALES.COMPANY_ID is a join condition.
·         The optimizer could have selected either table as the driving table. When the COMPANY table is the driving table, Oracle performs a full table scan.
·         In rule-based optimization, when there is equal chance of using an index regardless of the choice of the driving table, the driving table will be the one that is listed last in the FROM clause.
·         In cost-based optimization, the optimizer will consider the size of the tables and the selectivity of the indexes while selecting a driving table.

Interpreting the Order of Operations within NESTED LOOPS

NESTED LOOPS operations pose a special challenge when reading the output from PLAN_TABLE. Given the Explain path shown in the following listing, it appears that the first step in the Explain path is the scan of the COMPANY_PK index, since that is the innermost step of the Explain path.
NESTED LOOPS
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK
Despite its placement as the innermost step, the scan of the COMPANY_PK index is not the first step in the Explain path. A NESTED LOOPS join needs to be driven by a row source (such as a full table scan or an index scan) - so to determine the first step within a NESTED LOOPS join, you need to determine which operations directly provide data to the NESTED LOOPS operation. In this example, two operations provide data directly to the NESTED LOOPS operation - the full table scan of SALES, and the ROWID access of the COMPANY table.
Of the two operations that provide data to the NESTED LOOPS operation, the full table scan of SALES is listed first. Therefore, within the NESTED LOOPS operation, the order of operations is:
1.      The full table scan of SALES.
2.      For each record in SALES, access COMPANY by Company_ID. Since an index (COMPANY_PK) is available on COMPANY.Company_ID, use that index via a unique scan.
3.      For each ROWID returned from the COMPANY_PK index, access the COMPANY table (to get the NAME value, as requested by the query).
When reading the Explain path for a NESTED LOOPS operation, you need to look first at the order of the operations that directly provide data to it, and determine their order. 
.
HASH JOIN
A HASH JOIN is performed of two row sources.
HASH JOIN is one of the algorithms that Oracle can use to join two tables.
In a HASH JOIN a hash table, an on-the-fly index, is constructed for the larger of the two tables. The smaller table is then scanned, and the hash table used to find matching rows in the larger table.
HASH JOIN joins tables by creating an in-memory bitmap of one of the tables and then using a hashing function to locate the join rows in the second table.
In the following query, the COMPANY and SALES are joined based on their common COMPANY_ID column.

Example

select COMPANY.Name from COMPANY, SALES where COMPANY.Company_ID = SALES.Company_ID and SALES.Period_ID =3 and SALES.Sales_Total>1000;

Execution Plan

HASH JOIN
TABLE ACCESS FULL SALES
TABLE ACCESS FULL COMPANY

Interpreting the Execution Plan

The Execution Plan shows that the SALES table is used as the first table in the hash join. SALES table will be read into memory. Oracle will use a hashing function to compare the values in COMPANY table to the records that have been read into memory.
When one of the tables is significantly smaller than the other in the join, and the smaller table fits into the available memory area, then the optimizer will generally use a hash join instead of a traditional NESTED LOOPS join. Even if an index is available for the join, a hash join may be preferable to a NESTED LOOPS join.

Jun 29, 2011

UNDERSTANDING EXPLAIN PLAN


    Explain Plan a small overview


            To interpret an execution plan and correctly evaluate your SQL optimization options, you need to first understand the differences between the available database operations. The following topics describe each database access operation identified by the name given to it by the Oracle EXPLAIN PLAN command, along with its characteristics.
For each type of operation, an example is provided. In some cases, the example for an operation will use operations described later in the topic (such as MERGE JOIN, which always uses a SORT JOIN operation).
The operations are classified as row operations or set operations. The following list describes the differences between row and set operations. 


Row Operations
Set Operations
Executed on one row at a time
Executed on a result set of rows
Executed at the FETCH stage, if there is no set operation involved
Executed at the EXECUTE stage when the cursor is opened
Ability to view the first result before the last row is fetched
Inability to view the first result until all rows are fetched and processed
Example: A full-table scan
Example: A full-table scan with a GROUP BY



Aggregation Operations

The following Aggregation operations are available.

COUNT
Counts the rows in the result set to satisfy the COUNT() function

COUNT is executed when the RowNum pseudo-column is used without specifying a maximum value for RowNum. COUNT receives rows from its child operations and increments the RowNum counter. If a limiting counter is used on the RowNum pseudo-column, then the COUNT STOPKEY operation is used instead of COUNT
.

Example

.

select Name, City, State, RowNum from COMPANY where City > ‘Roanoke’order by Zip;
.
The query shown in the preceding listing selects rows from the COMPANY. Each row will have the original row number returned.

Execution Plan

SORT ORDER BY
COUNT
TABLE ACCESS BY ROWID COMPANY
INDEX RANGE SCAN COMPANY$CITY
 
Interpreting the Execution Plan
.
The Execution Plan shows that the index on the City column is used to find ROWIDs in the COMPANY table that satisfy the WHERE clause condition (where City > ‘Roanoke’). The ROWIDs from the City index scan are used to query the COMPANY table for the Name and State column values. For each row returned, the counter is incremented. Because of the use of the index, the rows that are returned will be the "lowest" city names that are greater than the value ‘Roanoke’. The rows will be returned from the COMPANY$CITY index in ascending order of the City column’s value. The RowNum pseudo-column will then be calculated and put into the row. The SORT ORDER BY operation will order the rows by Zip, as requested in the ORDER BY clause.  The RowNum values are assigned before the ordering takes place.
.
Counts the numbers of rows returned by a result set and stop processing when a certain number of rows are reached. This is usually the result of a WHERE clause which specifies a maximum ROWNUM (for example, WHERE ROWNUM <=10).

­­­Example 

.

 

Select Name, City, State from COMPANY where City > ‘Roanoke’ and Rownum <= 100;
.

Execution Plan

COUNT STOPKEY
TABLE ACCESS BY ROWID COMPANY
INDEX RANGE SCAN COMPANY$CITY



SORT
Performs a sort.

ORDER BY
Sorts a result set to satisfy an ORDER BY clause.

AGGREGATE
Occurs when a group function is used on data, which is already grouped.




SORT AGGREGATE is used to sort and aggregate result sets whenever a grouping function appears in a SQL statement without a GROUP BY clause. Grouping functions include MAX, MIN, COUNT, SUM, and AVG.

Example 

.

select SUM(Total) from SALES;
.

SORT AGGREGATE
TABLE ACCESS FULL SALES

.
The Execution Plan shows that after the SALES table is scanned (via the TABLE ACCESS FULL operation), the records are passed to the SORT AGGREGATE operation. SORT AGGREGATE sums the Total values and returns the output to the user.



JOIN
Sorts the rows in preparation for a merge join.
SORT JOIN sorts a set of records that is to be used in a MERGE JOIN operation.
Example
select C.Name  from COMPANY c, SALES s where C.Company_ID+0 = S.Company_ID+0
and S.Period_ID =3 and S.Sales_Total>1000;

Execution Plan

MERGE JOIN
SORT JOIN
TABLE ACCESS FULL SALES
SORT JOIN
TABLE ACCESS FULL COMPANY

Interpreting the Execution Plan

The Execution Plan shows that the COMPANY table and SALES table will be accessed using TABLE ACCESS FULL operations. Before the records from those tables are passed to the MERGE JOIN operation, they will first be processed by SORT JOIN operations that sort the records. The SORT JOIN output is used as input to the MERGE JOIN operation.