Join Operations
OUTER | An outer join that returns rows from one of the tables, even if there is no matching row in the other table. This is achieved in Oracle using the "(+)" operator in the WHERE clause. |
OUTER JOINS are used with CONNECT BY, MERGE JOIN, NESTED LOOPS and HASH JOIN operations. OUTER JOIN enables rows from the driving table to be returned to the calling query even though no matching rows were found in the joined table. The following example is based on the same query illustrated in the NESTED LOOPS topic, using an OUTER JOIN, instead.
Example
select c.Name from COMPANY c, SALES s where c.Company_ID = s.Company_ID (+) and s.Period_ID = 3 and s.Sales_Total >1000;
Execution Plan
NESTED LOOPS OUTER
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK
Interpreting the Execution Plan
The Execution Plan shows that the SALES table is used as the driving table for the query. For each COMPANY_ID value in SALES, the COMPANY_ID index on the COMPANY table will be checked to see if a matching value exists. Even if a match does not exist, that record is returned to the user via the NESTED LOOPS OUTER join operation.
Note the difference in Explain Plan of below 2 queries .
select D.NN_EMPCODE,VC_EMPNAME,D.NN_MGRCODE from hr_emp_mast d ,hr_emp_personal dd where D.NN_EMPCODE=DD.NN_EMPCODE(+) and D.NN_EMPCODE=269
select D.NN_EMPCODE,VC_EMPNAME,D.NN_MGRCODE from hr_emp_mast d ,hr_emp_personal dd where D.NN_EMPCODE=DD.NN_EMPCODE(+) and dD.NN_EMPCODE=269
ANTI | An anti-join that returns all rows in one table that do not have a matching row in the other table. It is typically implemented using a NOT IN sub-query. |
An ANTI-JOIN is a query that returns rows in one table that do not match some set of rows from another table. Since this is effectively the opposite of normal join behavior, the term ANTI-JOIN has been used to describe this operation. ANTI-JOINs are usually expressed using a sub-query, although there are alternative formulations.
One method of using an ANTI-JOIN query is to combine the IN operator with the NOT operator. This method works well when using the cost-based optimizer.
The rule-based optimizer method of using an ANTI-JOIN query is to use it with the NOT EXISTS operator in place of NOT IN. This method uses the WHERE clause in the sub-query.
Note: When Oracle is using rule-based optimization, avoid using NOT IN to perform an anti-join. Use NOT EXISTS instead.
You can also implement the ANTI-JOIN operation as an OUTER JOIN. An OUTER JOIN includes NULL for rows in the inner table, which have no match in the outer table. This feature can be used to include only rows that have no match in the inner table. However, the most efficient implementation is to use HASH JOIN and its hints.
To take advantage of Oracle’s ANTI-JOIN optimizations, the following must be true:
· Cost-based optimization must be enabled.
· The ANTI-JOIN columns used must not be NULL. This either means that they are not NULL in the table definition, or an IS NOT NULL clause appears in the query for all the relevant columns.
· The subquery is not correlated.
· The parent query does not contain an Or clause.
· The database parameter ALWAYS_ANTI_JOIN is set to either MERGE or HASH or a MERGE_AJ or HASH_AJ hint appears within the sub-query.
SEMI | A semi-join that returns rows from a table which have matching rows in a second table but which does not return multiple rows if there are multiple matches. This is usually expressed using a WHERE EXISTS sub-query. |
A SEMI JOIN is a join which returns rows from a table which have matching rows in a second table but which does not return multiple rows if there are multiple matches. This is usually expressed in Oracle using a WHERE EXISTS sub-query.
CARTESIAN | Every row in one result set is joined to every row in the other result set. |
A join with no join condition results in a CARTESIAN product, or a cross product. A CARTESIAN product is the set of all possible combinations of rows drawn from each table. In other words, for a join of two tables, each row in one table is matched with every row in the other. A CARTESIAN product for more than two tables is the result of pairing each row of one table with every row of the Cartesian product of the remaining tables.
All other kinds of joins are subsets of CARTESIAN products effectively created by deriving the CARTESIAN product and then excluding rows that fail the join condition.
Note: When using the ORDERED hint, it is important that the tables in the FROM clause are listed in the correct order to prevent CARTESIAN joins.
Hint: Consider using Oracle’s STAR query optimization when joining a very large "fact" table to smaller, unrelated "dimension" tables. You will need a concatenated index on the fact table and may need to specify the STAR hint.
OUTER, ANTI, SEMI, CARTESIAN WILL COME WITH CONNECT BY, MERGE JOIN, NESTED LOOP, HASH JOIN.
CONNECT BY | A hierarchical self-join is performed on the output of the preceding steps. |
CONNECT BY does a recursive join of a table to itself, in a hierarchical fashion.
Example
select Company_ID, Name from COMPANY where State = ‘VA’ connect by Parent_Company_ID = prior Company_ID start with Company_ID = 1;
The query shown in the preceding statement selects companies from the COMPANY in a hierarchical fashion; that is, it returns the rows based on each Company’s parent company. If there are multiple levels of company parentage, those levels display in the report.
Execution Plan
FILTER
CONNECT BY
INDEX UNIQUE SCAN COMPANY_PK
TABLE ACCESS BY ROWID COMPANY
TABLE ACCESS BY ROWID COMPANY
INDEX RANGE SCAN COMPANY$PARENT
CONNECT BY
INDEX UNIQUE SCAN COMPANY_PK
TABLE ACCESS BY ROWID COMPANY
TABLE ACCESS BY ROWID COMPANY
INDEX RANGE SCAN COMPANY$PARENT
Interpreting the Execution Plan
The plan shows that first the COMPANY_PK index is used to find the root node (Company_ID = 1), then index on the Parent_Company_ID column is used to provide values for queries against the Company_ID column in an iterative fashion. After the hierarchy of Company_IDs is complete, the FILTER operation—the WHERE clause related to the STATE value—is applied. Notice that the query does not use the index on the STATE column, although it is available and the column is used in the WHERE clause.
MERGE
MERGE JOIN | A MERGE JOIN performed on the output of the preceding steps |
MERGE JOIN joins tables by merging sorted lists of records from each table. It is effective for large batch operations, but may be ineffective for joins used by transaction-processing applications. MERGE JOIN is used whenever Oracle cannot use an index while conducting a join. In the following example, all of the tables are fully indexed. So the example deliberately disables the indexes by adding 0 to the numeric keys during the join to force a merge join to occur.
Example
select COMPANY.Name from COMPANY, SALES where COMPANY.Company_ID+0 = SALES.Company_ID+0 and SALES.Period_ID =3 and SALES.Sales_Total>1000;
Execution Plan
MERGE JOIN
SORT JOIN
TABLE ACCESS FULL SALES
SORT JOIN
TABLE ACCESS FULL COMPANY
SORT JOIN
TABLE ACCESS FULL SALES
SORT JOIN
TABLE ACCESS FULL COMPANY
Interpreting the Execution Plan
There are two potential indexes that could be used by a query joining the COMPANY table to the SALES table. First, there is an index on COMPANY.COMPANY_ID - but that index cannot be used because of the +0 value added to it (disabling indexes is described in detail in the Top SQL Tuning Tips topic). Second, there is an index whose first column is SALES.COMPANY_ID - but that index cannot be used, for the same reason.
As shown in the plan, Oracle will perform a full table scan (TABLE ACCESS FULL) on each table, sort the results (using the SORT JOIN operation), and merge the result sets. The use of merge joins indicates that indexes are either unavailable or disabled by the query’s syntax.
NESTED LOOPS | A nested loops join is performed on the preceding steps. For each row in the upper result set, the lower result set is scanned to find a matching row. |
NESTED LOOPS joins table access operations when at least one of the joined columns is indexed.
Example
select COMPANY.Name from COMPANY, SALES where COMPANY.Company_ID = SALES.Company_ID and SALES.Period_ID =3 and SALES.Sales_Total>1000;
Execution Plan
NESTED LOOPS
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK
Interpreting the Execution Plan
The Execution Plan shows that the SALES table is used as the driving table for the query. During NESTED LOOPS joins, one table is always used to drive the query. The Implications of the Driving Table in a NESTED LOOPS Join topic provides tuning guidance on the selection of a driving table for a NESTED LOOPS operation.
For each COMPANY_ID value in the SALES table, the COMPANY_ID index on the COMPANY table will be checked to see if a matching value exists. If a match exists, the record is returned to the user via the NESTED LOOPS operation.
There are several important things to note about this query:
· Although all of the primary key columns in the SALES table were specified in the query, the SALES_PK index was not used. The SALES_PK index was not used because there was not a limiting condition on the leading column (the COMPANY_ID column) of the SALES_PK index. The only condition on SALES.COMPANY_ID is a join condition.
· The optimizer could have selected either table as the driving table. When the COMPANY table is the driving table, Oracle performs a full table scan.
· In rule-based optimization, when there is equal chance of using an index regardless of the choice of the driving table, the driving table will be the one that is listed last in the FROM clause.
· In cost-based optimization, the optimizer will consider the size of the tables and the selectivity of the indexes while selecting a driving table.
Interpreting the Order of Operations within NESTED LOOPS
NESTED LOOPS operations pose a special challenge when reading the output from PLAN_TABLE. Given the Explain path shown in the following listing, it appears that the first step in the Explain path is the scan of the COMPANY_PK index, since that is the innermost step of the Explain path.
NESTED LOOPS
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK
Despite its placement as the innermost step, the scan of the COMPANY_PK index is not the first step in the Explain path. A NESTED LOOPS join needs to be driven by a row source (such as a full table scan or an index scan) - so to determine the first step within a NESTED LOOPS join, you need to determine which operations directly provide data to the NESTED LOOPS operation. In this example, two operations provide data directly to the NESTED LOOPS operation - the full table scan of SALES, and the ROWID access of the COMPANY table.
Of the two operations that provide data to the NESTED LOOPS operation, the full table scan of SALES is listed first. Therefore, within the NESTED LOOPS operation, the order of operations is:
1. The full table scan of SALES.
2. For each record in SALES, access COMPANY by Company_ID. Since an index (COMPANY_PK) is available on COMPANY.Company_ID, use that index via a unique scan.
3. For each ROWID returned from the COMPANY_PK index, access the COMPANY table (to get the NAME value, as requested by the query).
When reading the Explain path for a NESTED LOOPS operation, you need to look first at the order of the operations that directly provide data to it, and determine their order.
.
HASH JOIN | A HASH JOIN is performed of two row sources. |
HASH JOIN is one of the algorithms that Oracle can use to join two tables.
In a HASH JOIN a hash table, an on-the-fly index, is constructed for the larger of the two tables. The smaller table is then scanned, and the hash table used to find matching rows in the larger table.
HASH JOIN joins tables by creating an in-memory bitmap of one of the tables and then using a hashing function to locate the join rows in the second table.
In the following query, the COMPANY and SALES are joined based on their common COMPANY_ID column.
Example
select COMPANY.Name from COMPANY, SALES where COMPANY.Company_ID = SALES.Company_ID and SALES.Period_ID =3 and SALES.Sales_Total>1000;
Execution Plan
HASH JOIN
TABLE ACCESS FULL SALES
TABLE ACCESS FULL COMPANY
TABLE ACCESS FULL SALES
TABLE ACCESS FULL COMPANY
Interpreting the Execution Plan
The Execution Plan shows that the SALES table is used as the first table in the hash join. SALES table will be read into memory. Oracle will use a hashing function to compare the values in COMPANY table to the records that have been read into memory.
When one of the tables is significantly smaller than the other in the join, and the smaller table fits into the available memory area, then the optimizer will generally use a hash join instead of a traditional NESTED LOOPS join. Even if an index is available for the join, a hash join may be preferable to a NESTED LOOPS join.