Feb 27, 2009

Oracle Text - Part 2

Oracle Text offers the CTX_CLS.CLUSTERING package for building clusters.

Optimizer Hints

We can also "hint" the database optimizer to improve query performance if we
know ahead of time what plan is best:
SELECT /*+ index product_information description_idx */
score(1), product_id FROM product_information
WHERE CONTAINS (product_description, 'monitor NEAR "high resolution"', 1) > 0
AND list_price < 500;

Parallel Indexing

Parallel indexing can take advantage of hardware when you have multiple CPUs.
Parallel index creation is useful for
• Performance improvement
• Data Staging
• Rapid initial deployment of applications based on large data collections
• Application testing, when users need to test different index parameters
and schemas while developing an application
The following example creates a text index with degree 3:

CREATE INDEX myindex ON docs(tk) INDEXTYPE IS ctxsys.context PARALLEL 3;


SDATA Sections and Composite Domain Indexes

SDATA New in Oracle Text 11g They are designed to improve the performance of “mixed queries” – queries which have
a text search part and a structured part.

For example, the query:

SELECT item_id FROM items WHERE
CONTAINS (description, 'madonna') > 0
AND itmtype = 'BOOK' AND price < 10
ORDER BY price DESC

These queries are never likely to be as fast
as a simple text-only query.

Composite Domain Indexes
Composite Domain Indexes use the same underlying technology as SDATA
sections, but in an easier-to-use and more standard fashion.
A ‘domain index’ is a type of index for use with a particular type of data (in our case, textual data). A composite index in normal Oracle terms is an index that covers more than one column. So a Composite Domain Index (CDI, for short) is a extension of the usual domain index to cover multiple columns.

Original query again:

SELECT item_id FROM items WHERE
CONTAINS (description, 'madonna') > 0
AND itmtype = 'BOOK' AND price < 10
ORDER BY price DESC

To create appropriate indexes for this query in previous versions we may have run
the following SQL commands:
CREATE INDEX typeind ON items (itmtype)
CREATE INDEX priceind ON items (price)
CREATE INDEX descind ON items (description) INDEXTYPE IS
ctxsys.context

In Oracle 11g Release 1 it can do with a single call:

CREATE INDEX compind ON items (description)
INDEXTYPE IS ctxsys.context
FILTER BY itmtype, price SORT BY price

In Oracle Database 10g Release 1 MDATA (for MetaDATA)sections .
These were designed for short character fields which would be indexed
“as a whole” inside the text index. This would allow us to rewrite the query above
as something like:

SELECT item_id FROM items WHERE
CONTAINS (description, 'madonna and MDATA(itmtype, BOOK') > 0
AND price < 10 ORDER BY price DESC

It will remove many unnecessary docid to rowid resolutions, and the base table access to
evaluate “itmtype=’BOOK’” predicate, since we can get itmtype=BOOK from the text
index. However, it doesn’t solve the problem completely:
• We can only do equality searches, we can’t do “price < 10” with MDATA
• We can’t use it for sorting.

(Structured DATA) sections. These sections are embedded in the text of a document – like field or zone sections – but unlike previous sections they may contain character, numeric or date information and may be searched using operators such as “greater than”, “less than” and “between” as well as equality searches. Here’s an example of a query which makes use of SDATA query operators:

SELECT item_id FROM items WHERE
CONTAINS (description, 'racing and SDATA(itemtype=''BOOK'') and SDATA(price<10)') > 0
ORDER BY price DESC


XML Support
XML features include the operator WITHIN, nested section search, search within
attribute values, mapping multiple tags to the same name, path searching using
INPATH and HASPATH operators.

XML example to demonstrate Oracle Texts features.(Replace + with following operators ie: < or >

+?xml version="1.0"?+
+FAQ OWNER="Billy Text"+
+TITLE+"Oracle Text FAQ"+/TITLE+
+DESCRIPTION+
Everything you always wanted to know about Text"+/DESCRIPTION+"
+QUESTION+"What is Oracle Text?
+/QUESTION++ANSWER+"
Oracle Text uses standard SQL to index search and analyze text and
documents stored in the database files or websites.
"+/ANSWER++/FAQ+"

SELECT title description FROM FAQTable
WHERE CONTAINS(text'Oracle WITHIN QUESTION')> 0;

SELECT title description FROM FAQTable
WHERE CONTAINS(text'Billy WITHIN FAQ0OWNER')> 0;

SELECT title description FROM FAQTable
WHERE CONTAINS(text'Oracle INPATH(FAQ/TITLE)')> 0;

SELECT title description FROM FAQTable
WHERE CONTAINS(text'HASPATH(FAQ/TITLE/DESCRIPTION)')> 0;

• existsNode() : given an XPath expression, checks if the XPath applied
over the document can return any valid nodes.
• extract() : given an XPath expression, applies the XPath to the
document and returns the fragment as a XMLType.

select f.faq.extract('/FAQ/QUESTION/text()').getStringVal()
from faq f where contains(faq, 'standard or SQL INPATH(FAQ/ANSWER)')>0

No comments: