Mar 2, 2009

Oracle Text - Part 5

Generating XML Output

The INDEX_STATS procedure supports both formatted text and XML output.The following code
creates the INDEX_STATS report in text format:

-- Table to store our report
CREATE TABLE index_report (id NUMBER(10), report CLOB);

v_report CLOB := null;
report => v_report,part_name => NULL,frag_stats => NULL,
list_size => 20,report_format => NULL);
INSERT INTO index_report (id, report) VALUES (1, v_report);

Then output can be viewed:

SELECT report FROM index_report WHERE id = 1;

To find out the size of the index, we can use the function CTX_REPORT.INDEX_SIZE. This is a function returning a CLOB.

select ctx_report.index_size('SONG_INDEX') from dual;

for a XML report :
SQL> select ctx_report.index_size('SONG_INDEX', null, 'XML') from dual;

This will give us a heap of XML output, including sizes for each object.

Here's a section of the output:

<SIZE_OBJECT_NAME> text_user.SYS_IL0000051565C00006$$

selecting a CLOB, force it into an XMLTYPE value:
select xmltype(ctx_report.index_size('SONG_INDEX', null, 'XML')) from dual;

select extract(xmltype(ctx_report.index_size('SONG_INDEX', null, 'XML')), '//SIZE_OBJECT') from dual;

We're now using the XML DB "extract" function to fetch all of the XML within tags. The syntax '//SIZE_OBJECT' is an XPATH expression meaning "all the XML within a SIZE_OBJECT element anywhere below the root element".the reader is encouraged to look at the tutorial at w3Schools.

select extract(xmltype(replace(ctx_report.index_size('SONG_INDEX', null, 'XML'),chr(10),'')),'//SIZE_OBJECT') from dual;

Now specify that only information from the table $I is want . this is done with
within a predicate the xpath expression:

select extract(xmltype(replace(ctx_report.index_size('SONG_INDEX', null, 'XML'),chr(10),'')),'//SIZE_OBJECT[SIZE_OBJECT_NAME="text_user.DR$TI$I"]') from dual;

The size in bytes can be get by adding the XPATH expression:

select extract(xmltype(replace(ctx_report.index_size('SONG_INDEX', null, 'XML'),chr(10),'')),'//SIZE_OBJECT[SIZE_OBJECT_NAME="text_user.DR$TI$I"]/SIZE_BYTES_USED') from dual;

To avoid tags. There's two ways of doing this 1) the text() function within the XPATH, 2) use extractValue rather than extract.

select extract(xmltype(replace(ctx_report.index_size('SONG_INDEX', null, 'XML'),chr(10),'')),'//SIZE_OBJECT[SIZE_OBJECT_NAME="text_user.DR$TI$I"]/SIZE_BYTES_USED/text()') as "Table Size" from dual;

select extractValue(xmltype(replace(ctx_report.index_size('SONG_INDEX', null, 'XML'),chr(10),'')), '//SIZE_OBJECT[SIZE_OBJECT_NAME="text_user.DR$TI$I"]/SIZE_BYTES_USED')as "Table Size" from dual;

For fetching ALL the sizes, perhaps to add them together to get a summary XMLSequence can use,which returns a collection of XMLType values. This function can use in a TABLE clause to unnest the collection values into multiple rows. Now that we're generating a "table", we no longer need to reference DUAL.

To get just the sizes:

select * from table(xmlsequence( extract(xmltype(replace(ctx_report.index_size('SONG_INDEX', null, 'XML'),chr(10),'')),'//SIZE_OBJECT/SIZE_BYTES_USED')));

To get all the object info, one object per row:

select * from table(xmlsequence( extract(xmltype(replace(ctx_report.index_size('SONG_INDEX', null, 'XML'),chr(10),'')), '//SIZE_OBJECT')));

Processing information into individual values:

select extractValue(Column_Value, '/SIZE_OBJECT/SIZE_OBJECT_NAME') as "Name",
extractValue(Column_Value, '/SIZE_OBJECT/SIZE_TABLESPACE_NAME') as "Tablespace",
extractValue(Column_Value, '/SIZE_OBJECT/SIZE_BYTES_USED') as "Bytes"
from table(xmlsequence(extract(xmltype(replace(ctx_report.index_size('SONG_INDEX', null, 'XML'),chr(10),'')),'//SIZE_OBJECT')));

To get sum of all the sizes to get an aggragate total size of all objects
used in the index.

select sum(extractValue(Column_Value, '/SIZE_OBJECT/SIZE_BYTES_USED')) "Total"
from table(xmlsequence(extract(xmltype(replace(ctx_report.index_size('ti', null, 'XML'),chr(10),'')),'//SIZE_OBJECT')));


Most of the CTX_REPORT procedures have two forms - a function which returns a CLOB (as done above)and an alternate form which requires to pass in a CLOB to be populated. One procedure - INDEX_STATS - does not have a function version. This is because it needs to do a full table scan of the main index table, which is likely to take some time.Therefore when calling INDEX_STATS it is needed to go about things a little differently. First create a table with an XMLType column. Then call a PL/SQL block which passes a temporary CLOB to CTX_REPORT, and then inserts that CLOB into the XMLType column. Then various XML operations can perform on that table.

create table output(report XMLType);

tlob clob;
ctx_report.index_stats(index_name=>'ti', report=>tlob,
list_size=>20, report_format=>'XML');
insert into output values (xmlType (replace(tlob,chr(10),'')) );

There is a single row in table OUTPUT, which contains XML report. First well get the
estimated row fragmentation:

select extractValue(report,'//STAT_STATISTIC[@NAME="estimated row fragmentation"]')
as "Fragmentation" from output;

Now get the top three most frequent tokens.

using the position function:

select extract(value(d), '//STAT_TOKEN_TEXT')
from output, table(xmlsequence(extract(report,'/CTXREPORT/INDEX_STATS/STAT_TOKEN_STATS/STAT_TOKEN_LIST[@NAME="most frequent tokens"]/
STAT_TOKEN[position()<4]'))) d; Conclusion
The XML output mode of CTX_REPORT allows powerful manipulation of index information. In order to make full use of these,a good understanding of the XML features of the Oracle database, such as extract, extractValue, XMLSequence, and of XPATH syntax in general is needed.With the use of XML exploding within the data processing world, such an understanding is likely to be very useful in the future.

1 comment:

Manoj Vilayil said...

Hi i read this article.Are you working in this one?Where is the DTD file stored?