Feb 5, 2015

Integration of Hadoop with Oracle


Oracle Database provides the flexibility to leverage programming language functionality within the database without having to write complex SQL statements by using user defined functions known as Table Functions.

The Map Reduce programming model can be implemented within the Oracle Database using Parallel Pipelined Table Functions and parallel operations. It is possible to write parallel processing tasks as database queries using user defined Table Functions to aggregate or filter the data. Pipelined Table Functions were introduced in Oracle 9i as a way of embedding procedural logic within a data flow. At a logical level, a Table Function is a user defined function that appears in the FROM clause of a SQL statement and operates like a table returning a stream of rows.

This mechanism provides an alternative for SQL developers to perform very complex processing in a procedural way, not easily expressed with SQL. It also follows the Map Reduce paradigm, enabling massively parallel processing within the realm of the database.

Feeding Hadoop Data to the Database for Further Analysis

External tables present data stored in a file system in a table format, and can be used in SQL queries transparently. Hadoop data stored in HDFS can be accessed from inside the Oracle Database by using External Tables through the use of FUSE (File system in User Space) project driver to provide the application programming interface between HDFS and the External Table infrastructure. Using the External Table makes it easier for non-programmers to work with Hadoop data from inside an Oracle Database.

Leveraging Hadoop Processing From the Database

In the event that you need to process some data from Hadoop before it can be correlated with the data from your database, you can control the execution of the Map Reduce programs through a table function using the DBMS_SCHEDULER framework to asynchronously launch an external shell script that submit a Hadoop Map Reduce job. The table function and the Map Reduce program communicate using Oracle’s Advanced Queuing feature.

By leveraging Hadoop processing from the Database you can take advantage of the power of Oracle RDBMS at the same time to simplify the analysis of your data stored in a Hadoop Cluster by streaming data directly from Hadoop with Oracle Queue and Table Function

Nov 23, 2014

Improving Performance

Improving Performance of Dynamic SQL with Bind Variables

When you code INSERT, UPDATE, DELETE, and SELECT statements directly in PL/SQL,
PL/SQL turns the variables into bind variables automatically, to make the statements work efficiently with SQL. When you build up such statements in dynamic SQL, you need to specify the bind variables yourself to get the same performance.

 In the example below, Oracle opens a different cursor for each distinct value of emp_id. This can lead to resource contention and poor performance as each statement is parsed and cached.  

CREATE PROCEDURE del_employee (emp_id NUMBER) AS 
BEGIN 
EXECUTE IMMEDIATE 'DELETE FROM emp WHERE empno = ' || TO_CHAR(emp_id); 
END;

You can improve performance by using a bind variable, which allows Oracle to reuse the same cursor for different values of emp_id  

CREATE PROCEDURE delt_employee (emp_id NUMBER) AS 
BEGIN 
EXECUTE IMMEDIATE 'DELETE FROM emp WHERE empno = :num' USING emp_id; 
END; 

---- 

Oct 29, 2014

Some Good SQL

Query to convert comma separated values into rows 

 with test as (select '1,2,3' col1 from dual) select regexp_substr(col1, '[^,]+', 1, rownum) result1 from test connect by level <= length(regexp_replace(col1, '[^,]+')) + 1 ;

WITH test AS (SELECT 'a,b,c,d,e' temp, 1 slno FROM DUAL UNION SELECT 'f,g' temp, 2 slno FROM DUAL UNION SELECT 'h' temp, 3 slno FROM DUAL) SELECT TRIM(REGEXP_SUBSTR(temp, '[^,]+', 1, level)), slno FROM test CONNECT BY level <= REGEXP_COUNT(temp, '[^,]+') AND PRIOR slno = slno AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL ;
select REGEXP_SUBSTR('A,B,C,D,E','[^,]',1,level) from dual connect by level <= (select regexp_count('A,B,C,D,E',',') from dual);

SELECT EXTRACTVALUE(xt.column_value,'list') AS listitem FROM TABLE(XMLSEQUENCE(EXTRACT(XMLTYPE('' || REPLACE('2,1',',','') || '') ,'/lists/list'))) xt ;


SELECT EXTRACT(VALUE(d), '//row/text()').getstringval() AS DATA FROM (SELECT XMLTYPE('' || REPLACE('1,2', ',', '') || '') AS xmlval FROM DUAL) x, TABLE(XMLSEQUENCE(EXTRACT(x.xmlval, '/rows/row'))) d;

Query to split number and character into two different columns 

with src as (select 'CGJ0000617' col_1 from dual
union all select 'CG0,001442' from dual
union all select 'CGJ0001444' from dual
union all select 'CMOV000GH0200' from dual
union all select 'CXAR00000001' from dual
union all select 'CXAR00000002' from dual)
 select regexp_substr(col_1,'^[A-Za-z]+') part_1 ,regexp_substr(col_1,'[0-9]+$') part_2 from src ;

Second highest salary SQL

select max(sal) from employees where sal not in (select max(sal) from employees) ;
select max(sal) from employees where sal < (select max(sal) from employees ) ;
select sal from (select sal,rownum rn from (select sal from employees order by sal desc)) where rn=2 ;
select sal from (select sal,row_number() over(order by sal desc) rn from employees) where rn=2 ;

Pivot , Unpivot Queries


In the FOR IN clause of PIVOT keyword, we cannot use column names , can only use values 

ORA-56901 : non -constant expression is not allowed in pivot|unpivot values 

 In PIVOT keyword, it is required to provide an aggregate function like COUNT, SUM, AVG,MAX,MIN etc 

ORA-56902 : expect aggregate functions inside pivot operation  

In PIVOT keyword, instead of aggregate function window keyword cannot be used 

ORA-30483 : window functions are not allowed here

Query to Pivot Rows to columns even if aggregate function is not be used (use Max/min with row_number)

SELECT * FROM (
     SELECT column_name, row_number() over(partition by table_name order by table_name) rr FROM user_tab_columns WHERE table_name = 'EMPLOYEES' ) pivot (MAX(column_name) FOR  rr IN (1 C_FIRST, 2 AS C_SECOND, 3 C_THIRD, 4 C_FOURTH, 5 C_FIVE, 6 C_SIX, 7 C_SEV, 8 C_EIGHT, 9 C_NINE, 100 C_HUND));


In HR Schema 

SELECT * FROM ( SELECT column_name,TABLE_NAME,column_id FROM user_tab_columns WHERE table_name ='EMPLOYEES' ) unpivot (column_value FOR column_name IN (table_name,column_name,column_id));

 ORA-01790 expression must have same datatype as corresponding expression 

In unpivot function FOR IN clause data-type with same columns are only allowed . In the above query table_name,column_name are varchar2 but column_id is number .So this results in the above error


SELECT * FROM ( SELECT column_name,TABLE_NAME,column_id FROM user_tab_columns WHERE table_name = 'EMPLOYEES' ) unpivot (column_value FOR column_name IN (table_name,column_name));

SELECT * FROM ( SELECT column_name,table_name,column_id FROM user_tab_columns WHERE table_name = 'EMPLOYEES' ) unpivot (column_value FOR column_name IN (column_name));


SELECT * FROM ( SELECT column_name,TABLE_NAME FROM user_tab_columns WHERE table_name = 'EMPLOYEES' ) unpivot (column_value FOR column_name IN (table_name,column_name));

Try this query for getting more detail about pivot function

SELECT * FROM ( SELECT column_name, row_number() over(partition by table_name order by table_name) rr FROM user_tab_columns WHERE table_name = 'EMPLOYEES' ) pivot (MAX(rr) FOR  rr IN (1 C_FIRST, 2 AS C_SECOND, 3 C_THIRD, 4 C_FOURTH, 5 C_FIVE, 6 C_SIX, 7 C_SEV, 8 C_EIGHT, 9 C_NINE, 100 C_HUND));
    
 SELECT * FROM ( SELECT column_name, row_number() over(partition by table_name order by table_name) rr FROM user_tab_columns WHERE table_name = 'EMPLOYEES' ) pivot (MAX(column_name) FOR  column_name IN (1 C_FIRST, 2 AS C_SECOND, 3 C_THIRD, 4 C_FOURTH, 5 C_FIVE, 6 C_SIX, 7 C_SEV, 8 C_EIGHT, 9 C_NINE, 100 C_HUND));

If there are duplicate records in a query then usage of pivot will removes the duplicate records .For example refer below query


SELECT * FROM (     SELECT column_name, row_number() over(partition by table_name order by table_name) rr FROM user_tab_columns WHERE table_name = 'EMPLOYEES'
     union all
      SELECT column_name, row_number() over(partition by table_name order by table_name) rr FROM user_tab_columns WHERE table_name = 'EMPLOYEES'   ) pivot (MAX(column_name) FOR  rr IN (1 C_FIRST, 2 AS C_SECOND, 3 C_THIRD, 4 C_FOURTH, 5 C_FIVE, 6 C_SIX, 7 C_SEV, 8 C_EIGHT, 9 C_NINE, 100 C_HUND));



 For more examples
 

Oct 7, 2014

SQL Injection Part 2



---------------------------------------------------------------------------------------------------  
Inband: Retrieve the results of the SQL Injection in the same input (e.g. in the browser). Data can be display in the normal output or in an error message.
Most common techniques for Inband  are
* UNION based attacks
* Error Based

Now we must find out how many columns are used in the first SELECT statement. The most common techniques are the usage of  ORDER BY  or adding NULL values to the second query.

 SELECT * FROM table 
UNION 
SELECT null,null FROM table  

SELECT * FROM table ORDER BY 8 

Out-of-Band: A different channel (e.g. HTTP, DNS) is used to transfer the data from the SQL query. If this is working it is the easiest way to retrieve a large amount of data from the database

Blind: Different timings / results are used to retrieve data from the database.
Oracle offers 2 possibilities to run blind injection. • DECODE • CASE

--------