Technical Blog

Feb 12, 2013

Big Data

Big Data is all about finding a needle of value in a haystack of unstructured information.Big data refers to large datasets that are challenging to store, search, share, visualize, and analyze.

BigTable is a compressed, high performance, and proprietary data storage system built on Google File System .Big Table is Googles Database used for Google Reader,Google Maps,Google Book Search,My Search History,Google Earth, Blogger.com,Google Code hosting, Orkut,YouTube and Gmail.

BigTable maps two arbitrary string values (row key and column key) and timestamp (hence three dimensional mapping) into an associated arbitrary byte array. It is not a relational database

Relational databases (such as Oracle, MySQL, and SQL Server) versus newer non-relational databases (such as MongoDB, CouchDB, BigTable, and others).

Big Data is not just about volume, the approach to analysis contends with data content and structure that cannot be anticipated or predicted.

Feb 6, 2013

Connection Pool

A connection pool is a cache of database connections maintained so that the connections can be reused when future requests to the database are required. Connection pools are used to enhance the performance of executing commands on a database .

connection pooling is generally the practice of a middle tier (application server)

getting N connections to a database (say 20 connections).

These connections are stored in a pool in the middle tier, an "array" .Each connection is set to "not in use"
When a user submits a web page to the application server, it runs a piece of your code,
your code says "i need to get to the database", instead of connecting right there and
then (that takes time), it just goes to this pool and says "give me a connection please".
the connect pool software marks the connection as "in use" and gives it to you.

Jan 17, 2013

Modify Datatype

Column having number(5) and Char(5) datatype for a particular table cant reduce its size even if that column contains data with length 1 .

But can reduce the size of varchar/varchar2 even if the table is not empty .

Column having number(5) ,varchar2(5) and Char(5) datatype for a particular table can increase size.

Oct 8, 2012

Oracle GLOBALIZATION SUPPORT 2

Oracle GLOBALIZATION SUPPORT 1

N is used as escape character for national character set

In 9.2, Oracle has to convert your entire SQL statement to the database character set before executing it. In 10.2, you can set ORA_NCHAR_LITERAL_REPLACE to TRUE which would avoid converting N' escaped literals to the database character set,

If you are using 10g and you care about Chinese you should seriously consider using AL32UTF8.

AL32UTF8 contains a large number of additional Chinese characters. The most important ones are a slew of Hong Kong specific characters that are used most frequently in names. All HK systems for HK government are required to support these characters.

But if you do this, be aware that Dev 6i does not support AL32UTF8.

The chinese is a MULTIBYTE character and only UTF8 can handle this.

When creating the database using the UTF8 character set, the Chinese character can be stored in and extracted from the varchar2 data type column. (NCHAR is AL16UTF16).

SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER LIKE '%CHARACTERSET';

PARAMETER VALUE

------------------------------ ----------------------------------------

NLS_CHARACTERSET AL32UTF8

NLS_NCHAR_CHARACTERSET AL16UTF16

UTF8, ALT32UTF8 would be capable of storing Japanese symbols, WE8ISO8859P1 wouldn't

Any new system being built for anything more important than a school project should beUTF-8.

Use lengthb in these cases.

The 'N' Variant

So, of what use are the NVARCHAR2 and NCHAR (for completeness)? They are used in systems where the need to manage and store multiple character sets arises. This typically happens in a database where the predominant character set is a single-byte fixed-width one (such as WE8ISO8859P1), but the need arises to maintain and store some multibyte data. There are many systems that have legacy data but need to support multibyte data for some new applications, or systems that want the efficiency of a single-byte character set for most operations (string operations on a fixed-width string are more efficient than on a string where each character may store a different number of bytes), but need the flexibility of multibyte data at some points.

The NVARCHAR2 and NCHAR datatypes support this need. They are generally the same as their VARCHAR2 and CHAR counterparts, with the following exceptions:

* Their text is stored and managed in the databases national character set, not the default character set.

* Their lengths are always provided in characters, whereas a CHAR/VARCHAR2 may specify either bytes or characters.

In Oracle9i and above, the database¿s national character set may take one of two values: UTF8 or AL16UTF16 (UTF-16 in 9i; AL16UTF16 in 10g). This makes the NCHAR and NVARCHAR types suitable for storing only multibyte data, which is a change from earlier releases of the database (Oracle8i and earlier allowed you to choose any character set for the national character set).

select dump('xo'),ascii('x'),ascii('o') from dual;

select unistr('\8349') from dual;

NLS_LENGTH_SEMANTICS = BYTES

All the tables/plsql package are defined with the default byte semantics for e.g. Customer_Name VARCHAR2(80). Now the issue is because of the new incoming chinese characters it throws error "too large value" errors (though they are less then 80 charactes in case of Customer_Name

To resolve this change the COLUMN definition from VARCHAR2(80) to VARCHAR2(80 CHAR)

Some more Details -----------------

Sep 28, 2012

Oracle Histogram

Histograms are used to predict cardinality and the number of rows returned to a query .The Oracle Query Optimizer uses histograms to predict better query plans. The ANALYZE command or DBMS_STATS package can be used to compute these histograms.A histogram is a frequency distribution meta-data that describes the distribution of data values within a column of a table.A histogram is a collection of information about the distribution of values within a column.

In some cases, the distribution of values within a column of a table will affect the optimizers decision to use an index vs. perform a full-table scan. This scenario occurs when the value with a where clause has a disproportional amount of values, making a full-table scan cheaper than index access.

Histograms are also important for determine the optimal table join order.

Technical Blog

Pages