You are on page 1of 7

6/19/2014 unloading data using external tables in 10g

http://www.oracle-developer.net/display.php?id=325 1/7
unloading data using external tables in 10g
This is an introduction to a new feature of external tables in 10g, namely the ability to unload data from the database using
SQL. Introduced in 9i, external tables provided a new means of reading flat file data without first loading it into a staging
table. They offered additional benefits over traditional SQL*Loader methods in that loading data became as simple as an
INSERT..SELECT, with all the SQL processing and manipulation options that such a statement implies. It also meant that
more of the ETL application could be self-contained in PL/SQL packages, as call-outs to SQL*Loader or batch scripts were
no longer necessary to load data.
Oracle 10g has taken external tables a stage further by enabling an external table to be created as a CTAS (Create Table As
Select) operation, which enables a one-time unloading of data. When I first saw the headlines about "writing" to an external
table, I must admit to being excited by the possibilities of a native data unloader (no more sqlplus spools, UTL_FILE writes,
Pro*C unloaders etc). Alas, this is not the case. Oracle has in fact supplied a new access driver known as
"ORACLE_DATAPUMP" which means that Oracle will "unload" an external table to a proprietary format usable by the Data
Pump set of utilities (significant new 10g feature to replace imp and exp). So, generating an ASCII, CSV file from external
tables is not possible, but this new feature is exciting nonetheless.
a simple demonstration
We'll start with a simple demonstration of how to dump the data from the ALL_OBJECTS view to a file. I am assuming that
readers are familiar with external table concepts and DIRECTORY objects.
SQL> CREATE DIRECTORY xt_dir AS 'c:\temp\ora';
Directory created.
Now we have a directory to dump the data to, we can create an "unloading" external table.
SQL> CREATE TABLE all_objects_xt
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY xt_dir
6 LOCATION ( 'all_objects_xt.dmp' )
7 )
8 AS
9 SELECT *
10 FROM all_objects;
Table created.
We can verify that the file was created with a simple file lookup.
SQL> host dir c:\temp\ora\ALL*
Volume in drive C is DRIVE-C
Volume Serial Number is 6845-F732
Directory of c:\temp\ora
11/05/2005 20:16 4,648,960 ALL_OBJECTS_XT.DMP
11/05/2005 20:16 312 ALL_OBJECTS_XT_2024_3452.log
2 File(s) 4,649,272 bytes
0 Dir(s) 4,308,662,272 bytes free
It is probably worth mentioning at this point that these examples were created on a Windows XP PC, so the file names are
case-insensitive. This is apparent from the above output, where the file name quoted in the LOCATION access parameter is
lower-case and the subsequent directory listing shows an upper-case file name. On UNIX, more care will have to be taken
6/19/2014 unloading data using external tables in 10g
http://www.oracle-developer.net/display.php?id=325 2/7
over the case used to name and reference the file.
Also note the log file. This can be avoided using the NOLOGFILE keyword, but contains information such as the time the file
was opened and any error messages that occur during the creation or read-back of the external table.
read-back
Note I mentioned reading in the previous paragraph. An interesting fact about these tables is that they can continue to be
used to read the dumped data back in. As the table has been created, it therefore makes sense that it persists as a
database object and that as long as an underlying file exists (of the same format), it can continue to be used, as follows.
SQL> desc all_objects_xt
Name Null? Type
----------------------------------------- -------- ------------------
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(30)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(19)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
SQL> SELECT COUNT(*) FROM all_objects_xt;
COUNT(*)
----------
47374
As long as the underlying file is in the Data Pump format and matches the structure of the original file, the external table can
continue to be used to read the data (the underlying file(s) do not necessarily have to be the actual files created during the
original external table creation).
For the following example, I dumped ALL_OBJECTS on my Solaris database (version 10.1.0.3) to a file named ALL.DMP and
transferred it to my Windows XP server (Oracle 10.1.0.2) using DBMS_FILE_TRANSFER. If we rename the
ALL_OBJECTS_XT.DMP file to ALL.DMP and issue the following, the SELECT still works.
SQL> ALTER TABLE all_objects_xt LOCATION ( 'ALL.DMP' );
Table altered.
SQL> SELECT COUNT(*) FROM all_objects_xt;
COUNT(*)
----------
47374
file protection
Oracle will protect the dump files we create by stopping any further attempts to write to them. For example, if we tried to re-
create the ALL_OBJECTS_XT table, but forgot to remove the original file (remember this is now named 'ALL.DMP'), we will
6/19/2014 unloading data using external tables in 10g
http://www.oracle-developer.net/display.php?id=325 3/7
get an error.
SQL> DROP TABLE all_objects_xt;
Table dropped.
SQL> CREATE TABLE all_objects_xt
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY xt_dir
6 LOCATION ( 'ALL.DMP' )
7 )
8 AS
9 SELECT *
10 FROM all_objects;
CREATE TABLE all_objects_xt
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-11012: file ALL.DMP in c:\temp\ora already exists
ORA-06512: at "SYS.ORACLE_DATAPUMP", line 19
ORA-06512: at line 1
This means, however, that any attempts by the more UNIX-indoctrinated amongst us to compress the output file through a
named pipe will fail, because the "FIFO" will already exist before the CREATE TABLE operation.
Similarly, note that I previously mentioned the writes being a one-time only operation. This means that data is only unloaded
at the time of table creation and that statements such as the following will fail.
SQL> INSERT INTO all_objects_xt
2 SELECT *
3 FROM all_objects;
INSERT INTO all_objects_xt
*
ERROR at line 1:
ORA-30657: operation not supported on external organized table
parallelism for performance
One of the key performance benefits of the original external table implementation is that it makes parallel processing much
simpler than the SQL*Loader alternative. The same is also true of the new unloading external tables. Data can be written to
multiple files in parallel to add a significant performance boost to unloading processes.
SQL> CREATE TABLE all_objects_xt
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY xt_dir
6 LOCATION ( 'allobj1.dmp', 'allobj2.dmp', 'allobj3.dmp', 'allobj4.dmp' )
7 )
8 PARALLEL 4
9 AS
10 SELECT *
11 FROM all_objects;
Table created.
6/19/2014 unloading data using external tables in 10g
http://www.oracle-developer.net/display.php?id=325 4/7
SQL> host dir c:\temp\ora\allobj*
Volume in drive C is DRIVE-C
Volume Serial Number is 6845-F732
Directory of c:\temp\ora
11/05/2005 20:22 1,171,456 ALLOBJ1.DMP
11/05/2005 20:22 1,175,552 ALLOBJ2.DMP
11/05/2005 20:22 1,175,552 ALLOBJ3.DMP
11/05/2005 20:22 1,171,456 ALLOBJ4.DMP
4 File(s) 4,694,016 bytes
0 Dir(s) 4,302,034,944 bytes free
Oracle has now created four similarly-sized files and when we read the data back in, Oracle will read from each of the files to
give us the entire picture.
SQL> SELECT COUNT(*) FROM all_objects_xt;
COUNT(*)
----------
47374
Interestingly, each of these files is self-contained, which means that it is possible to read any one or more of them, as the
following demonstrates.
SQL> ALTER TABLE all_objects_xt LOCATION ( 'ALLOBJ1.DMP' );
Table altered.
SQL> SELECT COUNT(*) FROM all_objects_xt;
COUNT(*)
----------
11843
Incidentally, we must create one file per degree of parallelism. If we request a degree of parallelism greater than the number
of files in the LOCATION clause, then Oracle will reduce the parallelism accordingly. If we request any available degree of
parallelism (we can use the PARALLEL keyword without specifying a degree) but only include one file, then Oracle will not
invoke parallel query at all. Using the V$PQ_SESSTAT dynamic view, we can verify this quite simply.
SQL> host del c:\temp\ora\all*
SQL> DROP TABLE all_objects_xt;
Table dropped.
SQL> CREATE TABLE all_objects_xt
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY xt_dir
6 LOCATION ( 'allobj.dmp' )
7 )
8 PARALLEL
9 AS
10 SELECT *
11 FROM all_objects;
6/19/2014 unloading data using external tables in 10g
http://www.oracle-developer.net/display.php?id=325 5/7
Table created.
SQL> SELECT last_query
2 FROM v$pq_sesstat
3 WHERE statistic = 'DDL Parallelized';
LAST_QUERY
----------
0
Note that the same restriction does not apply when reading the data back. In most cases, the ORACLE_DATAPUMP access
driver will be able to read the data from a single file in parallel.
The following shows how Oracle will use parallel processing to unload data when we provide more than a single file.
SQL> host del c:\temp\ora\all*
SQL> DROP TABLE all_objects_xt;
Table dropped.
SQL> CREATE TABLE all_objects_xt
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY xt_dir
6 LOCATION ( 'allobj1.dmp', 'allobj2.dmp' )
7 )
8 PARALLEL
9 AS
10 SELECT *
11 FROM all_objects;
Table created.
SQL> SELECT last_query
2 FROM v$pq_sesstat
3 WHERE statistic = 'DDL Parallelized';
LAST_QUERY
----------
1
direct path read/write for performance
In addition to the performance benefits of writing data in parallel, Oracle also treats external data unloads as unlogged direct
path operations (which means they won't interfere with your buffer cache). For the following demonstration, I have taken a
heap copy of ALL_OBJECTS to simplify the explain plan output (ALL_OBJECTS is a view with a reasonably complicated
execution plan - taking a copy ensures we get a simple, single-table access plan).
SQL> EXPLAIN PLAN FOR
2 CREATE TABLE my_all_objects_xt
3 ORGANIZATION EXTERNAL
4 (
5 TYPE ORACLE_DATAPUMP
6 DEFAULT DIRECTORY xt_dir
7 LOCATION ( 'whatever.dmp' )
8 )
6/19/2014 unloading data using external tables in 10g
http://www.oracle-developer.net/display.php?id=325 6/7
9 AS
10 SELECT *
11 FROM my_all_objects;
Explained.
SQL> SELECT *
2 FROM TABLE( DBMS_XPLAN.DISPLAY );
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
Plan hash value: 3302953930
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 48693 | 6086K| 344 (1)| 00:00:05 |
| 1 | LOAD AS SELECT | | | | | |
| 2 | TABLE ACCESS FULL | MY_ALL_OBJECTS | 48693 | 6086K| 149 (1)| 00:00:02 |
-----------------------------------------------------------------------------------------
Note
- dynamic sampling used for this statement
13 rows selected.
shared data across oracle versions
It is possible to share data between different versions of Oracle using the ORACLE_DATAPUMP driver. Oracle includes an
optional VERSION clause to use in the ACCESS PARAMETERS when unloading data that is to be read in a database of a
different version. Valid values are "COMPATIBLE", "LATEST" or an actual version string. It is difficult to test this at present with
my Solaris 10.1.0.3 and XP 10.1.0.2 databases as the dump files produced by both are interchangeable without any
reference to VERSION. In fact, I am currently able to share data between the two databases despite including complete
garbage in the VERSION clause. Perhaps we will need a major release (e.g. 10gR2 in Summer 2005) to test backwards
compatibility.
uses for writeable external tables
All of which is very interesting and exciting, but given that the unload feature of external tables is proprietary (i.e. the Data
Pump format), why would we use it?
Ultimately, Oracle has provided writeable external tables as a means of backing up data or sharing data between schemas
or databases, regardless of platform or version (thanks to Data Pump). At its most simple, it is a very useful mechanism for
backing up data in a development environment (for example, to save a "before" image of data for regression testing). It is
also feasible to use external tables for archiving data.
For example, it is quite common for data warehouses to have "rolling" histories, whereby historical data is purged after a
certain period. The most simple and efficient method for this is to use partitioning and simply "age-out" the partitions that
contain data beyond the threshold. Partition exchange has always been the preferred mechanism for removing the data from
the table, but how that data is archived has not always been so simple. With the requirement being to get the data onto a
storage device for archiving, options have often been restricted to Oracle's export utility or even a "hand-rolled" data unloader
using UTL_FILE, for example. The former can be fast but requires a switch to a shell environment whereas the latter can be
painfully slow across large data volumes. It is not uncommon for people to rely on spooling sqlplus scripts, which offers very
little application "visibility" and can make some very well respected application developers shudder at the thought!
To my mind, writeable external tables in 10g excel in this task. Not only are they an excellent mechanism for fast data
unloading, they are also fully controllable within the perimeter of a PL/SQL application. An added benefit is that Oracle
creates the table that can be used to read the data back should the users ever require it. With the addition of a sensible
naming convention, the archiving requirement of a rolling history application becomes faster and considerably easier. Swap
6/19/2014 unloading data using external tables in 10g
http://www.oracle-developer.net/display.php?id=325 7/7
out the partition, unload the data for archiving and leave the external table behind to read the data back should it ever be
required and as a persistent reminder of what was archived. It is even possible to skip the partition exchange step and
simply create the external table as a SELECT from the oldest partition before it is removed from the history table.
It is also feasible that writeable external tables could become a mechanism for sharing data, especially in environments that
do not use database links or where the processing requirement is for features that are not supported over links (such as
direct path, parallel query, bulk PL/SQL operations). The Data Pump driver makes it possible to share data across database
versions and operating systems, even those of different endianness. To continue with the implications for data archiving as
described above, what if the data was to be archived to another database rather than storage device? We now have
significantly more control of our archiving requirement, all from within the same PL/SQL application. Using features such as
DBMS_FILE_TRANSFER (10g) and DBMS_AQ, we can now share data across databases and have the receiving database
act on the availability of the new data.
metadata
In addition to the XXX_TABLES family of views, Oracle provides two further view families specific to external tables. These are
XXX_EXTERNAL_TABLES and XXX_EXTERNAL_LOCATIONS. The former provides us with information such as the external
table type and access parameters, while the latter view family provides information on the underlying flat files and their
locations. The XXX_TABLES view family is still of importance, of course, with information critical to the CBO (statistics for
external tables can be computed using DBMS_STATS).
conclusions
Writeable external tables are a useful new addition to Oracle 10g's set of utilities. They provide a fast a simple mechanism to
backup or share critical and non-critical data in a variety of applications. This paper hopefully outlines some of the interesting
new opportunities and benefits and stimulates developers into thinking of alternative approaches to explore when building
database applications.
further reading
More information on external tables can be found in the Utilities book in the online Oracle documentation. I always start with
the documentation search engine and locate the correct set of manuals from there.
source code
The source code for the examples in this article can be downloaded from here.
Adrian Billington, May 2005
Back to Top

You might also like