Thread: Best practice to load a huge table from ORACLE to PG
Hi All,
I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 columns table
from ORACLE to PGSQL.
The current approach is to dump the data in CSV and than COPY it to Postgresql.
Anyone has a better idea.
Regards
Adonias Malosso
I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 columns table
from ORACLE to PGSQL.
The current approach is to dump the data in CSV and than COPY it to Postgresql.
Anyone has a better idea.
Regards
Adonias Malosso
Adonias Malosso wrote: > Hi All, > > I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 > columns table > from ORACLE to PGSQL. > > The current approach is to dump the data in CSV and than COPY it to > Postgresql. > > Anyone has a better idea. Write a java trigger in Oracle that notes when a row has been added/delete/updated and does the exact same thing in postgresql. Joshua D. Drake > > > Regards > Adonias Malosso
But do we link oracle trigger to postgres trigger ?
i mean :
oracle trigger will take a note of what has been changed .
but then how do we pass those changes to postgres trigger ?
can u suggest any logic or algorithm ?
Regards,
Srikanth k Potluri
+63 9177444783(philippines)
On Sat 26/04/08 8:40 PM , "Joshua D. Drake" jd@commandprompt.com sent:
i mean :
oracle trigger will take a note of what has been changed .
but then how do we pass those changes to postgres trigger ?
can u suggest any logic or algorithm ?
Regards,
Srikanth k Potluri
+63 9177444783(philippines)
On Sat 26/04/08 8:40 PM , "Joshua D. Drake" jd@commandprompt.com sent:
Adonias Malosso wrote:
> Hi All,
>
> I�d like to know what�s the best practice to LOAD a 70 milion rows, 101
> columns table
> from ORACLE to PGSQL.
>
> The current approach is to dump the data in CSV and than COPY it to
> Postgresql.
>
> Anyone has a better idea.
Write a java trigger in Oracle that notes when a row has been
added/delete/updated and does the exact same thing in postgresql.
Joshua D. Drake
>
>
> Regards
> Adonias Malosso
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Potluri Srikanth wrote: > But do we link oracle trigger to postgres trigger ? > > i mean : > > oracle trigger will take a note of what has been changed . > but then how do we pass those changes to postgres trigger ? I am assuming you can use the java trigger from oracle to load the postgresql jdbc driver, make a connection to postgresql and perform whatever statement needed to be done. Sincerely, Joshua D. Drake P.S. It is possible that Oracle can't do this (I don't know)
Yep – just do something like this within sqlplus (from http://www.dbforums.com/showthread.php?t=350614):
set termout off
set hea off
set pagesize 0
spool c:\whatever.csv
select a.a||','||a.b||','||a.c
from a
where a.a="whatever";
spool off
COPY is the fastest approach to get it into PG.
- Luke
On 4/26/08 6:25 AM, "Adonias Malosso" <malosso@gmail.com> wrote:
set termout off
set hea off
set pagesize 0
spool c:\whatever.csv
select a.a||','||a.b||','||a.c
from a
where a.a="whatever";
spool off
COPY is the fastest approach to get it into PG.
- Luke
On 4/26/08 6:25 AM, "Adonias Malosso" <malosso@gmail.com> wrote:
Hi All,
I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 columns table
from ORACLE to PGSQL.
The current approach is to dump the data in CSV and than COPY it to Postgresql.
Anyone has a better idea.
Regards
Adonias Malosso
Joshua D. Drake wrote: > Potluri Srikanth wrote: >> But do we link oracle trigger to postgres trigger ? >> >> i mean : >> >> oracle trigger will take a note of what has been changed . >> but then how do we pass those changes to postgres trigger ? > > I am assuming you can use the java trigger from oracle to load the > postgresql jdbc driver, make a connection to postgresql and perform > whatever statement needed to be done. Note that this will be rather inefficient if you're obtaining a new connection every time. It looks like Oracle's Java stored procedures and triggers run in an appserver-like environment, though, so you should be able to use a connection pool, JNDI, or similar. Some Java stored procedure examples: http://www.oracle.com/technology/sample_code/tech/java/jsp/oracle9ijsp.html You could also use a Java trigger to send simpler change message, with a serialized row if required, to an external app that's responsible for updating the PostgreSQL database. That might cause less load on the DB server. The trouble with this approach, though, is that it might be hard to get right when transactions roll back. An alternative is to use an Oracle trigger that inserts records in a change tracking / audit table. You can then periodically read and clear the audit table, using that change history data to update the PostgreSQL database. This method has the advantage of being transaction safe, as data will never become visible in the audit table until the transaction making the changes has committed. -- Craig Ringer
On Sat, Apr 26, 2008 at 9:25 AM, Adonias Malosso <malosso@gmail.com> wrote: > I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 > columns table > from ORACLE to PGSQL. The fastest and easiest method would be to dump the data from Oracle into CSV/delimited format using something like ociuldr (http://www.anysql.net/en/ociuldr.html) and load it back into PG using pg_bulkload (which is a helluva lot faster than COPY). Of course, you could try other things as well... such as setting up generic connectivity to PG and inserting the data to a PG table over the database link. Similarly, while I hate to see shameless self-plugs in the community, the *fastest* method you could use is dblink_ora_copy, contained in EnterpriseDB's PG+ Advanced Server; it uses an optimized OCI connection to COPY the data directly from Oracle into Postgres, which also saves you the intermediate step of dumping the data. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
On Sat, 26 Apr 2008, Adonias Malosso wrote: > The current approach is to dump the data in CSV and than COPY it to > Postgresql. You would have to comment on what you don't like about what you're doing now, what parts need to be improved for your priorities, to get a properly targeted answer here. > I�d like to know what�s the best practice to LOAD a 70 milion rows, 101 > columns table from ORACLE to PGSQL. There is no one best practice. There's a wide variety of techniques on both the Oracle and PostgreSQL side in this area that might be used depending on what trade-offs are important to you. For example, if the goal was to accelerate a dump of a single table to run as fast as possible because you need , you'd want to look into techniques that dumped that table with multiple sessions going at once, each handling a section of that table. Typically you'd use one session per CPU on the server, and you'd use something with a much more direct path into the data than SQL*PLUS. Then on the PostgreSQL side, you could run multiple COPY sessions importing at once to read this data all back in, because COPY will bottleneck at the CPU level before the disks will if you've got reasonable storage hardware. There's a list of utilities in this are at http://www.orafaq.com/wiki/SQL*Loader_FAQ#Is_there_a_SQL.2AUnloader_to_download_data_to_a_flat_file.3F you might look for inspiration in that area, I know the WisdomForce FastReader handles simultaneous multi-section dumps via a very direct path to the data. ...but that's just one example based on one set of priorities, and it will be expensive in terms of dollars and complexity. As another example of something that changes things considerably, if there's any data with errors that will cause COPY to abort you might consider a different approach on the PG side. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Hi, Le dimanche 27 avril 2008, Greg Smith a écrit : > than SQL*PLUS. Then on the PostgreSQL side, you could run multiple COPY > sessions importing at once to read this data all back in, because COPY > will bottleneck at the CPU level before the disks will if you've got > reasonable storage hardware. Latest pgloader version has been made to handle this exact case, so if you want to take this route, please consider pgloader 2.3.0: http://pgloader.projects.postgresql.org/#_parallel_loading http://pgfoundry.org/projects/pgloader/ Another good reason to consider using pgloader is when the datafile contains erroneous input lines and you don't want the COPY transaction to abort. Those error lines will get rejected out by pgloader while the correct ones will get COPYied in. Regards, -- dim
Attachment
Adonias Malosso wrote: > Hi All, > > I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 > columns table > from ORACLE to PGSQL. > > The current approach is to dump the data in CSV and than COPY it to > Postgresql. > Uhm. 101 columns you say? Sounds interesting. There are dataloaders like: http://pgfoundry.org/projects/pgloader/ which could speed up loading the data over just copy csv. I wonder how much normalizing could help. Tino
Jonah,
Thank you for the answer. Good to know about this enterprise DB feature.
I´ll follow using pgloader.
Regards.
Adonias Malosso
On Sat, Apr 26, 2008 at 10:14 PM, Jonah H. Harris <jonah.harris@gmail.com> wrote:
On Sat, Apr 26, 2008 at 9:25 AM, Adonias Malosso <malosso@gmail.com> wrote:The fastest and easiest method would be to dump the data from Oracle
> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table
> from ORACLE to PGSQL.
into CSV/delimited format using something like ociuldr
(http://www.anysql.net/en/ociuldr.html) and load it back into PG using
pg_bulkload (which is a helluva lot faster than COPY). Of course, you
could try other things as well... such as setting up generic
connectivity to PG and inserting the data to a PG table over the
database link.
Similarly, while I hate to see shameless self-plugs in the community,
the *fastest* method you could use is dblink_ora_copy, contained in
EnterpriseDB's PG+ Advanced Server; it uses an optimized OCI
connection to COPY the data directly from Oracle into Postgres, which
also saves you the intermediate step of dumping the data.
--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/
On Mon, Apr 28, 2008 at 5:37 PM, Adonias Malosso <malosso@gmail.com> wrote: > Thank you for the answer. Good to know about this enterprise DB feature. No problem. > I´ll follow using pgloader. That's fine. Though, I'd really suggest pg_bulkload, it's quite a bit faster. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/