Thread: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

From
Niels Jespersen
Date:

Hello all

 

We are often using the oracle_fdw to transfer data between Postgres (version 11+) and Oracle (version 18+). It works great.

 

However I have a task at hand that requires inserting a few billion rows in an Oracle table from a Postgres query.

 

insert into t_ora (a,b,c)   

select a,b,c from t_pg;

 

This is driven from a plpgsql stored procedure, if that matters.

 

I want to optimize the running time of this. But I am unsure of which, if any, possibilities there actually is.

 

Reducing the number of network roundtrips is usually a good way to increase throughput. But, how do I do that?

 

If I could make the Oracle insert direct load, that would usually also increase throughput. But, is that possible here. There are no constraints defined on the destinaton tables.

 

Regards Niels Jespersen

Re: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

From
Mladen Gogala
Date:


On 9/19/21 06:28, Niels Jespersen wrote:
@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri",sans-serif; mso-fareast-language:EN-US;}a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:#954F72; text-decoration:underline;}span.EmailStyle17 {mso-style-type:personal-compose; font-family:"Arial",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-family:"Calibri",sans-serif; mso-fareast-language:EN-US;}div.WordSection1 {page:WordSection1;}

Hello all

 

We are often using the oracle_fdw to transfer data between Postgres (version 11+) and Oracle (version 18+). It works great.

 

However I have a task at hand that requires inserting a few billion rows in an Oracle table from a Postgres query.

 

insert into t_ora (a,b,c)   

select a,b,c from t_pg;

 

This is driven from a plpgsql stored procedure, if that matters.

 

I want to optimize the running time of this. But I am unsure of which, if any, possibilities there actually is.

 

Reducing the number of network roundtrips is usually a good way to increase throughput. But, how do I do that?

 

If I could make the Oracle insert direct load, that would usually also increase throughput. But, is that possible here. There are no constraints defined on the destinaton tables.

 

Regards Niels Jespersen


The problem with oracle_fdw is that the SQL is parsed on the Postgres side, not on the Oracle side. If it was parsed on the Oracle side, you could use /*+ APPEND */ hint, which is essentially, a direct insert. You will have to write a script in one of the scripting languages, which would utilize the array insert, available with the instant client. Even Oracle ODBC driver utilizes array insert, as visible from the following article:

https://dbwhisperer.wordpress.com/2020/11/21/pyodbc-fast_executemany-and-oracle-rdbms/

Unfortunately, the Postgres side of the equation is not particularly good when using array fetch and does not do particularly well when trying to cut down on the number of network trips:

https://github.com/mkleehammer/pyodbc/wiki/Driver-support-for-fast_executemany

I would use a script on the Postgres side and then use superior options provided by SQL*Net.  You will need some fancy programming to prevent waiting on each operation. I would actually write 2 scripts, one reading data from Postgres, converting it to CSV and then piping it into script that inserts data into Oracle. That would make the scripts work in parallel, at least partially. Situations like this are the reason why a DBA needs to know how to script. So, this is where you start:

https://python.swaroopch.com/

Regards

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

From
Laurenz Albe
Date:
On Sun, 2021-09-19 at 10:28 +0000, Niels Jespersen wrote:
> We are often using the oracle_fdw to transfer data between Postgres (version 11+) and Oracle (version 18+). It works
great.
>  
> However I have a task at hand that requires inserting a few billion rows in an Oracle table from a Postgres query.
>  
> insert into t_ora (a,b,c)   
> select a,b,c from t_pg;
>  
> This is driven from a plpgsql stored procedure, if that matters.
>  
> I want to optimize the running time of this. But I am unsure of which, if any, possibilities there actually is.
>  
> Reducing the number of network roundtrips is usually a good way to increase throughput. But, how do I do that?
>  
> If I could make the Oracle insert direct load, that would usually also increase throughput.
> But, is that possible here. There are no constraints defined on the destinaton tables.

The cause of the bad performance for bulk data modifications is that the FDW API is built
that way: each row INSERTed means a round trip between PostgreSQL and Oracle.

That could be improved by collecting rows and inserting them in bulk on the Oracle
side, but I don't feel like implementing that and complicating the code.

From my point of view, oracle_fdw is good for reading, but not for bulk writes.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

From
Magnus Hagander
Date:
On Mon, Sep 20, 2021 at 3:18 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Sun, 2021-09-19 at 10:28 +0000, Niels Jespersen wrote:
> > We are often using the oracle_fdw to transfer data between Postgres (version 11+) and Oracle (version 18+). It
worksgreat.
 
> >
> > However I have a task at hand that requires inserting a few billion rows in an Oracle table from a Postgres query.
> >
> > insert into t_ora (a,b,c)
> > select a,b,c from t_pg;
> >
> > This is driven from a plpgsql stored procedure, if that matters.
> >
> > I want to optimize the running time of this. But I am unsure of which, if any, possibilities there actually is.
> >
> > Reducing the number of network roundtrips is usually a good way to increase throughput. But, how do I do that?
> >
> > If I could make the Oracle insert direct load, that would usually also increase throughput.
> > But, is that possible here. There are no constraints defined on the destinaton tables.
>
> The cause of the bad performance for bulk data modifications is that the FDW API is built
> that way: each row INSERTed means a round trip between PostgreSQL and Oracle.
>

Just as a note, with PostgreSQL 14 the FDW APIs allow batch insert. It
should be possible to update oracle_fdw to take advantage of that as
well, right?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



SV: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

From
Niels Jespersen
Date:
-----Oprindelig meddelelse-----
>Fra: Laurenz Albe <laurenz.albe@cybertec.at> 
>Sendt: 20. september 2021 15:18
>Til: Niels Jespersen <NJN@dst.dk>; pgsql-general@postgresql.org
>Emne: Re: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper
>
>On Sun, 2021-09-19 at 10:28 +0000, Niels Jespersen wrote:
>> We are often using the oracle_fdw to transfer data between Postgres (version 11+) and Oracle (version 18+). It works
great.
>>  
>> However I have a task at hand that requires inserting a few billion rows in an Oracle table from a Postgres query.
>> If I could make the Oracle insert direct load, that would usually also increase throughput.
> But, is that possible here. There are no constraints defined on the destinaton tables.
>
>The cause of the bad performance for bulk data modifications is that the FDW API is built that way: each row INSERTed
meansa round trip between PostgreSQL and Oracle.
 
>
>That could be improved by collecting rows and inserting them in bulk on the Oracle side, but I don't feel like
implementingthat and complicating the code.>
 
>
>From my point of view, oracle_fdw is good for reading, but not for bulk writes.
>
>Yours,
>Laurenz Albe

Thank you for clairifying. I will use a python script for doing this, then. Read large chunks of data from Postgres and
insertequally large chunks into Oracle, possibly using direct load. I know this can work. 
 

Regards Niels

Re: Possibilities for optimizing inserts across oracle_fdw foreign data wrapper

From
Laurenz Albe
Date:
On Mon, 2021-09-20 at 15:41 +0200, Magnus Hagander wrote:
> Just as a note, with PostgreSQL 14 the FDW APIs allow batch insert. It
> should be possible to update oracle_fdw to take advantage of that as
> well, right?

Yes.  The exercise is more complicated in Oracle, because they don't
support multi-line INSERTs.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com