Thread: dblink bulk operations

dblink bulk operations

From
Andrew Dunstan
Date:
Last night I needed to move a bunch of data from an OLTP database to an 
archive database, and used dblink with a bunch of insert statements. 
Since I was moving about 4m records this was distressingly but not 
surprisingly slow. It set me wondering why we don't build more support 
for libpq operations into dblink, like transactions and prepared 
queries, and maybe COPY too. It would be nice to be able to do something 
like:
   select dblink_connect('dbh','dbname=foo');   select dblink_begin('dbh');   select dblink_prepare('dbh','sth','insert
intobar values ($1,$2,$3)');   select dblink_exec_prepared('dbh','sth',row(a,b,c)) from bar; -- can   we do this?
selectdblink_commit('dbh');   select dblink_disconnect('dbh');
 


Does this seem worthwhile and doable, or am I smoking crack?

cheers

andrew


Re: dblink bulk operations

From
Merlin Moncure
Date:
On Thu, Aug 6, 2009 at 11:11 AM, Andrew Dunstan<andrew@dunslane.net> wrote:
>
> Last night I needed to move a bunch of data from an OLTP database to an
> archive database, and used dblink with a bunch of insert statements. Since I
> was moving about 4m records this was distressingly but not surprisingly
> slow. It set me wondering why we don't build more support for libpq
> operations into dblink, like transactions and prepared queries, and maybe
> COPY too. It would be nice to be able to do something like:
>
>   select dblink_connect('dbh','dbname=foo');
>   select dblink_begin('dbh');

you can always exec a sql 'begin'.

>   select dblink_prepare('dbh','sth','insert into bar values ($1,$2,$3)');
>   select dblink_exec_prepared('dbh','sth',row(a,b,c)) from bar; -- can
>   we do this?

The answer to this I think is yes, but not quite that way.  Much
better I think is to use 8.4 variable argument functions, use
parametrized features off libpq always, and use the binary protocol
when possible.  This does end up running much faster, and easier to
use...(we've done exactly that for our in house stuff).  IIRC you can
parameterize 'execute', so the above should work for prepared queries
as well.

If we get the ability to set specific OIDs for types, I can remove
some of the hacks  we have to send text for composites and arrays of
composites.
select * from pqlink_exec(connstr, 'select $1 + $2', 3, 4) as R(v int);v
---7
(1 row)


merlin


Re: dblink bulk operations

From
David Fetter
Date:
On Thu, Aug 06, 2009 at 11:11:58AM -0400, Andrew Dunstan wrote:
>
> Last night I needed to move a bunch of data from an OLTP database to an  
> archive database, and used dblink with a bunch of insert statements.  
> Since I was moving about 4m records this was distressingly but not  
> surprisingly slow. It set me wondering why we don't build more support  
> for libpq operations into dblink, like transactions and prepared  
> queries, and maybe COPY too. It would be nice to be able to do something  
> like:
>
>    select dblink_connect('dbh','dbname=foo');
>    select dblink_begin('dbh');
>    select dblink_prepare('dbh','sth','insert into bar values ($1,$2,$3)');
>    select dblink_exec_prepared('dbh','sth',row(a,b,c)) from bar; -- can
>    we do this?
>    select dblink_commit('dbh');
>    select dblink_disconnect('dbh');
>
>
> Does this seem worthwhile and doable, or am I smoking crack?

For what it's worth, DBI-Link provides a lot of this.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: dblink bulk operations

From
Andrew Dunstan
Date:

David Fetter wrote:
>
> For what it's worth, DBI-Link provides a lot of this.
>
>
>   

Indeed, but that assumes that perl+DBI+DBD::Pg is available, which is by 
no means always the case. If we're going to have a dblink module ISTM it 
should be capable of reasonable bulk operations.

cheers

andrew


Re: dblink bulk operations

From
David Fetter
Date:
On Thu, Aug 06, 2009 at 12:28:15PM -0400, Andrew Dunstan wrote:
> David Fetter wrote:
>>
>> For what it's worth, DBI-Link provides a lot of this.
>
> Indeed, but that assumes that perl+DBI+DBD::Pg is available, which
> is by  no means always the case. If we're going to have a dblink
> module ISTM it  should be capable of reasonable bulk operations.

I didn't mean to suggest that you should use DBI-Link, just that it's
a requirement that's come up in very similar contexts to that of
dblink.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: dblink bulk operations

From
Merlin Moncure
Date:
On Thu, Aug 6, 2009 at 11:11 AM, Andrew Dunstan<andrew@dunslane.net> wrote:
>
> Last night I needed to move a bunch of data from an OLTP database to an
> archive database, and used dblink with a bunch of insert statements. Since I
> was moving about 4m records this was distressingly but not surprisingly
> slow. It set me wondering why we don't build more support for libpq
> operations into dblink, like transactions and prepared queries, and maybe
> COPY too. It would be nice to be able to do something like:
>
>   select dblink_connect('dbh','dbname=foo');
>   select dblink_begin('dbh');
>   select dblink_prepare('dbh','sth','insert into bar values ($1,$2,$3)');
>   select dblink_exec_prepared('dbh','sth',row(a,b,c)) from bar; -- can
>   we do this?
>   select dblink_commit('dbh');
>   select dblink_disconnect('dbh');

thinking about this some more, you can get pretty close with vanilla
dblink with something like (i didn't test):

select dblink_exec('dbh', 'prepare xyz as insert into foo select ($1::foo).*');
select dblink_exec('dbh', 'execute xyz(' || my_foo::text || ')');

This maybe defeats a little bit of what you are trying to achieve
(especially performance), but is much easier to craft for basically
any table as long as the fields match.  The above runs into problems
with quoting (composite with bytea in it), but works ok most of the
time.

If you want faster/better, dblink need to be factored to parametrize
queries and, if possible, use binary.

merlin