Thread: dblink performance

dblink performance

From

"Alex Bible"

Date:

07 December 2011, 12:14:38

Hello All,

I’m currently on a development team utilizing PostgreSQL and we are looking into the possibility of using dblink to reference an external database (read only). Our system administrator and DBA were concerned about the performance impact that cross-database queries would have on a production application. Are there any known performance issues or anything of the like that I would need to know before pushing this issue further? I have been using PostgreSQL for the past couple months but this is my first time using dblink. I really just need an opinion from someone who has used this technology before. Thanks!

Alexander E. Bible

700 Delaware Ave.
Buffalo, NY 14226

Tel: (716)-888-3687
Mobile: (716)-472-9655
alex.bible@ctg.com

The information transmitted is intended only for the person or entity to which
it is addressed and may contain confidential and/or privileged material. Any
review, retransmission, dissemination or other use of, or taking of any action
in reliance upon, this information by persons or entities other than the
intended recipient is prohibited. If you are not the intended recipient of this
message, please contact the sender and delete this material from this computer.

Attachment

image001.gif

Re: dblink performance

From

Merlin Moncure

Date:

07 December 2011, 15:49:53

On Wed, Dec 7, 2011 at 10:14 AM, Alex Bible <Alex.Bible@ctg.com> wrote:
>
> Hello All,
>
> I’m currently on a development team utilizing PostgreSQL and we are looking into the possibility of using dblink to
referencean external database (read only). Our system administrator and DBA were concerned about the performance impact
thatcross-database queries would have on a production application. Are there any known performance issues or anything
ofthe like that I would need to know before pushing this issue further? I have been using PostgreSQL for the past
couplemonths but this is my first time using dblink. I really just need an opinion from someone who has used this
technologybefore. Thanks! 

dblink is a very thin wrapper for libpq.  From the querying database,
the overhead is pretty light -- basically the query is fired and the
results are interpreted from text into whatever the database has in
the receiving result via the various typein functions.   For all
intents and purposes, this is pretty similar to sending in queries
over the regular sql interface.  One gotcha of course is that libpq
buffers the entire result in memory which can be dangerous, so be
advised.

To the receiving database, dblink queries are no different from any
other query, except that they are not parameterized.  Lack of
parameterization and access to the binary protocol are the major
downsides when using dblink.  IMNSHO, dblink needs a variable argument
call that uses the paramterized interface.  Also support for binary
transfer of data would be nice.

merlin

Re: dblink performance

From

"Marc Mamin"

Date:

07 December 2011, 16:48:18

> On Wed, Dec 7, 2011 at 10:14 AM, Alex Bible <Alex.Bible@ctg.com>
wrote:
> >
> > Hello All,
> >
> > I'm currently on a development team utilizing PostgreSQL and we are
> looking into the possibility of using dblink to reference an external
> database (read only). Our system administrator and DBA were concerned
> about the performance impact that cross-database queries would have on
> a production application. Are there any known performance issues or
> anything of the like that I would need to know before pushing this
> issue further? I have been using PostgreSQL for the past couple months
> but this is my first time using dblink. I really just need an opinion
> from someone who has used this technology before. Thanks!
>
> dblink is a very thin wrapper for libpq.  From the querying database,
> the overhead is pretty light -- basically the query is fired and the
> results are interpreted from text into whatever the database has in
> the receiving result via the various typein functions.   For all
> intents and purposes, this is pretty similar to sending in queries
> over the regular sql interface.  One gotcha of course is that libpq
> buffers the entire result in memory which can be dangerous, so be
> advised.
>
> To the receiving database, dblink queries are no different from any
> other query, except that they are not parameterized.  Lack of
> parameterization and access to the binary protocol are the major
> downsides when using dblink.  IMNSHO, dblink needs a variable argument
> call that uses the paramterized interface.  Also support for binary
> transfer of data would be nice.
>
> merlin

I find dblink being a nice tool as long as the data volume to transfer
remains low.
I've evaluated it to implement a clustered Postgres environment, but
gave it up due to the poor performances.
Still waiting for the binary transfer before the next try ;-)

reagrds,

Marc Mamin

Re: dblink performance

From

Bruno Lavoie

Date:

07 December 2011, 16:53:06

Le 2011-12-07 11:14, Alex Bible a écrit :

Hello All,
I’m currently on a development team utilizing PostgreSQL and we are looking into the possibility of using dblink to reference an external database (read only). Our system administrator and DBA were concerned about the performance impact that cross-database queries would have on a production application. Are there any known performance issues or anything of the like that I would need to know before pushing this issue further? I have been using PostgreSQL for the past couple months but this is my first time using dblink. I really just need an opinion from someone who has used this technology before. Thanks!

Alexander E. Bible

Hello,

one problem is when you join local data with remote data. The optimizer falls short on this when finding good plans and executing. If you frequently need to join with non-volatile remote data, it's generally better to make a snapshot of remote tables. It all depends on volume and usage patterns of databases.

Interesting link:
http://www.postgresonline.com/journal/archives/44-Using-DbLink-to-access-other-PostgreSQL-Databases-and-Servers.html

hope that's help
Bruno Lavoie
bl@brunol.com
bruno.lavoie@gmail.com

Re: dblink performance

From

Merlin Moncure

Date:

07 December 2011, 17:17:53

On Wed, Dec 7, 2011 at 2:47 PM, Marc Mamin <M.Mamin@intershop.de> wrote:
>> On Wed, Dec 7, 2011 at 10:14 AM, Alex Bible <Alex.Bible@ctg.com>
> wrote:
>> >
>> > Hello All,
>> >
>> > I'm currently on a development team utilizing PostgreSQL and we are
>> looking into the possibility of using dblink to reference an external
>> database (read only). Our system administrator and DBA were concerned
>> about the performance impact that cross-database queries would have on
>> a production application. Are there any known performance issues or
>> anything of the like that I would need to know before pushing this
>> issue further? I have been using PostgreSQL for the past couple months
>> but this is my first time using dblink. I really just need an opinion
>> from someone who has used this technology before. Thanks!
>>
>> dblink is a very thin wrapper for libpq.  From the querying database,
>> the overhead is pretty light -- basically the query is fired and the
>> results are interpreted from text into whatever the database has in
>> the receiving result via the various typein functions.   For all
>> intents and purposes, this is pretty similar to sending in queries
>> over the regular sql interface.  One gotcha of course is that libpq
>> buffers the entire result in memory which can be dangerous, so be
>> advised.
>>
>> To the receiving database, dblink queries are no different from any
>> other query, except that they are not parameterized.  Lack of
>> parameterization and access to the binary protocol are the major
>> downsides when using dblink.  IMNSHO, dblink needs a variable argument
>> call that uses the paramterized interface.  Also support for binary
>> transfer of data would be nice.
>>
>> merlin
>
> I find dblink being a nice tool as long as the data volume to transfer
> remains low.
> I've evaluated it to implement a clustered Postgres environment, but
> gave it up due to the poor performances.
> Still waiting for the binary transfer before the next try ;-)

Binary transfer is not a super big deal in terms of performance
actually in the general case. It's only substantially faster in a few
cases like timestamp, geo types, and of course bytea.  Lack of
parameterization I find to be a bigger deal actually -- it's more of a
usability headache than a performance thing.

Also FYI binary dblink between databases is going to be problematic
for any non built in type unless the type oids are synchronized across
databases.

merlin

Re: dblink performance

From

"Marc Mamin"

Date:

07 December 2011, 17:38:15

> >
> > I find dblink being a nice tool as long as the data volume to
> transfer
> > remains low.
> > I've evaluated it to implement a clustered Postgres environment, but
> > gave it up due to the poor performances.
> > Still waiting for the binary transfer before the next try ;-)
>
> Binary transfer is not a super big deal in terms of performance
> actually in the general case. It's only substantially faster in a few
> cases like timestamp, geo types, and of course bytea.  Lack of
> parameterization I find to be a bigger deal actually -- it's more of a
> usability headache than a performance thing.
>
> Also FYI binary dblink between databases is going to be problematic
> for any non built in type unless the type oids are synchronized across
> databases.
>
> merlin

Thanks,
... so I don't really understand where all the time get lost in the
example I posted a few weeks ago:

http://archives.postgresql.org/pgsql-general/2011-09/msg00436.php

Marc Mamin

Re: dblink performance

From

Merlin Moncure

Date:

07 December 2011, 17:49:42

On Wed, Dec 7, 2011 at 3:37 PM, Marc Mamin <M.Mamin@intershop.de> wrote:
>> >
>> > I find dblink being a nice tool as long as the data volume to
>> transfer
>> > remains low.
>> > I've evaluated it to implement a clustered Postgres environment, but
>> > gave it up due to the poor performances.
>> > Still waiting for the binary transfer before the next try ;-)
>>
>> Binary transfer is not a super big deal in terms of performance
>> actually in the general case. It's only substantially faster in a few
>> cases like timestamp, geo types, and of course bytea.  Lack of
>> parameterization I find to be a bigger deal actually -- it's more of a
>> usability headache than a performance thing.
>>
>> Also FYI binary dblink between databases is going to be problematic
>> for any non built in type unless the type oids are synchronized across
>> databases.
>>
>> merlin
>
> Thanks,
> ... so I don't really understand where all the time get lost in the
> example I posted a few weeks ago:
>
> http://archives.postgresql.org/pgsql-general/2011-09/msg00436.php

you wrote:
"select count(*) from
(
select * from dblink('a','select * from test_cluster')as t1(a int)
union all
select * from dblink('a','select * from test_cluster')as t1(a int)
union all
select * from dblink('a','select * from test_cluster')as t1(a int)
)foo

is about 5 times slower than an equivalent query run locally.

working with asynchron. queries (dblink_send_query) does not bring much
benefit
so that much time seems to be spent for transfer and merge"

it's not exactly fair to compare dblink to local query -- dblink
method requires having to marshal all the data over the protoocl and
un-marshal on the other end.  I was seeing 3-5x times difference vs
local query but this is to be expected.  note the 'union all' had
nothing to so with your performance problems.  Also the querying
server can do a very special trick for count(*) since it only needs to
check tuple visibility that can't be done when doing select count(*)
from (<dblink_queries>).

My point up thread is that dblink is going to be comparable to other
methods that involve querying the data off the server and doing the
processing on the client side.

merlin