Thread: Re: Trying to create multi db query in one large querie

Re: Trying to create multi db query in one large querie

From
"Spiegelberg, Greg"
Date:
Hello,

My experience with dblink() is that each dblink() is executed serially, in
part I would guess, due to the plan for the query.  To have each query run
in parallel you would need to execute both dblink()'s simultaneously saving
each result into a table.  I'm not sure if the same table could be
specified.  Would depend on the constaint's I suppose.

#!/bin/sh
# Query 1
psql -d mydb -c "select * into mytable from dblink('db1','select * from
customer_data where timestamp between timestamp \'01-01-2004\' and timestamp
\'06-30-2004\'') as t1(c1 int, c2 text, ...);" & PID1=$!
# Query 2
psql -d mydb -c "select * into mytable from dblink('db2','select * from
customer_data where timestamp between timestamp \'01-07-2004\' and timestamp
\'12-31-2004\'') as t2(c1 int, c2 text, ...);" & PID2=$!
# wait
wait $PID1
wait $PID2
# Do more on mydb.mytable
...

Something like that so no guaranties.  I do remember testing with this a
while back and it is useful for JOIN's.

Greg


-----Original Message-----
From: Hasnul Fadhly bin Hasan
To: pgsql-performance@postgresql.org
Sent: 12/13/04 8:44 PM
Subject: [PERFORM] Trying to create multi db query in one large queries

Hi,

I am not sure if this is the place to ask this question, but since the
question is trying to improve the performance.. i guess i am not that
far off.

My question is if there is a query design that would query multiple
server simultaneously.. would that improve the performance?

To make it clear.. let's say we have 3 db servers.  1 server is just
designed to take the queries while the other 2 server is the ones that
actually
holds the data.  let's say we have a query of 'select * from
customer_data' and we change it to
select * from
(
dblink('db1','select * from customer_data where timestamp between
timestamp \'01-01-2004\' and timestamp \'06-30-2004\'')
union
dblink('db2','select * from customer_data where timestamp between
timestamp \'01-07-2004\' and timestamp \'12-31-2004\'')
)

Would the subquery above be done simultaneously by postgres before doing

the end query? or would it just execute one at a time?

If it does execute simultaneously.. it's possible to create code to
convert normal queries to distributed queries and requesting data from
multiple
database to improve performance.  This would be advantageous for large
amount of data.

Thanks,

Hasnul



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: Trying to create multi db query in one large querie

From
Joe Conway
Date:
Spiegelberg, Greg wrote:
>
> My experience with dblink() is that each dblink() is executed serially

Correct.

If you really want to do multiple queries simultaneously, you would need
to write a function very similar to dblink_record, but using asynchonous
libpq calls to both remote hosts. See:
   http://www.postgresql.org/docs/current/static/libpq-async.html

HTH,

Joe