Re: Trying to create multi db query in one large querie - Mailing list pgsql-performance

From Spiegelberg, Greg
Subject Re: Trying to create multi db query in one large querie
Date
Msg-id 387C22290D3FD71195D300508BF7DB5238AFB6@colmail01.cranel.local
Whole thread Raw
Responses Re: Trying to create multi db query in one large querie
List pgsql-performance
Hello,

My experience with dblink() is that each dblink() is executed serially, in
part I would guess, due to the plan for the query.  To have each query run
in parallel you would need to execute both dblink()'s simultaneously saving
each result into a table.  I'm not sure if the same table could be
specified.  Would depend on the constaint's I suppose.

#!/bin/sh
# Query 1
psql -d mydb -c "select * into mytable from dblink('db1','select * from
customer_data where timestamp between timestamp \'01-01-2004\' and timestamp
\'06-30-2004\'') as t1(c1 int, c2 text, ...);" & PID1=$!
# Query 2
psql -d mydb -c "select * into mytable from dblink('db2','select * from
customer_data where timestamp between timestamp \'01-07-2004\' and timestamp
\'12-31-2004\'') as t2(c1 int, c2 text, ...);" & PID2=$!
# wait
wait $PID1
wait $PID2
# Do more on mydb.mytable
...

Something like that so no guaranties.  I do remember testing with this a
while back and it is useful for JOIN's.

Greg


-----Original Message-----
From: Hasnul Fadhly bin Hasan
To: pgsql-performance@postgresql.org
Sent: 12/13/04 8:44 PM
Subject: [PERFORM] Trying to create multi db query in one large queries

Hi,

I am not sure if this is the place to ask this question, but since the
question is trying to improve the performance.. i guess i am not that
far off.

My question is if there is a query design that would query multiple
server simultaneously.. would that improve the performance?

To make it clear.. let's say we have 3 db servers.  1 server is just
designed to take the queries while the other 2 server is the ones that
actually
holds the data.  let's say we have a query of 'select * from
customer_data' and we change it to
select * from
(
dblink('db1','select * from customer_data where timestamp between
timestamp \'01-01-2004\' and timestamp \'06-30-2004\'')
union
dblink('db2','select * from customer_data where timestamp between
timestamp \'01-07-2004\' and timestamp \'12-31-2004\'')
)

Would the subquery above be done simultaneously by postgres before doing

the end query? or would it just execute one at a time?

If it does execute simultaneously.. it's possible to create code to
convert normal queries to distributed queries and requesting data from
multiple
database to improve performance.  This would be advantageous for large
amount of data.

Thanks,

Hasnul



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

pgsql-performance by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: Similar tables, different indexes performance
Next
From: Joe Conway
Date:
Subject: Re: Trying to create multi db query in one large querie