Re: Trying to create multi db query in one large querie

From: Spiegelberg, Greg
Subject: Re: Trying to create multi db query in one large querie
Date: ,
Msg-id: 387C22290D3FD71195D300508BF7DB5238AFB6@colmail01.cranel.local
(view: Whole thread, Raw)
Responses: Re: Trying to create multi db query in one large querie  (Joe Conway)
List: pgsql-performance

Hello,

My experience with dblink() is that each dblink() is executed serially, in
part I would guess, due to the plan for the query.  To have each query run
in parallel you would need to execute both dblink()'s simultaneously saving
each result into a table.  I'm not sure if the same table could be
specified.  Would depend on the constaint's I suppose.

#!/bin/sh
# Query 1
psql -d mydb -c "select * into mytable from dblink('db1','select * from
customer_data where timestamp between timestamp \'01-01-2004\' and timestamp
\'06-30-2004\'') as t1(c1 int, c2 text, ...);" & PID1=$!
# Query 2
psql -d mydb -c "select * into mytable from dblink('db2','select * from
customer_data where timestamp between timestamp \'01-07-2004\' and timestamp
\'12-31-2004\'') as t2(c1 int, c2 text, ...);" & PID2=$!
# wait
wait $PID1
wait $PID2
# Do more on mydb.mytable
...

Something like that so no guaranties.  I do remember testing with this a
while back and it is useful for JOIN's.

Greg


-----Original Message-----
From: Hasnul Fadhly bin Hasan
To: 
Sent: 12/13/04 8:44 PM
Subject: [PERFORM] Trying to create multi db query in one large queries

Hi,

I am not sure if this is the place to ask this question, but since the
question is trying to improve the performance.. i guess i am not that
far off.

My question is if there is a query design that would query multiple
server simultaneously.. would that improve the performance?

To make it clear.. let's say we have 3 db servers.  1 server is just
designed to take the queries while the other 2 server is the ones that
actually
holds the data.  let's say we have a query of 'select * from
customer_data' and we change it to
select * from
(
dblink('db1','select * from customer_data where timestamp between
timestamp \'01-01-2004\' and timestamp \'06-30-2004\'')
union
dblink('db2','select * from customer_data where timestamp between
timestamp \'01-07-2004\' and timestamp \'12-31-2004\'')
)

Would the subquery above be done simultaneously by postgres before doing

the end query? or would it just execute one at a time?

If it does execute simultaneously.. it's possible to create code to
convert normal queries to distributed queries and requesting data from
multiple
database to improve performance.  This would be advantageous for large
amount of data.

Thanks,

Hasnul



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


pgsql-performance by date:

From: Andrew Hood
Date:
Subject: Re: Hardware purchase question
From: Rod Taylor
Date:
Subject: Speeding up pg_dump