Re: is parallel union all possible over dblink? - Mailing list pgsql-performance

From Greg Spiegelberg
Subject Re: is parallel union all possible over dblink?
Date
Msg-id BANLkTin_CFVj24_XX=+o_gZKtTCuqgENQg@mail.gmail.com
Whole thread Raw
In response to Re: is parallel union all possible over dblink?  ("Marinos Yannikos" <mjy@geizhals.at>)
Responses Re: is parallel union all possible over dblink?
Re: is parallel union all possible over dblink?
List pgsql-performance
On Wed, Jun 29, 2011 at 12:37 PM, Marinos Yannikos <mjy@geizhals.at> wrote:
On Wed, 29 Jun 2011 13:55:58 +0200, Svetlin Manavski <svetlin.manavski@gmail.com> wrote:

Question: Is there a way to get the same result from within a PL/pgSQL
function but running all the sub-queries in parallel? In case it is not
directly available, which one would be the simplest way to implement it in
my application? (I am very keen to avoid the obvious solution of an
additional multi-threaded layer which would do it out of the RDBMS)

Have you tried dblink_send_query() + dblink_get_results() yet?

http://www.postgresql.org/docs/current/static/contrib-dblink-send-query.html

You'd have to do something like this to your queries [untested]:

select dblink_send_query('remote1','select * from

appqosfe.F_total_utilization(1306918800000000000::INT8, NULL,
60000000000::INT8, NULL)');

(select * from appqosfe.F_total_utilization(1306918800000000000::INT8, NULL,
60000000000::INT8, NULL))
UNION ALL
(SELECT * from dblink_get_result('remote1') as T1(detectroid numeric, timegroup numeric,
numbytes numeric, numpackets numeric))
order by timegroup asc;

i.e. start your remote query/-ies asynchronously, then collect the results in the UNION query. At least in theory it should work...


This does work however you'll need to add a little more to it to ensure your UNION succeeds.  In pseudo...

connection #1:
CREATE TABLE target_1 ...
BEGIN;
LOCK TABLE target_1 IN ACCESS EXCLUSIVE MODE;
INSERT INTO target_1 SELECT ...
COMMIT;

connection #2:
CREATE TABLE target_2 ...
BEGIN;
LOCK TABLE target_2 IN ACCESS EXCLUSIVE MODE;
INSERT INTO target_2 SELECT ...
COMMIT;

connection #3:
SELECT * FROM target_1 UNION SELECT * FROM target_2;

Connections 1 and 2 can be done in simultaneously and after both have reached the LOCK statement then the SELECT on connection 3 can be executed.  Same fundamentals if all three connections are to different databases and connection 3 uses dblink to pull the data.

Another alternative is to use GridSQL.  I haven't used it myself but seen it in action on a large install with 4 backend databases.  Pretty slick.

Greg

pgsql-performance by date:

Previous
From: "Marinos Yannikos"
Date:
Subject: Re: is parallel union all possible over dblink?
Next
From: Shaun Thomas
Date:
Subject: Re: is parallel union all possible over dblink?