Re: How to run in parallel in Postgres, EXECUTE_PARALLEL - Mailing list pgsql-performance

From Lars Aksel Opsahl
Subject Re: How to run in parallel in Postgres, EXECUTE_PARALLEL
Date
Msg-id HE1P189MB026617A76AEAF06FDADA20619D590@HE1P189MB0266.EURP189.PROD.OUTLOOK.COM
Whole thread Raw
List pgsql-performance

> From: Joe Conway

> Sent: Sunday, December 8, 2019 9:04 PM

> To: Lars Aksel Opsahl; Laurenz Albe; pgsql-performance@lists.postgresql.org

> Subject: Re: How to run in parallel in Postgres, EXECUTE_PARALLEL

> 

> On 12/8/19 1:14 PM, Lars Aksel Opsahl wrote:

> > Do you or anybody know if there are any plans for a function call that

> > support the calling structure below or something like it and that then

> > could finish in 1 second ? (If you are calling a void function, the

> > return value should not be any problem.)

> > 

> > DO

> > $body$

> > *DECLARE* 

> > command_string_list text[3];

> > *BEGIN*

> > command_string_list[0] = 'SELECT pg_sleep(1)';

> > command_string_list[1] = 'SELECT pg_sleep(1)';

> > command_string_list[2] = 'SELECT pg_sleep(1)';

> > EXECUTE_PARALLEL command_string_list;

> > *END*

> > $body$;

> > 

> > The only way to this today as I understand it, is to open 3 new

> > connections back to the database which you can be done in different ways. 

> 

> Yes, correct.

> 

> > If we had a parallel functions like the one above it's easier to

> > make parallel sql without using complex scripts, java, python or other

> > system.

> 

> It does require one connection per statement, but with dblink it is not

> necessarily all that complex. For example (granted, this could use more

> error checking, etc.):

> 

> 8<----------------

> CREATE OR REPLACE FUNCTION

>  execute_parallel(stmts text[])

> RETURNS text AS

> $$

> declare

>   i int;

>   retv text;

>   conn text;

>   connstr text;

>   rv int;

>   db text := current_database();

> begin

>   for i in 1..array_length(stmts,1) loop

>     conn := 'conn' || i::text;

>     connstr := 'dbname=' || db;

>     perform dblink_connect(conn, connstr);

>     rv := dblink_send_query(conn, stmts[i]);

>   end loop;

>   for i in 1..array_length(stmts,1) loop

>     conn := 'conn' || i::text;

>     select val into retv

>     from dblink_get_result(conn) as d(val text);

>   end loop;

>   for i in 1..array_length(stmts,1) loop

>     conn := 'conn' || i::text;

>     perform dblink_disconnect(conn);

>   end loop;

>   return 'OK';

>   end;

> $$ language plpgsql;

> 8<----------------

> 

> And then:

> 

> 8<----------------

> \timing

> DO $$

>   declare

>     stmts text[];

>   begin

>     stmts[1] = 'select pg_sleep(1)';

>     stmts[2] = 'select pg_sleep(1)';

>     stmts[3] = 'select pg_sleep(1)';

>   PERFORM execute_parallel(stmts);

>   end;

> $$ LANGUAGE plpgsql;

> DO

> Time: 1010.831 ms (00:01.011)

> 8<----------------

> 

> HTH,

> 

> Joe

> -- 

> Crunchy Data - http://crunchydata.com

> PostgreSQL Support for Secure Enterprises

> Consulting, Training, & Open Source Development

Hi

(The test is failing now because it seems like drop EXTENSION dblink; is not cleaning up every thing)

As you say we need some error handling. And maybe  some retry if not enough free connections and a parameter for max parallel connections and so on.

So far this is best solution I have seen.

Thanks.

Lars 

pgsql-performance by date:

Previous
From: Joe Conway
Date:
Subject: Re: How to run in parallel in Postgres, EXECUTE_PARALLEL
Next
From: Justin Pryzby
Date:
Subject: Re: Specific query taking time to process