Thread: How to run in parallel in Postgres

How to run in parallel in Postgres

From
Lars Aksel Opsahl
Date:
Hi

I have a function that prepares data, so the big job can be run it in parallel. 

Today I have solved this by using "Gnu parallel" like this.
psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel -j 4  psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;

The problem here is that I depend on external code which may not be installed. 

Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql  you see that

find_overlap_gap_make_run_cmd generates as set of 28 sql calls. 


So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs  ?


When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on the next step of work.


Thanks .


Lars



 

   

Re: How to run in parallel in Postgres

From
Laurenz Albe
Date:
On Thu, 2019-12-05 at 12:10 +0000, Lars Aksel Opsahl wrote:
> have a function that prepares data, so the big job can be run it in parallel. 
> 
> Today I have solved this by using "Gnu parallel" like this.
> psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT
find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\";
parallel-j 4 
 
> psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;
> 
> The problem here is that I depend on external code which may not be installed. 
> 
> Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.
> 
> If you look at
https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that
 
> find_overlap_gap_make_run_cmd generates as set of 28 sql calls. 
> 
> So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I
don'thave dependent on externally install programs  ?
 
> 
> When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the
functionthat triggers parallel calls wait for them complete and then may start on
 
> the next step of work.

You cannot run several queries in parallel in a PostgreSQL function.

You may want to have a look at PL/Proxy which might be used for things like that.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: How to run in parallel in Postgres

From
Lars Aksel Opsahl
Date:

>From: Laurenz Albe <laurenz.albe@cybertec.at>

>Sent: Thursday, December 5, 2019 5:42 PM

>To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

>Subject: Re: How to run in parallel in Postgres

> 

>On Thu, 2019-12-05 at 12:10 +0000, Lars Aksel Opsahl wrote:

>> have a function that prepares data, so the big job can be run it in parallel. 

>> 

>> Today I have solved this by using "Gnu parallel" like this.

>> psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel -j 4 

>> psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;

>> 

>> The problem here is that I depend on external code which may not be installed. 

>> 

>> Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

>> 

>> If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql  you see that

>> find_overlap_gap_make_run_cmd generates as set of 28 sql calls. 

>> 

>> So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs  ?

>> 

>> When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on

>> the next step of work.

>

>You cannot run several queries in parallel in a PostgreSQL function.

>

>You may want to have a look at PL/Proxy which might be used for things like that.

>

>Yours,

>Laurenz Albe

>-- 

>Cybertec | https://www.cybertec-postgresql.com


Hi


Thanks, I checked it out.


If I understand it correct I have to write the code using plproxy syntax and this means if plproxy is not installed the code will fail.


So the only way now to use built in parallel functionality in Postgres is to use C ?


Do you believe it will possible in the future to run parallel calls from a PostgresSQL function (or is impossible/difficult because of design) ?


Lars


Re: How to run in parallel in Postgres

From
Ondrej Ivanič
Date:
Hi Lars,

I have two suggestions:

- `xargs` almost always present and it can run in parallel (-P) but script needs to be changed:
for((i=1;i<=28;i++)); do echo "SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',${I},28);"; done | xargs -n1 -P 10 psql ... 

- `UNION ALL` might trigger parallel execution (you need to mess with the cost of the function and perhaps other settings):
SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28) UNION ALL
SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28)
...

Cheers,

On Thu, 5 Dec 2019 at 23:11, Lars Aksel Opsahl <Lars.Opsahl@nibio.no> wrote:
Hi

I have a function that prepares data, so the big job can be run it in parallel. 

Today I have solved this by using "Gnu parallel" like this.
psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel -j 4  psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;

The problem here is that I depend on external code which may not be installed. 

Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql  you see that

find_overlap_gap_make_run_cmd generates as set of 28 sql calls. 


So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs  ?


When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on the next step of work.


Thanks .


Lars



 

   


--
Ondrej

Re: How to run in parallel in Postgres

From
Justin Pryzby
Date:
On Thu, Dec 05, 2019 at 12:10:42PM +0000, Lars Aksel Opsahl wrote:
> I have a function that prepares data, so the big job can be run it in parallel.
> 
> Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.
> 
> If you look at
https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that
 
> 
> find_overlap_gap_make_run_cmd generates as set of 28 sql calls.
>
> So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I
don'thave dependent on externally install programs  ?
 

SELECT
find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28);
SELECT
find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28);
SELECT
find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',3,28);
...

I see that find_overlap_gap_single_cell creates tables, so cannot be run in parallel.
Maybe you could consider rewriting it to return data to its caller instead.
You'd also need to mark it as PARALLEL SAFE, of course.
Your other functions involved should be PARALLEL SAFE too.

Justin



Re: How to run in parallel in Postgres

From
Lars Aksel Opsahl
Date:

 >From: Ondrej Ivanič <ondrej.ivanic@gmail.com>

 >Sent: Saturday, December 7, 2019 2:23 AM

 >Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

 >Subject: Re: How to run in parallel in Postgres

 > 

 >Hi Lars,

 >

 >I have two suggestions:

 >

 >- `xargs` almost always present and it can run in parallel (-P) but script needs to be changed:

 >for((i=1;i<=28;i++)); do echo "SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',${I},28);"; done | xargs -n1 -P 10 psql ... 

 >

 >- `UNION ALL` might trigger parallel execution (you need to mess with the cost of the function and perhaps other settings):

 >SELECT

 > find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28)

 >UNION ALL

 >SELECT

 > find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28)

 >...

 >

 >

 >Cheers,

 >

Hi Ondrej

Thanks.

Lars


Re: How to run in parallel in Postgres

From
Lars Aksel Opsahl
Date:

> From: Justin Pryzby <pryzby@telsasoft.com>

> Sent: Saturday, December 7, 2019 2:25 AM

> To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>

> Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

> Subject: Re: How to run in parallel in Postgres

>  

> On Thu, Dec 05, 2019 at 12:10:42PM +0000, Lars Aksel Opsahl wrote:

> > I have a function that prepares data, so the big job can be run it in parallel.

> > 

> > Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

> > 

> > If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql  you see that

> > 

> > find_overlap_gap_make_run_cmd generates as set of 28 sql calls.

> >

> > So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs  ?

> 

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28);

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28);

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',3,28);

> ...

> 

> I see that find_overlap_gap_single_cell creates tables, so cannot be run in parallel.

> Maybe you could consider rewriting it to return data to its caller instead.

> You'd also need to mark it as PARALLEL SAFE, of course.

> Your other functions involved should be PARALLEL SAFE too.

> 

> Justin


Hi Justin

The reason why I don't return the results Is that on very bug tables I usually  get memory problems if I return all the results to the master function. So I usually break thing up into small unlogged tables. Then I work on each table separately or in groups. When all steps are done i merge all the small tables together. I this case we only single step, but usually I work many more steps.

But I will keep mind that it may work i parallel if I don't create any child tables but returns the result.

Thanks.

Lars

Re: How to run in parallel in Postgres, EXECUTE_PARALLEL

From
Lars Aksel Opsahl
Date:
>You cannot run several queries in parallel in a PostgreSQL function.
>
>You may want to have a look at PL/Proxy which might be used for things like that.
>
>Yours,
>Laurenz Albe
>--
>Cybertec | https://www.cybertec-postgresql.com

Hi Laurenz

The code below takes 3, seconds 
DO

$body$

DECLARE 

BEGIN

EXECUTE 'SELECT pg_sleep(1); SELECT pg_sleep(1); SELECT pg_sleep(1);';

END

$body$;

Do you or anybody know if there are any plans for a function call that support the calling structure below or something like it and that then could finish in 1 second ? (If you are calling a void function, the return value should not be any problem.)

DO

$body$

DECLARE 

command_string_list text[3];

BEGIN

command_string_list[0] = 'SELECT pg_sleep(1)';

command_string_list[1] = 'SELECT pg_sleep(1)';

command_string_list[2] = 'SELECT pg_sleep(1)';

EXECUTE_PARALLEL command_string_list;

END

$body$;


The only way to this today as I understand it, is to open 3 new connections back to the database which you can be done in different ways. 
 
If we had a parallel functions like the one above it's easier to make parallel sql without using complex scripts, java, python or other system.

Thanks.

Lars

Re: How to run in parallel in Postgres, EXECUTE_PARALLEL

From
Joe Conway
Date:
On 12/8/19 1:14 PM, Lars Aksel Opsahl wrote:
> Do you or anybody know if there are any plans for a function call that
> support the calling structure below or something like it and that then
> could finish in 1 second ? (If you are calling a void function, the
> return value should not be any problem.)
>
> DO
> $body$
> *DECLARE* 
> command_string_list text[3];
> *BEGIN*
> command_string_list[0] = 'SELECT pg_sleep(1)';
> command_string_list[1] = 'SELECT pg_sleep(1)';
> command_string_list[2] = 'SELECT pg_sleep(1)';
> EXECUTE_PARALLEL command_string_list;
> *END*
> $body$;
>
> The only way to this today as I understand it, is to open 3 new
> connections back to the database which you can be done in different ways. 

Yes, correct.

> If we had a parallel functions like the one above it's easier to
> make parallel sql without using complex scripts, java, python or other
> system.

It does require one connection per statement, but with dblink it is not
necessarily all that complex. For example (granted, this could use more
error checking, etc.):

8<----------------
CREATE OR REPLACE FUNCTION
 execute_parallel(stmts text[])
RETURNS text AS
$$
declare
  i int;
  retv text;
  conn text;
  connstr text;
  rv int;
  db text := current_database();
begin
  for i in 1..array_length(stmts,1) loop
    conn := 'conn' || i::text;
    connstr := 'dbname=' || db;
    perform dblink_connect(conn, connstr);
    rv := dblink_send_query(conn, stmts[i]);
  end loop;
  for i in 1..array_length(stmts,1) loop
    conn := 'conn' || i::text;
    select val into retv
    from dblink_get_result(conn) as d(val text);
  end loop;
  for i in 1..array_length(stmts,1) loop
    conn := 'conn' || i::text;
    perform dblink_disconnect(conn);
  end loop;
  return 'OK';
  end;
$$ language plpgsql;
8<----------------

And then:

8<----------------
\timing
DO $$
  declare
    stmts text[];
  begin
    stmts[1] = 'select pg_sleep(1)';
    stmts[2] = 'select pg_sleep(1)';
    stmts[3] = 'select pg_sleep(1)';
  PERFORM execute_parallel(stmts);
  end;
$$ LANGUAGE plpgsql;
DO
Time: 1010.831 ms (00:01.011)
8<----------------

HTH,

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment