Thread: How to run in parallel in Postgres

How to run in parallel in Postgres

From

Lars Aksel Opsahl

Date:

05 December 2019, 12:10:42

I have a function that prepares data, so the big job can be run it in parallel.

Today I have solved this by using "Gnu parallel" like this.

psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel -j 4 psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;

The problem here is that I depend on external code which may not be installed.

Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that

find_overlap_gap_make_run_cmd generates as set of 28 sql calls.

So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ?

When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on the next step of work.

Thanks .

Lars

Re: How to run in parallel in Postgres

From

Laurenz Albe

Date:

05 December 2019, 16:42:06

On Thu, 2019-12-05 at 12:10 +0000, Lars Aksel Opsahl wrote:
> have a function that prepares data, so the big job can be run it in parallel. 
> 
> Today I have solved this by using "Gnu parallel" like this.
> psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT
find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\";
parallel-j 4 
 
> psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;
> 
> The problem here is that I depend on external code which may not be installed. 
> 
> Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.
> 
> If you look at
https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that
 
> find_overlap_gap_make_run_cmd generates as set of 28 sql calls. 
> 
> So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I
don'thave dependent on externally install programs  ?
 
> 
> When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the
functionthat triggers parallel calls wait for them complete and then may start on
 
> the next step of work.

You cannot run several queries in parallel in a PostgreSQL function.

You may want to have a look at PL/Proxy which might be used for things like that.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: How to run in parallel in Postgres

From

Lars Aksel Opsahl

Date:

06 December 2019, 08:39:55

>From: Laurenz Albe <laurenz.albe@cybertec.at>

>Sent: Thursday, December 5, 2019 5:42 PM

>To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

>Subject: Re: How to run in parallel in Postgres

>On Thu, 2019-12-05 at 12:10 +0000, Lars Aksel Opsahl wrote:

>> have a function that prepares data, so the big job can be run it in parallel.

>> Today I have solved this by using "Gnu parallel" like this.

>> psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel -j 4

>> psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;

>> The problem here is that I depend on external code which may not be installed.

>> Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

>> If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that

>> find_overlap_gap_make_run_cmd generates as set of 28 sql calls.

>> So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ?

>> When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on

>> the next step of work.

>You cannot run several queries in parallel in a PostgreSQL function.

>You may want to have a look at PL/Proxy which might be used for things like that.

>Yours,

>Laurenz Albe

>--

>Cybertec | https://www.cybertec-postgresql.com

Thanks, I checked it out.

If I understand it correct I have to write the code using plproxy syntax and this means if plproxy is not installed the code will fail.

So the only way now to use built in parallel functionality in Postgres is to use C ?

Do you believe it will possible in the future to run parallel calls from a PostgresSQL function (or is impossible/difficult because of design) ?

Lars

Re: How to run in parallel in Postgres

From

Ondrej Ivanič

Date:

07 December 2019, 01:23:15

Hi Lars,

I have two suggestions:

- `xargs` almost always present and it can run in parallel (-P) but script needs to be changed:

for((i=1;i<=28;i++)); do echo "SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',${I},28);"; done | xargs -n1 -P 10 psql ...

- `UNION ALL` might trigger parallel execution (you need to mess with the cost of the function and perhaps other settings):

SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28) UNION ALL

SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28)

...

Cheers,

On Thu, 5 Dec 2019 at 23:11, Lars Aksel Opsahl <Lars.Opsahl@nibio.no> wrote:

Hi

I have a function that prepares data, so the big job can be run it in parallel.

Today I have solved this by using "Gnu parallel" like this.
psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel -j 4 psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;

The problem here is that I depend on external code which may not be installed.

Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that
find_overlap_gap_make_run_cmd generates as set of 28 sql calls.

So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ?

When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on the next step of work.

Thanks .

Lars

Ondrej

Re: How to run in parallel in Postgres

From

Justin Pryzby

Date:

07 December 2019, 01:25:21

On Thu, Dec 05, 2019 at 12:10:42PM +0000, Lars Aksel Opsahl wrote:
> I have a function that prepares data, so the big job can be run it in parallel.
> 
> Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.
> 
> If you look at
https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that

> 
> find_overlap_gap_make_run_cmd generates as set of 28 sql calls.
>
> So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I
don'thave dependent on externally install programs  ?

SELECT
find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28);
SELECT
find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28);
SELECT
find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',3,28);
...

I see that find_overlap_gap_single_cell creates tables, so cannot be run in parallel.
Maybe you could consider rewriting it to return data to its caller instead.
You'd also need to mark it as PARALLEL SAFE, of course.
Your other functions involved should be PARALLEL SAFE too.

Justin

Re: How to run in parallel in Postgres

From

Lars Aksel Opsahl

Date:

07 December 2019, 11:17:25

>From: Ondrej Ivanič <ondrej.ivanic@gmail.com>

>Sent: Saturday, December 7, 2019 2:23 AM

>Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

>Subject: Re: How to run in parallel in Postgres

>Hi Lars,

>I have two suggestions:

>- `xargs` almost always present and it can run in parallel (-P) but script needs to be changed:

>for((i=1;i<=28;i++)); do echo "SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',${I},28);"; done | xargs -n1 -P 10 psql ...

>- `UNION ALL` might trigger parallel execution (you need to mess with the cost of the function and perhaps other settings):

>SELECT

> find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28)

>UNION ALL

>SELECT

> find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28)

>...

>Cheers,

Hi Ondrej

Yes using xargs seems be an alternative to GNU parallel and I will have that in mind.
I did a test using UNION ALL in the branch https://github.com/larsop/find-overlap-and-gap/tree/union_all_parallel but I was not able to trigger Postgres parallel . That may be related to what Justin say about create tables.

Thanks.

Lars

Re: How to run in parallel in Postgres

From

Lars Aksel Opsahl

Date:

07 December 2019, 11:27:56

> From: Justin Pryzby <pryzby@telsasoft.com>

> Sent: Saturday, December 7, 2019 2:25 AM

> To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>

> Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

> Subject: Re: How to run in parallel in Postgres

> On Thu, Dec 05, 2019 at 12:10:42PM +0000, Lars Aksel Opsahl wrote:

> > I have a function that prepares data, so the big job can be run it in parallel.

> >

> > Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

> >

> > If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that

> >

> > find_overlap_gap_make_run_cmd generates as set of 28 sql calls.

> >

> > So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ?

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28);

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28);

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',3,28);

> ...

> I see that find_overlap_gap_single_cell creates tables, so cannot be run in parallel.

> Maybe you could consider rewriting it to return data to its caller instead.

> You'd also need to mark it as PARALLEL SAFE, of course.

> Your other functions involved should be PARALLEL SAFE too.

> Justin

Hi Justin

The reason why I don't return the results Is that on very bug tables I usually get memory problems if I return all the results to the master function. So I usually break thing up into small unlogged tables. Then I work on each table separately or in groups. When all steps are done i merge all the small tables together. I this case we only single step, but usually I work many more steps.

But I will keep mind that it may work i parallel if I don't create any child tables but returns the result.

Thanks.

Lars

Re: How to run in parallel in Postgres, EXECUTE_PARALLEL

From

Lars Aksel Opsahl

Date:

08 December 2019, 18:14:14

>You cannot run several queries in parallel in a PostgreSQL function.

>
>You may want to have a look at PL/Proxy which might be used for things like that.
>
>Yours,
>Laurenz Albe
>--
>Cybertec | https://www.cybertec-postgresql.com

Hi Laurenz

The code below takes 3, seconds

$body$

DECLARE

BEGIN

EXECUTE 'SELECT pg_sleep(1); SELECT pg_sleep(1); SELECT pg_sleep(1);';

END

$body$;

Do you or anybody know if there are any plans for a function call that support the calling structure below or something like it and that then could finish in 1 second ? (If you are calling a void function, the return value should not be any problem.)

$body$

DECLARE

command_string_list text[3];

BEGIN

command_string_list[0] = 'SELECT pg_sleep(1)';

command_string_list[1] = 'SELECT pg_sleep(1)';

command_string_list[2] = 'SELECT pg_sleep(1)';

EXECUTE_PARALLEL command_string_list;

END

$body$;

The only way to this today as I understand it, is to open 3 new connections back to the database which you can be done in different ways.

If we had a parallel functions like the one above it's easier to make parallel sql without using complex scripts, java, python or other system.

Thanks.

Lars

Re: How to run in parallel in Postgres, EXECUTE_PARALLEL

From

Joe Conway

Date:

08 December 2019, 20:04:05

On 12/8/19 1:14 PM, Lars Aksel Opsahl wrote:
> Do you or anybody know if there are any plans for a function call that
> support the calling structure below or something like it and that then
> could finish in 1 second ? (If you are calling a void function, the
> return value should not be any problem.)
>
> DO
> $body$
> *DECLARE* 
> command_string_list text[3];
> *BEGIN*
> command_string_list[0] = 'SELECT pg_sleep(1)';
> command_string_list[1] = 'SELECT pg_sleep(1)';
> command_string_list[2] = 'SELECT pg_sleep(1)';
> EXECUTE_PARALLEL command_string_list;
> *END*
> $body$;
>
> The only way to this today as I understand it, is to open 3 new
> connections back to the database which you can be done in different ways. 

Yes, correct.

> If we had a parallel functions like the one above it's easier to
> make parallel sql without using complex scripts, java, python or other
> system.

It does require one connection per statement, but with dblink it is not
necessarily all that complex. For example (granted, this could use more
error checking, etc.):

8<----------------
CREATE OR REPLACE FUNCTION
 execute_parallel(stmts text[])
RETURNS text AS
$$
declare
  i int;
  retv text;
  conn text;
  connstr text;
  rv int;
  db text := current_database();
begin
  for i in 1..array_length(stmts,1) loop
    conn := 'conn' || i::text;
    connstr := 'dbname=' || db;
    perform dblink_connect(conn, connstr);
    rv := dblink_send_query(conn, stmts[i]);
  end loop;
  for i in 1..array_length(stmts,1) loop
    conn := 'conn' || i::text;
    select val into retv
    from dblink_get_result(conn) as d(val text);
  end loop;
  for i in 1..array_length(stmts,1) loop
    conn := 'conn' || i::text;
    perform dblink_disconnect(conn);
  end loop;
  return 'OK';
  end;
$$ language plpgsql;
8<----------------

And then:

8<----------------
\timing
DO $$
  declare
    stmts text[];
  begin
    stmts[1] = 'select pg_sleep(1)';
    stmts[2] = 'select pg_sleep(1)';
    stmts[3] = 'select pg_sleep(1)';
  PERFORM execute_parallel(stmts);
  end;
$$ LANGUAGE plpgsql;
DO
Time: 1010.831 ms (00:01.011)
8<----------------

HTH,

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment

signature.asc