Thread: How to run in parallel in Postgres
find_overlap_gap_make_run_cmd generates as set of 28 sql calls.
So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ?
When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on the next step of work.
Thanks .
Lars
On Thu, 2019-12-05 at 12:10 +0000, Lars Aksel Opsahl wrote: > have a function that prepares data, so the big job can be run it in parallel. > > Today I have solved this by using "Gnu parallel" like this. > psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel-j 4 > psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log; > > The problem here is that I depend on external code which may not be installed. > > Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls. > > If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that > find_overlap_gap_make_run_cmd generates as set of 28 sql calls. > > So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don'thave dependent on externally install programs ? > > When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the functionthat triggers parallel calls wait for them complete and then may start on > the next step of work. You cannot run several queries in parallel in a PostgreSQL function. You may want to have a look at PL/Proxy which might be used for things like that. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
>From: Laurenz Albe <laurenz.albe@cybertec.at>
>Sent: Thursday, December 5, 2019 5:42 PM
>To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>; pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
>Subject: Re: How to run in parallel in Postgres
>
>On Thu, 2019-12-05 at 12:10 +0000, Lars Aksel Opsahl wrote:
>> have a function that prepares data, so the big job can be run it in parallel.
>>
>> Today I have solved this by using "Gnu parallel" like this.
>> psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel -j 4
>> psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;
>>
>> The problem here is that I depend on external code which may not be installed.
>>
>> Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.
>>
>> If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that
>> find_overlap_gap_make_run_cmd generates as set of 28 sql calls.
>>
>> So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ?
>>
>> When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on
>> the next step of work.
>
>You cannot run several queries in parallel in a PostgreSQL function.
>
>You may want to have a look at PL/Proxy which might be used for things like that.
>
>Yours,
>Laurenz Albe
>--
>Cybertec | https://www.cybertec-postgresql.com
Hi
Thanks, I checked it out.
If I understand it correct I have to write the code using plproxy syntax and this means if plproxy is not installed the code will fail.
So the only way now to use built in parallel functionality in Postgres is to use C ?
Do you believe it will possible in the future to run parallel calls from a PostgresSQL function (or is impossible/difficult because of design) ?
Lars
HiI have a function that prepares data, so the big job can be run it in parallel.Today I have solved this by using "Gnu parallel" like this.psql testdb -c"\! psql -t -q -o /tmp/run_cmd.sql testdb -c\"SELECT find_overlap_gap_make_run_cmd('sl_lop.overlap_gap_input_t1','geom',4258,'sl_lop.overlap_gap_input_t1_res',50);\"; parallel -j 4 psql testdb -c :::: /tmp/run_cmd.sql" 2>> /tmp/analyze.log;The problem here is that I depend on external code which may not be installed.Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see thatfind_overlap_gap_make_run_cmd generates as set of 28 sql calls.
So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ?
When this 28 sql calls are done, the find_overlap_gap_make_run_cmd may continue to the next step of work. So the function that triggers parallel calls wait for them complete and then may start on the next step of work.
Thanks .
Lars
On Thu, Dec 05, 2019 at 12:10:42PM +0000, Lars Aksel Opsahl wrote: > I have a function that prepares data, so the big job can be run it in parallel. > > Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls. > > If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that > > find_overlap_gap_make_run_cmd generates as set of 28 sql calls. > > So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don'thave dependent on externally install programs ? SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28); SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28); SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',3,28); ... I see that find_overlap_gap_single_cell creates tables, so cannot be run in parallel. Maybe you could consider rewriting it to return data to its caller instead. You'd also need to mark it as PARALLEL SAFE, of course. Your other functions involved should be PARALLEL SAFE too. Justin
>From: Ondrej Ivanič <ondrej.ivanic@gmail.com>
>Sent: Saturday, December 7, 2019 2:23 AM
>Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
>Subject: Re: How to run in parallel in Postgres
>
>Hi Lars,
>
>I have two suggestions:
>
>- `xargs` almost always present and it can run in parallel (-P) but script needs to be changed:
>for((i=1;i<=28;i++)); do echo "SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',${I},28);"; done | xargs -n1 -P 10 psql ...
>
>- `UNION ALL` might trigger parallel execution (you need to mess with the cost of the function and perhaps other settings):
>SELECT
> find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28)
>UNION ALL
>SELECT
> find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28)
>...
>
>
>Cheers,
>
- Yes using xargs seems be an alternative to GNU parallel and I will have that in mind.
- I did a test using UNION ALL in the branch https://github.com/larsop/find-overlap-and-gap/tree/union_all_parallel but I was not able to trigger Postgres parallel . That may be related to what Justin say about create tables.
> From: Justin Pryzby <pryzby@telsasoft.com>
> Sent: Saturday, December 7, 2019 2:25 AM
> To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>
> Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>
> Subject: Re: How to run in parallel in Postgres
>
> On Thu, Dec 05, 2019 at 12:10:42PM +0000, Lars Aksel Opsahl wrote:
> > I have a function that prepares data, so the big job can be run it in parallel.
> >
> > Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.
> >
> > If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql you see that
> >
> > find_overlap_gap_make_run_cmd generates as set of 28 sql calls.
> >
> > So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs ?
>
> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28);
> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28);
> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',3,28);
> ...
>
> I see that find_overlap_gap_single_cell creates tables, so cannot be run in parallel.
> Maybe you could consider rewriting it to return data to its caller instead.
> You'd also need to mark it as PARALLEL SAFE, of course.
> Your other functions involved should be PARALLEL SAFE too.
>
> Justin
>You may want to have a look at PL/Proxy which might be used for things like that.
>
>Yours,
>Laurenz Albe
>--
>Cybertec | https://www.cybertec-postgresql.com
$body$
DECLARE
BEGIN
EXECUTE 'SELECT pg_sleep(1); SELECT pg_sleep(1); SELECT pg_sleep(1);';
END
$body$;
$body$
DECLARE
command_string_list text[3];
BEGIN
command_string_list[0] = 'SELECT pg_sleep(1)';
command_string_list[1] = 'SELECT pg_sleep(1)';
command_string_list[2] = 'SELECT pg_sleep(1)';
EXECUTE_PARALLEL command_string_list;
END
$body$;
On 12/8/19 1:14 PM, Lars Aksel Opsahl wrote: > Do you or anybody know if there are any plans for a function call that > support the calling structure below or something like it and that then > could finish in 1 second ? (If you are calling a void function, the > return value should not be any problem.) > > DO > $body$ > *DECLARE* > command_string_list text[3]; > *BEGIN* > command_string_list[0] = 'SELECT pg_sleep(1)'; > command_string_list[1] = 'SELECT pg_sleep(1)'; > command_string_list[2] = 'SELECT pg_sleep(1)'; > EXECUTE_PARALLEL command_string_list; > *END* > $body$; > > The only way to this today as I understand it, is to open 3 new > connections back to the database which you can be done in different ways. Yes, correct. > If we had a parallel functions like the one above it's easier to > make parallel sql without using complex scripts, java, python or other > system. It does require one connection per statement, but with dblink it is not necessarily all that complex. For example (granted, this could use more error checking, etc.): 8<---------------- CREATE OR REPLACE FUNCTION execute_parallel(stmts text[]) RETURNS text AS $$ declare i int; retv text; conn text; connstr text; rv int; db text := current_database(); begin for i in 1..array_length(stmts,1) loop conn := 'conn' || i::text; connstr := 'dbname=' || db; perform dblink_connect(conn, connstr); rv := dblink_send_query(conn, stmts[i]); end loop; for i in 1..array_length(stmts,1) loop conn := 'conn' || i::text; select val into retv from dblink_get_result(conn) as d(val text); end loop; for i in 1..array_length(stmts,1) loop conn := 'conn' || i::text; perform dblink_disconnect(conn); end loop; return 'OK'; end; $$ language plpgsql; 8<---------------- And then: 8<---------------- \timing DO $$ declare stmts text[]; begin stmts[1] = 'select pg_sleep(1)'; stmts[2] = 'select pg_sleep(1)'; stmts[3] = 'select pg_sleep(1)'; PERFORM execute_parallel(stmts); end; $$ LANGUAGE plpgsql; DO Time: 1010.831 ms (00:01.011) 8<---------------- HTH, Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development