Re: How to run in parallel in Postgres - Mailing list pgsql-performance

From Lars Aksel Opsahl
Subject Re: How to run in parallel in Postgres
Date
Msg-id HE1P189MB0266944A7C861C8711453F7B9D5E0@HE1P189MB0266.EURP189.PROD.OUTLOOK.COM
Whole thread Raw
In response to Re: How to run in parallel in Postgres  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-performance

> From: Justin Pryzby <pryzby@telsasoft.com>

> Sent: Saturday, December 7, 2019 2:25 AM

> To: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>

> Cc: pgsql-performance@lists.postgresql.org <pgsql-performance@lists.postgresql.org>

> Subject: Re: How to run in parallel in Postgres

>  

> On Thu, Dec 05, 2019 at 12:10:42PM +0000, Lars Aksel Opsahl wrote:

> > I have a function that prepares data, so the big job can be run it in parallel.

> > 

> > Since Postgres now supports parallel I was wondering if it's easy to trigger parallel dynamically created SQL calls.

> > 

> > If you look at https://github.com/larsop/find-overlap-and-gap/blob/master/src/test/sql/regress/find_overlap_and_gap.sql  you see that

> > 

> > find_overlap_gap_make_run_cmd generates as set of 28 sql calls.

> >

> > So is it in a simple way possible to use Postgres parallel functionality to call this 28 functions i parallel so I don't have dependent on externally install programs  ?

> 

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',1,28);

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',2,28);

> SELECT find_overlap_gap_single_cell('test_data.overlap_gap_input_t1','geom',4258,'test_data.overlap_gap_input_t1_res',3,28);

> ...

> 

> I see that find_overlap_gap_single_cell creates tables, so cannot be run in parallel.

> Maybe you could consider rewriting it to return data to its caller instead.

> You'd also need to mark it as PARALLEL SAFE, of course.

> Your other functions involved should be PARALLEL SAFE too.

> 

> Justin


Hi Justin

The reason why I don't return the results Is that on very bug tables I usually  get memory problems if I return all the results to the master function. So I usually break thing up into small unlogged tables. Then I work on each table separately or in groups. When all steps are done i merge all the small tables together. I this case we only single step, but usually I work many more steps.

But I will keep mind that it may work i parallel if I don't create any child tables but returns the result.

Thanks.

Lars

pgsql-performance by date:

Previous
From: Lars Aksel Opsahl
Date:
Subject: Re: How to run in parallel in Postgres
Next
From: Jean-David Beyer
Date:
Subject: Re: Legal disclaimers on emails to this group