Re: A reloption for partitioned tables - parallel_workers - Mailing list pgsql-hackers

From Amit Langote
Subject Re: A reloption for partitioned tables - parallel_workers
Date
Msg-id CA+HiwqHMmU=GwUvpWEScQJzDgSF9-voZco8C5ttX_BzYqEuB6w@mail.gmail.com
Whole thread Raw
In response to Re: A reloption for partitioned tables - parallel_workers  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: A reloption for partitioned tables - parallel_workers  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-hackers
On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> On Wed, 2021-03-24 at 14:14 +1300, David Rowley wrote:
> > On Fri, 19 Mar 2021 at 02:07, Amit Langote <amitlangote09@gmail.com> wrote:
> > > Attached a new version rebased over c8f78b616, with the grouping
> > > relation partitioning enhancements as a separate patch 0001.  Sorry
> > > about the delay.
> >
> > I had a quick look at this and wondered if the partitioned table's
> > parallel workers shouldn't be limited to the sum of the parallel
> > workers of the Append's subpaths?
> >
> > It seems a bit weird to me that the following case requests 4 workers:
> >
> > # create table lp (a int) partition by list(a);
> > # create table lp1 partition of lp for values in(1);
> > # insert into lp select 1 from generate_series(1,10000000) x;
> > # alter table lp1 set (parallel_workers = 2);
> > # alter table lp set (parallel_workers = 4);
> > # set max_parallel_workers_per_Gather = 8;
> > # explain select count(*) from lp;
> >                                         QUERY PLAN
> > -------------------------------------------------------------------------------------------
> >  Finalize Aggregate  (cost=97331.63..97331.64 rows=1 width=8)
> >    ->  Gather  (cost=97331.21..97331.62 rows=4 width=8)
> >          Workers Planned: 4
> >          ->  Partial Aggregate  (cost=96331.21..96331.22 rows=1 width=8)
> >                ->  Parallel Seq Scan on lp1 lp  (cost=0.00..85914.57
> > rows=4166657 width=0)
> > (5 rows)
> >
> > I can see a good argument that there should only be 2 workers here.
>
> Good point, I agree.
>
> > If someone sets the partitioned table's parallel_workers high so that
> > they get a large number of workers when no partitions are pruned
> > during planning, do they really want the same number of workers in
> > queries where a large number of partitions are pruned?
> >
> > This problem gets a bit more complex in generic plans where the
> > planner can't prune anything but run-time pruning prunes many
> > partitions. I'm not so sure what to do about that, but the problem
> > does exist today to a lesser extent with the current method of
> > determining the append parallel workers.
>
> Also a good point.  That would require changing the actual number of
> parallel workers at execution time, but that is tricky.
> If we go with your suggestion above, we'd have to disambiguate if
> the number of workers is set because a partition is large enough
> to warrant a parallel scan (then it shouldn't be reduced if the executor
> prunes partitions) or if it is because of the number of partitions
> (then it should be reduced).

Maybe we really want a parallel_append_workers for partitioned tables,
instead of piggybacking on parallel_workers?

> I don't know if Seamus is still working on that; if not, we might
> mark it as "returned with feedback".

I have to agree given the time left.

> Perhaps Amit's patch 0001 should go in independently.

Perhaps, but maybe we should wait until something really needs that.

--
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)
Next
From: "Joel Jacobson"
Date:
Subject: Re: Idea: Avoid JOINs by using path expressions to follow FKs