Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | CAA4eK1Ks+OP3i9AmORVV7oerEKmvGKC+zvo38MeNQeZ+dYw=Yg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Wed, Mar 18, 2015 at 10:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Mar 14, 2015 at 1:04 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> # EXPLAIN SELECT DISTINCT bid FROM pgbench_accounts;
> >> ERROR: too many dynamic shared memory segments
> >
> > This happens because we have maximum limit on the number of
> > dynamic shared memory segments in the system.
> >
> > In function dsm_postmaster_startup(), it is defined as follows:
> >
> > maxitems = PG_DYNSHMEM_FIXED_SLOTS
> > + PG_DYNSHMEM_SLOTS_PER_BACKEND * MaxBackends;
> >
> > In the above case, it is choosing parallel plan for each of the
> > AppendRelation,
> > (because of seq_page_cost = 1000) and that causes the test to
> > cross max limit of dsm segments.
>
> The problem here is, of course, that each parallel sequential scan is
> trying to create an entirely separate group of workers. Eventually, I
> think we should fix this by rejiggering things so that when there are
> multiple parallel nodes in a plan, they all share a pool of workers.
> So each worker would actually get a list of plan nodes instead of a
> single plan node. Maybe it works on the first node in the list until
> that's done, and then moves onto the next, or maybe it round-robins
> among all the nodes and works on the ones where the output tuple
> queues aren't currently full, or maybe the master somehow notifies the
> workers which nodes are most useful to work on at the present time.
> But I think trying to figure this out is far too ambitious for 9.5,
> and I think we can have a useful feature without implementing any of
> it.
>
> But, we can't just ignore the issue right now, because erroring out on
> a large inheritance hierarchy is no good. Instead, we should fall
> back to non-parallel operation in this case. By the time we discover
> the problem, it's too late to change the plan, because it's already
> execution time. So we are going to be stuck executing the parallel
> node - just with no workers to help. However, what I think we can do
> is use a slab of backend-private memory instead of a dynamic shared
> memory segment, and in that way avoid this error. We do something
> similar when starting the postmaster in stand-alone mode: the main
> shared memory segment is replaced by a backend-private allocation with
> the same contents that the shared memory segment would normally have.
> The same fix will work here.
>
> Even once we make the planner and executor smarter, so that they don't
> create lots of shared memory segments and lots of separate worker
> pools in this type of case, it's probably still useful to have this as
> a fallback approach, because there's always the possibility that some
> other client of the dynamic shared memory system could gobble up all
> the segments. So, I'm going to go try to figure out the best way to
> implement this.
>
Thanks.
>
> On Sat, Mar 14, 2015 at 1:04 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> # EXPLAIN SELECT DISTINCT bid FROM pgbench_accounts;
> >> ERROR: too many dynamic shared memory segments
> >
> > This happens because we have maximum limit on the number of
> > dynamic shared memory segments in the system.
> >
> > In function dsm_postmaster_startup(), it is defined as follows:
> >
> > maxitems = PG_DYNSHMEM_FIXED_SLOTS
> > + PG_DYNSHMEM_SLOTS_PER_BACKEND * MaxBackends;
> >
> > In the above case, it is choosing parallel plan for each of the
> > AppendRelation,
> > (because of seq_page_cost = 1000) and that causes the test to
> > cross max limit of dsm segments.
>
> The problem here is, of course, that each parallel sequential scan is
> trying to create an entirely separate group of workers. Eventually, I
> think we should fix this by rejiggering things so that when there are
> multiple parallel nodes in a plan, they all share a pool of workers.
> So each worker would actually get a list of plan nodes instead of a
> single plan node. Maybe it works on the first node in the list until
> that's done, and then moves onto the next, or maybe it round-robins
> among all the nodes and works on the ones where the output tuple
> queues aren't currently full, or maybe the master somehow notifies the
> workers which nodes are most useful to work on at the present time.
Good idea. I think for this particular case, we might want to optimize
the work distribution such each worker gets one independent relation
segment to scan.
> But I think trying to figure this out is far too ambitious for 9.5,
> and I think we can have a useful feature without implementing any of
> it.
>
Agreed.
> But, we can't just ignore the issue right now, because erroring out on
> a large inheritance hierarchy is no good. Instead, we should fall
> back to non-parallel operation in this case. By the time we discover
> the problem, it's too late to change the plan, because it's already
> execution time. So we are going to be stuck executing the parallel
> node - just with no workers to help. However, what I think we can do
> is use a slab of backend-private memory instead of a dynamic shared
> memory segment, and in that way avoid this error. We do something
> similar when starting the postmaster in stand-alone mode: the main
> shared memory segment is replaced by a backend-private allocation with
> the same contents that the shared memory segment would normally have.
> The same fix will work here.
>
> Even once we make the planner and executor smarter, so that they don't
> create lots of shared memory segments and lots of separate worker
> pools in this type of case, it's probably still useful to have this as
> a fallback approach, because there's always the possibility that some
> other client of the dynamic shared memory system could gobble up all
> the segments. So, I'm going to go try to figure out the best way to
> implement this.
>
Thanks.
pgsql-hackers by date: