Re: Buffers from parallel workers not accumulated to upper nodes with gather merge - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: Buffers from parallel workers not accumulated to upper nodes with gather merge
Date
Msg-id CAA4eK1L28FjP4D7f7F3FMiq+x9wa5maJWG7vrBXUr0XFv20MMw@mail.gmail.com
Whole thread Raw
In response to Re: Buffers from parallel workers not accumulated to upper nodes with gather merge  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
On Mon, Jul 20, 2020 at 11:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jul 18, 2020 at 7:32 PM Jehan-Guillaume de Rorthais
> <jgdr@dalibo.com> wrote:
> >
> > Hi,
> >
> > A bug report[1] on pev2 project exposes another bug from explain analyze
> > buffers, similar to the one fixed 2 years ago[2].
> >
> > In this report, buffers from the workers are not propagated to upper nodes
> > on top of the Gather Merge node. On the opposite, buffers from workers ARE
> > propagated to upper nodes on top of a Gather node.
> >
> > Here is some SQL to reproduce:
> >
> >   CREATE TABLE contacts (contactSourceName text, filler text)
> >       WITH ( autovacuum_enabled = off);
> >   INSERT INTO contacts
> >       SELECT rpad(chr(64+i)::text,10,'0'),
> >           rpad(j::text,500,'0')
> >       FROM generate_series(1,26) i
> >       CROSS JOIN  generate_series(1,3200) j;
> >
> >   SET work_mem TO '4MB';
> >   SET random_page_cost to 4;
> >   SET jit TO off ;
> >   SET parallel_setup_cost TO 0;
> >   SET parallel_tuple_cost TO 0;
> >
> >   EXPLAIN (ANALYZE, BUFFERS, VERBOSE, COSTS OFF, TIMING OFF)
> >   SELECT contactsourceName, count(*)
> >   FROM contacts
> >   GROUP BY contactSourceName;
> >
> >   Finalize GroupAggregate (actual rows=26 loops=1)
> >     Buffers: shared hit=2087 read=1463 written=32
> >     ->  Gather Merge (actual rows=70 loops=1)
> >           Workers Planned: 2
> >           Workers Launched: 2
> >           Buffers: shared hit=3910 read=2049 written=96
> >           ->  Sort (actual rows=23 loops=3)
> >                 Buffers: shared hit=3910 read=2049 written=96
> >                 Worker 0:
> >                   Buffers: shared hit=914 read=287 written=32
> >                 Worker 1:
> >                   Buffers: shared hit=909 read=299 written=32
> >   [...]
> >
> > Gather Merge exposes shared hit=3910.
> > Finalize GroupAggregate exposes shared hit=2087.
> > The 909 and 914 buffers from workers have not been propagated.
> >
> > Comparing with a Gather node:
> >
> >   EXPLAIN (ANALYZE, BUFFERS, VERBOSE, COSTS OFF, TIMING OFF) SELECT count(*) AS
> >   c FROM contacts;
> >
> >   Finalize Aggregate (actual rows=1 loops=1)
> >     Output: count(*)
> >     Buffers: shared hit=3894 read=2049 written=96
> >     ->  Gather (actual rows=3 loops=1)
> >           Workers Planned: 2
> >           Workers Launched: 2
> >           Buffers: shared hit=3894 read=2049 written=96
> >           ->  Partial Aggregate (actual rows=1 loops=3)
> >                 Buffers: shared hit=3894 read=2049 written=96
> >                 Worker 0:
> >                   Buffers: shared hit=870 read=299 written=32
> >                 Worker 1:
> >                   Buffers: shared hit=887 read=301 written=32
> >   [...]
> >
> > Buffers are correctly propagated to the Finalize Aggregate node.
> >
> > I'm not perfectly clear with the code, but here is my analysis of the issue
> > and a fix proposal.
> >
> > The Merge node works correctly because it calls ExecShutdownGatherWorkers as
> > soon as the workers are exhausted from gather_readnext. Because of this,
> > buffers from workers are already accounted and propagated to upper nodes
> > before the recursive call of ExecShutdownNode on each nodes. There's no similar
> > call to ExecShutdownGatherMergeWorkers for Gather Merge. Adding a call to
> > ExecShutdownGatherMergeWorkers in gather_merge_getnext when workers are
> > exhausted seems to fix the issue, but I feel like this is the wrong place to
> > fix this issue.
> >
>
> Why do you think so?  I think irrespective of whether we want to call
> ExecShutdownGatherMergeWorkers in gather_merge_getnext (when we know
> we are done aka binaryheap_empty) to fix this particular issue, it is
> better to shutdown the workers as soon as we are done similar to what
> we do for Gather node.  It is good to release resources as soon as we
> can.
>

Apart from above, we should also do what you are suggesting in the
patch as that is also a problem and your fix seems correct to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-bugs by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Buffers from parallel workers not accumulated to upper nodes with gather merge
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: Buffers from parallel workers not accumulated to upper nodes with gather merge