Home > mailing lists

Re: Explain buffers wrong counter with parallel plans - Mailing list pgsql-hackers

From	Jonathan S. Katz
Subject	Re: Explain buffers wrong counter with parallel plans
Date	July 25, 2018 17:12:29
Msg-id	BEC5CFD6-ADF1-4D12-938D-F2C38575EC63@excoventures.com Whole thread Raw
In response to	Re: Explain buffers wrong counter with parallel plans (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Explain buffers wrong counter with parallel plans
List	pgsql-hackers

Tree view

On Jul 7, 2018, at 12:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jul 7, 2018 at 7:45 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jul 7, 2018 at 12:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jul 6, 2018 at 9:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I have tried this idea, but it doesn't completely solve the problem.
The problem is that nodes below LIMIT won't get a chance to accumulate
the stats as they won't be able to call InstrStopNode.

I'm not sure I understand. Why not? I see that we'd need to insert
an extra call to InstrStopNode() if we were stopping the node while it
was running, because then InstrStartNode() would have already been
done, but the corresponding call to InstrStopNode() would not have
been done. But I'm not sure how that would happen in this case. Can
you explain further?

Okay, let me try. The code flow is that for each node we will call
InstrStartNode()->ExecProcNodeReal()->InstrStopNode(). Now let's say
we have to execute a plan Limit->Gather-> Parallel SeqScan. In this,
first for Limit node, we will call InstrStartNode() and
ExecProcNodeReal() and then for Gather we will call InstrStartNode(),
ExecProcNodeReal() and InstrStopNode(). Now, Limit node decides that
it needs to shutdown all the nodes (ExecShutdownNode) and after that
it will call InstrStopNode() for Limit node. So, in this flow after
shutting down nodes, we never get chance for Gather node to use stats
collected during ExecShutdownNode.

I went ahead and tried the solution which I had mentioned yesterday,
that is to allow ExecShutdownNode to count stats. Apart from fixing
this problem, it will also fix the problem with Gather Merge as
reported by Adrien [1], because now Gather Merge will also get a
chance to count stats after shutting down workers.

Note that, I have changed the location of InstrStartParallelQuery in
ParallelQueryMain so that the buffer usage stats are accumulated only
for the plan execution which is what we do for instrumentation
information as well. If we don't do that, it will count some
additional stats for ExecutorStart which won't match with what we have
in Instrumentation structure of each node.

I tried running the below on both 11beta2 and HEAD with the patch

applied:

CREATE UNLOGGED TABLE t1 (c1 int);

INSERT INTO t1 SELECT generate_series(1,100000000);

/** restart PostgreSQL */

EXPLAIN (analyze,buffers,timing off,costs off)

SELECT count(*) FROM t1;

/** repeat above two queries */

I have identical postgresql.conf files on both instances as well.

11beta2

======

buffers=# explain (analyze,buffers,timing off,costs off) select count(*)

from t1;

QUERY PLAN

--------------------------------------------------------------------------

Finalize Aggregate (actual rows=1 loops=1)

Buffers: shared read=63175

-> Gather (actual rows=7 loops=1)

Workers Planned: 6

Workers Launched: 6

Buffers: shared read=63175

-> Partial Aggregate (actual rows=1 loops=7)

Buffers: shared read=442478

-> Parallel Seq Scan on t1 (actual rows=14285714 loops=7)

Buffers: shared read=442478

Planning Time: 1.422 ms

Execution Time: 3214.407 ms

(12 rows)

buffers=# explain (analyze,buffers,timing off,costs off) select count(*)

from t1;

QUERY PLAN

--------------------------------------------------------------------------

Finalize Aggregate (actual rows=1 loops=1)

Buffers: shared hit=27 read=64960

-> Gather (actual rows=7 loops=1)

Workers Planned: 6

Workers Launched: 6

Buffers: shared hit=27 read=64960

-> Partial Aggregate (actual rows=1 loops=7)

Buffers: shared hit=224 read=442254

-> Parallel Seq Scan on t1 (actual rows=14285714 loops=7)

Buffers: shared hit=224 read=442254

Planning Time: 0.080 ms

Execution Time: 3774.091 ms

(12 rows)

HEAD

=====

buffers=# explain (analyze,buffers,timing off,costs off) select count(*)

from t1;

QUERY PLAN

--------------------------------------------------------------------------

Finalize Aggregate (actual rows=1 loops=1)

Buffers: shared read=442478

-> Gather (actual rows=7 loops=1)

Workers Planned: 6

Workers Launched: 6

Buffers: shared read=442478

-> Partial Aggregate (actual rows=1 loops=7)

Buffers: shared read=442478

-> Parallel Seq Scan on t1 (actual rows=14285714 loops=7)

Buffers: shared read=442478

Planning Time: 1.594 ms

Execution Time: 6207.799 ms

(12 rows)

buffers=# explain (analyze,buffers,timing off,costs off) select count(*)

from t1;

QUERY PLAN

--------------------------------------------------------------------------

Finalize Aggregate (actual rows=1 loops=1)

Buffers: shared hit=224 read=442254

-> Gather (actual rows=7 loops=1)

Workers Planned: 6

Workers Launched: 6

Buffers: shared hit=224 read=442254

-> Partial Aggregate (actual rows=1 loops=7)

Buffers: shared hit=224 read=442254

-> Parallel Seq Scan on t1 (actual rows=14285714 loops=7)

Buffers: shared hit=224 read=442254

Planning Time: 0.074 ms

Execution Time: 5006.395 ms

(12 rows)

Notice the “read” numbers just before the “Finalize Aggregate” is much

higher, and there appears to be a performance hit.

Jonathan

Attachment

signature.asc

pgsql-hackers by date:

From: Michael Paquier
Date: 25 July 2018, 17:08:33
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?

From: David Steele
Date: 25 July 2018, 17:18:30
Subject: Re: Missing pg_control crashes postmaster

Re: Explain buffers wrong counter with parallel plans - Mailing list pgsql-hackers

Attachment

Previous

Next