Re: plan_rows confusion with parallel queries - Mailing list pgsql-hackers

From Robert Haas
Subject Re: plan_rows confusion with parallel queries
Date
Msg-id CA+TgmoY4XN2G05Ka8qNmegL4WkRX0HtiZgun9keNKPG8+BXBrA@mail.gmail.com
Whole thread Raw
In response to Re: plan_rows confusion with parallel queries  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Wed, Nov 2, 2016 at 10:44 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Although - it is estimating 1M rows, but only "per worker" estimates are
> shown, and because there are 2 workers planned it says 1M/2.4 which is the
> 416k. I agree it's a bit unclear, but at least it's consistent with how we
> treat loops (i.e. that the numbers are per loop).

Right.  Which I think was a horrible decision.  I think that it would
be best to change EXPLAIN so that the row counts and costs are never
divided by nloops.  That would be a backward-incompatible change, but
I think it would be worth it.  What you typically want to understand
is the total effort expended in a particular plan node, and the
current system makes that incredibly difficult to understand,
especially because we then round off the row count estimates to the
nearest integer, so that you can't even reverse the division if you
want to (which you always do).

> But there's more fun with joins - consider for example this simple join:
>
>                                QUERY PLAN
> ------------------------------------------------------------------------------
>  Gather  (cost=19515.96..43404.82 rows=96957 width=12)
>          (actual time=295.167..746.312 rows=99999 loops=1)
>    Workers Planned: 2
>    Workers Launched: 2
>    ->  Hash Join  (cost=18515.96..32709.12 rows=96957 width=12)
>                   (actual time=249.281..670.309 rows=33333 loops=3)
>          Hash Cond: (t2.a = t1.a)
>          ->  Parallel Seq Scan on t2
>              (cost=0.00..8591.67 rows=416667 width=8)
>              (actual time=0.100..184.315 rows=333333 loops=3)
>          ->  Hash  (cost=16925.00..16925.00 rows=96957 width=8)
>                    (actual time=246.760..246.760 rows=99999 loops=3)
>                Buckets: 131072  Batches: 2  Memory Usage: 2976kB
>                ->  Seq Scan on t1
>                    (cost=0.00..16925.00 rows=96957 width=8)
>                    (actual time=0.065..178.385 rows=99999 loops=3)
>                      Filter: (b < 100000)
>                      Rows Removed by Filter: 900001
>  Planning time: 0.763 ms
>  Execution time: 793.653 ms
> (13 rows)
>
> Suddenly we don't show per-worker estimates for the hash join - both the
> Hash Join and the Gather have exactly the same cardinality estimate.

I'm not sure why that's happening, but I haven't made any changes to
the costing for a node like hash join.  It doesn't treat the parallel
sequential scan that is coming as its first input any differently than
it would if that were a non-parallel plan.  It's just costing the join
normally, based on an input row count that is lower than what it would
be if it were going to see every row from t2 rather than only some of
them.

> So, different join method but same result - 2 workers, loops=3. But let's
> try with small tables (100k rows instead of 1M rows):
>
>                                   QUERY PLAN
> ----------------------------------------------------------------------------
>  Gather  (cost=0.29..36357.94 rows=100118 width=12) (actual
> time=13.219..589.723 rows=100000 loops=1)
>    Workers Planned: 1
>    Workers Launched: 1
>    Single Copy: true
>    ->  Nested Loop  (cost=0.29..36357.94 rows=100118 width=12)
>                     (actual time=0.288..442.821 rows=100000 loops=1)
>          ->  Seq Scan on t1  (cost=0.00..1444.18 rows=100118 width=8)
>                       (actual time=0.148..49.308 rows=100000 loops=1)
>          ->  Index Scan using t2_a_idx on t2
>                       (cost=0.29..0.34 rows=1 width=8)
>                       (actual time=0.002..0.002 rows=1 loops=100000)
>                Index Cond: (a = t1.a)
>  Planning time: 0.483 ms
>  Execution time: 648.941 ms
> (10 rows)
>
> Suddenly, we get nworkers=1 with loops=1 (and not nworkers+1 as before).
> FWIW I've only seen this with force_parallel_mode=on, and the row counts are
> correct, so perhaps that's OK. single_copy seems a bit underdocumented,
> though.

This is certainly entirely as expected.  Single-copy means that
there's one process running the non-parallel plan beneath it, and
that's it.  So the Gather is just a pass-through node here, like a
Materialize or Sort: the number of input rows and the number of output
rows have to be the same.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: plan_rows confusion with parallel queries
Next
From: Robert Haas
Date:
Subject: Re: WAL consistency check facility