Partial path row estimates - Mailing list pgsql-hackers

From Thomas Munro
Subject Partial path row estimates
Date
Msg-id CA+hUKG+rwqQH=E5+SQ2PZO4pWSU22pVaceqMJPrdPJmqjAdyaA@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hello hackers,

I don't like this:

 ->  Parallel Hash  (... rows=416667 ...) (... rows=333333 ...)

I think the logic in get_parallel_divisor() only makes sense for
queries like this (output with nearby patch to show leader
contribution):

postgres=# set parallel_tuple_cost = 0;
SET
postgres=# set parallel_setup_cost = 0;
SET
postgres=# explain (analyze, verbose) select * from t;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..8591.67 rows=1000000 width=4) (actual
time=0.460..411.944 rows=1000000 loops=1)
   Output: i
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on public.t  (cost=0.00..8591.67 rows=416667
width=4) (actual time=0.066..136.777 rows=333333 loops=3)
         Output: i
         Leader: actual time=0.022..8.061 rows=30058 loops=1
<--- poor contribution
         Worker 0: actual time=0.150..201.007 rows=502574 loops=1
         Worker 1: actual time=0.027..201.263 rows=467368 loops=1
 Planning Time: 0.071 ms
 Execution Time: 495.700 ms
(11 rows)

For anything that consumes its entire input before emitting a tuple,
for example Partial Aggregate, Sort and Hash, it doesn't make sense to
mangle the child paths' row estimates.  For example:

 ->  Parallel Hash  (cost=8591.67..8591.67 rows=416667 width=4)
(actual time=263.402..263.403 rows=333333 loops=3)
       Output: t2.i
       Buckets: 131072  Batches: 16  Memory Usage: 3520kB
       Leader: actual time=266.861..266.861 rows=341888 loops=1
<--- equal contribution
       Worker 0: actual time=261.521..261.522 rows=322276 loops=1
       Worker 1: actual time=261.824..261.825 rows=335836 loops=1

get_parallel_divisor() effectively assumes that every tuple emitted by
the path will cause a tuple to reach the Gather node, at which point
the gather distraction quotient needs to be estimated, so it does that
up front.  Whether that really happens depends on where the path
finishes up being used.

I wonder if it would be better to get rid of that logic completely,
and instead tweak the gather path's run cost to account for the
processing asymmetry.  In cases where the leader contributes very
little, you could argue that it's not OK to ignore leader distraction
in child paths (and therefore use avg(cardinality), not
max(cardinality) over all processes in, say, a nestloop), but on the
other hand, for cases that use if for something important like
estimating memory consumption (hash, sort, agg) that's exactly what
you want anyway because they're greedy.

With this approach, I suspect gather merge doesn't need to do anything
at all, because there we expect the leader to be forced to contribute
equally by the ordering requirement.

Here's an experimental patch to show what I mean.  Not tested much,
just trying out the idea.

Attachment

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: dropdb --force
Next
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.