Re: incremental sort vs. gather paths - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: incremental sort vs. gather paths
Date
Msg-id 2d00f31d-08e7-05b8-ccf3-b0645943d80a@enterprisedb.com
Whole thread Raw
In response to incremental sort vs. gather paths  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: incremental sort vs. gather paths
List pgsql-hackers

On 12/16/21 17:48, Robert Haas wrote:
> This code introduced in ba3e76cc571eba3dea19c9465ff15ac3ac186576 looks
> wrong to me:
> 
>              total_groups = cheapest_partial_path->rows *
>                  cheapest_partial_path->parallel_workers;
>              path = (Path *)
>                  create_gather_merge_path(root, ordered_rel,
>                                           path,
>                                           path->pathtarget,
>                                           root->sort_pathkeys, NULL,
>                                           &total_groups);
> 
> This too:
> 
>                  total_groups = input_path->rows *
>                      input_path->parallel_workers;
> 
> This came to my attention because Dave Page sent me a query plan that
> looks like this:
> 
> Gather Merge  (cost=22617.94..22703.35 rows=732 width=97) (actual
> time=2561.476..2561.856 rows=879 loops=1)
>     Workers Planned: 2
>     Workers Launched: 2
>     ->  Sort  (cost=21617.92..21618.83 rows=366 width=97) (actual
> time=928.329..928.378 rows=293 loops=3)
>           Sort Key: aid
>           Sort Method: quicksort  Memory: 155kB
>           Worker 0:  Sort Method: quicksort  Memory: 25kB
>           Worker 1:  Sort Method: quicksort  Memory: 25kB
>           ->  Parallel Seq Scan on accounts_1m  (cost=0.00..21602.33
> rows=366 width=97) (actual time=74.391..74.518 rows=293 loops=3)
>                 Filter: (aid < 10000000)
>                 Rows Removed by Filter: 333040
> 
> If you look at the actual row count estimates, you see that the Gather
> Merge produces 3 times the number of rows that the Parallel Seq Scan
> produces, which is completely correct, because the raw number is 897
> in both cases, but EXPLAIN unhelpfully divides the displayed row count
> by the loop count, which in this case is 3. If you look at the
> estimated row count, you see that the Gather Merge is estimated to
> produce exactly 2 times the number of rows that the nodes under it
> would produce. That's not a very good estimate, unless
> parallel_leader_participation=off, which in this case it isn't.
> 
> What's happening here is that the actual number of rows produced by
> accounts_1m is actually 879 and is estimated as 879.
> get_parallel_divisor() decides to divide that number by 2.4, and gets
> 366, because it thinks the leader will do less work than the other
> workers. Then the code above fires and, instead of letting the
> original row count estimate for accounts_1m apply to the Gather Merge
> path, it overrides it with 2 * 366 = 732. This cannot be right.
> Really, I don't think it should be overriding the row count estimate
> at all. But if it is, multiplying by the number of workers can't be
> right, because the leader can also do stuff.
> 

Maybe, but other places (predating incremental sort) creating Gather 
Merge do the same thing, and commit ba3e76cc57 merely copied this. For 
example generate_gather_paths() does this:

     foreach(lc, rel->partial_pathlist)
     {
         Path       *subpath = (Path *) lfirst(lc);
         GatherMergePath *path;

         if (subpath->pathkeys == NIL)
             continue;

         rows = subpath->rows * subpath->parallel_workers;
         path = create_gather_merge_path(root, rel, subpath,
                                         rel->reltarget,
                                     subpath->pathkeys, NULL, rowsp);
         add_path(rel, &path->path);
     }

i.e. it's doing the same (rows * parallel_workers) calculation.

It may not not be the right way to estimate this, of course. But I'd say 
if ba3e76cc57 is doing it wrong, so are the older places.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Use generation context to speed up tuplesorts
Next
From: Andrew Dunstan
Date:
Subject: Re: Buildfarm support for older versions