Re: Possible incorrect row estimation for Gather paths - Mailing list pgsql-hackers

From Rafia Sabih
Subject Re: Possible incorrect row estimation for Gather paths
Date
Msg-id CA+FpmFfaCbKy1D9r2cr90oKXRwDiat4mKWFVnT5b82xm8xidRw@mail.gmail.com
Whole thread Raw
In response to Possible incorrect row estimation for Gather paths  (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>)
Responses Re: Possible incorrect row estimation for Gather paths
List pgsql-hackers
Hello Anthonin,

I spent some time on this problem and your proposed solution.
As I understand it, this is the correction for the row count when the
number of parallel workers < 4.
Once the number of workers is 4 or more, the output from
parallel_divisor is the same as the number of parallel workers.
And then the number of rows for such cases would be the same with or
without the proposed patch.
So that way I think it is good to fix this case for a smaller number of workers.

But I don't quite understood the purpose of this,
+ total_groups = input_rel->rows;
+
+ /*
+ * If the number of rows is unknown, fallback to gather rows
+ * estimation
+ */
+ if (total_groups == 0)
+ total_groups = gather_rows_estimate(input_path);

why not just use total_groups = gather_rows_estimate(input_path), what
is the importance of having total_groups = input_rel->rows?

With respect to the change introduced by the patch in the regression
test, I wonder if we should test it on the tables of a larger scale
and check for performance issues.
Imagine the case when Parallel Seq Scan on extremely_skewed s
(cost=0.00..167.01 rows=1 width=4) returns 1000 rows instead of 1 ...
I wonder which plan would perform better then or will there be a
totally different plan.

However, my hunch is that there should not be serious problems,
because before this patch the number of estimated rows was incorrect
anyway.

I don't see a problem in merging the two patches.


On Fri, 24 May 2024 at 11:35, Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
>
> Hi,
>
> While experimenting on an explain option to display all plan candidates (very rough prototype here [1]), I've noticed
somediscrepancies in some generated plans. 
>
> EXPLAIN (ALL_CANDIDATES) SELECT * FROM pgbench_accounts order by aid;
>  Plan 1
> ->  Gather Merge  (cost=11108.32..22505.38 rows=100000 width=97)
>      Workers Planned: 1
>      ->  Sort  (cost=10108.31..10255.37 rows=58824 width=97)
>            Sort Key: aid
>            ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..2228.24 rows=58824 width=97)
>  Plan 2
> ->  Gather Merge  (cost=11108.32..17873.08 rows=58824 width=97)
>      Workers Planned: 1
>      ->  Sort  (cost=10108.31..10255.37 rows=58824 width=97)
>            Sort Key: aid
>            ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..2228.24 rows=58824 width=97)
>
> The 2 plans are similar except one Gather Merge has a lower 58K estimated rows.
>
> The first plan is created with generate_useful_gather_paths with override_rows=false then create_gather_merge_path
andwill use rel->rows as the row count (so the 100K rows of pgbench_accounts). 
> #0: create_gather_merge_path(...) at pathnode.c:1885:30
> #1: generate_useful_gather_paths(... override_rows=false) at allpaths.c:3286:11
> #2: apply_scanjoin_target_to_paths(...) at planner.c:7744:3
> #3: grouping_planner(...) at planner.c:1638:3
>
> The second plan is created through create_ordered_paths then create_gather_merge_path and the number of rows is
estimatedto (worker_rows * parallel_workers). Since we only have 1 parallel worker, this yields 58K rows. 
> #0: create_gather_merge_path(...) at pathnode.c:1885:30
> #1: create_ordered_paths(...) at planner.c:5287:5
> #2: grouping_planner(...) at planner.c:1717:17
>
> The 58K row estimate looks possibly incorrect. A worker row count is estimated using total_rows/parallel_divisor. The
parallel_divisorwill include the possible leader participation and will be 1.7 for 1 worker thus the 58K rows
(100K/1.7=58K)
> However, the gather merge will only do 58K*1, dropping the leader participation component.
>
> I have a tentative patch split in two changes:
> 1: This is a related refactoring to remove an unnecessary and confusing assignment of rows in
create_gather_merge_path.This value is never used and immediately overwritten in cost_gather_merge 
> 2: This changes the row estimation of gather path to use worker_rows*parallel_divisor to get a more accurate
estimation.Additionally, when creating a gather merge path in create_ordered_paths, we try to use the source's rel rows
whenavailable. 
>
> The patch triggered a small change in the hash_join regression test. Pre patch, we had the following candidates.
> Plan 4
> ->  Aggregate  (cost=511.01..511.02 rows=1 width=8)
>      ->  Gather  (cost=167.02..511.00 rows=2 width=0)
>            Workers Planned: 1
>            ->  Parallel Hash Join  (cost=167.02..510.80 rows=1 width=0)
>                  Hash Cond: (r.id = s.id)
>                  ->  Parallel Seq Scan on simple r  (cost=0.00..299.65 rows=11765 width=4)
>                  ->  Parallel Hash  (cost=167.01..167.01 rows=1 width=4)
>                        ->  Parallel Seq Scan on extremely_skewed s  (cost=0.00..167.01 rows=1 width=4)
> Plan 5
> ->  Finalize Aggregate  (cost=510.92..510.93 rows=1 width=8)
>      ->  Gather  (cost=510.80..510.91 rows=1 width=8)
>            Workers Planned: 1
>            ->  Partial Aggregate  (cost=510.80..510.81 rows=1 width=8)
>                  ->  Parallel Hash Join  (cost=167.02..510.80 rows=1 width=0)
>                        Hash Cond: (r.id = s.id)
>                        ->  Parallel Seq Scan on simple r  (cost=0.00..299.65 rows=11765 width=4)
>                        ->  Parallel Hash  (cost=167.01..167.01 rows=1 width=4)
>                              ->  Parallel Seq Scan on extremely_skewed s  (cost=0.00..167.01 rows=1 width=4)
>
> Post patch, the plan candidates became:
> Plan 4
> ->  Finalize Aggregate  (cost=511.02..511.03 rows=1 width=8)
>      ->  Gather  (cost=510.80..511.01 rows=2 width=8)
>            Workers Planned: 1
>            ->  Partial Aggregate  (cost=510.80..510.81 rows=1 width=8)
>                  ->  Parallel Hash Join  (cost=167.02..510.80 rows=1 width=0)
>                        Hash Cond: (r.id = s.id)
>                        ->  Parallel Seq Scan on simple r  (cost=0.00..299.65 rows=11765 width=4)
>                        ->  Parallel Hash  (cost=167.01..167.01 rows=1 width=4)
>                              ->  Parallel Seq Scan on extremely_skewed s  (cost=0.00..167.01 rows=1 width=4)
> Plan 5
> ->  Aggregate  (cost=511.01..511.02 rows=1 width=8)
>      ->  Gather  (cost=167.02..511.00 rows=2 width=0)
>            Workers Planned: 1
>            ->  Parallel Hash Join  (cost=167.02..510.80 rows=1 width=0)
>                  Hash Cond: (r.id = s.id)
>                  ->  Parallel Seq Scan on simple r  (cost=0.00..299.65 rows=11765 width=4)
>                  ->  Parallel Hash  (cost=167.01..167.01 rows=1 width=4)
>                        ->  Parallel Seq Scan on extremely_skewed s  (cost=0.00..167.01 rows=1 width=4)
>
> The FinalizeAggregate plan has an increased cost of 1 post patch due to the number of rows in the Gather node that
wentfrom 1 to 2 (rint(1 * 1.7)=2). This was enough to make the Agggregate plan cheaper. The test is to check the
parallelhash join so updating it with the new cheapest plan looked fine. 
>
> Regards,
> Anthonin
>
> [1]: https://github.com/bonnefoa/postgres/tree/plan-candidates



--
Regards,
Rafia Sabih



pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: Is it possible to create a cursor with hold using extended query protocol
Next
From: "David E. Wheeler"
Date:
Subject: Re: jsonpath: Inconsistency of timestamp_tz() Output