Re: Possible incorrect row estimation for Gather paths - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Possible incorrect row estimation for Gather paths
Date
Msg-id CAMbWs4-ZkH9t40LH8LMyZUuqbBww1k9OD+CH+O_7LJ7TwP3Zhw@mail.gmail.com
Whole thread Raw
In response to Re: Possible incorrect row estimation for Gather paths  (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>)
Responses Re: Possible incorrect row estimation for Gather paths
List pgsql-hackers
I can reproduce this problem with the query below.

explain (costs on) select * from tenk1 order by twenty;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Gather Merge  (cost=772.11..830.93 rows=5882 width=244)
   Workers Planned: 1
   ->  Sort  (cost=772.10..786.80 rows=5882 width=244)
         Sort Key: twenty
         ->  Parallel Seq Scan on tenk1  (cost=0.00..403.82 rows=5882 width=244)
(5 rows)

On Tue, Jul 16, 2024 at 3:56 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> The initial goal was to use the source tuples if available and avoid
> possible rounding errors. Though I realise that the difference would
> be minimal. For example, 200K tuples and 3 workers would yield
> int(int(200000 / 2.4) * 2.4)=199999. That is probably not worth the
> additional complexity, I've updated the patch to just use
> gather_rows_estimate.

I wonder if the changes in create_ordered_paths should also be reduced
to 'total_groups = gather_rows_estimate(path);'.

> I've also realised from the comments in optimizer.h that
> nodes/pathnodes.h should not be included there and fixed it.

I think perhaps it's better to declare gather_rows_estimate() in
cost.h rather than optimizer.h.
(BTW, I wonder if compute_gather_rows() would be a better name?)

I noticed another issue in generate_useful_gather_paths() -- *rowsp
would have a random value if override_rows is true and we use
incremental sort for gather merge.  I think we should fix this too.

Thanks
Richard



pgsql-hackers by date:

Previous
From: Joseph Koshakow
Date:
Subject: Re: Remove dependence on integer wrapping
Next
From: Michael Paquier
Date:
Subject: Re: Injection points: preloading and runtime arguments