Home > mailing lists

Re: Fix incorrect start up costs for WindowAgg paths (bug #17862) - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Fix incorrect start up costs for WindowAgg paths (bug #17862)
Date	April 13, 2023 00:16:21
Msg-id	CAKU4AWo0K+85q-R8Hs_r9n47CV6_m6-GG_hCFCJcqhmBCbRDhw@mail.gmail.com Whole thread Raw
In response to	Re: Fix incorrect start up costs for WindowAgg paths (bug #17862) (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Apr 13, 2023 at 6:09 AM David Rowley <dgrowleyml@gmail.com> wrote:

.On Thu, 13 Apr 2023 at 02:28, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> The concept of startup_tuples for a WindowAgg looks good to me, but I
> can't follow up with the below line:
>
> + return clamp_row_est(partition_tuples * DEFAULT_INEQ_SEL);
>
> # select count(*) over() from tenk1 limit 1;
> count
> -------
> 10000 --> We need to scan all the tuples.
>
> Should we just return clamp_row_est(partition_tuples)?

For the case you've shown, it will. It's handled by this code:

if (wc->orderClause == NIL)
return clamp_row_est(partition_tuples);

My fault. I should have real debugging to double check my

understanding, surely I will next time.

It would take something like the following to hit the code you're
concerned about:

explain select count(*) over(order by unique1 rows between unbounded
preceding and 10*random() following) from tenk1;
QUERY PLAN
--------------------------------------------------------------------------------------------
WindowAgg (cost=140.23..420.29 rows=10000 width=12)
-> Index Only Scan using tenk1_unique1 on tenk1
(cost=0.29..270.29 rows=10000 width=4)
(2 rows)

You can see the startup cost is about 33% of the total cost for that,
which is from the DEFAULT_INEQ_SEL. I'm not exactly set on that
having to be DEFAULT_INEQ_SEL, but I'm not really sure what we could
put that's better. I don't really follow why assuming all rows are
required is better. That'll just mean we favour cheap startup plans
less, but there might be a case where a cheap startup plan is
favourable. I was opting for a happy medium when I thought to use
DEFAULT_INEQ_SEL.

That looks reasonable to me. My suggestion came from my misreading

before, It was a bit late in my time zone when writing. Thanks for the

detailed explanation!

I also see I might need to do a bit more work on this as the following
is not handled correctly:

select count(*) over(rows between unbounded preceding and 10
following) from tenk1;

it's assuming all rows due to lack of ORDER BY, but it seems like it
should be 10 rows due to the 10 FOLLOWING end bound.

True to me.

Best Regards

Andy Fan

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 12 April 2023, 23:25:05
Subject: Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

From: Michael Paquier
Date: 13 April 2023, 00:16:27
Subject: Clean up hba.c of code freeing regexps

Re: Fix incorrect start up costs for WindowAgg paths (bug #17862) - Mailing list pgsql-hackers

Previous

Next