.On Thu, 13 Apr 2023 at 02:28, Andy Fan <zhihui.fan1213@gmail.com> wrote: > The concept of startup_tuples for a WindowAgg looks good to me, but I > can't follow up with the below line: > > + return clamp_row_est(partition_tuples * DEFAULT_INEQ_SEL); > > # select count(*) over() from tenk1 limit 1; > count > ------- > 10000 --> We need to scan all the tuples. > > Should we just return clamp_row_est(partition_tuples)?
For the case you've shown, it will. It's handled by this code:
if (wc->orderClause == NIL) return clamp_row_est(partition_tuples);
My fault. I should have real debugging to double check my
understanding, surely I will next time.
It would take something like the following to hit the code you're concerned about:
explain select count(*) over(order by unique1 rows between unbounded preceding and 10*random() following) from tenk1; QUERY PLAN -------------------------------------------------------------------------------------------- WindowAgg (cost=140.23..420.29 rows=10000 width=12) -> Index Only Scan using tenk1_unique1 on tenk1 (cost=0.29..270.29 rows=10000 width=4) (2 rows)
You can see the startup cost is about 33% of the total cost for that, which is from the DEFAULT_INEQ_SEL. I'm not exactly set on that having to be DEFAULT_INEQ_SEL, but I'm not really sure what we could put that's better. I don't really follow why assuming all rows are required is better. That'll just mean we favour cheap startup plans less, but there might be a case where a cheap startup plan is favourable. I was opting for a happy medium when I thought to use DEFAULT_INEQ_SEL.
That looks reasonable to me. My suggestion came from my misreading
before, It was a bit late in my time zone when writing. Thanks for the
detailed explanation!
I also see I might need to do a bit more work on this as the following is not handled correctly:
select count(*) over(rows between unbounded preceding and 10 following) from tenk1;
it's assuming all rows due to lack of ORDER BY, but it seems like it should be 10 rows due to the 10 FOLLOWING end bound.