On Thu, 23 Mar 2023 at 07:04, PG Bug reporting form
<noreply@postgresql.org> wrote:
> SELECT large_table.*, count(*) OVER ()
> FROM generate_series(1, 1000000000000) large_table
> LIMIT 10
>
> I would have expected a query plan something like this, with a large overall
> cost:
>
> Limit (cost=0.00..22500000000.00 rows=10 width=16)
> -> WindowAgg (cost=0.00..22500000000.00 rows=1000000000000 width=16)
> -> Function Scan on generate_series large_table
> (cost=0.00..10000000000.00 rows=1000000000000 width=8)
>
> But I actually get this query plan, with a cost of 0.23:
>
> Limit (cost=0.00..0.23 rows=10 width=16)
> -> WindowAgg (cost=0.00..22500000000.00 rows=1000000000000 width=16)
> -> Function Scan on generate_series large_table
> (cost=0.00..10000000000.00 rows=1000000000000 width=8)
It likely would be possible to adjust cost_windowagg() to figure out a
startup_cost for getting the first row from a WindowFunc. Doing so
would require looking at the frame options and trying to figure out
how many rows need to be looked at. If you'd written count(*) OVER
(rows between current row and 10 following) then we'd only need to
look forward 10 rows from the current row. I'm really just not sure
this is worth the trouble unless you or someone else can demonstrate
that it's causing actual problems.
> I believe this (on a more complicated query) is affecting the plan chosen by
> the optimizer.
I immediately see what alternative plans could be considered and not
chosen as a result of this. Can you give an example?
David