Bogus startup cost for WindowAgg - Mailing list pgsql-performance

From Ants Aasma
Subject Bogus startup cost for WindowAgg
Date
Msg-id AANLkTin=Zy22T1o88DkRhCpzq50j44PNs9zesXb7Y_rK@mail.gmail.com
Whole thread Raw
Responses Re: Bogus startup cost for WindowAgg  (Mladen Gogala <mladen.gogala@vmsinfo.com>)
Re: Bogus startup cost for WindowAgg  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Bogus startup cost for WindowAgg  (Mladen Gogala <mladen.gogala@vmsinfo.com>)
List pgsql-performance
I hit an issue with window aggregate costing while experimenting with
providing a count of the full match along side a limited result set.
Seems that the window aggregate node doesn't take into account that it
has to consume the whole input before outputting the first row. When
this is combined with a limit, the resulting cost estimate is wildly
underestimated, leading to suboptimal plans.

Is this a known issue? I couldn't find anything referring to this on
the mailing list or todo.

Code to reproduce follows:

ants=# CREATE TABLE test (a int, b int);
CREATE TABLE
ants=# INSERT INTO test (a,b) SELECT random()*1e6, random()*1e6 FROM
generate_series(1,1000000);
INSERT 0 1000000
ants=# CREATE INDEX a_idx ON test (a);
CREATE INDEX
ants=# CREATE INDEX b_idx ON test (b);
CREATE INDEX
ants=# ANALYZE test;
ANALYZE
ants=# EXPLAIN ANALYZE SELECT *, COUNT(*) OVER () FROM test WHERE a <
2500 ORDER BY b LIMIT 10;
                                                             QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..195.31 rows=10 width=8) (actual
time=728.325..728.339 rows=10 loops=1)
   ->  WindowAgg  (cost=0.00..46209.93 rows=2366 width=8) (actual
time=728.324..728.337 rows=10 loops=1)
         ->  Index Scan using b_idx on test  (cost=0.00..46180.36
rows=2366 width=8) (actual time=0.334..727.221 rows=2512 loops=1)
               Filter: (a < 2500)
 Total runtime: 728.401 ms
(5 rows)

ants=# SET enable_indexscan = off;
SET
ants=# EXPLAIN ANALYZE SELECT *, COUNT(*) OVER () FROM test WHERE a <
2500 ORDER BY b LIMIT 10;
                                                              QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3986.82..3986.85 rows=10 width=8) (actual
time=7.186..7.189 rows=10 loops=1)
   ->  Sort  (cost=3986.82..3992.74 rows=2366 width=8) (actual
time=7.185..7.187 rows=10 loops=1)
         Sort Key: b
         Sort Method:  top-N heapsort  Memory: 25kB
         ->  WindowAgg  (cost=46.70..3935.69 rows=2366 width=8)
(actual time=4.181..6.508 rows=2512 loops=1)
               ->  Bitmap Heap Scan on test  (cost=46.70..3906.12
rows=2366 width=8) (actual time=0.933..3.555 rows=2512 loops=1)
                     Recheck Cond: (a < 2500)
                     ->  Bitmap Index Scan on a_idx  (cost=0.00..46.10
rows=2366 width=0) (actual time=0.512..0.512 rows=2512 loops=1)
                           Index Cond: (a < 2500)
 Total runtime: 7.228 ms
(10 rows)

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: SQL functions vs. PL/PgSQL functions
Next
From: Jesper Krogh
Date:
Subject: Re: Slow count(*) again...