Re: Bogus startup cost for WindowAgg - Mailing list pgsql-performance

From Ants Aasma
Subject Re: Bogus startup cost for WindowAgg
Date
Msg-id AANLkTimz9344yJ2ribypx=Sun7Rmgx-HWmPoTapKvcuL@mail.gmail.com
Whole thread Raw
In response to Re: Bogus startup cost for WindowAgg  (Mladen Gogala <mladen.gogala@vmsinfo.com>)
List pgsql-performance
On Wed, Oct 13, 2010 at 10:35 PM, Mladen Gogala
<mladen.gogala@vmsinfo.com> wrote:
> You will see that for most of the columns, the length of the histogram array
> corresponds to the value of the default_statistics_target parameter. For
> those that are smaller, the size is the total number of values in the column
> in the sample taken by the "analyze" command. The longer histogram, the
> better plan. In this case, the size does matter.

The issue here isn't that the statistics are off. The issue is, as Tom
said, that the optimizer doesn't consider them for the cost model of
the window aggregate. The trivial case I put forward wouldn't be too
hard to cover - if there's no partitioning of the window and the frame
is over the full partition, the startup cost should be nearly the same
as the full cost. But outside of the trick I tried, I'm not sure if
the trivial case matters much. I can also see how the estimation gets
pretty hairy when partitioning, frames and real window functions come
into play.

One idea would be to cost three different cases. If the aggregate
needs to read ahead some most likely constant number of rows, i.e. is
not using an unbounded following frame, leave the startup cost as is.
If there is partitioning, estimate the number of groups produced by
the partitioning and add one n-th of the difference between startup
and total cost. Otherwise, if the frame is to the end of the partition
and there is no partitioning, set the startup cost equal to total
cost, or in terms of the previous case, n=1. I don't know how accurate
estimating the number of groups would be, or even if it is feasible to
do it. If those assumptions hold, then it seems to me that this method
should at-least cover any large O(n) effects.

pgsql-performance by date:

Previous
From: "mark"
Date:
Subject: Re: Slow count(*) again...
Next
From: Merlin Moncure
Date:
Subject: Re: SQL functions vs. PL/PgSQL functions