On Wed, 2009-12-30 at 14:55 -0500, Tom Lane wrote:
> Actually, in the problematic cases, it's interesting to consider the
> following strategy: when scalarineqsel notices that it's being asked for
> a range estimate that's outside the current histogram bounds, first try
> to obtain the actual current max() or min() of the column value --- this
> is something we can get fairly cheaply if there's a btree index on the
> column. If we can get it, plug it into the histogram, replacing the
> high or low bin boundary. Then estimate as we currently do. This would
> work reasonably well as long as re-analyzes happen at a time scale such
> that the histogram doesn't move much overall, ie, the number of
> insertions between analyzes isn't a lot compared to the number of rows
> per bin. We'd have some linear-in-the-bin-size estimation error because
> the modified last or first bin actually contains more rows than other
> bins, but it would certainly work a lot better than it does now.
Histograms often move quickly, but they seldom change shape.
Why not get both max() and min(), then rebase the histogram according to
those values. That way the histogram can still move significantly and
the technique will still work.
-- Simon Riggs www.2ndQuadrant.com