Re: Thoughts on statistics for continuously advancing columns - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Thoughts on statistics for continuously advancing columns
Date
Msg-id 1262287538.19367.11937.camel@ebony
Whole thread Raw
In response to Re: Thoughts on statistics for continuously advancing columns  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Thoughts on statistics for continuously advancing columns  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, 2009-12-30 at 14:55 -0500, Tom Lane wrote:

> Actually, in the problematic cases, it's interesting to consider the
> following strategy: when scalarineqsel notices that it's being asked for
> a range estimate that's outside the current histogram bounds, first try
> to obtain the actual current max() or min() of the column value --- this
> is something we can get fairly cheaply if there's a btree index on the
> column.  If we can get it, plug it into the histogram, replacing the
> high or low bin boundary.  Then estimate as we currently do.  This would
> work reasonably well as long as re-analyzes happen at a time scale such
> that the histogram doesn't move much overall, ie, the number of
> insertions between analyzes isn't a lot compared to the number of rows
> per bin.  We'd have some linear-in-the-bin-size estimation error because
> the modified last or first bin actually contains more rows than other
> bins, but it would certainly work a lot better than it does now.

Histograms often move quickly, but they seldom change shape.

Why not get both max() and min(), then rebase the histogram according to
those values. That way the histogram can still move significantly and
the technique will still work.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Thoughts on statistics for continuously advancing columns
Next
From: Heikki Linnakangas
Date:
Subject: Re: Re-enabling SET ROLE in security definer functions