Re: self-tuning histograms - Mailing list pgsql-hackers

From Neil Conway
Subject Re: self-tuning histograms
Date
Msg-id 20020530004210.2adb6137.nconway@klamath.dyndns.org
Whole thread Raw
In response to Re: self-tuning histograms  (Gavin Sherry <swm@linuxworld.com.au>)
List pgsql-hackers
On Thu, 30 May 2002 13:52:08 +1000 (EST)
"Gavin Sherry" <swm@linuxworld.com.au> wrote:
> On Wed, 29 May 2002, Neil Conway wrote:
> > Histogram refinement can take place in two possible ways: online
> > (as queries are executed, the histograms are updated immediately),
> > or offline (the necessary data is written to a log after every
> > query, which is processed on a regular basis to refine the
> > histograms).
> 
> I would have thought that offline would have meant that the histogram
> refinement could be run at the DBA's leisure.

Yeah -- that makes more sense.

> > (2) Performance: As Aboulnaga and Shaudhuri point out, online
> >     histogram refinement can become a point of contention.
> >     Obviously, we want to avoid that.
> 
> This should be fine as long as the refinement system works through MVCC.

Good point -- the current pg_statistic routines are MVCC aware, so
there's no reason to change that. In that case, my concerns about
contention over histogram refinement may be unfounded. I still think we
should avoid redundant histogram refinements (i.e. don't update the
histograms on every single query), but I'm glad that MVCC solves most
of this problem for us.

> There is another consideration. If a database is using histogram
> refinement then the 'base' data it works on must be accurate. If not,
> refinement would compound the inaccuracy of the histogram.

Aboulnaga and Shaudhuri doesn't require this. The initial assumption
is that the attribute is uniformly distributed over the initial
buckets of the histogram. Naturally, this is incorrect: as queries
are executed, the initial histogram is modified by "refinement"
(the frequences of individual buckets are adjusted) and
"restructuring" (bucket boundaries are adjusted). For more
information (and the exact algorithms used), see sections 3.2
and 3.3 of the paper.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: ipv6
Next
From: Oleg Bartunov
Date:
Subject: Re: Null values in indexes