Re: self-tuning histograms - Mailing list pgsql-hackers
From | Gavin Sherry |
---|---|
Subject | Re: self-tuning histograms |
Date | |
Msg-id | Pine.LNX.4.21.0205301322500.31132-100000@linuxworld.com.au Whole thread Raw |
In response to | self-tuning histograms (Neil Conway <nconway@klamath.dyndns.org>) |
Responses |
Re: self-tuning histograms
|
List | pgsql-hackers |
Neil, I've also been thinking about this but haven't had time to collect my thoughts. On Wed, 29 May 2002, Neil Conway wrote: > Histogram refinement can take place in two possible ways: online > (as queries are executed, the histograms are updated immediately), > or offline (the necessary data is written to a log after every > query, which is processed on a regular basis to refine the > histograms). I would have thought that offline would have meant that the histogram refinement could be run at the DBA's leisure. > There are some disadvantages, however: > > (1) Reproduceability: At the moment, the system's performance > only changes when the data is changed, or the DBA makes a > configuration change. With this (and other "self-tuning" > techniques, which are becoming very popular among > commercial databases), the system can change the state of > the system without the intervention of the DBA. While I'd > hope that those changes are for the better (i.e. histograms > eventually converging toward "perfect" accuracy), that > won't always be the case. I don't really see a way around > this, other than letting the DBA disable ST histograms > when debugging problems. Self-tuning would have to be optional. > (2) Performance: As Aboulnaga and Shaudhuri point out, online > histogram refinement can become a point of contention. > Obviously, we want to avoid that. I think online refinement > is still possible as long as we: > > (a) don't block waiting for locks: try to acquire the > necessary locks to refine the histograms, > immediately give up if not possible > > (b) delay histogram refinement so it doesn't interfere > with the user: for example, store histogram data > locally and only update the system catalogs when > the backend is idle This should be fine as long as the refinement system works through MVCC. There is another consideration. If a database is using histogram refinement then the 'base' data it works on must be accurate. If not, refinement would compound the inaccuracy of the histogram. As such, ANALYZE would have to scan the whole table (if/when run), COPY would have to update the statistics, etc. Gavin
pgsql-hackers by date: