Measuring relation free space - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Measuring relation free space |
Date | |
Msg-id | 4EB5FA1B.1090305@2ndQuadrant.com Whole thread Raw |
Responses |
Re: Measuring relation free space
(Magnus Hagander <magnus@hagander.net>)
Re: Measuring relation free space (Bernd Helmle <mailings@oopsware.de>) a fast bloat measurement tool (was Re: Measuring relation free space) (Abhijit Menon-Sen <ams@2ndQuadrant.com>) |
List | pgsql-hackers |
Attached patch adds a new function to the pageinspect extension for measuring total free space, in either tables or indexes. It returns the free space as a percentage, so higher numbers mean more bloat. After trying a couple of ways to quantify it, I've found this particular measure correlates well with the nastiest bloat issues I've ran into in production recently. For example, an index that had swelled to over 5X its original size due to autovacuum issues registered at 0.86 on this scale. I could easily see people putting an alert at something like 0.40 and picking candidates to reindex based on it triggering. That would be about a million times smarter than how I've been muddling through this class of problems so far. Code by Jaime Casanova, based on a prototype by me. Thanks to attendees and sponsors of the PgWest conference for helping to fund some deeper exploration of this idea. Here's a test case showing it in action: create extension pageinspect; create table t (k serial,v integer); insert into t(v) (select generate_series(1,100000)); create index t_idx on t(k); delete from t where k<50000; vacuum t; gsmith=# select relation_free_space('t'); relation_free_space --------------------- 0.445466 gsmith=# select relation_free_space('t_idx'); relation_free_space --------------------- 0.550946 Some open questions in my mind: -Given this is doing a full table scan, should it hook into a ring buffer to keep from trashing the buffer cache? Or might it loop over the relation in a different way all together? I was thinking about eyeing the FSM instead at one point, didn't explore that yet. There's certainly a few ways to approach this, we just aimed at the easiest way to get a working starter implementation, and associated results to compare others against. -Should there be a non-superuser version of this? We'd certainly need to get a less cache demolishing version before that would seem wise. -There were related things in the pageinspect module, but a case could be made for this being a core function instead. It's a bit more likely to be used in production than the rest of that extension. -What if anything related to TOAST should this handle? We're also planning to do a sampling version of this, using the same approach ANALYZE does. Grab a number of blocks, extrapolate from there. It shouldn't take many samples before the accuracy is better than how people are estimated this now. That work is just waiting on some better thinking about how to handle the full relation version first. And, yes, the explanation in the docs and code should be clear that it's returning a percentage, which I just realized when writing this. At least I remembered to document something; still ahead of the average new patch... -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
Attachment
pgsql-hackers by date: