Re: Measuring relation free space - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Measuring relation free space
Date
Msg-id 20111218165625.GB6393@tornado.leadboat.com
Whole thread Raw
In response to Re: Measuring relation free space  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Dec 16, 2011 at 02:02:03AM -0500, Greg Smith wrote:
> On 12/15/2011 04:11 PM, Noah Misch wrote:
>> Is one of those index figures simply wrong, or do they measure two senses of
>> free space, both of which are interesting to DBAs?
>
> I think the bigger one--the one I was aiming to measure--also includes  
> fill-factor space.  It should be possible to isolate whether that's true  
> by running the function against a fresh index, or by trying tests with a  
> table where there's no useful fill.  I need to add some of those to the  
> test example suite.

No, both measures include fillfactor space.  From a brief look at the code, the
proposed function counts space in non-leaf pages, while pgstattuple does not.
Also, the proposed function counts half-dead pages like live pages, while
pgstattuple counts them like dead pages.

One could perhaps justify those choices either way, but they seem too esoteric
for DBA exposure.  I recommend choosing a policy on each and making both
pgstattuple() and any new code respect that policy.

> Shaking out the alternate implementation ideas was really my goal for  
> this CF here.  The major goal of the next revision is to present the  
> options with a measure of their respective accuracy and runtime.  If I  
> have to give up just a of bit of accuracy and make it much faster,  
> that's probably what most people want as an option.  When Jaime and I  
> come back with an update, it really needs to have benchmarks and  
> accuracy numbers for each option.  That may be complicated a bit  
> depending on how much of the table or index is cached, so isolating that  
> out will be a pain.

The previous submission seemed to boil down to a speedier version of "SELECT
free_percent FROM pgstattuple('foo')".  (Some of the other statistics aren't
cheap.)  Considering that, the code does belong in the pgstattuple module.

The sampling approach you have mentioned sounds promising, especially for
indexes.  For heap bloat, it may be hard to improve on pg_freespacemap-based and
check_postgres-style estimates with anything less than a full heap scan.

Thanks,
nm


pgsql-hackers by date:

Previous
From: amit sehas
Date:
Subject: array behavior
Next
From: Josh Berkus
Date:
Subject: Re: unite recovery.conf and postgresql.conf