Re: a fast bloat measurement tool (was Re: Measuring relation free space) - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Date
Msg-id 54EA8E61.9080306@BlueTreble.com
Whole thread Raw
In response to Re: a fast bloat measurement tool (was Re: Measuring relation free space)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: a fast bloat measurement tool (was Re: Measuring relation free space)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 2/22/15 5:41 PM, Tomas Vondra wrote:
> Otherwise, the code looks OK to me. Now, there are a few features I'd
> like to have for production use (to minimize the impact):
>
> 1) no index support:-(
>
>     I'd like to see support for more relation types (at least btree
>     indexes). Are there any plans for that? Do we have an idea on how to
>     compute that?

It'd be cleaner if had actual an actual am function for this, but see below.

> 2) sampling just a portion of the table
>
>     For example, being able to sample just 5% of blocks, making it less
>     obtrusive, especially on huge tables. Interestingly, there's a
>     TABLESAMPLE patch in this CF, so maybe it's possible to reuse some
>     of the methods (e.g. functions behind SYSTEM sampling)?
>
> 3) throttling
>
>     Another feature minimizing impact of running this on production might
>     be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s'
>     or something along those lines.
>
> 4) prefetch
>
>     fbstat_heap is using visibility map to skip fully-visible pages,
>     which is nice, but if we skip too many pages it breaks readahead
>     similarly to bitmap heap scan. I believe this is another place where
>     effective_io_concurrency (i.e. prefetch) would be appropriate.

All of those wishes are solved in one way or another by vacuum and/or 
analyze. If we had a hook in the tuple scanning loop and at the end of 
vacuum you could just piggy-back on it. But really all we'd need for 
vacuum to be able to report this info is one more field in LVRelStats, a 
call to GetRecordedFreeSpace for all-visible pages, and some logic to 
deal with pages skipped because we couldn't get the vacuum lock.

Should we just add this to vacuum instead?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Next
From: Robert Haas
Date:
Subject: Re: Redesigning checkpoint_segments