Re: a fast bloat measurement tool (was Re: Measuring relation free space) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Date
Msg-id 54EA852F.8070000@2ndquadrant.com
Whole thread Raw
In response to Re: a fast bloat measurement tool (was Re: Measuring relation free space)  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Responses Re: a fast bloat measurement tool (was Re: Measuring relation free space)  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: a fast bloat measurement tool (was Re: Measuring relation free space)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi!

On 28.1.2015 05:03, Abhijit Menon-Sen wrote:
> At 2015-01-27 17:00:27 -0600, Jim.Nasby@BlueTreble.com wrote:
>>
>> It would be best to get this into a commit fest so it's not lost.
> 
> It's there already.
> 
> -- Abhijit
> 
> 

I looked at this patch today, so a few comments from me:

1) I believe the patch is slightly broken, because the version was  bumped to 1.3 but only partially, so I get this on
"makeinstall"
 
   $ make -j9 -s install   /usr/bin/install: cannot stat ‘./pgstattuple--1.3.sql’: No such   file or directory
../../src/makefiles/pgxs.mk:129:recipe for target 'install' failed   make[1]: *** [install] Error 1   Makefile:85:
recipefor target 'install-pgstattuple-recurse' failed   make: *** [install-pgstattuple-recurse] Error 2   make: ***
Waitingfor unfinished jobs....
 
  The problem seems to be that Makefile already lists --1.3.sql in the  DATA section, but the file was not renamed
(there'sjust --1.2.sql).
 
  After fixing that, it's also necessary to fix the version in the  control file, otherwise the CREATE EXTENSION
fails.
     default_version = '1.2' -> '1.3'
  At least that fixed the install for me.


2) Returning Datum from fbstat_heap, which is a static function seems a  bit strange to me, I'd use the HeapTuple
directly,but meh ...
 


3) The way the tuple is built by first turning the values into strings  and then turned back into Datums by calling
BuildTupleFromCStrings is more serious I guess. Why not to just keep the Datums and call  heap_form_tuple directly?
 
  I see the other functions in pgstattuple.c do use the string way, so  maybe there's a reason for that? Or is this
justto keep the code  consistent in the extension?
 

4) I really doubt 'fastbloat' is a good name. I propose 'pgstatbloat'  to make that consistent with the rest of
pgstattuplefunctions. Or  something similar, but 'fastbloat' is just plain confusing.
 


Otherwise, the code looks OK to me. Now, there are a few features I'd
like to have for production use (to minimize the impact):

1) no index support :-(
  I'd like to see support for more relation types (at least btree  indexes). Are there any plans for that? Do we have
anidea on how to  compute that?
 

2) sampling just a portion of the table
  For example, being able to sample just 5% of blocks, making it less  obtrusive, especially on huge tables.
Interestingly,there's a  TABLESAMPLE patch in this CF, so maybe it's possible to reuse some  of the methods (e.g.
functionsbehind SYSTEM sampling)?
 

3) throttling
  Another feature minimizing impact of running this on production might  be some sort of throttling, e.g. saying 'limit
thescan to 4 MB/s'  or something along those lines.
 

4) prefetch
  fbstat_heap is using visibility map to skip fully-visible pages,  which is nice, but if we skip too many pages it
breaksreadahead  similarly to bitmap heap scan. I believe this is another place where  effective_io_concurrency (i.e.
prefetch)would be appropriate.
 

regards

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: restrict global access to be readonly
Next
From: Jim Nasby
Date:
Subject: Re: a fast bloat measurement tool (was Re: Measuring relation free space)