Re: Table and Index compression - Mailing list pgsql-hackers

From Sam Mason
Subject Re: Table and Index compression
Date
Msg-id 20090807123835.GD5407@samason.me.uk
Whole thread Raw
In response to Re: Table and Index compression  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
On Fri, Aug 07, 2009 at 12:59:57PM +0100, Greg Stark wrote:
> On Fri, Aug 7, 2009 at 12:48 PM, Sam Mason<sam@samason.me.uk> wrote:
> >> Well most users want compression for the space savings. So running out
> >> of space sooner than without compression when most of the space is
> >> actually unused would disappoint them.
> >
> > Note, that as far as I can tell for a filesystems you only need to keep
> > enough reserved for the amount of uncompressed dirty buffers you have in
> > memory.  As space runs out in the filesystem all that happens is that
> > the amount of (uncompressed?) dirty buffers you can safely have around
> > decreases.
> 
> And when it drops to zero?

That was why I said you need to have one page left "to handle the base
case".  I was treating the inductive case as the interesting common case
and considered the base case of lesser interest.

> > In PG's case, it would seem possible to do the compression and then
> > check to see if the resulting size is greater than 4kB.  If it is you
> > write into the 4kB page size and write uncompressed data.  Upon reading
> > you do the inverse, if it's 4kB then no need to decompress.  I believe
> > TOAST does this already.
> 
> It does, as does gzip and afaik every compression system.

It's still a case that needs to be handled explicitly by the code.  Just
for reference, gzip does not appear to do this when I test it:
 echo -n 'a' | gzip > tmp.gz gzip -l --verbose tmp.gz

says the compression ratio is "-200%" (an empty string results in
an infinite increase in size yet gets displayed as "0%" for some
strange reason).  It's only when you hit six 'a's that you start to get
positive ratios.  Note that that this is taking headers into account;
the compressed size is 23 bytes for both 'aaa' and 'aaaaaa' but the
uncompressed size obviously changes.

gzip does indeed have a "copy" method, but it doesn't seem to be being
used.

--  Sam  http://samason.me.uk/


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Table and Index compression
Next
From: Robert Haas
Date:
Subject: Re: Table and Index compression