On Fri, Aug 07, 2009 at 12:59:57PM +0100, Greg Stark wrote:
> On Fri, Aug 7, 2009 at 12:48 PM, Sam Mason<sam@samason.me.uk> wrote:
> >> Well most users want compression for the space savings. So running out
> >> of space sooner than without compression when most of the space is
> >> actually unused would disappoint them.
> >
> > Note, that as far as I can tell for a filesystems you only need to keep
> > enough reserved for the amount of uncompressed dirty buffers you have in
> > memory. As space runs out in the filesystem all that happens is that
> > the amount of (uncompressed?) dirty buffers you can safely have around
> > decreases.
>
> And when it drops to zero?
That was why I said you need to have one page left "to handle the base
case". I was treating the inductive case as the interesting common case
and considered the base case of lesser interest.
> > In PG's case, it would seem possible to do the compression and then
> > check to see if the resulting size is greater than 4kB. If it is you
> > write into the 4kB page size and write uncompressed data. Upon reading
> > you do the inverse, if it's 4kB then no need to decompress. I believe
> > TOAST does this already.
>
> It does, as does gzip and afaik every compression system.
It's still a case that needs to be handled explicitly by the code. Just
for reference, gzip does not appear to do this when I test it:
echo -n 'a' | gzip > tmp.gz gzip -l --verbose tmp.gz
says the compression ratio is "-200%" (an empty string results in
an infinite increase in size yet gets displayed as "0%" for some
strange reason). It's only when you hit six 'a's that you start to get
positive ratios. Note that that this is taking headers into account;
the compressed size is 23 bytes for both 'aaa' and 'aaaaaa' but the
uncompressed size obviously changes.
gzip does indeed have a "copy" method, but it doesn't seem to be being
used.
-- Sam http://samason.me.uk/