Re: Table and Index compression - Mailing list pgsql-hackers

From Sam Mason
Subject Re: Table and Index compression
Date
Msg-id 20090807090950.GZ5407@samason.me.uk
Whole thread Raw
In response to Re: Table and Index compression  (Pierre Frédéric Caillaud<lists@peufeu.com>)
List pgsql-hackers
On Fri, Aug 07, 2009 at 10:36:39AM +0200, Pierre Frrrdddric Caillaud wrote:
> Also, about compressed NTFS : it can give you disk-full errors on read().
> While this may appear stupid, it is in fact very good.

Is this not just because they've broken the semantics of read?

> As a side note, I have also tested lzjb (the ZFS compressor) and lzo is  
> much faster, and compresses much better (sometimes 2x better).

Disks are fast and cheap; a basic IDE disk runs at over 100MB/s now, and
it's doing this in the background while your CPU is doing other stuff.
If you're also decompressing stuff you're serializing even more and
you're doing so with a much power hungrier device (the CPU).

How fast is decompression (as that seems to be your selling point)?  Lzo
claims to run at about about a third of main memory bandwidth which is
nice, however research projects found this to be far too slow and were
only getting positive results when decompression stayed in secondary
cache.  Basically decompression has to run at several GB/s for it to
have much measurable benefit.

> I made a quick check before implementing it, using python scripts to play  
> with sparse files on ext3 :
> 
> - writing a sparse file is a bit (not much) slower than a regular file,
> - reading from a non-fragmented sparse file is as fast as reading a  
> regular file
> - holes do not eat OS disk cache (which is the most interesting point)
> - reading from cache is as fast as reading a regular file (and faster if  
> you don't read the holes because you know they are holes, which is the  
> case here)

Numbers?

> And, also, growing a sparse file by plugging the holes in it WILL allocate  
> blocks all over the place and render IO extremely inefficient.
> You can defrag it, of course (using fs-specific tools or just cpio), but  
> that's not "high-availability"...

That would not seem to difficult to solve.

> I forgot to talk about SSDs in the previous message. SSDs are quite  
> expensive, but seek really fast.

SSDs are about decreasing latency; if you're putting compression in
there you're pushing latency up as well.  If you don't care about
latency you get traditional rotating media.

--  Sam  http://samason.me.uk/


pgsql-hackers by date:

Previous
From: Pierre Frédéric Caillaud
Date:
Subject: Re: Table and Index compression
Next
From: Muhammad Aqeel
Date:
Subject: Patch to remove inconsistency in dependency.c