Re: Significantly larger toast tables on 8.4? - Mailing list pgsql-hackers

From Gregory Maxwell
Subject Re: Significantly larger toast tables on 8.4?
Date
Msg-id e692861c0901070644y6f55f441gb39397ab4aca736b@mail.gmail.com
Whole thread Raw
In response to Re: Significantly larger toast tables on 8.4?  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
On Fri, Jan 2, 2009 at 5:48 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
> So you compromise. You split the data into say 1MB blobs and compress
> each individually. Then if someone does a substring at offset 3MB you
> can find it quickly. This barely costs you anything in the compression
> ratio mostly.
>
> Implementation though, that's harder. The size of the blobs is tunable
> also. I imagine the optimal value will probably be around 100KB. (12
> blocks uncompressed).

Or have the database do that internally:  With the available fast
compression algorithms (zlib; lzo; lzf; etc) the diminishing return
from larger compression block sizes kicks in rather quickly. Other
algos like LZMA or BZIP gain more from bigger block sizes, but I
expect all of them are too slow to ever consider using in PostgreSQL.

So, I expect that the compression loss from compressing in chunks of
64kbytes would be minimal. The database could then include a list of
offsets for the 64kbyte chunks at the beginning of the field, or
something like that.  A short substring would then require
decompressing just one or two blocks, far less overhead then
decompressing everything.

It would probably be worthwhile to graph compression ratio vs block
size for some reasonable input.  I'd offer to do it; but I doubt I
have a reasonable test set for this.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: reducing statistics write overhead
Next
From: Bruce Momjian
Date:
Subject: Re: Multiplexing SUGUSR1