Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date
Msg-id 871vvhjqv5.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Mark Mielke <mark@mark.mielke.cc>)
Responses Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Mark Mielke <mark@mark.mielke.cc>)
List pgsql-hackers
Mark Mielke <mark@mark.mielke.cc> writes:

> It seems to me that transparent file system compression doesn't have limits
> like "files must be less than 1 Mbyte to be compressed". They don't exhibit
> poor file system performance.

Well I imagine those implementations are more complex than toast is. I'm not
sure what lessons we can learn from their behaviour directly.

> I remember back in the 386/486 days, that I would always DriveSpace compress
> everything, because hard disks were so slow then that DriveSpace would
> actually increase performance.

Surely this depends on whether your machine was cpu starved or disk starved?
Do you happen to recall which camp these anecdotal machines from 1980 fell in?

> The toast tables already give a sort of block-addressable scheme.
> Compression can be on a per block or per set of blocks basis allowing for
> seek into the block,

The current toast architecture is that we compress the whole datum, then store
the datum either inline or using the same external blocking mechanism that we
use when not compressing. So this doesn't fit at all.

It does seem like an interesting idea to have toast chunks which are
compressed individually. So each chunk could be, say, an 8kb chunk of
plaintext and stored as whatever size it ends up being after compression. That
would allow us to do random access into external chunks as well as allow
overlaying the cpu costs of decompression with the i/o costs. It would get a
lower compression ratio than compressing the whole object together but we
would have to experiment to see how big a problem that was.

It would be pretty much rewriting the toast mechanism for external compressed
data though. Currently the storage and the compression are handled separately.
This would tie the two together in a separate code path.

Hm, It occurs to me we could almost use the existing code. Just store it as a
regular uncompressed external datum but allow the toaster to operate on the
data column (which it's normally not allowed to) to compress it, but not store
it externally.

> or if compression doesn't seem to be working for the first few blocks, the
> later blocks can be stored uncompressed? Or is that too complicated compared
> to what we have now? :-)

Actually we do that now, it was part of the same patch we're discussing.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


pgsql-hackers by date:

Previous
From: "Stephen R. van den Berg"
Date:
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Next
From: "Stephen R. van den Berg"
Date:
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)