Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers

From Mark Mielke
Subject Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date
Msg-id 4962B6E9.2060703@mark.mielke.cc
Whole thread Raw
In response to Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
Gregory Stark wrote: <blockquote cite="mid:871vvhjqv5.fsf@oxford.xeocode.com" type="cite"><pre wrap="">Mark Mielke <a
class="moz-txt-link-rfc2396E"href="mailto:mark@mark.mielke.cc"><mark@mark.mielke.cc></a> writes:
</pre><blockquotetype="cite"><pre wrap="">It seems to me that transparent file system compression doesn't have limits
 
like "files must be less than 1 Mbyte to be compressed". They don't exhibit
poor file system performance.   </pre></blockquote><pre wrap="">
Well I imagine those implementations are more complex than toast is. I'm not
sure what lessons we can learn from their behaviour directly. </pre><blockquote type="cite"><pre wrap="">I remember
backin the 386/486 days, that I would always DriveSpace compress
 
everything, because hard disks were so slow then that DriveSpace would
actually increase performance.   </pre></blockquote><pre wrap="">
Surely this depends on whether your machine was cpu starved or disk starved?
Do you happen to recall which camp these anecdotal machines from 1980 fell in? </pre></blockquote><br /> I agree. I'm
sureit was disk I/O starved - and maybe not just the disk. The motherboard might have contributed. :-)<br /><br /> My
productionmachine in 2008/2009 for my uses still seems I/O bound. The main database server I use is 2 x Intel Xeon 3.0
Ghz(dual-core) = 4 cores, and the uptime load average for the whole system is currently 0.10. The database and web
serveruse their own 4 drives with RAID 10 (main system is on two other drives). Yes, I could always upgrade to a
fancy/largerRAID array, SAS, 15k RPM drives, etc. but if a PostgreSQL tweak were to give me 30% more performance at a
15%CPU cost... I think that would be a great alternative option. :-)<br /><br /> Memory may also play a part. My server
athome has 4Mbytes of L2 cache and 4Gbytes of RAM running with 5-5-5-18 DDR2 at 1000Mhz. At these speeds, my realized
bandwidthfor RAM is 6.0+ Gbyte/s. My L1/L2 operate at 10.0+ Gbyte/s. Compression doesn't run that fast, so at least for
me,the benefit of having something in L1/L2 cache vs RAM isn't great, however, my disks in the RAID10 configuraiton
onlyread/write at ~150Mbyte/s sustained, and much less if seeking is required. Compressing the data means 30% more data
mayfit into RAM or 30% increase in data read from disk, as I assume many compression algorithms can beat 150
Mbyte/s.<br/><br /> Is my configuration typical? It's probably becoming more so. Certainly more common than the 10+
diskhardware RAID configurations.<br /><br /><br /><blockquote cite="mid:871vvhjqv5.fsf@oxford.xeocode.com"
type="cite"><blockquotetype="cite"><pre wrap="">The toast tables already give a sort of block-addressable scheme.
 
Compression can be on a per block or per set of blocks basis allowing for
seek into the block,   </pre></blockquote><pre wrap="">
The current toast architecture is that we compress the whole datum, then store
the datum either inline or using the same external blocking mechanism that we
use when not compressing. So this doesn't fit at all.
It does seem like an interesting idea to have toast chunks which are
compressed individually. So each chunk could be, say, an 8kb chunk of
plaintext and stored as whatever size it ends up being after compression. That
would allow us to do random access into external chunks as well as allow
overlaying the cpu costs of decompression with the i/o costs. It would get a
lower compression ratio than compressing the whole object together but we
would have to experiment to see how big a problem that was.

It would be pretty much rewriting the toast mechanism for external compressed
data though. Currently the storage and the compression are handled separately.
This would tie the two together in a separate code path.

Hm, It occurs to me we could almost use the existing code. Just store it as a
regular uncompressed external datum but allow the toaster to operate on the
data column (which it's normally not allowed to) to compress it, but not store
it externally. </pre></blockquote> Yeah - sounds like it could be messy.<br /><br /><blockquote
cite="mid:871vvhjqv5.fsf@oxford.xeocode.com"type="cite"><blockquote type="cite"><pre wrap="">or if compression doesn't
seemto be working for the first few blocks, the
 
later blocks can be stored uncompressed? Or is that too complicated compared
to what we have now? :-)   </pre></blockquote><pre wrap="">
Actually we do that now, it was part of the same patch we're discussing. </pre></blockquote><br /> Cheers,<br />
mark<br/><br /><pre class="moz-signature" cols="72">-- 
 
Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a>
</pre>

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Function with defval returns wrong result
Next
From: Andrew Chernow
Date:
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)