Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) - Mailing list pgsql-hackers
From | Mark Mielke |
---|---|
Subject | Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) |
Date | |
Msg-id | 4962B6E9.2060703@mark.mielke.cc Whole thread Raw |
In response to | Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) (Gregory Stark <stark@enterprisedb.com>) |
List | pgsql-hackers |
Gregory Stark wrote: <blockquote cite="mid:871vvhjqv5.fsf@oxford.xeocode.com" type="cite"><pre wrap="">Mark Mielke <a class="moz-txt-link-rfc2396E"href="mailto:mark@mark.mielke.cc"><mark@mark.mielke.cc></a> writes: </pre><blockquotetype="cite"><pre wrap="">It seems to me that transparent file system compression doesn't have limits like "files must be less than 1 Mbyte to be compressed". They don't exhibit poor file system performance. </pre></blockquote><pre wrap=""> Well I imagine those implementations are more complex than toast is. I'm not sure what lessons we can learn from their behaviour directly. </pre><blockquote type="cite"><pre wrap="">I remember backin the 386/486 days, that I would always DriveSpace compress everything, because hard disks were so slow then that DriveSpace would actually increase performance. </pre></blockquote><pre wrap=""> Surely this depends on whether your machine was cpu starved or disk starved? Do you happen to recall which camp these anecdotal machines from 1980 fell in? </pre></blockquote><br /> I agree. I'm sureit was disk I/O starved - and maybe not just the disk. The motherboard might have contributed. :-)<br /><br /> My productionmachine in 2008/2009 for my uses still seems I/O bound. The main database server I use is 2 x Intel Xeon 3.0 Ghz(dual-core) = 4 cores, and the uptime load average for the whole system is currently 0.10. The database and web serveruse their own 4 drives with RAID 10 (main system is on two other drives). Yes, I could always upgrade to a fancy/largerRAID array, SAS, 15k RPM drives, etc. but if a PostgreSQL tweak were to give me 30% more performance at a 15%CPU cost... I think that would be a great alternative option. :-)<br /><br /> Memory may also play a part. My server athome has 4Mbytes of L2 cache and 4Gbytes of RAM running with 5-5-5-18 DDR2 at 1000Mhz. At these speeds, my realized bandwidthfor RAM is 6.0+ Gbyte/s. My L1/L2 operate at 10.0+ Gbyte/s. Compression doesn't run that fast, so at least for me,the benefit of having something in L1/L2 cache vs RAM isn't great, however, my disks in the RAID10 configuraiton onlyread/write at ~150Mbyte/s sustained, and much less if seeking is required. Compressing the data means 30% more data mayfit into RAM or 30% increase in data read from disk, as I assume many compression algorithms can beat 150 Mbyte/s.<br/><br /> Is my configuration typical? It's probably becoming more so. Certainly more common than the 10+ diskhardware RAID configurations.<br /><br /><br /><blockquote cite="mid:871vvhjqv5.fsf@oxford.xeocode.com" type="cite"><blockquotetype="cite"><pre wrap="">The toast tables already give a sort of block-addressable scheme. Compression can be on a per block or per set of blocks basis allowing for seek into the block, </pre></blockquote><pre wrap=""> The current toast architecture is that we compress the whole datum, then store the datum either inline or using the same external blocking mechanism that we use when not compressing. So this doesn't fit at all. It does seem like an interesting idea to have toast chunks which are compressed individually. So each chunk could be, say, an 8kb chunk of plaintext and stored as whatever size it ends up being after compression. That would allow us to do random access into external chunks as well as allow overlaying the cpu costs of decompression with the i/o costs. It would get a lower compression ratio than compressing the whole object together but we would have to experiment to see how big a problem that was. It would be pretty much rewriting the toast mechanism for external compressed data though. Currently the storage and the compression are handled separately. This would tie the two together in a separate code path. Hm, It occurs to me we could almost use the existing code. Just store it as a regular uncompressed external datum but allow the toaster to operate on the data column (which it's normally not allowed to) to compress it, but not store it externally. </pre></blockquote> Yeah - sounds like it could be messy.<br /><br /><blockquote cite="mid:871vvhjqv5.fsf@oxford.xeocode.com"type="cite"><blockquote type="cite"><pre wrap="">or if compression doesn't seemto be working for the first few blocks, the later blocks can be stored uncompressed? Or is that too complicated compared to what we have now? :-) </pre></blockquote><pre wrap=""> Actually we do that now, it was part of the same patch we're discussing. </pre></blockquote><br /> Cheers,<br /> mark<br/><br /><pre class="moz-signature" cols="72">-- Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a> </pre>
pgsql-hackers by date: