Re: Are there plans to add data compression feature to postgresql? - Mailing list pgsql-general
From | Ivan Sergio Borgonovo |
---|---|
Subject | Re: Are there plans to add data compression feature to postgresql? |
Date | |
Msg-id | 20081031151206.4bc99c4a@dawn.webthatworks.it Whole thread Raw |
In response to | Re: Are there plans to add data compression feature to postgresql? (Gregory Stark <stark@enterprisedb.com>) |
Responses |
Re: Are there plans to add data compression feature to postgresql?
|
List | pgsql-general |
On Fri, 31 Oct 2008 08:49:56 +0000 Gregory Stark <stark@enterprisedb.com> wrote: > "Scott Marlowe" <scott.marlowe@gmail.com> writes: > > > What is the torn page problem? Note I'm no big fan of > > compressed file systems, but I can't imagine them not working > > with databases, as I've seen them work quite reliably under > > exhange server running a db oriented storage subsystem. And I > > can't imagine them not being invisible to an application, > > otherwise you'd just be asking for trouble. > Invisible under normal operation sure, but when something fails the > consequences will surely be different and I can't see how you > could make a compressed filesystem safe without a huge performance > hit. Pardon my naiveness but I can't get why compression and data integrity should be always considered clashing factors. DB operation are supposed to be atomic if fsync actually does what it is supposed to do. So you'd have coherency assured by proper execution of "fsync" going down to all HW levels before it reach permanent storage. Now suppose your problem is "avoiding to lose data" not avoiding to lose coherency. eg. you're having a very fast stream of data coming from the LHC. The faster you write to the disk the lower the chances to lose data in case you incur in some kind of hardware failure during the write. The fact you're choosing data compression or not depends on which kind of failure you think is more probable on your hardware and associated costs. If you expect gamma rays cooking your SCSI cables or an asteroid splashing your UPS, compression may be a good choice... it will make your data reach your permanent storage faster. If you expect your permanent storage to store data in a not reliable way (and not report back) a loss of 1 sector may correspond to larger loss of data. Another thing I think should be put in the equation of understanding where is your risk of data loss would be to factor in if your "data source" has some form of "data persistence". If it has you could introduce one more layer of "fsyncing", that means, your data source is not going to wipe the original copy till your DB report back that everything went fine (no asteroid etc...). etc... so data compression may be just one more tool to manage your budget for asteroid shelters. An annoyance of compression may be that while compression *on average* may let you put data faster on permanent storage it increase uncertainty on instant speed of transfer, especially if fs level and db level compression are not aware of each other and fs level compression is less aware of the data that is worth to compress. If I had to push more for data compression I'd make it data-type aware and switchable (or auto-switchable based on ANALYZE or stats results). Of course if you expect to have faulty "permanent storage", data compression *may* not be a good bet... but still it depends on hardware cost, rate of compression, specific kind of failure... eg. the more you compress the more RAID becomes cheaper... I understand Tom that DBA are paid to be paranoid and I really really really appreciate data stored in a format that doesn't require a long queue of tools to be read. I do really hate dependencies that translates in hours of *boring* work if something turn bad. BTW I gave a glance to MonetDB papers posted earlier and it seems that compression algorithms are strongly read-only search optimised. -- Ivan Sergio Borgonovo http://www.webthatworks.it
pgsql-general by date: