Re: Are there plans to add data compression feature to postgresql? - Mailing list pgsql-general

From Sam Mason
Subject Re: Are there plans to add data compression feature to postgresql?
Date
Msg-id 20081030114602.GA2459@frubble.xen.chris-lamb.co.uk
Whole thread Raw
In response to Re: Are there plans to add data compression feature to postgresql?  (Grant Allen <gxallen@gmail.com>)
Responses Re: Are there plans to add data compression feature to postgresql?  ("Grzegorz Jaśkiewicz" <gryzman@gmail.com>)
List pgsql-general
On Thu, Oct 30, 2008 at 03:50:20PM +1100, Grant Allen wrote:
> One other thing I forgot to mention:  Compression by the DB trumps
> filesystem compression in one very important area - shared_buffers! (or
> buffer_cache, bufferpool or whatever your favourite DB calls its working
> memory for caching data).  Because the data stays compressed in the
> block/page when cached by the database in one of its buffers, you get
> more bang for you memory buck in many circumstances!  Just another angle
> to contemplate :-)

The database research project known as MonetDB/X100 has been looking at
this recently; the first paper below gives a bit of an introduction into
the design of the database and the second into the effects of different
compression schemes:

  http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuBoNeHe:DEBULL:05
  http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=ZuHeNeBo:ICDE:06

The important thing seems to be is that you don't want a storage
efficient compression scheme, decent RAID subsystems demand a very
lightweight scheme that can be decompressed at several GB/s (i.e. two or
three cycles per tuple, not 50 to 100 like traditional schemes like zlib
or bzip).  It's very interesting reading (references to "commercial DBMS
`X'" being somewhat comical), but it's a *long* way from being directly
useful to Postgres.

It's interesting to bear in mind some of the things they talk about when
writing new code, the importance of designing cache conscious algorithms
(and then when writing the code) seem to have stuck in my mind the most.
Am I just old fashioned, or is this focus on cache conscious design
quite a new thing and somewhat undervalued in the rest of the software
world?


  Sam

p.s. if you're interested, there are more papers about MonetDB here:

  http://monetdb.cwi.nl/projects/monetdb/Development/Research/Articles/index.html

pgsql-general by date:

Previous
From: "Grzegorz Jaśkiewicz"
Date:
Subject: Re: Are there plans to add data compression feature to postgresql?
Next
From: "Grzegorz Jaśkiewicz"
Date:
Subject: Re: Are there plans to add data compression feature to postgresql?