Thread: page compression

page compression

From

Andy Colson

Date:

28 December 2010, 11:10:16

I know its been discussed before, and one big problem is license and 
patent problems.

Would this project be a problem:

http://oldhome.schmorp.de/marc/liblzf.html


-Andy

Re: page compression

From

Joachim Wieland

Date:

28 December 2010, 11:33:16

On Tue, Dec 28, 2010 at 10:10 AM, Andy Colson <andy@squeakycode.net> wrote:
> I know its been discussed before, and one big problem is license and patent
> problems.
>
> Would this project be a problem:
>
> http://oldhome.schmorp.de/marc/liblzf.html

It looks like even liblzf is not going to be accepted. I have proposed
to only link against liblzf if available for pg_dump and have somehow
failed, see:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg00824.php

Remember that PostgreSQL has toast tables to compress large values and
store them externally, so it still has to be proven that page
compression has the same benefit for PostgreSQL as for other
databases.

Ironically we also use an LZ compression algorithm for toast
compression (defined in pg_lzcompress.c). I am still failing to
understand why linking against liblzf would bring us deeper into the
compression patents mine field than we already are by hardwiring and
shipping this other algorithm in pg_lzcompress.c.

Joachim

Re: page compression

From

Robert Haas

Date:

28 December 2010, 11:49:47

On Dec 28, 2010, at 10:33 AM, Joachim Wieland <joe@mcknight.de> wrote:
> On Tue, Dec 28, 2010 at 10:10 AM, Andy Colson <andy@squeakycode.net> wrote:
>> I know its been discussed before, and one big problem is license and patent
>> problems.
>>
>> Would this project be a problem:
>>
>> http://oldhome.schmorp.de/marc/liblzf.html
>
> It looks like even liblzf is not going to be accepted. I have proposed
> to only link against liblzf if available for pg_dump and have somehow
> failed, see:

I thought that was mostly about not wanting multiple changes in one patch. I don't see why liblzf would be
objectionablein general. 

...Robert

Re: page compression

From

Simon Riggs

Date:

02 January 2011, 19:36:14

On Tue, 2010-12-28 at 09:10 -0600, Andy Colson wrote:

> I know its been discussed before, and one big problem is license and 
> patent problems.

Would like to see a design for that. There's a few different ways we
might want to do that, and I'm interested to see if its possible to get
compressed pages to be indexable as well.

For example, if you compress 2 pages into 8Kb then you do one I/O and
out pops 2 buffers. That would work nicely with ring buffers.

Or you might try to have pages > 8Kb in one block, which would mean
decompressing every time you access the page. That wouldn't be much of a
problem if we were just seq scanning.

Or you might want to compress the whole table at once, so it can only be
read by seq scan. Efficient, but not indexes.

It would be interesting to explore pre-populating the compression
dictionary with some common patterns.

Anyway, interesting topic.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: page compression

From

Jim Nasby

Date:

03 January 2011, 05:02:35

On Jan 2, 2011, at 5:36 PM, Simon Riggs wrote:
> On Tue, 2010-12-28 at 09:10 -0600, Andy Colson wrote:
>
>> I know its been discussed before, and one big problem is license and
>> patent problems.
>
> Would like to see a design for that. There's a few different ways we
> might want to do that, and I'm interested to see if its possible to get
> compressed pages to be indexable as well.
>
> For example, if you compress 2 pages into 8Kb then you do one I/O and
> out pops 2 buffers. That would work nicely with ring buffers.
>
> Or you might try to have pages > 8Kb in one block, which would mean
> decompressing every time you access the page. That wouldn't be much of a
> problem if we were just seq scanning.
>
> Or you might want to compress the whole table at once, so it can only be
> read by seq scan. Efficient, but not indexes.

FWIW, last time I looked at how Oracle handled compression, it would only compress existing data. As soon as you
modifieda row, it ended up un-compressed, presumably in a different page that was also un-compressed. 

I wonder if it would be feasible to use a fork to store where a compressed page lives inside the heap... if we could do
thatI don't see any reason why indexes wouldn't work. The changes required to support that might not be too horrific
either...
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: page compression

From

Robert Haas

Date:

03 January 2011, 11:03:51

On Mon, Jan 3, 2011 at 4:02 AM, Jim Nasby <jim@nasby.net> wrote:
> FWIW, last time I looked at how Oracle handled compression, it would only compress existing data. As soon as you
modifieda row, it ended up un-compressed, presumably in a different page that was also un-compressed.

IIUC, InnoDB basically compresses a block as small as it'll go, and
then stores it in a regular size block.  That leaves free space at the
end, which can be used to cram additional tuples into the page.
Eventually that free space is exhausted, at which point you try to
recompress the whole page and see if that gives you room to cram in
even more stuff.

I thought that was a pretty clever approach.

> I wonder if it would be feasible to use a fork to store where a compressed page lives inside the heap... if we could
dothat I don't see any reason why indexes wouldn't work. The changes required to support that might not be too horrific
either...

At first blush, that sounds like a recipe for large amounts of
undesirable random I/O.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company