Home > mailing lists

Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Compression and on-disk sorting
Date	May 17, 2006 14:01:34
Msg-id	87ejysr3fs.fsf@stark.xeocode.com Whole thread Raw
In response to	Re: Compression and on-disk sorting (Andrew Piskorski <atp@piskorski.com>)
Responses	Re: Compression and on-disk sorting
List	pgsql-hackers

Tree view

Andrew Piskorski <atp@piskorski.com> writes:

> Things like enums and 1 bit booleans certainly could be useful, but
> they cannot take advantage of duplicate values across multiple rows at
> all, even if 1000 rows have the exact same value in their "date"
> column and are all in the same disk block, right?

That's an interesting direction to go in. Generic algorithms would still help
in that case since the identical value would occur more frequently than other
values it would be encoded in a smaller symbol. But there's going to be a
limit to how compressed it can get the data.

The ideal way to handle the situation you're describing would be to interleave
the tuples so that you have all 1000 values of the first column, followed by
all 1000 values of the second column and so on. Then you run a generic
algorithm on this and it achieves very high compression rates since there are
a lot of repeating patterns.

I don't see how you build a working database with data in this form however.
For example, a single insert would require updating small pieces of data
across the entire table. Perhaps there's some middle ground with interleaving
the tuples within a single compressed page, or something like that?

-- 
greg

pgsql-hackers by date:

From: Tom Lane
Date: 17 May 2006, 13:59:21
Subject: Re: [GENERAL] Querying libpq compile time options

From: "Larry Rosenman"
Date: 17 May 2006, 14:06:28
Subject: Re: [GENERAL] Querying libpq compile time options

Re: Compression and on-disk sorting - Mailing list pgsql-hackers

Previous

Next