Home > mailing lists

Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Compression and on-disk sorting
Date	May 17, 2006 16:31:21
Msg-id	24764.1147894254@sss.pgh.pa.us Whole thread Raw
In response to	Re: Compression and on-disk sorting (Greg Stark <gsstark@mit.edu>)
List	pgsql-hackers

Tree view

Greg Stark <gsstark@mit.edu> writes:
> The ideal way to handle the situation you're describing would be to interleave
> the tuples so that you have all 1000 values of the first column, followed by
> all 1000 values of the second column and so on. Then you run a generic
> algorithm on this and it achieves very high compression rates since there are
> a lot of repeating patterns.

It's not obvious to me that that yields a form more compressible than
what we have now.  As long as the previous value is within the lookback
window, an LZ-style compressor will still be able to use it.  More
importantly, the layout you describe would be unable to take advantage
of any cross-column correlation, which in real data is likely to be a
useful property for compression.
        regards, tom lane

pgsql-hackers by date:

From: Bruce Momjian
Date: 17 May 2006, 16:02:35
Subject: Re: [GENERAL] Querying libpq compile time options

From: "Jim C. Nasby"
Date: 17 May 2006, 16:51:17
Subject: Re: Compression and on-disk sorting

Re: Compression and on-disk sorting - Mailing list pgsql-hackers

Previous

Next