Home > mailing lists

Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From	Andrew Piskorski
Subject	Re: Compression and on-disk sorting
Date	May 17, 2006 05:52:37
Msg-id	20060517085230.GA53017@tehun.pair.com Whole thread Raw
In response to	Re: Compression and on-disk sorting (Greg Stark <gsstark@mit.edu>)
Responses	Re: Compression and on-disk sorting
List	pgsql-hackers

Tree view

On Tue, May 16, 2006 at 11:48:21PM -0400, Greg Stark wrote:

> There are some very fast decompression algorithms:
> 
> http://www.oberhumer.com/opensource/lzo/

Sure, and for some tasks in PostgreSQL perhaps it would be useful.
But at least as of July 2005, a Sandor Heman, one of the MonetDB guys,
had looked at zlib, bzlib2, lzrw, and lzo, and claimed that:
 "... in general, it is very unlikely that we could achieve any bandwidth gains with these algorithms. LZRW and LZO
mightincrease bandwidth on relatively slow disk systems, with bandwidths up to 100MB/s, but this would induce high
processingoverheads, which interferes with query execution. On a fast disk system, such as our 350MB/s 12 disk RAID,
allthe generic algorithms will fail to achieve any speedup."

 http://www.google.com/search?q=MonetDB+LZO+Heman&btnG=Search http://homepages.cwi.nl/~heman/downloads/msthesis.pdf

> I think most of the mileage from "lookup tables" would be better implemented
> at a higher level by giving tools to data modellers that let them achieve
> denser data representations. Things like convenient enum data types, 1-bit
> boolean data types, short integer data types, etc.

Things like enums and 1 bit booleans certainly could be useful, but
they cannot take advantage of duplicate values across multiple rows at
all, even if 1000 rows have the exact same value in their "date"
column and are all in the same disk block, right?

Thus I suspect that the exact opposite is true, a good table
compression scheme would render special denser data types largely
redundant and obsolete.

Good table compression might be a lot harder to do, of course.
Certainly Oracle's implementation of it had some bugs which made it
difficult to use reliably in practice (in certain circumstances
updates could fail, or if not fail perhaps have pathological
performance), bugs which are supposed to be fixed in 10.2.0.2, which
was only released within the last few months.

-- 
Andrew Piskorski <atp@piskorski.com>
http://www.piskorski.com/

pgsql-hackers by date:

From: Martijn van Oosterhout
Date: 17 May 2006, 05:46:19
Subject: Re: Compression and on-disk sorting

From: "Zeugswetter Andreas DCP SD"
Date: 17 May 2006, 07:20:35
Subject: Re: Compression and on-disk sorting

Re: Compression and on-disk sorting - Mailing list pgsql-hackers

Previous

Next