Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From Luke Lonergan
Subject Re: Compression and on-disk sorting
Date
Msg-id 3E37B936B592014B978C4415F90D662D03489494@MI8NYCMAIL06.Mi8.com
Whole thread Raw
In response to Compression and on-disk sorting  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-hackers
Jim,

> http://jim.nasby.net/misc/compress_sort.txt is preliminary results.
> I've run into a slight problem in that even at a compression
> level of -3, zlib is cutting the on-disk size of sorts by
> 25x. So my pgbench sort test with scale=150 that was
> producing a 2G on-disk sort is now producing a 80M sort,
> which obviously fits in memory. And cuts sort times by more than half.

When you're ready, we can test this on some other interesting cases and
on fast hardware.

BTW - external sorting is *still* 4x slower than popular commercial DBMS
(PCDB) on real workload when full rows are used in queries.  The final
results we had after the last bit of sort improvements were limited to
cases where only the sort column was used in the query, and for that
case the improved external sort code was as fast as PCDB provided lots
of work_mem are used, but when the whole contents of the row are
consumed (as with TPC-H and in many real world cases) the performance is
still far slower.

So, compression of the tuples may be just what we're looking for.

- Luke



pgsql-hackers by date:

Previous
From: Tommi Maekitalo
Date:
Subject: Re: [OT] MySQL is bad, but THIS bad?
Next
From: Tom Lane
Date:
Subject: Re: Compression and on-disk sorting