Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: Compression and on-disk sorting
Date
Msg-id 20060519030243.GW64371@pervasive.com
Whole thread Raw
In response to Re: Compression and on-disk sorting  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Compression and on-disk sorting  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
On Thu, May 18, 2006 at 04:55:17PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > Actually, I guess the amount of memory used for zlib's lookback buffer
> > (or whatever they call it) could be pretty substantial, and I'm not sure
> > if there would be a way to combine that across all tapes.
> 
> But there's only one active write tape at a time.  My recollection of
> zlib is that compression is memory-hungry but decompression not so much,
> so it seems like this shouldn't be a huge deal.

It seems more appropriate to discuss results here, rather than on
-patches...

http://jim.nasby.net/misc/compress_sort.txt is preliminary results.
I've run into a slight problem in that even at a compression level of
-3, zlib is cutting the on-disk size of sorts by 25x. So my pgbench sort
test with scale=150 that was producing a 2G on-disk sort is now
producing a 80M sort, which obviously fits in memory. And cuts sort
times by more than half.

So, if nothing else, it looks like compression is definately a win if it
means you can now fit the sort within the disk cache. While that doesn't
sound like something very worthwhile, it essentially extends work_mem
from a fraction of memory to up to ~25x memory.

I'm currently loading up a pgbench database with a scaling factor of
15000; hopefully I'll have results from that testing in the morning.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: TupleDesc refcounting, again
Next
From: Thomas Hallgren
Date:
Subject: Re: [OT] MySQL is bad, but THIS bad?