Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: Compression and on-disk sorting
Date
Msg-id 20060524214018.GP59464@pervasive.com
Whole thread Raw
In response to Re: Compression and on-disk sorting  ("Joshua D. Drake" <jd@commandprompt.com>)
Responses Re: Compression and on-disk sorting
List pgsql-hackers
On Wed, May 24, 2006 at 02:20:43PM -0700, Joshua D. Drake wrote:
> Jim C. Nasby wrote:
> >Finally completed testing of a dataset that doesn't fit in memory with
> >compression enabled. Results are at
> >http://jim.nasby.net/misc/pgsqlcompression .
> >
> >Summary:
> >                    work_mem    compressed  not compressed  gain
> >in-memory           20000       400.1       797.7           49.8%
> >in-memory           2000        371.4       805.7           53.9%
> >not in-memory       20000       8537        17436           51.0%
> >not in-memory       2000        8152        17820           54.3%
> >
> >I find it very interesting that the gains are identical even when the
> >tapes should fit in memory. My guess is that for some reason the OS is
> >flushing those to disk anyway. In fact, watching gstat during a run, I
> >do see write activity hitting the drives. So if there was some way to
> >tune that behavior, the in-memory case would probably be much, much
> >faster. Anyone know FreeBSD well enough to suggest how to change this?
> >Anyone want to test on linux and see if the results are the same? This
> >could indicate that it might be advantageous to attempt an in-memory
> >sort with compressed data before spilling that compressed data to
> >disk...
> >
> 
> I can test it on linux just let me know what you need.

Actually, after talking to Larry he mentioned that it'd be worth
checking to see if we're doing something like opening the files in
O_DIRECT, which I haven't had a chance to do. Might be worth looking at
that before running more tests.

Anyway, I've posted the patch now as well, and compress_sort.txt has the
commands I was running. Those are just against a plain pgbench database
that's been freshly initialized (ie: no dead tuples). I just created two
install directories from a checkout of HEAD via --prefix=, one with the
patch and one without. Both hit the same $PGDATA. I've posted the
postgresql.conf as well.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: file-locking and postmaster.pid
Next
From: Tom Lane
Date:
Subject: Re: file-locking and postmaster.pid