Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From Jim C. Nasby
Subject Re: Compression and on-disk sorting
Date
Msg-id 20060526163107.GD59464@pervasive.com
Whole thread Raw
In response to Re: Compression and on-disk sorting  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Re: Compression and on-disk sorting
List pgsql-hackers
I've done some more testing with Tom's recently committed changes to
tuplesort.c, which remove the tupleheaders from the sort data. It does
about 10% better than compression alone does. What's interesting is that
the gains are about 10% regardless of compression, which means
compression isn't helping very much on all the redundant header data,
which I find very odd. And the header data is very redundant:

bench=# select xmin,xmax,cmin,cmax,aid from accounts order by aid limit 1; xmin  | xmax | cmin | cmax | aid
--------+------+------+------+-----280779 |    0 |    0 |    0 |   1
(1 row)

bench=# select xmin,xmax,cmin,cmax,aid from accounts order by aid desc limit 1; xmin  | xmax | cmin | cmax |    aid
--------+------+------+------+-----------310778 |    0 |    0 |    0 | 300000000
(1 row)

Makes sense, since pgbench loads the database via a string of COPY commands,
each of which loads 10000 rows.

Something else worth mentioning is that sort performance is worse with
larger work_mem for all cases except the old HEAD, prior to the
tuplesort.c changes. It looks like whatever was done to fix that will
need to be adjusted/rethought pending the outcome of using compression.

In any case, compression certainly seems to be a clear win, at least in
this case. If there's interest, I can test this on some larger hardware,
or if someone wants to produce a patch for pgbench that will load some
kind of real data into accounts.filler, I can test that as well.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


pgsql-hackers by date:

Previous
From: Andreas Pflug
Date:
Subject: Re: Inefficient bytea escaping?
Next
From: "Jim C. Nasby"
Date:
Subject: Re: LIKE, leading percent, bind parameters and indexes