Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Compression and on-disk sorting
Date
Msg-id 3134.1148064785@sss.pgh.pa.us
Whole thread Raw
In response to Re: Compression and on-disk sorting  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Re: Compression and on-disk sorting  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote:
>> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
>> unbeleiveable. What's in the table? It would seem to imply that our
>> tuple format is far more compressable than we expected.

> It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> If the tape routines were actually storing visibility information, I'd
> expect that to be pretty compressible in this case since all the tuples
> were presumably created in a single transaction by pgbench.

It's worse than that: IIRC what passes through a heaptuple sort are
tuples manufactured by heap_form_tuple, which will have consistently
zeroed header fields.  However, the above isn't very helpful since the
rest of us have no idea what that "accounts" table contains.  How wide
is the tuple data, and what's in it?

(This suggests that we might try harder to strip unnecessary header info
from tuples being written to tape inside tuplesort.c.  I think most of
the required fields could be reconstructed given the TupleDesc.)
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: [OT] MySQL is bad, but THIS bad?
Next
From: "Jim C. Nasby"
Date:
Subject: Re: [OT] MySQL is bad, but THIS bad?