Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: Compression and on-disk sorting
Date
Msg-id 20060519192944.GI17873@svana.org
Whole thread Raw
In response to Re: Compression and on-disk sorting  (Hannu Krosing <hannu@skype.net>)
Responses Re: Compression and on-disk sorting  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-hackers
On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote:
> > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> > > If the tape routines were actually storing visibility information, I'd
> > > expect that to be pretty compressible in this case since all the tuples
> > > were presumably created in a single transaction by pgbench.
>
> Was he not using pg_bench data ?

Hmm, so there was only 3 integer fields and one varlena structure which
was always empty. This prepended with a tuple header with mostly blank
fields or at least repeated, yes, I can see how we might get a 25-to-1
compression.

Maybe we need to change pgbench so that it puts random text in the
filler field, that would at least put some strain on the compression
algorithm...

> I guess that tapefiles compress better than averahe table because they
> are sorted, and thus at least a little more repetitive than the rest.
> If there are varlen types, then they usually also have abundance of
> small 4-byte integers, which should also compress at least better than
> 4/1, maybe a lot better.

Hmm, that makes sense. That also explains the 37-to-1 compression I was
seeing on indexes :).

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

pgsql-hackers by date:

Previous
From: "Mark Woodward"
Date:
Subject: Re: [OT] MySQL is bad, but THIS bad?
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Compression and on-disk sorting