Home > mailing lists

Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From	Hannu Krosing
Subject	Re: Compression and on-disk sorting
Date	May 19, 2006 16:07:13
Msg-id	1148065370.3833.9.camel@localhost.localdomain Whole thread
In response to	Re: Compression and on-disk sorting (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Compression and on-disk sorting Re: Compression and on-disk sorting
List	pgsql-hackers

Tree view

Ühel kenal päeval, R, 2006-05-19 kell 14:53, kirjutas Tom Lane:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote:
> >> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
> >> unbeleiveable. What's in the table? It would seem to imply that our
> >> tuple format is far more compressable than we expected.
> 
> > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> > If the tape routines were actually storing visibility information, I'd
> > expect that to be pretty compressible in this case since all the tuples
> > were presumably created in a single transaction by pgbench.
> 
> It's worse than that: IIRC what passes through a heaptuple sort are
> tuples manufactured by heap_form_tuple, which will have consistently
> zeroed header fields.  However, the above isn't very helpful since the
> rest of us have no idea what that "accounts" table contains.  How wide
> is the tuple data, and what's in it?

Was he not using pg_bench data ?

> (This suggests that we might try harder to strip unnecessary header info
> from tuples being written to tape inside tuplesort.c.  I think most of
> the required fields could be reconstructed given the TupleDesc.)

I guess that tapefiles compress better than averahe table because they
are sorted, and thus at least a little more repetitive than the rest. 
If there are varlen types, then they usually also have abundance of
small 4-byte integers, which should also compress at least better than
4/1, maybe a lot better.


-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

pgsql-hackers by date:

From: Hannu Krosing
Date: 19 May 2006, 16:01:42
Subject: Re: PL/pgSQL 'i = i + 1' Syntax

From: Robert Treat
Date: 19 May 2006, 16:14:58
Subject: Re: [pgsql-advocacy] OO PostgreSQL Driver

Re: Compression and on-disk sorting - Mailing list pgsql-hackers

Previous

Next