Re: Compression and on-disk sorting - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Compression and on-disk sorting
Date
Msg-id 1148077306.3833.51.camel@localhost.localdomain
Whole thread Raw
In response to Re: Compression and on-disk sorting  ("Jim C. Nasby" <jnasby@pervasive.com>)
List pgsql-hackers
Ühel kenal päeval, R, 2006-05-19 kell 14:57, kirjutas Jim C. Nasby:
> On Fri, May 19, 2006 at 09:29:44PM +0200, Martijn van Oosterhout wrote:
> > On Fri, May 19, 2006 at 10:02:50PM +0300, Hannu Krosing wrote:
> > > > > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> > > > > If the tape routines were actually storing visibility information, I'd
> > > > > expect that to be pretty compressible in this case since all the tuples
> > > > > were presumably created in a single transaction by pgbench.
> > > 
> > > Was he not using pg_bench data ?
> > 
> > Hmm, so there was only 3 integer fields and one varlena structure which
> > was always empty. This prepended with a tuple header with mostly blank
> > fields or at least repeated, yes, I can see how we might get a 25-to-1
> > compression.
> > 
> > Maybe we need to change pgbench so that it puts random text in the
> > filler field, that would at least put some strain on the compression
> > algorithm...
> 
> Wow, I thought there was actually something in there...
> 
> True random data wouldn't be such a great test either; what would
> probably be best is a set of random words, since in real life you're
> unlikely to have truely random data.

I usually use something like the following for my "random name" tests:

#!/usr/bin/python

import random

words = [line.strip() for line in open('/usr/share/dict/words')]

def make_random_name(min_items, max_items):   l = []   for w in range(random.randint(min_items, max_items)):
l.append(random.choice(words))  return ' '.join(l)
 

it gives out somewhat justifyable but still quite amusing results:

>>> make_random_name(2,4)
'encroaches Twedy'
>>> make_random_name(2,4)
'annuloida Maiah commends imputatively'
>>> make_random_name(2,4)
'terebral wine-driven pacota'
>>> make_random_name(2,4)
'ballads disenfranchise cabriolets spiny-fruited'


-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: text_position worst case runtime
Next
From: Hannu Krosing
Date:
Subject: Re: text_position worst case runtime