Re: ext3 filesystem / linux 7.3 - Mailing list pgsql-performance

From Kevin Brown
Subject Re: ext3 filesystem / linux 7.3
Date
Msg-id 20030408232247.GH1847@filer
Whole thread Raw
In response to Re: ext3 filesystem / linux 7.3  (Josh Berkus <josh@agliodbs.com>)
Responses Re: ext3 filesystem / linux 7.3  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Josh Berkus wrote:
> Jeffery,
>
> > Can't we generate data?  Random data stored in random formats at random
> > sizes would stress the file system wouldn't it?
>
> In my experience, randomly generated data tends to resemble real data very
> little in distribution patterns and data types.  This is one of the
> limitations of PGBench.

Okay, from this it sounds like what we need is information on the data
types typically used for real world applications and information on
the the distribution patterns for each type (the latter could get
quite complex and varied, I'm sure, but since we're after something
that's typical, we only need a few examples).

So perhaps the first step in this is to write something that will show
what the distribution pattern for data in a table is?  With that
information, we *could* randomly generate data that would conform to
the statistical patterns seen in the real world.

In fact, even though the databases you have access to are all
proprietary, I'm pretty sure their owners would agree to let you run a
program that would gather statistical distribution about it.  Then (as
long as they agree) you could copy the schema itself, recreate it on
the test system, and randomly generate the data.



--
Kevin Brown                          kevin@sysexperts.com


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: [SQL] Yet Another (Simple) Case of Index not used
Next
From: Martijn van Oosterhout
Date:
Subject: Re: [GENERAL] Yet Another (Simple) Case of Index not used