Re: One large v. many small - Mailing list pgsql-performance

From Curt Sampson
Subject Re: One large v. many small
Date
Msg-id Pine.NEB.4.51.0302011359390.610@angelic.cynic.net
Whole thread Raw
In response to Re: One large v. many small  ("Curtis Faith" <curtis@galtcapital.com>)
Responses Re: One large v. many small  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-performance
On Fri, 31 Jan 2003, Curtis Faith wrote:

> Depending on the way the records are accessed and the cache size,
> the exact opposite could be true. The index pages will most likely
> rarely be in memory when you have 3000 different tables. Meaning
> that each search will require at least three or four index page
> retrievals plus the tuple page.

Assuming you're using indexes at all. If you're tending to use table
scans, this doesn't matter.

From Noah's description it seemed he was--he said that a particular data
item couldn't be the primary key, presumably because he couldn't index
it reasonably. But this just my guess, not a fact.

> Combine a multi-part index (on both client and foo, which order
> would depend on the access required) that is clustered once a week
> or so using the admittedly non-optimal PostgreSQL CLUSTER command
> and I'll bet you can get equivalent or better performance...

I would say that, just after a CLUSTER, you're likely to see better
performance because this would have the effect, on a FFS or similar
filesystem where you've got plenty of free space, of physically
clustering data that would not have been clustered in the case of a lot
of small tables that see a lot of appending evenly over all of them over
the course of time.

So the tradeoff there is really, can you afford the time for the CLUSTER?
(In a system where you have a lot of maintenance time, probably. Though if
it's a huge table, this might need an entire weekend. In a system that needs
to be up 24/7, probably not, unless you have lots of spare I/O capacity.)
Just out of curiousity, how does CLUSTER deal with updates to the table while
the CLUSTER command is running?

> I don't think there is any substitute for just trying it out. It
> shouldn't be that hard to create a bunch of SQL statements that
> concatenate the tables into one large one.

I entirely agree! There are too many unknowns here to do more than
speculate on this list.

But thanks for enlightening me on situations where one big table perform
better.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: not using index for select min(...)
Next
From: Bruce Momjian
Date:
Subject: Re: One large v. many small