Re: many tables in db - Mailing list pgsql-general

From Tom Lane
Subject Re: many tables in db
Date
Msg-id 24822.995567160@sss.pgh.pa.us
Whole thread Raw
In response to many tables in db  (Kenneth Been <kennethb@telocity.com>)
List pgsql-general
Kenneth Been <kennethb@telocity.com> writes:
> I am considering a db schema with many tables (in the
> thousands).

Just out of curiosity, why would you need so many tables?  Couldn't you
represent the same info in fewer tables, maybe with more key columns?
If every table has a different set of columns, that wouldn't work, but
I'm wondering what the organizing principle really is.

> 2. One of the reasons I am considering breaking the data
> into multiple tables is that I want to cluster a large table
> (on an rtree index)

This isn't really relevant to your main point, but: since an rtree
doesn't have an associated sort order, it's not clear to me that this
operation makes any sense.  Have you determined that you'll actually get
any performance improvement as a result of the clustering?  I suspect
you may find that you're just rearranging the table into a different
random order.

> 3. Another reason for splitting up the data is that I am
> worried about the size of the rtree.  I have some suspicion
> that the rtree size grows more than linearly with the size
> of the table.  Is that the case?

Dunno.  The rtree code isn't very well maintained, because none of
the current crop of developers uses it or knows much about it.
It could well have some bugs that cause it to waste space.  (I realized
only a few days ago that it probably behaves completely horribly when
asked to store NULLs, for example.  Do you have any NULLs in the column
you're indexing?)

> 4. Another possible reason is query speed.  Any guesses on
> which would be faster, two or three queries on tables of
> about 40K records, or one query on a table of about 10M
> records?  (All queries are geometric range searches on an
> rtree index.)

Given the size difference, I'd suppose that the 10M table would actually
be the equivalent of a materialized join of the smaller tables?  I'd
guess that recomputing the join on the fly is better because it takes
so much less I/O, but that's just a guess.  You'd be well advised to
test both ways and see.

            regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: UNION syntax different for 7.1.2
Next
From: Tom Lane
Date:
Subject: Re: RELAX! - or more to the point, how do I temporarily relax a trigger/constraint?