Scale, Normalization, and Table Count - Mailing list pgsql-general

From Tom Cross
Subject Scale, Normalization, and Table Count
Date
Msg-id Pine.GSO.4.40.0210060210140.28070-100000@apogee.whack.org
Whole thread Raw
Responses Re: Scale, Normalization, and Table Count  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Scale, Normalization, and Table Count  (Richard Huxton <dev@archonet.com>)
Re: Scale, Normalization, and Table Count  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Much to my horror I got an error message while attempting to back up my
database tonight:

pg_dump: NOTICE:  ShmemAlloc: out of memory
pg_dump: Attempt to lock table "thread_204" failed.  ERROR:  LockAcquire:
lock table 1 is out of memory

Obviously this is related to my shared memory allocations. I'm not sure
how much shared memory my kernel is allowing. I'm running a rather default
redhat 7.3 installation. I plan on looking into this.

However, in investigating this problem I've found that it also seems to
imply that I have too many tables in my database. I found a post in the
postgres discussion boards about someone who encountered this who had
200,000 tables. He was told that this was too many, and that postgres
was not designed to handle databases which were so poorly designed.

While I can't imagine that I have more then 10,000 tables at this time, I
think that as this system scales I will easily surpass 200,000. I don't
think that this is due to poor database design, but rather the
requirements of my application. However, standard disclaimer applies here.
I'm an engineer, but I'm not a DB specialist. This is why I'm asking.

Essentially, I have a discussion system. There could be millions of
messages in the system. Lets pretend that there are 10 million for the
sake of arguement. Of course, these messages are all on different topics.
There are, lets say, 1000 messages on each topic. Users will only access
one topic at a time. It seems intuitive that it would be less expensive
to create an individual table for each topic then it would be to put all
of the messages in a single table. If I can directly access the 1000
messages that I need by loading the table that they are in, I save myself
from having to search a table which contains 10 million messages.

In general, disk space is less expensive then memory or processor power,
and so this seems to be a reasonable approach.

I seems clear to me that there are some things I need to be careful about
in pursuing this.

1. The table names, as you'll note from my error message, are
automatically generated. There are limits to table name sizes, and I need
to make sure that my database stays well above that limit. This is tunable
in postgres, and a 128 character table name length, for example, could
handle an astronomical number of table names...

2. I've learned this evening that there is a problem with shared memory
allocation. There also appears to be a concern about the number of files
in a UNIX directory. Right now the directory for this database has some
20,000 files in it. This seems to be easy to solve. I'll just create
dynamically generated databases which include my dynamically generated
tables. Each database will only be allowed, say, 5000 tables. (Is this
too many?)

I wonder what the limits on database names are... I haven't researched it.

3. At some point I'm clearly going to have to contend with the number of
inodes available on my file system. This is also easy to solve, as I can
put each database on its own partition. I think that it will eventually
make sense, in fact, to put different databases on different database
servers.

But I have to wonder...

Am I barking up the wrong tree here with all these dynamically generated
tables? Is it really more reasonable to search a table with 10 million
entries for the 1000 I want every time a user wants to read something? Are
there other pitfalls that I need to be aware of? Is there a key book that
discusses this sort of thing that I could have on my desk?

Thanks,
Tom Cross


pgsql-general by date:

Previous
From: "Billy G. Allie"
Date:
Subject: Re: Proper Join and check
Next
From: Christoph Strebin
Date:
Subject: Case insensitive columns