Re: Scale, Normalization, and Table Count - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Re: Scale, Normalization, and Table Count
Date
Msg-id 20021007132610.GE24034@svana.org
Whole thread Raw
In response to Scale, Normalization, and Table Count  (Tom Cross <decius@whack.org>)
List pgsql-general
I can't speak about most of this, but as to the normalisation...

On Sun, Oct 06, 2002 at 02:39:57AM -0700, Tom Cross wrote:
> Essentially, I have a discussion system. There could be millions of
> messages in the system. Lets pretend that there are 10 million for the
> sake of arguement. Of course, these messages are all on different topics.
> There are, lets say, 1000 messages on each topic. Users will only access
> one topic at a time. It seems intuitive that it would be less expensive
> to create an individual table for each topic then it would be to put all
> of the messages in a single table. If I can directly access the 1000
> messages that I need by loading the table that they are in, I save myself
> from having to search a table which contains 10 million messages.

A database system does not "load" tables, it pulls only those parts it
needs. Databases are designed to solve exactly this problem. They will, as a
rule, do a much better job of storing data than people.

> 2. I've learned this evening that there is a problem with shared memory
> allocation. There also appears to be a concern about the number of files
> in a UNIX directory. Right now the directory for this database has some
> 20,000 files in it. This seems to be easy to solve. I'll just create
> dynamically generated databases which include my dynamically generated
> tables. Each database will only be allowed, say, 5000 tables. (Is this
> too many?)

As you can see, you're just digging deeper and deeper holes for yourself.
All on the (incorrect) assumption that accessing large tables are slow.
Perhaps you should try the everything in one table approach. If it is
slower, it's a bug in the database not in your code. Just imagine how much
it would simplify the code.

> 3. At some point I'm clearly going to have to contend with the number of
> inodes available on my file system. This is also easy to solve, as I can
> put each database on its own partition. I think that it will eventually
> make sense, in fact, to put different databases on different database
> servers.

Digging deeper...

> Am I barking up the wrong tree here with all these dynamically generated
> tables? Is it really more reasonable to search a table with 10 million
> entries for the 1000 I want every time a user wants to read something? Are
> there other pitfalls that I need to be aware of? Is there a key book that
> discusses this sort of thing that I could have on my desk?

This is why indexes where invented. We've have tables of phone calls
totalling millions. Looking up the calls for a single customer only takes
milliseconds...

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

pgsql-general by date:

Previous
From: Christoph Dalitz
Date:
Subject: Re: [pgsql-general] Daily Digest V1 #2480
Next
From: "Shridhar Daithankar"
Date:
Subject: Re: Efficient date range search?