Re: Query questions - Mailing list pgsql-general

From Chris Travers
Subject Re: Query questions
Date
Msg-id 431B8C67.8070002@metatrontech.com
Whole thread Raw
In response to Re: Query questions  (Poul Jensen <flyvholm@gfy.ku.dk>)
Responses Re: Query questions
SQL - planet redundant data
List pgsql-general
Poul Jensen wrote:

>
>>>> you want to create 1 million tables, all with one of
>>>> 2 schemas?
>>>>
>>>>
>>>>
>>>
>>> I started out with a schema for each file, thinking I could utilize
>>> the schema
>>> structure in queries, but I don't see how. Schemas are useful for
>>> grouping
>>> tables according to users/owners. Other than that, do they add anything
>>> but a dot in the table name?
>>>
>>
>>
>> The word "schema" is a bit overloaded--here it means "table
>> specification" rather than "namespace".
>>
>> A general rule of database design is: if you end up creating multiple
>> tables with the same schema, you're doing it wrong--they should all be
>> one table with an additional key (usually indexed) to distinguish the
>> data.
>>
>> -Doug
>>
>>
> Thank you for clarifying -that had me confused!
>
> Would the reason for this general rule be that if you combine same-schema
> tables, you can access them all with one query?

No.  It is so that the single query becomes manageable.  It also greatly
simplifies database maintenance.

> At least that would be the
> motivation in my case, but it isn't free. It would cost space (500
> million entries
> for that extra key, add index) and query time (1st condition need now be
> checked on 500 million entries of which "only" the 500,000 unique ones
> should have to be checked - indexing helps, but there must be a loss).

I suppose how the data is distributed in the table.  For the record, I
don't think that 500M records should be that problematic.

Now....
Suppose you need to further partition this  (best to wait until 8.1 for
performance reasons), but you could partition it into, say, 10
subtables.  You could use inheritance and check constraints to make this
transparent to the application and perform reasonably well.

>
> If this is really necessary it's because it's so unbelievably hard to
> access
> many tables in one query that I'm thinking the developers maintain
> restrictions
> on query options in order to force certain database designs - your
> general rule
> above. If so, what is the advantage? I see only disadvantages (a pain
> to be
> honest).

Look into inheritance.  It makes this easier.  However, I don't care
which RDBMS you use, management of 1000 identical tables is going to be
a real pain and I think that everyone here will probably suggest that it
is not exactly a sane thing to do.

Best Wishes,
Chris Travers
Metatron Technology COnsulting

Attachment

pgsql-general by date:

Previous
From: Leonard Soetedjo
Date:
Subject: PostgreSQL local replication
Next
From: "Thomas F. O'Connell"
Date:
Subject: Re: Insert Ignore or something similar...