Re: Query questions - Mailing list pgsql-general

From Poul Jensen
Subject Re: Query questions
Date
Msg-id 431D9600.5060603@gfy.ku.dk
Whole thread Raw
In response to Re: Query questions  (Chris Travers <chris@metatrontech.com>)
List pgsql-general
> Look into inheritance.  It makes this easier.  However, I don't care
> which RDBMS you use, management of 1000 identical tables is going to
> be a real pain and I think that everyone here will probably suggest
> that it is not exactly a sane thing to do.

Thank you, Chris. I have omitted two important points. One is:
The database needs no maintenance. Once created nothing in it will be
modified - there'll only
be SELECT queries to look up data satisfying certain criteria.

To summarize the task:
I have ~500,000 data files containing ~1,000 records each. The database
should then contain:
1) A detailed table (~1,000 rows, ~15 columns) for each file
2) A small table with file summary (1 row, ~30 columns) for each file
The typical query should then check file summaries to identify which
detailed tables/subtables
to check for further conditions.

The other important point I forgot about (much worse):
The detailed table for each file is created with a number of columns
which for most files have
the same value for all records. I had planned to delete columns
containing only a single
value and push it to the file summary instead. Hence, the 500,000
detailed tables wouldn't
all be identical. The initial check on the file summary would determine
whether the detailed
table for a given file should be searched and, if so, what columns are
found in it.

I guess I could either:
1) Add a lot of redundant data to the database so tables can be
combined, allowing single
    query searches. Judging from the size of my test database I would
end up with ~200 GB
    without redundant data, so I do consider this a problem.
2) Write code (e.g. in C) with a loop to do separate queries for every
file - I imagine this
    would be terribly inefficient compared to the single query option.

Question:
If I create a table with all the columns needed to combine the 500,000
tables and I simply
omit the redundant columns when entering the subtable for a given file,
will PostgreSQL
fill in values, not allowing me to save any space? Would I have to make
an inheritance
hierarchy to save the space? If so, all columns not in the parent table
can only be
accessed via separate, child-table specific queries which is
troublesome... :-|

Help!

Poul

pgsql-general by date:

Previous
From: Ben Grimm
Date:
Subject: Re: fix pg_autovacuum
Next
From: "Peter Cook"
Date:
Subject: Installation problem