Hi,
On 22 August 2012 07:07, Menelaos PerdikeasSemantix
<mperdikeas.semantix@gmail.com> wrote:
> Let's say you have a father-child (or master-detail if you wish) hierarchy
> of tables of not just 2 levels, but, say, 5 levels.
> E.g. tables A, B, C, D and E organized in successive 1-to-N relationships:
>
> A ----1-to-N-----> B
> B ----1-to-N-----> C
> C ----1-to-N-----> D
> D ----1-to-N-----> E
>
> with appropriate foreign keys:
>
> * from E to D
> * from D to C
> * from C to B
> * from B to A
>
> This is normalized so far. Now assume that it is the case than in some
> queries on table E you also need to report a field that only exists on table
> A. This will mean a JOIN between five tables: E, D, C, B and A. Some
> questions follow:
>
> [1] assuming tables having a number of rows in the order of 100,000, after
> how many levels of depth would you feel justified to depart from the
> normalized schema and introduce some redundancy to speed up the queries?
>
> [3] do you feel this is a legitimate concern in a modern PostgreSQL database
> running on high end (200,000 USD) hardware and serving no more than 1000
> concurrent users with table sizes at the lowest (more detailed) level of the
> hierarchy in the order of a few tens of millions of rows at the most and
> dropping by a factor of 20 for each level up ?
I would ask different question(s): how "static" that tree structure is
and what kind of queries do you want to run:
- father-child: easy to understand; add new node; change leaf node;
hard to run some count(*) queries; and get hierarchy (CTEs are help
full)
- nested sets: pailful to move nodes around (even add new node); easy
to get tree subsets; ...
Anyway, I've found this summary:
http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database
when I was googling for Joe Celko's Trees and Hierarchies book.
--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)