Thread: indexing of hierarchical data

indexing of hierarchical data

From

Dado Feigenblatt

Date:

06 July 2001, 19:17:20

We make cartoons here. <br />But let's say we were working on 3 different Cop movies. <br />Our projects are devided
intoproject, sequence, and shot. <p><tt>project    sequence         shot</tt><tt></tt><p><tt>CopFilm1   alley
shooting  death of the bad guy</tt><br /><tt>CopFilm2   car chase        death of the bad guy</tt><br /><tt>CopFilm3  
carchase        death of the bad guy</tt><tt></tt><p>At first I was indexing the shots just buy shot_ID (serial), and
storingthe sequence_ID it belongs to. <br />On the sequence record, I was storing the project_ID it belongs to. <p>So
ifI wanted to select <tt>CopFilm3, car chase, death of the bad guy</tt><br />I had to find the ID of the project
<tt>CopFilm3</tt>,the ID of the sequence <tt>car chase</tt> belonging to that project and then shot <tt>death of the
badguy </tt>belonging to that sequence. <br />As most of the operations happen at the shot level, for performance
reasonsI think it might be better to store the project and sequence with the shot, so I don't have to perform any
joins.<br />Also, projects and sequences have alphabetical codes assigned to them, which is usually the prefered way of
accessingthe data. <br />So, it is my impression that I should store those codes in the shot as foreign keys with
<tt>onupdate cascade</tt><br />should someone decide to rename projects and sequences, and their codes, which happens.
<br/>Is this approach ok or should I stick to serial ID's and make the lookups? <br />Any comment on problems like
this?<p>Thanks. <pre>-- 
 
Dado Feigenblatt                                 Wild Brain, Inc.   
Technical Director                               (415) 553-8000 x???
dado@wildbrain.com                               San Francisco, CA.</pre>

Re: indexing of hierarchical data

From

"Josh Berkus"

Date:

06 July 2001, 19:59:50

Dado,
Yeah, me again. What can I say? I'm procrastinating, and list stuff
is a great way to do it. And I prefer design theory issues.

project sequence shot

CopFilm1 alley shooting death of the bad guy
CopFilm2 car chase death of the bad guy
CopFilm3 car chase death of the bad guy

At first I was indexing the shots just buy shot_ID (serial), and storing
the sequence_ID it belongs to.
On the sequence record, I was storing the project_ID it belongs to.
This sounds fine so far.

So if I wanted to select CopFilm3, car chase, death of the bad guy
I had to find the ID of the project CopFilm3, the ID of the sequence car
chase belonging to that project and then shot death of the bad guy
belonging to that sequence.
As most of the operations happen at the shot level, for performance
reasons I think it might be better to store the project and sequence
with the shot, so I don't have to perform any joins.
What's wrong with joins?

Also, projects and sequences have alphabetical codes assigned to them,
which is usually the prefered way of accessing the data.
So, it is my impression that I should store those codes in the shot as
foreign keys with on update cascade
should someone decide to rename projects and sequences, and their codes,
which happens.
Is this approach ok or should I stick to serial ID's and make the
lookups?
Any comment on problems like this?
Yes. You want to have as primary and foriegn keys values that do not
change over the lifetime of the record. Any time that you choose
instead user-modifiable records you are introducing a world of headaches
and trigger maintainence.
The solution to the above design problem is simple:

table projectsproject_ID SERIAL PRIMARY KEYproject_code VARCHAR(5) NOT NULLproject_name VARCHAR(100) NOT NULL

table sequencessequence_ID SERIAL PRIMARY KEYproject_ID INT NOT NULL REFERENCES projects(project_ID)sequence_code
VARCHAR(5)NOT NULLsequence_name VARCHAR(100) NOT NULL

table shotsshot_ID SERIAL PRIMARY KEYsequence_ID INT NOT NULL REFERENCES sequences(sequence_ID)shot_code VARCHAR(5) NOT
NULLshot_nameVARCHAR(100) NOT NULL

CREATE VIEW vw_shots AS
SELECT project_code, project_name, sequence_code, sequence_name, shot_code, shot_name
FROM projects JOIN sequences USING (project_ID)JOIN shots USING (sequence_ID);

Then you users can access this view, and search by codes without being
aware that the numerical ID's even exist. For that matter, you can
impose UNIQUE constraints on codes within their context without tying up
those codes for all time or preventing your users from changing the
codes.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete
informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small
businesses fax 621-2533 and non-profit organizations. San Francisco

Re: indexing of hierarchical data

From

Dado Feigenblatt

Date:

06 July 2001, 20:22:09

Josh Berkus wrote: <blockquote type="CITE">Dado, <p>        Yeah, me again.  What can I say?  I'm procrastinating, and
liststuff <br />is a great way to do it.  And I prefer design theory issues. <p>project    sequence         shot
<p>CopFilm1  alley shooting   death of the bad guy <br />CopFilm2   car chase        death of the bad guy <br
/>CopFilm3  car chase        death of the bad guy <p>At first I was indexing the shots just buy shot_ID (serial), and
storing<br />the sequence_ID it belongs to. <br />On the sequence record, I was storing the project_ID it belongs to.
<p>       This sounds fine so far. <br />  <p>So if I wanted to select CopFilm3, car chase, death of the bad guy <br
/>Ihad to find the ID of the project CopFilm3, the ID of the sequence car <br />chase belonging to that project and
thenshot death of the bad guy <br />belonging to that sequence. <br />As most of the operations happen at the shot
level,for performance <br />reasons I think it might be better to store the project and sequence <br />with the shot,
soI don't have to perform any joins. <p>        What's wrong with joins? <p>Also, projects and sequences have
alphabeticalcodes assigned to them, <br />which is usually the prefered way of accessing the data. <br />So, it is my
impressionthat I should store those codes in the shot as <br />foreign keys with on update cascade <br />should someone
decideto rename projects and sequences, and their codes, <br />which happens. <br />Is this approach ok or should I
stickto serial ID's and make the <br />lookups? <br />Any comment on problems like this? <p>        Yes.  You want to
haveas primary and foriegn keys values that do not <br />change over the lifetime of the record.  Any time that you
choose<br />instead user-modifiable records you are introducing a world of headaches <br />and trigger maintainence.
<p>       The solution to the above design problem is simple: <p>table projects <br />        project_ID SERIAL PRIMARY
KEY<br />        project_code VARCHAR(5) NOT NULL <br />        project_name VARCHAR(100) NOT NULL <p>table sequences
<br/>        sequence_ID SERIAL PRIMARY KEY <br />        project_ID INT NOT NULL REFERENCES projects(project_ID) <br
/>       sequence_code VARCHAR(5) NOT NULL <br />        sequence_name VARCHAR(100) NOT NULL <p>table shots <br
/>       shot_ID SERIAL PRIMARY KEY <br />        sequence_ID INT NOT NULL REFERENCES sequences(sequence_ID) <br
/>       shot_code VARCHAR(5) NOT NULL <br />        shot_name VARCHAR(100) NOT NULL <p>CREATE VIEW vw_shots AS <br
/>SELECTproject_code, project_name, sequence_code, sequence_name, <br />        shot_code, shot_name <br />FROM
projectsJOIN sequences USING (project_ID) <br />        JOIN shots USING (sequence_ID); <p>Then you users can access
thisview, and search by codes without being <br />aware that the numerical ID's even exist.  For that matter, you can
<br/>impose UNIQUE constraints on codes within their context without tying up <br />those codes for all time or
preventingyour users from changing the <br />codes.</blockquote> I think that's where experience comes in, uh? <br />I
haven'tthought of creating a view that way. <br />That's another very helpfull hint. <br />I think if we don't get a
consultingbudget anytime soon I'll have to get you dinner ;) <p>Thanks. <p>PS: can't your e-mail client insert >'s
onreplies? It's kind of heard to sift through. <pre>-- 
 
Dado Feigenblatt                                 Wild Brain, Inc.   
Technical Director                               (415) 553-8000 x???
dado@wildbrain.com                               San Francisco, CA.</pre>