Thread: indexing of hierarchical data
We make cartoons here. <br />But let's say we were working on 3 different Cop movies. <br />Our projects are devided intoproject, sequence, and shot. <p><tt>project sequence shot</tt><tt></tt><p><tt>CopFilm1 alley shooting death of the bad guy</tt><br /><tt>CopFilm2 car chase death of the bad guy</tt><br /><tt>CopFilm3 carchase death of the bad guy</tt><tt></tt><p>At first I was indexing the shots just buy shot_ID (serial), and storingthe sequence_ID it belongs to. <br />On the sequence record, I was storing the project_ID it belongs to. <p>So ifI wanted to select <tt>CopFilm3, car chase, death of the bad guy</tt><br />I had to find the ID of the project <tt>CopFilm3</tt>,the ID of the sequence <tt>car chase</tt> belonging to that project and then shot <tt>death of the badguy </tt>belonging to that sequence. <br />As most of the operations happen at the shot level, for performance reasonsI think it might be better to store the project and sequence with the shot, so I don't have to perform any joins.<br />Also, projects and sequences have alphabetical codes assigned to them, which is usually the prefered way of accessingthe data. <br />So, it is my impression that I should store those codes in the shot as foreign keys with <tt>onupdate cascade</tt><br />should someone decide to rename projects and sequences, and their codes, which happens. <br/>Is this approach ok or should I stick to serial ID's and make the lookups? <br />Any comment on problems like this?<p>Thanks. <pre>-- Dado Feigenblatt Wild Brain, Inc. Technical Director (415) 553-8000 x??? dado@wildbrain.com San Francisco, CA.</pre>
Dado, Yeah, me again. What can I say? I'm procrastinating, and list stuff is a great way to do it. And I prefer design theory issues. project sequence shot CopFilm1 alley shooting death of the bad guy CopFilm2 car chase death of the bad guy CopFilm3 car chase death of the bad guy At first I was indexing the shots just buy shot_ID (serial), and storing the sequence_ID it belongs to. On the sequence record, I was storing the project_ID it belongs to. This sounds fine so far. So if I wanted to select CopFilm3, car chase, death of the bad guy I had to find the ID of the project CopFilm3, the ID of the sequence car chase belonging to that project and then shot death of the bad guy belonging to that sequence. As most of the operations happen at the shot level, for performance reasons I think it might be better to store the project and sequence with the shot, so I don't have to perform any joins. What's wrong with joins? Also, projects and sequences have alphabetical codes assigned to them, which is usually the prefered way of accessing the data. So, it is my impression that I should store those codes in the shot as foreign keys with on update cascade should someone decide to rename projects and sequences, and their codes, which happens. Is this approach ok or should I stick to serial ID's and make the lookups? Any comment on problems like this? Yes. You want to have as primary and foriegn keys values that do not change over the lifetime of the record. Any time that you choose instead user-modifiable records you are introducing a world of headaches and trigger maintainence. The solution to the above design problem is simple: table projectsproject_ID SERIAL PRIMARY KEYproject_code VARCHAR(5) NOT NULLproject_name VARCHAR(100) NOT NULL table sequencessequence_ID SERIAL PRIMARY KEYproject_ID INT NOT NULL REFERENCES projects(project_ID)sequence_code VARCHAR(5)NOT NULLsequence_name VARCHAR(100) NOT NULL table shotsshot_ID SERIAL PRIMARY KEYsequence_ID INT NOT NULL REFERENCES sequences(sequence_ID)shot_code VARCHAR(5) NOT NULLshot_nameVARCHAR(100) NOT NULL CREATE VIEW vw_shots AS SELECT project_code, project_name, sequence_code, sequence_name, shot_code, shot_name FROM projects JOIN sequences USING (project_ID)JOIN shots USING (sequence_ID); Then you users can access this view, and search by codes without being aware that the numerical ID's even exist. For that matter, you can impose UNIQUE constraints on codes within their context without tying up those codes for all time or preventing your users from changing the codes. -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Josh Berkus wrote: <blockquote type="CITE">Dado, <p> Yeah, me again. What can I say? I'm procrastinating, and liststuff <br />is a great way to do it. And I prefer design theory issues. <p>project sequence shot <p>CopFilm1 alley shooting death of the bad guy <br />CopFilm2 car chase death of the bad guy <br />CopFilm3 car chase death of the bad guy <p>At first I was indexing the shots just buy shot_ID (serial), and storing<br />the sequence_ID it belongs to. <br />On the sequence record, I was storing the project_ID it belongs to. <p> This sounds fine so far. <br /> <p>So if I wanted to select CopFilm3, car chase, death of the bad guy <br />Ihad to find the ID of the project CopFilm3, the ID of the sequence car <br />chase belonging to that project and thenshot death of the bad guy <br />belonging to that sequence. <br />As most of the operations happen at the shot level,for performance <br />reasons I think it might be better to store the project and sequence <br />with the shot, soI don't have to perform any joins. <p> What's wrong with joins? <p>Also, projects and sequences have alphabeticalcodes assigned to them, <br />which is usually the prefered way of accessing the data. <br />So, it is my impressionthat I should store those codes in the shot as <br />foreign keys with on update cascade <br />should someone decideto rename projects and sequences, and their codes, <br />which happens. <br />Is this approach ok or should I stickto serial ID's and make the <br />lookups? <br />Any comment on problems like this? <p> Yes. You want to haveas primary and foriegn keys values that do not <br />change over the lifetime of the record. Any time that you choose<br />instead user-modifiable records you are introducing a world of headaches <br />and trigger maintainence. <p> The solution to the above design problem is simple: <p>table projects <br /> project_ID SERIAL PRIMARY KEY<br /> project_code VARCHAR(5) NOT NULL <br /> project_name VARCHAR(100) NOT NULL <p>table sequences <br/> sequence_ID SERIAL PRIMARY KEY <br /> project_ID INT NOT NULL REFERENCES projects(project_ID) <br /> sequence_code VARCHAR(5) NOT NULL <br /> sequence_name VARCHAR(100) NOT NULL <p>table shots <br /> shot_ID SERIAL PRIMARY KEY <br /> sequence_ID INT NOT NULL REFERENCES sequences(sequence_ID) <br /> shot_code VARCHAR(5) NOT NULL <br /> shot_name VARCHAR(100) NOT NULL <p>CREATE VIEW vw_shots AS <br />SELECTproject_code, project_name, sequence_code, sequence_name, <br /> shot_code, shot_name <br />FROM projectsJOIN sequences USING (project_ID) <br /> JOIN shots USING (sequence_ID); <p>Then you users can access thisview, and search by codes without being <br />aware that the numerical ID's even exist. For that matter, you can <br/>impose UNIQUE constraints on codes within their context without tying up <br />those codes for all time or preventingyour users from changing the <br />codes.</blockquote> I think that's where experience comes in, uh? <br />I haven'tthought of creating a view that way. <br />That's another very helpfull hint. <br />I think if we don't get a consultingbudget anytime soon I'll have to get you dinner ;) <p>Thanks. <p>PS: can't your e-mail client insert >'s onreplies? It's kind of heard to sift through. <pre>-- Dado Feigenblatt Wild Brain, Inc. Technical Director (415) 553-8000 x??? dado@wildbrain.com San Francisco, CA.</pre>