Thread: design of tables for sparse data
Hi, I need some help to improve my design skills. :) I lately read an article about table design, that teached one shouldn't designe tables where it's clear that some columns aren't relevant for every row. It didn't span into the dirty usage details beyond the table design. E.g. a really simple example like a school that stores pupils like this: pupil (pupil_id, pupil_name, attends_english, attends_history, attends_maths, attends_football, attends_swimming) 1) Some pupils don't attend to football, swimming or both. 2) Occasionally there will be new classes added and others get dropped. Say in a year a column "attends_knitting" gets introduced. Now all those 50,000 existing rows get a column where the person hadn't even the occasion to apply. If for some reason the knitting class gets discontinued every row in the future will still get this column. So it was better to create 3 normalized tables: pupil (pupil_id, pupil_name, start_date, exit_date) classes (class_id, class_name, is_available, output_order) attends_to (pupil_id, class_id, in_year) as an n:m-relation Fine. Now I got rid off those empty columns in the pupil table. MY QUESTIONS: 1) How would I SELECT a report that looks like the first version of the pupil table out of the 3 table design? There must be a nontrivial SELECT statement that combines all 3 tables. E.g. I want the result: pupil_id, pupil_name, attends_to_english, ....., attends_to_football, attends_to_swimming, attends_to_knitting (42, Frank Miller, yes, ...., no, yes, yes) (43, Suzy Smith, yes, ..., yes, yes, no) ... 2) Could I control the order in which those attends_to-columns appear by a numerical field output_order? 3) Could I restrict the classes list so that only those appear when there are pupils actually attending them in a given time frame? 3) a) Like "competitve knitting" was only available from 2000-2005. Now I'd produce a list of 2007 so there shouldn't appear an empty knitting-column. --> classes.is_availlable 3) b) Or it is availlable but no one has chosen it in 2007. --> attends_to.in_year Regards Andreas
> --- Andreas Wrote: --- > ... > > MY QUESTIONS: Your questions have a strong "home-work" look. > > 1) How would I SELECT a report that looks like the first version of > the pupil table out of the 3 table design? > There must be a nontrivial SELECT statement that combines all 3 tables. > E.g. I want the result: > pupil_id, pupil_name, attends_to_english, ....., attends_to_football, > attends_to_swimming, attends_to_knitting > (42, Frank Miller, yes, ...., no, yes, yes) > (43, Suzy Smith, yes, ..., yes, yes, no) > ... You should check out the JOIN clause in select statements. Simple example: Select t1.col1, t2.col1, t2.col2 from t1 inner join t2 b on (t1.col1 = t2.col1) > > 2) Could I control the order in which those attends_to-columns appear > by a numerical field output_order? > You specify the order of output columns in the select statement. If you want to do this dynamically (say each user wants to configure its own order) you are really better of programming in your front-end application. No trivial solution in a pure SQL solution. It would probably require some dynamic sql and another table which holds de column printout order. > 3) Could I restrict the classes list so that only those appear when > there are pupils actually attending them in a given time frame? > 3) a) Like "competitve knitting" was only available from 2000-2005. > Now I'd produce a list of 2007 so there shouldn't appear an empty > knitting-column. --> classes.is_availlable > 3) b) Or it is availlable but no one has chosen it in 2007. --> > attends_to.in_year > Yes, you could. Read about different JOINS and WHERE clauses. ;) Regards, Fernando.
Fernando Hevia schrieb: >> --- Andreas Wrote: --- >> ... >> >> MY QUESTIONS: >> > > Your questions have a strong "home-work" look. > Yes but I didn't want to bother everyone with my project's details. It's more like a CRM. Up until now I just tried to manage somehow with the sql basics and now I like to get better. One interesting thing are crosstabs because I could use them in the reporting module. I used this schoolbook scenario because it's such an easy example. ;) >> 1) How would I SELECT a report that looks like the first version of >> the pupil table out of the 3 table design? >> There must be a nontrivial SELECT statement that combines all 3 tables. >> > > You should check out the JOIN clause in select statements. > Simple example: > > Select t1.col1, t2.col1, t2.col2 > >from t1 inner join t2 b on (t1.col1 = t2.col1) > A simple JOIN won't do the trick. That would give me something like: (42, Frank Miller, Maths) (42, Frank Miller, English) (42, Frank Miller, Sports) (43, Suzy Smith, Maths) (43, Suzy Smith, History) But I want it turned around and a bit interpreted like: Column heads = (ID, Name, Maths, English, Sports, History) (42, Frank Miller, yes, yes, yes, no ) (43, Suzy Smith, yes, no, no, yes) Regards Andreas
On Nov 12, 2007, at 8:10 PM, Andreas wrote: > Fernando Hevia schrieb: >>> --- Andreas Wrote: --- >>> ... >>> >>> MY QUESTIONS: >>> >> >> Your questions have a strong "home-work" look. > > Yes but I didn't want to bother everyone with my project's details. > It's more like a CRM. > Up until now I just tried to manage somehow with the sql basics and > now I like to get better. > One interesting thing are crosstabs because I could use them in the > reporting module. > > I used this schoolbook scenario because it's such an easy > example. ;) > > >>> 1) How would I SELECT a report that looks like the first >>> version of >>> the pupil table out of the 3 table design? >>> There must be a nontrivial SELECT statement that combines all 3 >>> tables. >>> >> >> You should check out the JOIN clause in select statements. Simple >> example: >> >> Select t1.col1, t2.col1, t2.col2 >> >from t1 inner join t2 b on (t1.col1 = t2.col1) >> > > A simple JOIN won't do the trick. > That would give me something like: > (42, Frank Miller, Maths) > (42, Frank Miller, English) > (42, Frank Miller, Sports) > (43, Suzy Smith, Maths) > (43, Suzy Smith, History) > > But I want it turned around and a bit interpreted like: > > Column heads = (ID, Name, Maths, English, Sports, History) > (42, Frank Miller, yes, yes, yes, no ) > (43, Suzy Smith, yes, no, no, yes) You should look into the crosstab contrib package. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com