Thread: design of tables for sparse data

design of tables for sparse data

From
Andreas
Date:
Hi,

I need some help to improve my design skills.   :)

I lately read an article about table design, that teached one shouldn't 
designe tables where it's clear that some columns aren't relevant for 
every row. It didn't span into the dirty usage details beyond the table 
design.

E.g. a really simple example like a school that stores pupils like this:
pupil (pupil_id, pupil_name, attends_english, attends_history, 
attends_maths, attends_football, attends_swimming)

1)   Some pupils don't attend to football, swimming or both.
2)   Occasionally there will be new classes added and others get 
dropped. Say in a year a column "attends_knitting" gets introduced. Now 
all those 50,000 existing rows get a column where the person hadn't even 
the occasion to apply.
If for some reason the knitting class gets discontinued every row in the 
future will still get this column.

So it was better to create 3 normalized tables:
pupil  (pupil_id,  pupil_name,  start_date,  exit_date)
classes  (class_id,  class_name,  is_available,  output_order)
attends_to  (pupil_id,  class_id,  in_year)   as an n:m-relation

Fine. Now I got rid off those empty columns in the pupil table.


MY QUESTIONS:

1)   How would I SELECT a report that looks like the first version of 
the pupil table out of the 3 table design?
There must be a nontrivial SELECT statement that combines all 3 tables.
E.g. I want the result:
pupil_id, pupil_name, attends_to_english, ....., attends_to_football, 
attends_to_swimming, attends_to_knitting
(42, Frank Miller, yes, ...., no, yes, yes)
(43, Suzy Smith, yes, ..., yes, yes, no)
...

2)   Could I control the order in which those attends_to-columns appear 
by a numerical field output_order?

3)   Could I restrict the classes list so that only those appear when 
there are pupils actually attending them in a given time frame?
3) a)   Like "competitve knitting" was only available from 2000-2005. 
Now I'd produce a list of 2007 so there shouldn't appear an empty 
knitting-column.  -->  classes.is_availlable
3) b)   Or it is availlable but no one has chosen it in 2007. --> 
attends_to.in_year


Regards
Andreas



Re: design of tables for sparse data

From
"Fernando Hevia"
Date:
> --- Andreas Wrote: ---
> ...
>
> MY QUESTIONS:

Your questions have a strong "home-work" look. 

> 
> 1)   How would I SELECT a report that looks like the first version of
> the pupil table out of the 3 table design?
> There must be a nontrivial SELECT statement that combines all 3 tables.
> E.g. I want the result:
> pupil_id, pupil_name, attends_to_english, ....., attends_to_football,
> attends_to_swimming, attends_to_knitting
> (42, Frank Miller, yes, ...., no, yes, yes)
> (43, Suzy Smith, yes, ..., yes, yes, no)
> ...

You should check out the JOIN clause in select statements. 
Simple example:

Select t1.col1, t2.col1, t2.col2
from t1 inner join t2 b on (t1.col1 = t2.col1)

> 
> 2)   Could I control the order in which those attends_to-columns appear
> by a numerical field output_order?
> 

You specify the order of output columns in the select statement. If you want
to do this dynamically (say each user wants to configure its own order) you
are really better of programming in your front-end application. No trivial
solution in a pure SQL solution. It would probably require some dynamic sql
and another table which holds de column printout order.

> 3)   Could I restrict the classes list so that only those appear when
> there are pupils actually attending them in a given time frame?
> 3) a)   Like "competitve knitting" was only available from 2000-2005.
> Now I'd produce a list of 2007 so there shouldn't appear an empty
> knitting-column.  -->  classes.is_availlable
> 3) b)   Or it is availlable but no one has chosen it in 2007. -->
> attends_to.in_year
> 

Yes, you could. Read about different JOINS and WHERE clauses. ;)


Regards,
Fernando.



Re: design of tables for sparse data

From
Andreas
Date:
Fernando Hevia schrieb:
>> --- Andreas Wrote: ---
>> ...
>>
>> MY QUESTIONS:
>>     
>
> Your questions have a strong "home-work" look. 
>   

Yes but I didn't want to bother everyone with my project's details.
It's more like a CRM.
Up until now I just tried to manage somehow with the sql basics and now 
I like to get better.
One interesting thing are crosstabs because I could use them in the 
reporting module.

I used this schoolbook scenario because it's such an easy example.   ;)


>> 1)   How would I SELECT a report that looks like the first version of
>> the pupil table out of the 3 table design?
>> There must be a nontrivial SELECT statement that combines all 3 tables.
>>     
>
> You should check out the JOIN clause in select statements. 
> Simple example:
>
> Select t1.col1, t2.col1, t2.col2
> >from t1 inner join t2 b on (t1.col1 = t2.col1)
>   

A simple JOIN won't do the trick.
That would give me something like:
(42, Frank Miller,  Maths)
(42, Frank Miller,  English)
(42, Frank Miller,  Sports)
(43, Suzy Smith,  Maths)
(43, Suzy Smith,  History)

But I want it turned around and a bit interpreted like:

Column heads = (ID, Name, Maths, English, Sports, History)
(42, Frank Miller, yes, yes, yes, no )
(43, Suzy Smith, yes, no, no, yes)


Regards
Andreas



Re: design of tables for sparse data

From
Erik Jones
Date:
On Nov 12, 2007, at 8:10 PM, Andreas wrote:

> Fernando Hevia schrieb:
>>> --- Andreas Wrote: ---
>>> ...
>>>
>>> MY QUESTIONS:
>>>
>>
>> Your questions have a strong "home-work" look.
>
> Yes but I didn't want to bother everyone with my project's details.
> It's more like a CRM.
> Up until now I just tried to manage somehow with the sql basics and
> now I like to get better.
> One interesting thing are crosstabs because I could use them in the
> reporting module.
>
> I used this schoolbook scenario because it's such an easy
> example.   ;)
>
>
>>> 1)   How would I SELECT a report that looks like the first
>>> version of
>>> the pupil table out of the 3 table design?
>>> There must be a nontrivial SELECT statement that combines all 3
>>> tables.
>>>
>>
>> You should check out the JOIN clause in select statements. Simple
>> example:
>>
>> Select t1.col1, t2.col1, t2.col2
>> >from t1 inner join t2 b on (t1.col1 = t2.col1)
>>
>
> A simple JOIN won't do the trick.
> That would give me something like:
> (42, Frank Miller,  Maths)
> (42, Frank Miller,  English)
> (42, Frank Miller,  Sports)
> (43, Suzy Smith,  Maths)
> (43, Suzy Smith,  History)
>
> But I want it turned around and a bit interpreted like:
>
> Column heads = (ID, Name, Maths, English, Sports, History)
> (42, Frank Miller, yes, yes, yes, no )
> (43, Suzy Smith, yes, no, no, yes)

You should look into the crosstab contrib package.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com