Thread: Best database model for canvassing (and analysing) opinion

Best database model for canvassing (and analysing) opinion

From
Sebastian Tennant
Date:
Hi all,

A school wants to offer a number of short courses on a number of different
dates.  Students apply online for a single course at a time and choose one or
more dates (from a list) which would suit them. Once the application period is
over which course is taught when is decided soley on the basis of maximising
the number of students that can attend.

Perhaps the simplest model is a database table 'application_forms' which
includes two text columns; 'course' and 'preferred_dates' with entries that
look like this:

         course: "Drama"
 prefered_dates: "Sat_22Aug09, Tue_25Aug09, Tue_08Sep09"

The data can then be usefully presented in a series of SELECT statements (one
for each date):

 AS SELECT count(*), course FROM application_forms WHERE preferred_dates like
 '%Sat_22Aug09%' GROUP BY course ORDER BY count DESC;

  count | course
 -------+---------
      7 | Drama
      3 | Readers
      1 | Self-study

but clearly this method doesn't scale very well as the number of dates
increases.

A single table of results looking something like this would be far better, but
how?

        date | course_suiting_most_applicants | num_applicants
 ----- ------+--------------------------------+---------------
 Sat_22Aug09 | Drama                          | 7
 Tue_25Aug09 | Readers                        | 4

Any advice/tips/pointers/suggestions for a database design newbie very much
appreciated.

Regards,

Sebastian

--
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap

Re: Best database model for canvassing (and analysing) opinion

From
Sam Mason
Date:
On Tue, Aug 18, 2009 at 08:58:12AM +0000, Sebastian Tennant wrote:
> Perhaps the simplest model is a database table 'application_forms' which
> includes two text columns; 'course' and 'preferred_dates' with entries that
> look like this:
>
>          course: "Drama"
>  prefered_dates: "Sat_22Aug09, Tue_25Aug09, Tue_08Sep09"

I'd use the standard "date" data type for storing dates in, it's much
more useful than text values.  Then the conventional method of breaking
single (non-normalized) tables down into smaller normalized tables.
I've got a bit over the top here, but shows what could be done if this
was going to be a bigger database.

-- list of students, if you want
  CREATE TABLE students (
    student TEXT PRIMARY KEY,
    name    TEXT,
    email   TEXT
  );

-- list of courses, again only if you want database to be able to check
-- that people are signing up for valid courses
  CREATE TABLE courses (
    course      TEXT PRIMARY KEY,
    runby       TEXT,
    description TEXT
  );

-- which courses are available on which days
  CREATE TABLE course_availability (
    course TEXT REFERENCES courses,
    date   DATE,
      PRIMARY KEY (course,date)
  );

-- which students want to do which courses
  CREATE TABLE application_forms (
    student TEXT REFERENCES students,
    course  TEXT REFERENCES courses,
      PRIMARY KEY (studentnum,course)
  );

-- and on which days do they want to do them
  CREATE TABLE application_preferred_date (
    student TEXT,
    course  TEXT,
    date    DATE,
      PRIMARY KEY (student,course,date),
      FOREIGN KEY (student,course) REFERENCES application_forms,
      FOREIGN KEY (course,date) REFERENCES course_availability
  );

The only table that's really needed to solve your original problem would
be the last one, but the others provide all the checks that the data
is actually going in correctly and may or may not be useful depending
on your problem.  The main thing to notice is lots of tables with few
columns, the reason being is that the database normally takes care of
the rows and you, the DBA/programmer, take care of the columns.  Thus
the more work you can give to the database the better.

If my student number is 'cs1234' and I want to do a database course, I
would put in:

  INSERT INTO application_forms (studentnum,course) VALUES
    ('cs1234','database 101');
  INSERT INTO application_preferred_date (studentnum,course,date) VALUES
    ('cs1234','database 101','2009-08-26'),
    ('cs1234','database 101','2009-08-28'),
    ('cs1234','database 101','2009-08-31');

If I wanted to know how many people wanted to do each course on each
date, I'd just do:

  SELECT course, date, COUNT(*)
  FROM application_preferred_date
  GROUP BY course, date;

Hope that gives you some ideas!

--
  Sam  http://samason.me.uk/

Re: Best database model for canvassing (and analysing) opinion

From
John R Pierce
Date:
Sam Mason wrote:
>   SELECT course, date, COUNT(*)
>   FROM application_preferred_date
>   GROUP BY course, date;
>
> Hope that gives you some ideas!
>

the problem as stated is more complex than that.  A student could, in
theory, pick several different courses on the same dates on the
assumption that he doesn't care what course on which date.   so if the
optimizer/solver has counted this student for course1 on date1, he can't
be counted for course2 on that same date, only on an alternate date he
may have also specified.



Re: Best database model for canvassing (and analysing) opinion

From
Sam Mason
Date:
On Tue, Aug 18, 2009 at 05:24:52AM -0700, John R Pierce wrote:
> Sam Mason wrote:
> >  SELECT course, date, COUNT(*)
> >  FROM application_preferred_date
> >  GROUP BY course, date;
>
> the problem as stated is more complex than that.  A student could, in
> theory, pick several different courses on the same dates on the
> assumption that he doesn't care what course on which date.

Yup

> so if the
> optimizer/solver has counted this student for course1 on date1, he can't
> be counted for course2 on that same date, only on an alternate date he
> may have also specified.

Huh, fun optimisation problem.  There must be standard solutions to it
though and I'm pretty sure it's something I'd like to solve outside the
database.  Or am I missing something else?

I was just answering call for "advice/tips/pointers/suggestions for a
database design newbie".

--
  Sam  http://samason.me.uk/

Re: Best database model for canvassing (and analysing) opinion

From
Sebastian Tennant
Date:
Quoth Sam Mason <sam@samason.me.uk>:

[...]

> The only table that's really needed to solve your original problem would
> be the last one, but the others provide all the checks that the data
> is actually going in correctly and may or may not be useful depending
> on your problem.  The main thing to notice is lots of tables with few
> columns, the reason being is that the database normally takes care of
> the rows and you, the DBA/programmer, take care of the columns.  Thus
> the more work you can give to the database the better.

[...]

> Hope that gives you some ideas!

More than enough ideas.  Thank you _very_ much.

Presenting this kind of 'distributed' data in a useful way is more difficult
(at least for me) but I can see now that this is what _relational_ databses are
all about, and that once you've grasped how to do this, the advantages are
legion.

Many thanks once again.

Sebastian
--
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap