Thread: Best database model for canvassing (and analysing) opinion
Hi all, A school wants to offer a number of short courses on a number of different dates. Students apply online for a single course at a time and choose one or more dates (from a list) which would suit them. Once the application period is over which course is taught when is decided soley on the basis of maximising the number of students that can attend. Perhaps the simplest model is a database table 'application_forms' which includes two text columns; 'course' and 'preferred_dates' with entries that look like this: course: "Drama" prefered_dates: "Sat_22Aug09, Tue_25Aug09, Tue_08Sep09" The data can then be usefully presented in a series of SELECT statements (one for each date): AS SELECT count(*), course FROM application_forms WHERE preferred_dates like '%Sat_22Aug09%' GROUP BY course ORDER BY count DESC; count | course -------+--------- 7 | Drama 3 | Readers 1 | Self-study but clearly this method doesn't scale very well as the number of dates increases. A single table of results looking something like this would be far better, but how? date | course_suiting_most_applicants | num_applicants ----- ------+--------------------------------+--------------- Sat_22Aug09 | Drama | 7 Tue_25Aug09 | Readers | 4 Any advice/tips/pointers/suggestions for a database design newbie very much appreciated. Regards, Sebastian -- Emacs' AlsaPlayer - Music Without Jolts Lightweight, full-featured and mindful of your idyllic happiness. http://home.gna.org/eap
On Tue, Aug 18, 2009 at 08:58:12AM +0000, Sebastian Tennant wrote: > Perhaps the simplest model is a database table 'application_forms' which > includes two text columns; 'course' and 'preferred_dates' with entries that > look like this: > > course: "Drama" > prefered_dates: "Sat_22Aug09, Tue_25Aug09, Tue_08Sep09" I'd use the standard "date" data type for storing dates in, it's much more useful than text values. Then the conventional method of breaking single (non-normalized) tables down into smaller normalized tables. I've got a bit over the top here, but shows what could be done if this was going to be a bigger database. -- list of students, if you want CREATE TABLE students ( student TEXT PRIMARY KEY, name TEXT, email TEXT ); -- list of courses, again only if you want database to be able to check -- that people are signing up for valid courses CREATE TABLE courses ( course TEXT PRIMARY KEY, runby TEXT, description TEXT ); -- which courses are available on which days CREATE TABLE course_availability ( course TEXT REFERENCES courses, date DATE, PRIMARY KEY (course,date) ); -- which students want to do which courses CREATE TABLE application_forms ( student TEXT REFERENCES students, course TEXT REFERENCES courses, PRIMARY KEY (studentnum,course) ); -- and on which days do they want to do them CREATE TABLE application_preferred_date ( student TEXT, course TEXT, date DATE, PRIMARY KEY (student,course,date), FOREIGN KEY (student,course) REFERENCES application_forms, FOREIGN KEY (course,date) REFERENCES course_availability ); The only table that's really needed to solve your original problem would be the last one, but the others provide all the checks that the data is actually going in correctly and may or may not be useful depending on your problem. The main thing to notice is lots of tables with few columns, the reason being is that the database normally takes care of the rows and you, the DBA/programmer, take care of the columns. Thus the more work you can give to the database the better. If my student number is 'cs1234' and I want to do a database course, I would put in: INSERT INTO application_forms (studentnum,course) VALUES ('cs1234','database 101'); INSERT INTO application_preferred_date (studentnum,course,date) VALUES ('cs1234','database 101','2009-08-26'), ('cs1234','database 101','2009-08-28'), ('cs1234','database 101','2009-08-31'); If I wanted to know how many people wanted to do each course on each date, I'd just do: SELECT course, date, COUNT(*) FROM application_preferred_date GROUP BY course, date; Hope that gives you some ideas! -- Sam http://samason.me.uk/
Sam Mason wrote: > SELECT course, date, COUNT(*) > FROM application_preferred_date > GROUP BY course, date; > > Hope that gives you some ideas! > the problem as stated is more complex than that. A student could, in theory, pick several different courses on the same dates on the assumption that he doesn't care what course on which date. so if the optimizer/solver has counted this student for course1 on date1, he can't be counted for course2 on that same date, only on an alternate date he may have also specified.
On Tue, Aug 18, 2009 at 05:24:52AM -0700, John R Pierce wrote: > Sam Mason wrote: > > SELECT course, date, COUNT(*) > > FROM application_preferred_date > > GROUP BY course, date; > > the problem as stated is more complex than that. A student could, in > theory, pick several different courses on the same dates on the > assumption that he doesn't care what course on which date. Yup > so if the > optimizer/solver has counted this student for course1 on date1, he can't > be counted for course2 on that same date, only on an alternate date he > may have also specified. Huh, fun optimisation problem. There must be standard solutions to it though and I'm pretty sure it's something I'd like to solve outside the database. Or am I missing something else? I was just answering call for "advice/tips/pointers/suggestions for a database design newbie". -- Sam http://samason.me.uk/
Quoth Sam Mason <sam@samason.me.uk>: [...] > The only table that's really needed to solve your original problem would > be the last one, but the others provide all the checks that the data > is actually going in correctly and may or may not be useful depending > on your problem. The main thing to notice is lots of tables with few > columns, the reason being is that the database normally takes care of > the rows and you, the DBA/programmer, take care of the columns. Thus > the more work you can give to the database the better. [...] > Hope that gives you some ideas! More than enough ideas. Thank you _very_ much. Presenting this kind of 'distributed' data in a useful way is more difficult (at least for me) but I can see now that this is what _relational_ databses are all about, and that once you've grasped how to do this, the advantages are legion. Many thanks once again. Sebastian -- Emacs' AlsaPlayer - Music Without Jolts Lightweight, full-featured and mindful of your idyllic happiness. http://home.gna.org/eap