Thread: Storing questionnaire data
Hi, Is there any optimal and generally agreed way to store questionnaire data in a database? The questionnaire would have to support both of the following: - different question types (e.g. What is your name? (free form text) Are you a smoker? (yes/no checkbox) Are you male or female? (radio buttons) Select which country you are from (drop-down box).) - multiple paths (e.g. if a user were asked what their primary mode of transport is and they answered "a motorbike" they would be asked if they carry pillion passengers and how many ccs the engine is, whereas if they said something like walking they would be asked how far they walk to work and how long it takes) I have previously had a questionnaire which had 5 tables, questions and answers and question types, questionnaire and results. questions ======= id (serial) [PK] question (text) question_type (int) question_types =========== id (serial) [PK] description (text) answers ====== id (serial) [PK] answer (text) next_question_id (int) [FK to questions.id] questionnaire ========== id (serial) [PK] questionnaire_date (timestamp) results ===== id (serial) [PK] questionnaire_id [FK to questionnaire.id] question_id (int) [FK to questions.id] answer_id (int) answer_text (text) If the question was for free form text, the answer_id would be 0, which seems a bit kludgey to me. Plus because an answer ID can't be required due to free form text answers, I can't enforce a foreign key. Is there a nice elegant solution anyone knows of? Thanks Thom
On Wed, Oct 22, 2008 at 03:59:07PM +0100, Thom Brown wrote: > I have previously had a questionnaire which had 5 tables, questions > and answers and question types, questionnaire and results. This design looks a lot like the EAV (entity-attribute-value) style of database design. This tends to be frowned upon here but is good in some situations. I personally have tended to just go for a simple field per question spread over several tables (if it's a big questionnaire). I.e. something like: CREATE TABLE questionnaire ( qnid SERIAL PRIMARY KEY, entryname TEXT, qndate DATE ); CREATE TABLE questionnaire_page1 ( qnid INTEGER PRIMARY KEY REFERENCES questionnaire, business_purpose TEXT, average_turnover NUMERIC, num_employees INTEGER ); CREATE TABLE questionnaire_buildings ( qnid INTEGER REFERENCES questionnaire, buildnum INTEGER, PRIMARY KEY (qnid, buildnum), buildingname TEXT, buildingsize NUMERIC, lightlevel INTEGER, ventilation INTEGER ); This way you can enforce useful constraints inside the database, but requires the database to be aware of what questionnaire data you're actually storing. The EAV style would, in my eyes, be appropriate if you're trying to write a generic program that could handle arbitrary questionnaire forms and not just one specific one. Hope that helps Sam
Hi Sam, thanks for your suggestion. I'm afraid such a rigid structure is completely tailored for a specific questionnaire. What if I, or even a client, wanted to generate different questionnaires? I would like the data to indicate the flow of questions and answers rather than just use the database as pure storage for a completely coded solution. On Thu, Oct 23, 2008 at 7:01 PM, Sam Mason <sam@samason.me.uk> wrote: > On Wed, Oct 22, 2008 at 03:59:07PM +0100, Thom Brown wrote: >> I have previously had a questionnaire which had 5 tables, questions >> and answers and question types, questionnaire and results. > > This design looks a lot like the EAV (entity-attribute-value) style of > database design. This tends to be frowned upon here but is good in some > situations. I personally have tended to just go for a simple field per > question spread over several tables (if it's a big questionnaire). I.e. > something like: > > CREATE TABLE questionnaire ( > qnid SERIAL PRIMARY KEY, > entryname TEXT, > qndate DATE > ); > > CREATE TABLE questionnaire_page1 ( > qnid INTEGER PRIMARY KEY REFERENCES questionnaire, > business_purpose TEXT, > average_turnover NUMERIC, > num_employees INTEGER > ); > > CREATE TABLE questionnaire_buildings ( > qnid INTEGER REFERENCES questionnaire, > buildnum INTEGER, > PRIMARY KEY (qnid, buildnum), > buildingname TEXT, > buildingsize NUMERIC, > lightlevel INTEGER, > ventilation INTEGER > ); > > This way you can enforce useful constraints inside the database, but > requires the database to be aware of what questionnaire data you're > actually storing. The EAV style would, in my eyes, be appropriate if > you're trying to write a generic program that could handle arbitrary > questionnaire forms and not just one specific one. > > Hope that helps > > > Sam > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
On Wed, Oct 22, 2008 at 03:59:07PM +0100, Thom Brown wrote: > Hi, > > Is there any optimal and generally agreed way to store questionnaire > data in a database? > > The questionnaire would have to support both of the following: Without going EAV (almost always a mistake, this should get you a long way in the right direction :) <http://www.varlena.com/GeneralBits/110.php> Cheers, David. > - different question types (e.g. What is your name? (free form text) > Are you a smoker? (yes/no checkbox) Are you male or female? (radio > buttons) Select which country you are from (drop-down box).) > > - multiple paths (e.g. if a user were asked what their primary mode of > transport is and they answered "a motorbike" they would be asked if > they carry pillion passengers and how many ccs the engine is, whereas > if they said something like walking they would be asked how far they > walk to work and how long it takes) > > I have previously had a questionnaire which had 5 tables, questions > and answers and question types, questionnaire and results. > > questions > ======= > id (serial) [PK] > question (text) > question_type (int) > > question_types > =========== > id (serial) [PK] > description (text) > > answers > ====== > id (serial) [PK] > answer (text) > next_question_id (int) [FK to questions.id] > > questionnaire > ========== > id (serial) [PK] > questionnaire_date (timestamp) > > results > ===== > id (serial) [PK] > questionnaire_id [FK to questionnaire.id] > question_id (int) [FK to questions.id] > answer_id (int) > answer_text (text) > > If the question was for free form text, the answer_id would be 0, > which seems a bit kludgey to me. Plus because an answer ID can't be > required due to free form text answers, I can't enforce a foreign key. > > Is there a nice elegant solution anyone knows of? > > Thanks > > Thom > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, 2008-10-23 at 21:38 +0100, Thom Brown wrote: > I'm afraid such a rigid structure is completely tailored for a > specific questionnaire. What if I, or even a client, wanted to > generate different questionnaires? I would like the data to indicate > the flow of questions and answers rather than just use the database as > pure storage for a completely coded solution. If you construct a schema in such a way that there's really no constraint on the data at all, then the user of that permissive schema is effectively designing the database. This is not necessarily a bad thing (or rather, it may not be avoidable). For instance, if you have no idea what kind of questions might be asked by the questionnaire, nor any idea what kind of questions might be asked about the responses they receive to the questionnaire, there's not much you can do. Pretty much anything is going to look a lot like EAV. The cost of this, however, is that you (as the designer of the permissive schema) can no longer see any meaning in the data at all. You may be able to dig around manually a bit and find out a few specific things, but you can't do it in any automated way. This is because you aren't the real designer of the database, you've passed that job along to your users. Only they (hopefully) have any idea what it might mean. The users might not be good database designers, in which case they'll end up with a mess, and you won't be able to help them. Also, as a performance matter, the optimizer also has no idea what your data means, and so it can't take any useful shortcuts. So, it will probably be slow. The best you can really do is try to find whatever basic meaning you can. Usually there is something there: there are basic data types people will want (e.g. string, numeric, timestamp). There are questions, perhaps groups of questions, order in which the questions should be asked, order in which the questions are answered, time the question was answered, and respondents. There is one (or fewer) answer per question per respondent. Try to piece this stuff together in some way as to provide maximum meaning to you (and to PostgreSQL) without destroying the usefulness to your customers. I think the article David mentioned: http://www.varlena.com/GeneralBits/110.php Is a pretty reasonable compromise for many use-cases. Perhaps more can be done, but usually questionnaires are either too unimportant to really dig in, or so important that designing a database around it is the obvious thing to do. Regards, Jeff Davis
Thanks David and Jeff. I can see your point. The provided link might actually be useful, although I think I'd make some changes to it. I wouldn't have trouble data-mining such a structure for individual questionnaire results. The planner will be shrugging its shoulders, but I haven't actually tested that solution with many massive questionnaires for its query performance. I pretty much have my answer. Thanks for your input guys. Thom On Fri, Oct 24, 2008 at 12:31 AM, Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2008-10-23 at 21:38 +0100, Thom Brown wrote: >> I'm afraid such a rigid structure is completely tailored for a >> specific questionnaire. What if I, or even a client, wanted to >> generate different questionnaires? I would like the data to indicate >> the flow of questions and answers rather than just use the database as >> pure storage for a completely coded solution. > > If you construct a schema in such a way that there's really no > constraint on the data at all, then the user of that permissive schema > is effectively designing the database. > > This is not necessarily a bad thing (or rather, it may not be > avoidable). For instance, if you have no idea what kind of questions > might be asked by the questionnaire, nor any idea what kind of questions > might be asked about the responses they receive to the questionnaire, > there's not much you can do. Pretty much anything is going to look a lot > like EAV. > > The cost of this, however, is that you (as the designer of the > permissive schema) can no longer see any meaning in the data at all. You > may be able to dig around manually a bit and find out a few specific > things, but you can't do it in any automated way. This is because you > aren't the real designer of the database, you've passed that job along > to your users. Only they (hopefully) have any idea what it might mean. > The users might not be good database designers, in which case they'll > end up with a mess, and you won't be able to help them. > > Also, as a performance matter, the optimizer also has no idea what your > data means, and so it can't take any useful shortcuts. So, it will > probably be slow. > > The best you can really do is try to find whatever basic meaning you > can. Usually there is something there: there are basic data types people > will want (e.g. string, numeric, timestamp). There are questions, > perhaps groups of questions, order in which the questions should be > asked, order in which the questions are answered, time the question was > answered, and respondents. There is one (or fewer) answer per question > per respondent. Try to piece this stuff together in some way as to > provide maximum meaning to you (and to PostgreSQL) without destroying > the usefulness to your customers. > > I think the article David mentioned: > http://www.varlena.com/GeneralBits/110.php > Is a pretty reasonable compromise for many use-cases. Perhaps more can > be done, but usually questionnaires are either too unimportant to really > dig in, or so important that designing a database around it is the > obvious thing to do. > > Regards, > Jeff Davis > >
On Fri, Oct 24, 2008 at 09:34:20AM +0100, Thom Brown wrote: > Thanks David and Jeff. > > I can see your point. The provided link might actually be useful, > although I think I'd make some changes to it. Good :) It's not meant to be holy writ, just a way to see how you might approach this problem without getting the EAV monkey on your back. That monkey slowly turns into an 800-lb gorilla, and then grows very quickly, crushing your app. > I wouldn't have trouble data-mining such a structure for individual > questionnaire results. The planner will be shrugging its shoulders, > but I haven't actually tested that solution with many massive > questionnaires for its query performance. > > I pretty much have my answer. Thanks for your input guys. Happy to help :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
You may want to look at this article for some ideas: http://www.varlena.com/GeneralBits/110.php The article talks about doing dynamic schema design for online surveys... it's pretty interesting. Aaron Thul http://www.chasingnuts.com On Wed, Oct 22, 2008 at 10:59 AM, Thom Brown <thombrown@gmail.com> wrote: > Hi, > > Is there any optimal and generally agreed way to store questionnaire > data in a database? > > The questionnaire would have to support both of the following: > > - different question types (e.g. What is your name? (free form text) > Are you a smoker? (yes/no checkbox) Are you male or female? (radio > buttons) Select which country you are from (drop-down box).) > > - multiple paths (e.g. if a user were asked what their primary mode of > transport is and they answered "a motorbike" they would be asked if > they carry pillion passengers and how many ccs the engine is, whereas > if they said something like walking they would be asked how far they > walk to work and how long it takes) > > I have previously had a questionnaire which had 5 tables, questions > and answers and question types, questionnaire and results. > > questions > ======= > id (serial) [PK] > question (text) > question_type (int) > > question_types > =========== > id (serial) [PK] > description (text) > > answers > ====== > id (serial) [PK] > answer (text) > next_question_id (int) [FK to questions.id] > > questionnaire > ========== > id (serial) [PK] > questionnaire_date (timestamp) > > results > ===== > id (serial) [PK] > questionnaire_id [FK to questionnaire.id] > question_id (int) [FK to questions.id] > answer_id (int) > answer_text (text) > > If the question was for free form text, the answer_id would be 0, > which seems a bit kludgey to me. Plus because an answer ID can't be > required due to free form text answers, I can't enforce a foreign key. > > Is there a nice elegant solution anyone knows of? > > Thanks > > Thom > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >