Thread: Storing questionnaire data

Storing questionnaire data

From
"Thom Brown"
Date:
Hi,

Is there any optimal and generally agreed way to store questionnaire
data in a database?

The questionnaire would have to support both of the following:

- different question types (e.g. What is your name? (free form text)
Are you a smoker? (yes/no checkbox)  Are you male or female? (radio
buttons)  Select which country you are from (drop-down box).)

- multiple paths (e.g. if a user were asked what their primary mode of
transport is and they answered "a motorbike" they would be asked if
they carry pillion passengers and how many ccs the engine is, whereas
if they said something like walking they would be asked how far they
walk to work and how long it takes)

I have previously had a questionnaire which had 5 tables, questions
and answers and question types, questionnaire and results.

questions
=======
id (serial) [PK]
question (text)
question_type (int)

question_types
===========
id (serial) [PK]
description (text)

answers
======
id (serial) [PK]
answer (text)
next_question_id (int) [FK to questions.id]

questionnaire
==========
id (serial) [PK]
questionnaire_date (timestamp)

results
=====
id (serial) [PK]
questionnaire_id [FK to questionnaire.id]
question_id (int) [FK to questions.id]
answer_id (int)
answer_text (text)

If the question was for free form text, the answer_id would be 0,
which seems a bit kludgey to me.  Plus because an answer ID can't be
required due to free form text answers, I can't enforce a foreign key.

Is there a nice elegant solution anyone knows of?

Thanks

Thom

Re: Storing questionnaire data

From
Sam Mason
Date:
On Wed, Oct 22, 2008 at 03:59:07PM +0100, Thom Brown wrote:
> I have previously had a questionnaire which had 5 tables, questions
> and answers and question types, questionnaire and results.

This design looks a lot like the EAV (entity-attribute-value) style of
database design.  This tends to be frowned upon here but is good in some
situations.  I personally have tended to just go for a simple field per
question spread over several tables (if it's a big questionnaire).  I.e.
something like:

  CREATE TABLE questionnaire (
     qnid SERIAL PRIMARY KEY,
     entryname TEXT,
     qndate DATE
  );

  CREATE TABLE questionnaire_page1 (
    qnid INTEGER PRIMARY KEY REFERENCES questionnaire,
    business_purpose TEXT,
    average_turnover NUMERIC,
    num_employees INTEGER
  );

  CREATE TABLE questionnaire_buildings (
    qnid INTEGER REFERENCES questionnaire,
    buildnum INTEGER,
      PRIMARY KEY (qnid, buildnum),
    buildingname TEXT,
    buildingsize NUMERIC,
    lightlevel INTEGER,
    ventilation INTEGER
  );

This way you can enforce useful constraints inside the database, but
requires the database to be aware of what questionnaire data you're
actually storing.  The EAV style would, in my eyes, be appropriate if
you're trying to write a generic program that could handle arbitrary
questionnaire forms and not just one specific one.

Hope that helps


  Sam

Re: Storing questionnaire data

From
"Thom Brown"
Date:
Hi Sam, thanks for your suggestion.

I'm afraid such a rigid structure is completely tailored for a
specific questionnaire.  What if I, or even a client, wanted to
generate different questionnaires?  I would like the data to indicate
the flow of questions and answers rather than just use the database as
pure storage for a completely coded solution.

On Thu, Oct 23, 2008 at 7:01 PM, Sam Mason <sam@samason.me.uk> wrote:
> On Wed, Oct 22, 2008 at 03:59:07PM +0100, Thom Brown wrote:
>> I have previously had a questionnaire which had 5 tables, questions
>> and answers and question types, questionnaire and results.
>
> This design looks a lot like the EAV (entity-attribute-value) style of
> database design.  This tends to be frowned upon here but is good in some
> situations.  I personally have tended to just go for a simple field per
> question spread over several tables (if it's a big questionnaire).  I.e.
> something like:
>
>  CREATE TABLE questionnaire (
>     qnid SERIAL PRIMARY KEY,
>     entryname TEXT,
>     qndate DATE
>  );
>
>  CREATE TABLE questionnaire_page1 (
>    qnid INTEGER PRIMARY KEY REFERENCES questionnaire,
>    business_purpose TEXT,
>    average_turnover NUMERIC,
>    num_employees INTEGER
>  );
>
>  CREATE TABLE questionnaire_buildings (
>    qnid INTEGER REFERENCES questionnaire,
>    buildnum INTEGER,
>      PRIMARY KEY (qnid, buildnum),
>    buildingname TEXT,
>    buildingsize NUMERIC,
>    lightlevel INTEGER,
>    ventilation INTEGER
>  );
>
> This way you can enforce useful constraints inside the database, but
> requires the database to be aware of what questionnaire data you're
> actually storing.  The EAV style would, in my eyes, be appropriate if
> you're trying to write a generic program that could handle arbitrary
> questionnaire forms and not just one specific one.
>
> Hope that helps
>
>
>  Sam
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Storing questionnaire data

From
David Fetter
Date:
On Wed, Oct 22, 2008 at 03:59:07PM +0100, Thom Brown wrote:
> Hi,
>
> Is there any optimal and generally agreed way to store questionnaire
> data in a database?
>
> The questionnaire would have to support both of the following:

Without going EAV (almost always a mistake, this should get you a long
way in the right direction :)
<http://www.varlena.com/GeneralBits/110.php>

Cheers,
David.
> - different question types (e.g. What is your name? (free form text)
> Are you a smoker? (yes/no checkbox)  Are you male or female? (radio
> buttons)  Select which country you are from (drop-down box).)
>
> - multiple paths (e.g. if a user were asked what their primary mode of
> transport is and they answered "a motorbike" they would be asked if
> they carry pillion passengers and how many ccs the engine is, whereas
> if they said something like walking they would be asked how far they
> walk to work and how long it takes)
>
> I have previously had a questionnaire which had 5 tables, questions
> and answers and question types, questionnaire and results.
>
> questions
> =======
> id (serial) [PK]
> question (text)
> question_type (int)
>
> question_types
> ===========
> id (serial) [PK]
> description (text)
>
> answers
> ======
> id (serial) [PK]
> answer (text)
> next_question_id (int) [FK to questions.id]
>
> questionnaire
> ==========
> id (serial) [PK]
> questionnaire_date (timestamp)
>
> results
> =====
> id (serial) [PK]
> questionnaire_id [FK to questionnaire.id]
> question_id (int) [FK to questions.id]
> answer_id (int)
> answer_text (text)
>
> If the question was for free form text, the answer_id would be 0,
> which seems a bit kludgey to me.  Plus because an answer ID can't be
> required due to free form text answers, I can't enforce a foreign key.
>
> Is there a nice elegant solution anyone knows of?
>
> Thanks
>
> Thom
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Storing questionnaire data

From
Jeff Davis
Date:
On Thu, 2008-10-23 at 21:38 +0100, Thom Brown wrote:
> I'm afraid such a rigid structure is completely tailored for a
> specific questionnaire.  What if I, or even a client, wanted to
> generate different questionnaires?  I would like the data to indicate
> the flow of questions and answers rather than just use the database as
> pure storage for a completely coded solution.

If you construct a schema in such a way that there's really no
constraint on the data at all, then the user of that permissive schema
is effectively designing the database.

This is not necessarily a bad thing (or rather, it may not be
avoidable). For instance, if you have no idea what kind of questions
might be asked by the questionnaire, nor any idea what kind of questions
might be asked about the responses they receive to the questionnaire,
there's not much you can do. Pretty much anything is going to look a lot
like EAV.

The cost of this, however, is that you (as the designer of the
permissive schema) can no longer see any meaning in the data at all. You
may be able to dig around manually a bit and find out a few specific
things, but you can't do it in any automated way. This is because you
aren't the real designer of the database, you've passed that job along
to your users. Only they (hopefully) have any idea what it might mean.
The users might not be good database designers, in which case they'll
end up with a mess, and you won't be able to help them.

Also, as a performance matter, the optimizer also has no idea what your
data means, and so it can't take any useful shortcuts. So, it will
probably be slow.

The best you can really do is try to find whatever basic meaning you
can. Usually there is something there: there are basic data types people
will want (e.g. string, numeric, timestamp). There are questions,
perhaps groups of questions, order in which the questions should be
asked, order in which the questions are answered, time the question was
answered, and respondents. There is one (or fewer) answer per question
per respondent. Try to piece this stuff together in some way as to
provide maximum meaning to you (and to PostgreSQL) without destroying
the usefulness to your customers.

I think the article David mentioned:
http://www.varlena.com/GeneralBits/110.php
Is a pretty reasonable compromise for many use-cases. Perhaps more can
be done, but usually questionnaires are either too unimportant to really
dig in, or so important that designing a database around it is the
obvious thing to do.

Regards,
    Jeff Davis


Re: Storing questionnaire data

From
"Thom Brown"
Date:
Thanks David and Jeff.

I can see your point.  The provided link might actually be useful,
although I think I'd make some changes to it.

I wouldn't have trouble data-mining such a structure for individual
questionnaire results.  The planner will be shrugging its shoulders,
but I haven't actually tested that solution with many massive
questionnaires for its query performance.

I pretty much have my answer.  Thanks for your input guys.

Thom

On Fri, Oct 24, 2008 at 12:31 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2008-10-23 at 21:38 +0100, Thom Brown wrote:
>> I'm afraid such a rigid structure is completely tailored for a
>> specific questionnaire.  What if I, or even a client, wanted to
>> generate different questionnaires?  I would like the data to indicate
>> the flow of questions and answers rather than just use the database as
>> pure storage for a completely coded solution.
>
> If you construct a schema in such a way that there's really no
> constraint on the data at all, then the user of that permissive schema
> is effectively designing the database.
>
> This is not necessarily a bad thing (or rather, it may not be
> avoidable). For instance, if you have no idea what kind of questions
> might be asked by the questionnaire, nor any idea what kind of questions
> might be asked about the responses they receive to the questionnaire,
> there's not much you can do. Pretty much anything is going to look a lot
> like EAV.
>
> The cost of this, however, is that you (as the designer of the
> permissive schema) can no longer see any meaning in the data at all. You
> may be able to dig around manually a bit and find out a few specific
> things, but you can't do it in any automated way. This is because you
> aren't the real designer of the database, you've passed that job along
> to your users. Only they (hopefully) have any idea what it might mean.
> The users might not be good database designers, in which case they'll
> end up with a mess, and you won't be able to help them.
>
> Also, as a performance matter, the optimizer also has no idea what your
> data means, and so it can't take any useful shortcuts. So, it will
> probably be slow.
>
> The best you can really do is try to find whatever basic meaning you
> can. Usually there is something there: there are basic data types people
> will want (e.g. string, numeric, timestamp). There are questions,
> perhaps groups of questions, order in which the questions should be
> asked, order in which the questions are answered, time the question was
> answered, and respondents. There is one (or fewer) answer per question
> per respondent. Try to piece this stuff together in some way as to
> provide maximum meaning to you (and to PostgreSQL) without destroying
> the usefulness to your customers.
>
> I think the article David mentioned:
> http://www.varlena.com/GeneralBits/110.php
> Is a pretty reasonable compromise for many use-cases. Perhaps more can
> be done, but usually questionnaires are either too unimportant to really
> dig in, or so important that designing a database around it is the
> obvious thing to do.
>
> Regards,
>        Jeff Davis
>
>

Re: Storing questionnaire data

From
David Fetter
Date:
On Fri, Oct 24, 2008 at 09:34:20AM +0100, Thom Brown wrote:
> Thanks David and Jeff.
>
> I can see your point.  The provided link might actually be useful,
> although I think I'd make some changes to it.

Good :)

It's not meant to be holy writ, just a way to see how you might
approach this problem without getting the EAV monkey on your back.
That monkey slowly turns into an 800-lb gorilla, and then grows very
quickly, crushing your app.

> I wouldn't have trouble data-mining such a structure for individual
> questionnaire results.  The planner will be shrugging its shoulders,
> but I haven't actually tested that solution with many massive
> questionnaires for its query performance.
>
> I pretty much have my answer.  Thanks for your input guys.

Happy to help :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Storing questionnaire data

From
Aaron
Date:
You may want to look at this article for some ideas:
http://www.varlena.com/GeneralBits/110.php
The article talks about doing dynamic schema design for online
surveys... it's pretty interesting.

Aaron Thul
http://www.chasingnuts.com



On Wed, Oct 22, 2008 at 10:59 AM, Thom Brown <thombrown@gmail.com> wrote:
> Hi,
>
> Is there any optimal and generally agreed way to store questionnaire
> data in a database?
>
> The questionnaire would have to support both of the following:
>
> - different question types (e.g. What is your name? (free form text)
> Are you a smoker? (yes/no checkbox)  Are you male or female? (radio
> buttons)  Select which country you are from (drop-down box).)
>
> - multiple paths (e.g. if a user were asked what their primary mode of
> transport is and they answered "a motorbike" they would be asked if
> they carry pillion passengers and how many ccs the engine is, whereas
> if they said something like walking they would be asked how far they
> walk to work and how long it takes)
>
> I have previously had a questionnaire which had 5 tables, questions
> and answers and question types, questionnaire and results.
>
> questions
> =======
> id (serial) [PK]
> question (text)
> question_type (int)
>
> question_types
> ===========
> id (serial) [PK]
> description (text)
>
> answers
> ======
> id (serial) [PK]
> answer (text)
> next_question_id (int) [FK to questions.id]
>
> questionnaire
> ==========
> id (serial) [PK]
> questionnaire_date (timestamp)
>
> results
> =====
> id (serial) [PK]
> questionnaire_id [FK to questionnaire.id]
> question_id (int) [FK to questions.id]
> answer_id (int)
> answer_text (text)
>
> If the question was for free form text, the answer_id would be 0,
> which seems a bit kludgey to me.  Plus because an answer ID can't be
> required due to free form text answers, I can't enforce a foreign key.
>
> Is there a nice elegant solution anyone knows of?
>
> Thanks
>
> Thom
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>