Thread: How do i store arbitrary questions and answers in SQL?
Dear group, I have been bugged by this problem on and off for years, and would like to put it out to general discussion. I suspect it is a common SQL problem but I have never found a satisfactory answer - maybe there is not one. The problem : to store, in SQL/RDBMS, an arbitrary set of questions and their answers, where the questions are text (obviously) but the answers can be dates, text, integers, money etc. think of it as a big questionnaire where at design time we have no idea what the questions will be. My usual solution to this problem is to store everything in varchar and flag the type, converting as I extract data. It is not a desirable solution. i.e. CREATE TABLE Qu (ID INT, Question VARCHAR(64)) CREATE TABLE Answer(ID INT, questionID INT, answer_text VARCHAR(255), datatype INT) Are there any other solutions out there? cheers PS I think I am essentially looking for a database that stores a datatype of VARIANT (for those of MS VB background). However my (patchy) knowledge of RDBMS internals says this is essentially breaks all the rules for optimisation and storage so is never offered. I can think of several ideas but they all fall short in some way 0. (current one) I have to cast any search term to string first but after that the SQL performs as one expects. 1. store the serialised object in binary form. (how does one search this? Totally dependant on choice of middleware language) 2. Store different types in diff columns table answer (questionID, ans_text VARCHAR, ans_money MONEY, ans_int INT .... But this makes searching through SQL even harder than casting, as in each query i must what answer to expect. 3. Different answer tables each with answer types - same problem as 2 but even harder. 4. I suspect pl/python might be useful, but I cannot see how at the moment...
Hi, On Wed, Aug 23, 2006 at 02:40:29AM -0700, lifeisgood wrote: > CREATE TABLE Answer(ID INT, questionID INT, answer_text VARCHAR(255), > datatype INT) You can actually store answer_text as a BLOB in the database that is packed in any format your application can handle. You can serialize a class that represents the answer you need and store it, for example. -HTH -- Enver
Attachment
On Wed, Aug 23, 2006 at 04:13:24PM +0300, Enver ALTIN wrote: > > CREATE TABLE Answer(ID INT, questionID INT, answer_text VARCHAR(255), > > datatype INT) > > You can actually store answer_text as a BLOB in the database that is > packed in any format your application can handle. You can serialize a > class that represents the answer you need and store it, for example. There was an article on that on Elein's General Bits not too long ago. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
lifeisgood wrote: > The problem : to store, in SQL/RDBMS, an arbitrary set of questions and > their answers, where the questions are text (obviously) but the answers > can be dates, text, integers, money etc. > think of it as a big questionnaire where at design time we have no idea > what the questions will be. > > My usual solution to this problem is to store everything in varchar and > flag the type, converting > as I extract data. It is not a desirable solution. > > i.e. > CREATE TABLE Qu (ID INT, Question VARCHAR(64)) > > CREATE TABLE Answer(ID INT, questionID INT, answer_text VARCHAR(255), > datatype INT) Use text, not varchar - varchar is just a waste of time and space. This approach isn't all that bad, since if you're using libpq or similar to read and write the values (without using a binary cursor), you're using a text representation anyway. Just use the same text representation of your data that the db interface is going to use. > Are there any other solutions out there? The other solution is to partition your table, make some number of tables, one for each data type, with the value column using the correct native type. Your code that reads and writes values then needs to be smart enough to use the correct table depending on the data type. We've used both approaches, and they both work fine for us. The text for everything approach has the advantage of simplicity of interface code, whereas partitioning on data type gives you better query planning and better performance when you have a lot (ie millions of rows plus) of data. > I can think of several ideas but they all fall short in some way > > 0. (current one) I have to cast any search term to string first but > after that the SQL performs as one expects. No, you don't have to "cast" things to text - quite the reverse; if you are querying on the contents of your value (answer) column and your criterion depends on the correct type (eg find questions with integer answers greater than 42) then you have to cast the text to integer in the query. > 1. store the serialised object in binary form. (how does one search > this? Totally dependant on choice of middleware language) I'd avoid this one - for the reason you've mentioned, among others. Not sure what the middleware language has to do with it, though - if your choice of middleware makes things harder then it's the wrong choice. If middleware doesn't make things easier, then what use is it? > 2. Store different types in diff columns > table answer (questionID, ans_text VARCHAR, ans_money MONEY, > ans_int INT .... > But this makes searching through SQL even harder than casting, as > in each query i must what answer to expect. Definitely avoid this one. Lots of wasted space, as well as extra software complexity, with little payoff. > 3. Different answer tables each with answer types - same problem as 2 > but even harder. This is the other option I mentioned above. It's not hard at all. > 4. I suspect pl/python might be useful, but I cannot see how at the > moment... I can't see it either :). Decide what you want to do first, _then_ work out how to implement it. Tim -- ----------------------------------------------- Tim Allen tim@proximity.com.au Proximity Pty Ltd http://www.proximity.com.au/
I agree with Tim, that your option 3 is really not all that hard, and I suggest would provide you with the best solution. I'm assuming, however, that your application asks the question and checks the user supplied answer with the answer in the db (and not Jeopardy style). I might add a 'join' or 'answertype' field in the questions (qu) table to assist when drawing the application and to immediately know what table to verify the supplied answer against. Otherwise, I think you might need a query to check the type on the answer the user supplies and make a possible few passes (through the numeric and integer tables for example when the user's response is 42) - or some combination of outers to find the non null value field. For what it's worth, I'd probably also include a 'type' table to hold my application specific mask for the user's reply, and perhaps a attribute to hold the joining table name (I'm thinking about the future requirement to add a new type without much coding changes on the application side). So, now I'm interested in this, and I could see a use for something similar in the near future, I'll put the following out there for comment. Obviously I haven't put it into practice but perhaps it could work. One of differences from your original post is the absense of a serial field on the responses. I've only put a primary key which would link back to the the questions.qid field (so maybe it should be a fk? see my note about my pg newness in a moment). I'm assuming this is not a multiple choice type of situation. Each question has one answer (although not currently enforced in the db layer here since there is nothing stopping you from placing an answer in more than one of the response tables - I'm relatively new to pg and not sure of anyway to deal with this). It's early, haven't had my first cup yet, but I would start with something like the following: CREATE TABLE questions -- holds the questions or challenges ( qid serial NOT NULL, qchallenge text, qtype int2, -- linking to types.tid CONSTRAINT questions_pkey PRIMARY KEY (qid) ) WITHOUT OIDS; CREATE TABLE types -- mostly to assist application development ( tid serial NOT NULL, tdescription varchar(25), tmask varchar(25), -- just a thought, could be useful when building a web app tjoin varchar, -- again, just thinking about ease of new additions CONSTRAINT types_pkey PRIMARY KEY (tid) ) WITHOUT OIDS; CREATE TABLE response_numeric ( rnqid int2 NOT NULL, -- linking to questions.qid rnvalue numeric, CONSTRAINT response_numeric_pkey PRIMARY KEY (rnqid) ) WITHOUT OIDS; CREATE TABLE response_integer ( riqid int2 NOT NULL, rivalue int4, CONSTRAINT response_integer_pkey PRIMARY KEY (riqid) ) WITHOUT OIDS; CREATE TABLE response_text ( rtqid int2 NOT NULL, rtvalue text, CONSTRAINT response_text_pkey PRIMARY KEY (rtqid) ) WITHOUT OIDS; CREATE TABLE response_date ( rdqid int2 NOT NULL, rdvalue date, CONSTRAINT response_date_pkey PRIMARY KEY (rdqid) ) WITHOUT OIDS; Tim Allen wrote: > > 3. Different answer tables each with answer types - same problem as 2 > > but even harder. > > This is the other option I mentioned above. It's not hard at all.
Dang. I re-read your post and see that you are not checking for the correct answer, you simply want to store the responses like a questionare. Well I still think a different table for each type of response would be beneficial, obviously, adding a serial field like you have done and the foreign key linking to the qid. I had a completely different problem on my mind (testing user's knowledge). Sorry about that. macgillivary wrote: > I agree with Tim, that your option 3 is really not all that hard, and I > suggest would provide you with the best solution. I'm assuming, > however, that your application asks the question and checks the user > supplied answer with the answer in the db (and not Jeopardy style). I > might add a 'join' or 'answertype' field in the questions (qu) table to > assist when drawing the application and to immediately know what table > to verify the supplied answer against. Otherwise, I think you might > need a query to check the type on the answer the user supplies and make > a possible few passes (through the numeric and integer tables for > example when the user's response is 42) - or some combination of outers > to find the non null value field. > > For what it's worth, I'd probably also include a 'type' table to hold > my application specific mask for the user's reply, and perhaps a > attribute to hold the joining table name (I'm thinking about the future > requirement to add a new type without much coding changes on the > application side). > > So, now I'm interested in this, and I could see a use for something > similar in the near future, I'll put the following out there for > comment. Obviously I haven't put it into practice but perhaps it could > work. One of differences from your original post is the absense of a > serial field on the responses. I've only put a primary key which would > link back to the the questions.qid field (so maybe it should be a fk? > see my note about my pg newness in a moment). I'm assuming this is not > a multiple choice type of situation. Each question has one answer > (although not currently enforced in the db layer here since there is > nothing stopping you from placing an answer in more than one of the > response tables - I'm relatively new to pg and not sure of anyway to > deal with this). It's early, haven't had my first cup yet, but I would > start with something like the following: > > CREATE TABLE questions -- holds the questions or challenges > ( > qid serial NOT NULL, > qchallenge text, > qtype int2, -- linking to types.tid > CONSTRAINT questions_pkey PRIMARY KEY (qid) > ) > WITHOUT OIDS; > > CREATE TABLE types -- mostly to assist application development > ( > tid serial NOT NULL, > tdescription varchar(25), > tmask varchar(25), -- just a thought, could be useful when building > a web app > tjoin varchar, -- again, just thinking about ease of new additions > CONSTRAINT types_pkey PRIMARY KEY (tid) > ) > WITHOUT OIDS; > > CREATE TABLE response_numeric > ( > rnqid int2 NOT NULL, -- linking to questions.qid > rnvalue numeric, > CONSTRAINT response_numeric_pkey PRIMARY KEY (rnqid) > ) > WITHOUT OIDS; > > CREATE TABLE response_integer > ( > riqid int2 NOT NULL, > rivalue int4, > CONSTRAINT response_integer_pkey PRIMARY KEY (riqid) > ) > WITHOUT OIDS; > > CREATE TABLE response_text > ( > rtqid int2 NOT NULL, > rtvalue text, > CONSTRAINT response_text_pkey PRIMARY KEY (rtqid) > ) > WITHOUT OIDS; > > CREATE TABLE response_date > ( > rdqid int2 NOT NULL, > rdvalue date, > CONSTRAINT response_date_pkey PRIMARY KEY (rdqid) > ) > WITHOUT OIDS; > > > > Tim Allen wrote: > > > 3. Different answer tables each with answer types - same problem as 2 > > > but even harder. > > > > This is the other option I mentioned above. It's not hard at all.
I agree that option 3 is the way to go. There is a little reading at: http://www.varlena.com/varlena/GeneralBits/110.php which may be of interest. Tim Allen wrote: > lifeisgood wrote: > > 3. Different answer tables each with answer types - same problem as 2 > > but even harder. > > This is the other option I mentioned above. It's not hard at all.