Re: My honours project - databases using dynamically attached entity-properties - Mailing list pgsql-hackers
From | Eddie Stanley |
---|---|
Subject | Re: My honours project - databases using dynamically attached entity-properties |
Date | |
Msg-id | 45F84C0B.9030306@paradise.net.nz Whole thread Raw |
In response to | Re: My honours project - databases using dynamically attached entity-properties (David Fetter <david@fetter.org>) |
List | pgsql-hackers |
David Fetter wrote: > On Wed, Mar 14, 2007 at 12:07:45PM +1300, Edward Stanley wrote: > >> On Wed, 14 Mar 2007, David Fetter wrote: >> >>> On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote: >>> >>>> David Fetter wrote: >>>> >>>>> On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote: >>>>> >>>>>> David Fetter wrote: >>>>>> >>>>>>> On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote: >>>>>>> >>>>>>>> * Another good example is the "questionnaire". >>>>>>>> >>>>>>> With all due respect, this is a solved problem *without EAV or >>>>>>> run-time DDL*. The URL below has one excellent approach to this. >>>>>>> >>>>>>> <http://www.varlena.com/GeneralBits/110.php> >>>>>>> >>>>>> Which broadly speaking was the solution I used for my >>>>>> questionnaire, except I had a restricted set of types so basically >>>>>> just coerced them to text and side-stepped the inheritance issue. >>>>>> To the extent that it's dynamic, it's still just EAV though. >>>>>> >>>>> That's precisely the difference between the above solution and >>>>> yours, and it's the difference between a good design and one that >>>>> will come up and bit you on the as^Hnkle. >>>>> >>>> It's still basically EAV (either approach). The key fault with EAV >>>> is that the tables have no semantic meaning - answer_int contains >>>> number of oranges, days since birth and the price of a tube ticket >>>> in pennies. >>>> >>> Stuffing all of those into an answer_int is *precisely* what the end >>> user must not do. That's pilot error. >>> >>> >>>> Now, with a questionnaire that might not matter because everything >>>> is an "answer" and you're not necessarily going to do much more than >>>> count/aggregate it. >>>> >>> See above. >>> >>> >>>>>> It doesn't remove the need for run-time DDL if you allow users to >>>>>> add their own questions. >>>>>> >>>>> Sure it does. When a user, who should be talking with you, wants >>>>> to ask a new kind of question, that's the start of a discussion >>>>> about what new kind(s) of questions would be generally applicable >>>>> in the questionnaire schema. Then, when you come to an agreement, >>>>> you roll it into the new schema, and the whole system gets an >>>>> improvement. >>>>> >>>> Fine, but if you're not letting the user extend the system, then >>>> it's not really addressing Edward's original posting, is it? >>>> >>> It's my contention that Edward's original idea is ill-posed. SQL is >>> just fine for doing this kind of thing, and it's *not that hard*. >>> >>> >>>> If the user's talking to me, I might as well just write the DDL >>>> myself - it's the talk that'll take the time, not writing a dozen >>>> lines of SQL. >>>> >>> It's the talk that's the important part. Machines are really bad at >>> seeing the broader picture. In the attempt to "save" a few minutes' >>> discussion, he's trying to borrow that time from a system asked to do >>> things that computers are inherently bad at doing, and every end user >>> will pay that time back at a very high rate of interest. This is >>> precisely the kind of false economy that so plagues software >>> development and maintenance these days. >>> >>> >>>> The interesting part of the problem (from a Comp-Sci point of view) >>>> is precisely in automating part of that discussion. It's providing >>>> an abstraction so that you don't end up with a mass of attributes >>>> while still providing freedom to the user. >>>> >>> This freedom and efficiency you're talking about is better supplied, >>> IMHO, by putting a standard DDL for questionnaires up on a pgfoundry >>> or an SF.net. That way, improvements to the DDL get spread all over >>> the world, and a very large amount of wheel reinvention gets avoided. >>> Reusable components are a big chunk of both freedom and efficiency. :) >>> >>> Cheers, >>> D >>> >> Maybe I should rethink the problem a bit - from the very brief >> initial research I've done, it seems EAV schemas have two common >> uses: >> > > >> 1) When new attributes have to be created on-the-fly >> 2) When the number of possible properties for an entity greatly (orders of >> magnitude) exceeds the number of properties any one entity is likely to have. >> > > Um, no. The first use case is bad coding practice, and the second is > a classic case for a join table, which is the standard way to handle > M:N relationships. > > >> I'm not sure about solving the first problem - there seems to be a lot of >> debate around this. I can see reasons for and against allowing this. However >> I think the second is a very real problem. One such example is a patient >> record system. >> >> For each patient we have a table of common data (dob, sex, height, weight etc) >> but as well as this a patient can present with many symptoms. This might be a >> table of 40,000 possible symptoms. >> > > Here's how I'd do that: > > CREATE TABLE patient ( > patient_id SERIAL PRIMARY KEY, /* for simplicity. Some > combination of columns in the > table would also have a UNIQUE > NOT NULL constraint on it. > */ > ... > ); > > CREATE TABLE symptom ( > symptom_id SERIAL PRIMARY KEY, /* See above. */ > ... > ); > > CREATE TABLE patient_presents_with ( > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > UNIQUE(patient_id, symptom_id) > ); > > >> Lets say we want to run a query on these symptoms (using a boolean expression) >> > > I'd use something like the following: > > SELECT > p.patient_id, > p.f_name, > p.l_name, > s.symptom_name, > s.symptom_desc > FROM > patient p > JOIN > patient_presents_with ppw > USING (patient_id) > JOIN > symptom s > USING (symptom_id) > WHERE > s.symptom_name = ALL('foo','bar','baz') > AND > s.symptom_name = ANY('quux','fleeg'); > > Are the ALL and ANY functions new to Postgresql 8? I haven't met them before. Anyway this will work for some queries but not others. What about the following expression? ('foo' && 'bar') | ('baz' && ! 'quxx') | 'fleeg' Maybe I have misunderstood how these functions work, but I don't think they will handle anything but trivial examples of this problem. >> to return the patient records which match the query string on the symptoms. >> >> (This turns out to be a very similar problem to the 'tags' example I first >> presented) - assume a similar schema. With more than a couple of symptoms and >> a complex tree, the resulting SQL can span pages. >> > > Not really. See above :) > > >> When I first started thinking about this project I believed the two problems >> essentially to be the same class of problem, but this may not be the case. >> > > EAV will bite you. It's not *that* much work to keep its from biting > you. :) > > Cheers, > D >
pgsql-hackers by date: