Thread: My honours project - databases using dynamically attached entity-properties
My honours project - databases using dynamically attached entity-properties
From
Edward Stanley
Date:
Hi, Was wondering if people would mind having a read over what I plan to do for my undergraduate honours project - you can get the proposal here: http://www.mcs.vuw.ac.nz/~eddie/489_Proposal.pdf What I'd basically like to know is a) Is this problem worth solving? b) Is there already a good solution (particularly, within PostgreSQL)? c) Any other feedback or comments you may have. Regards Eddie Stanley
Re: My honours project - databases using dynamically attached entity-properties
From
Heikki Linnakangas
Date:
Edward Stanley wrote: > Was wondering if people would mind having a read over what I plan to do for my > undergraduate honours project - you can get the proposal here: > > http://www.mcs.vuw.ac.nz/~eddie/489_Proposal.pdf ISTM that what you're calling "dynamically attached properties" are generally called Entity-Attribute-Value schemas. In my experience, EAV schemas are usually result of improper database design by someone not understanding the relational theory and the principles of normalization. EAV offers a false sense of flexibility: sure, you can store whatever you want in an EAV database, but because the data isn't well structured, building queries becomes extremely complicated. No amount of syntactic sugar is going to fix that, you just need to normalize your data. > What I'd basically like to know is > a) Is this problem worth solving? > b) Is there already a good solution (particularly, within PostgreSQL)? You could create a view on top of the normalized schema. There's also a contrib module called "hstore" which might be relevant. > c) Any other feedback or comments you may have. I don't know how wedded you are to your subject or your ambitions, but let me suggest a few other ideas anyway: - Temporal extensions. IIRC, two people/groups have recently come up on hackers, planning to do their thesis on that subject. It's a very real problem, so more people working on it wouldn't hurt. See http://www.amazon.co.uk/Temporal-Data-Relational-Model-Investigation/dp/1558608559 for the right way to solve it - Recursive queries. - Materialized views. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: My honours project - databases using dynamically attached entity-properties
From
Josh Berkus
Date:
Edward, Heikki, > In my experience, EAV > schemas are usually result of improper database design by someone not > understanding the relational theory and the principles of normalization. Edward, you should be aware that EAV schema are a source of disagreement among database designers. Some developers like Heikki and David Fetter hate them, and other like me use them frequently when the application business logic warrants it -- primarily cases where each item has a variable number of attributes and a large number of the attributes are null for most items, or when the application is expected to define new attributes which can't be anticipated by the database designer. No matter how much Heikki hates them, I think he'd agree that EAV tables are better than having the application execute DDL at runtime. ;-) > > What I'd basically like to know is > > a) Is this problem worth solving? > > b) Is there already a good solution (particularly, within PostgreSQL)? It's pretty easy to do EAV ad-hoc using existing tools. I'm not clear on what you're planning to implement that's not already available. Could you be more specific? -- Josh Berkus PostgreSQL @ Sun San Francisco
Re: My honours project - databases using dynamically attached entity-properties
From
"Andrew Hammond"
Date:
On Mar 11, 12:47 pm, j...@agliodbs.com (Josh Berkus) wrote: > No matter how much Heikki hates them, I think he'd agree that EAV tables are > better than having the application execute DDL at runtime. EAV moves the structure that is typically in the design of the tables into the contents of the tables. With an EAV database you have effectively destroyed the semantic difference between DML and DDL. I'm willing to concede that there may be situations where EAV is actually the right answer to a design problem. I have yet to encounter one, but Josh has more experience, and more varied experience than I do. To me, EAV is a perfect example of ignoring the YAGNI principal. http://c2.com/xp/YouArentGonnaNeedIt.html > > > What I'd basically like to know is > > > a) Is this problem worth solving? I think you're solving a symptom, not the root cause of the problem. Clarify the root cause, and then you have an interesting project. Andrew
Re: My honours project - databases using dynamically attached entity-properties
From
Josh Berkus
Date:
Amdrew, > I have yet to encounter one, but > Josh has more experience, and more varied experience than I do. To me, > EAV is a perfect example of ignoring the YAGNI principal. I've done plenty of applications where part of the specification for the application was "User Defined Fields" allowing the users to customize the data structure at runtime. This is a very, very common thing; of the 15 or so commercial applications I implemented from scratch as a consultant probably 10 of them had it. I really don't see any way you could implement UDFs other than EAV that wouldn't be immensely awkward, or result in executing DDL at runtime. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Re: My honours project - databases using dynamically attached entity-properties
From
"Joshua D. Drake"
Date:
Josh Berkus wrote: > Amdrew, > >> I have yet to encounter one, but >> Josh has more experience, and more varied experience than I do. To me, >> EAV is a perfect example of ignoring the YAGNI principal. > > I've done plenty of applications where part of the specification for the > application was "User Defined Fields" allowing the users to customize the > data structure at runtime. This is a very, very common thing; of the 15 > or so commercial applications I implemented from scratch as a consultant > probably 10 of them had it. Just to add a note from YAJ, custom fields are unfortunately a requirement in most apps I have seen, from a CMS to Quickbooks. The reality is, you don't know everything the person wants to know about a particular set of data. Joshua D. Drake > > I really don't see any way you could implement UDFs other than EAV that > wouldn't be immensely awkward, or result in executing DDL at runtime. > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Re: My honours project - databases using dynamically attached entity-properties
From
Richard Huxton
Date:
Josh Berkus wrote: > I really don't see any way you could implement UDFs other than EAV that > wouldn't be immensely awkward, or result in executing DDL at runtime. What's so horrible about DDL at runtime? Obviously, you're only going to allow specific additions to specific schemas/tables, but why not? -- Richard Huxton Archonet Ltd
Re: My honours project - databases using dynamically attached entity-properties
From
"Andrew Hammond"
Date:
On 3/12/07, Richard Huxton <dev@archonet.com> wrote: > Josh Berkus wrote: > > I really don't see any way you could implement UDFs other than EAV that > > wouldn't be immensely awkward, or result in executing DDL at runtime. > > What's so horrible about DDL at runtime? Obviously, you're only going to > allow specific additions to specific schemas/tables, but why not? More to the point, since EAV is effectively smearing the semantics of DDL with DML, what, if any of the arguments against doing DDL at runtime don't apply equally to EAV? Well, aside from being able to say "hey, I'm not executing DDL at runtime". :) I see the issue as one of cost: it's substantially harder to implement DDL at runtime than to work around the problem using EAV. If that analysis is reasonable, then it would be a very interesting research project to see how to cut down that cost of implementation. Andrew
Re: My honours project - databases using dynamically attached entity-properties
From
Richard Huxton
Date:
Andrew Hammond wrote: > On 3/12/07, Richard Huxton <dev@archonet.com> wrote: >> Josh Berkus wrote: >> > I really don't see any way you could implement UDFs other than EAV that >> > wouldn't be immensely awkward, or result in executing DDL at runtime. >> >> What's so horrible about DDL at runtime? Obviously, you're only going to >> allow specific additions to specific schemas/tables, but why not? > > More to the point, since EAV is effectively smearing the semantics of > DDL with DML, what, if any of the arguments against doing DDL at > runtime don't apply equally to EAV? Well, aside from being able to say > "hey, I'm not executing DDL at runtime". :) > > I see the issue as one of cost: it's substantially harder to implement > DDL at runtime than to work around the problem using EAV. If that > analysis is reasonable, then it would be a very interesting research > project to see how to cut down that cost of implementation. Well the cost depends on where/how complex the extra fields are. If you're just talking about adding columns usercol01..NN with different types and possibly a lookup to a single client_attributes table, it's not difficult. Of course, if inheritence worked fully, you could just have core and user versions of relevant tables. -- Richard Huxton Archonet Ltd
Re: My honours project - databases using dynamically attached entity-properties
From
Gregory Stark
Date:
"Richard Huxton" <dev@archonet.com> writes: > Well the cost depends on where/how complex the extra fields are. If you're just > talking about adding columns usercol01..NN with different types and possibly a > lookup to a single client_attributes table, it's not difficult. And then what? dynamically construct all your SQL queries? Sure, sounds like a simple solution to me... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: My honours project - databases using dynamically attached entity-properties
From
Josh Berkus
Date:
> And then what? dynamically construct all your SQL queries? > Sure, sounds like a simple solution to me... Not to mention DB security issues. How do you secure your database when your web client has DDL access? So, Edward, the really *interesting* idea would be to come up with a secure, normalized way to do UDFs *without* EAV tables. People would be very impressed. BTW, Google Summer of Code opens Wednesday: http://www.postgresql.org/developer/summerofcode.html -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Re: My honours project - databases using dynamically attached entity-properties
From
Richard Huxton
Date:
Gregory Stark wrote: > "Richard Huxton" <dev@archonet.com> writes: > >> Well the cost depends on where/how complex the extra fields are. If you're just >> talking about adding columns usercol01..NN with different types and possibly a >> lookup to a single client_attributes table, it's not difficult. > > And then what? dynamically construct all your SQL queries? > Sure, sounds like a simple solution to me... No different to dynamically constructing a query for a report. Simpler, since in my experience most of these user-defined fields are just slots for extra codes/tags/notes rather than something you'd join on. -- Richard Huxton Archonet Ltd
Re: My honours project - databases using dynamically attached entity-properties
From
"Sean Utt"
Date:
And then what? Make the search box on www.postgresql.org able to handle an email address as search text without throwing a shoe? Search for linus@torvalds.com or any other 'email' address from the postgres home page. Barfage every time. Easy for some isn't easy for all, apparently. Left that out as a test case did we? Someone searching a mailing list for an email address? Who wudda thunk it? It works without the . -- don't know why, but then I also don't know why someone hasn't tried that before me. "Sure, sounds like a simple solution to me..." Richard said sarcastically. Would be funnier if the search on the website wasn't broken in a completely stupid, almost ironic way. Ah, irony and sarcasm -- the ugly twins. Yeah, we have to dynamically generate queries day in and day out. But then some of us actually work for a living. Since we already have to do that, maybe someone could make that easier? Isn't that really the point here? Someone asked if something would be useful, and the people who use the database to do real work said YES, and here's how I might use it. Like full text seach and recursive queries, user defined (fields|attributes|properties) and the ability to manage them would be BUTTER! Is it a difficult problem? YES, but if it wasn't, why should it be worth an advanced degree? Or maybe 'we' only solve the trivial and obvious problems, like searching a mailing list for an email address? Sarcastically yours, Sean ----- Original Message ----- From: "Richard Huxton" <dev@archonet.com> To: "Gregory Stark" <stark@enterprisedb.com> Cc: "Andrew Hammond" <andrew.george.hammond@gmail.com>; <josh@agliodbs.com>; <pgsql-hackers@postgresql.org> Sent: Monday, March 12, 2007 7:30 PM Subject: Re: [HACKERS] My honours project - databases using dynamically attached entity-properties > Gregory Stark wrote: >> "Richard Huxton" <dev@archonet.com> writes: >> >>> Well the cost depends on where/how complex the extra fields are. If >>> you're just >>> talking about adding columns usercol01..NN with different types and >>> possibly a >>> lookup to a single client_attributes table, it's not difficult. >> >> And then what? dynamically construct all your SQL queries? Sure, sounds >> like a simple solution to me... > > No different to dynamically constructing a query for a report. Simpler, > since in my experience most of these user-defined fields are just slots > for extra codes/tags/notes rather than something you'd join on. > > -- > Richard Huxton > Archonet Ltd > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >
Re: My honours project - databases using dynamically attached entity-properties
From
Magnus Hagander
Date:
On Mon, Mar 12, 2007 at 10:05:58PM -0700, Sean Utt wrote: > And then what? Make the search box on www.postgresql.org able to handle an > email address as search text without throwing a shoe? > > Search for linus@torvalds.com or any other 'email' address from the > postgres home page. Barfage every time. > Easy for some isn't easy for all, apparently. Left that out as a test case > did we? Someone searching a mailing list for an email address? Who wudda > thunk it? It works without the . -- don't know why, but then I also don't > know why someone hasn't tried that before me. Obviously not, since nobody has reported it before. Fixed, insofar that we don't barf on it, but show you the no hits message. Because all email addresses are obfuscated in the archives, they're not actually searchable. Thanks for reporting the issue. You're advised to report issues with the website to pgsql-www instead for a faster response, since web people don't always monitor all the other lists. //Magnus
Re: My honours project - databases using dynamically attached entity-properties
From
Csaba Nagy
Date:
On Tue, 2007-03-13 at 00:43, Richard Huxton wrote: > Josh Berkus wrote: > > I really don't see any way you could implement UDFs other than EAV that > > wouldn't be immensely awkward, or result in executing DDL at runtime. > > What's so horrible about DDL at runtime? Obviously, you're only going to > allow specific additions to specific schemas/tables, but why not? Well, exclusively locking the table for DDL is not always possible in production systems. We also shortly had a solution where we added new columns on the fly, and we had to ditch it... it was simply not working. The users were usually adding columns in the peek business hours, and in the same hours it was simply impossible to take an exclusive lock on that table. I think DDL will actually also exclusively lock parent tables of FK relationships (I might be mistaken here, but I think I have seen such lock), which is even worse. After it caused extensive perceived downtime for hours, we simply redesigned the feature so that the users need now to announce beforehand how many different types of new columns they will need and we just create a few extra of them, and assign them to the users needs as they need it... the unused columns stay null and hopefully don't have too big overhead, but it's the simplest solution we found which actually works. When the user runs out of spare columns, the admin will create some new spares in quiet hours. Cheers, Csaba.
Re: My honours project - databases using dynamically attached entity-properties
From
Richard Huxton
Date:
Sean Utt wrote: > And then what? Make the search box on www.postgresql.org able to handle > an email address as search text without throwing a shoe? > > Search for linus@torvalds.com or any other 'email' address from the > postgres home page. Barfage every time. > Easy for some isn't easy for all, apparently. Left that out as a test > case did we? Someone searching a mailing list for an email address? Who > wudda thunk it? It works without the . -- don't know why, but then I > also don't know why someone hasn't tried that before me. Had a bad day? And I'll be damned if you're not right - it doesn't return results for dev@archonet.com but does for dev@archonet. Presumably something to do with (not ?) splitting the email address on ".". Can't believe no-one has noticed this before (me, for example). I guess that even though I search a lot, it's not on email addrs. Have you reported it to the www team? > "Sure, sounds like a simple solution to me..." Richard said > sarcastically. Would be funnier if the search on the website wasn't > broken in a completely stupid, almost ironic way. Ah, irony and sarcasm > -- the ugly twins. Actually, it was Greg who said that. And it was *me* the (really very gentle) sarcasm was directed at. > Yeah, we have to dynamically generate queries day in and day out. But > then some of us actually work for a living. Umm - like all of us? > Since we already have to do that, maybe someone could make that easier? Good idea! > Isn't that really the point here? Not as I was reading the discussion. > Someone asked if something would be > useful, and the people who use the database to do real work said YES, > and here's how I might use it. Like full text seach and recursive > queries, user defined (fields|attributes|properties) and the ability to > manage them would be BUTTER! Is it a difficult problem? YES, but if it > wasn't, why should it be worth an advanced degree? I think the main discussion has been around: 1. Whether Edward's final-year project is basically EAV (in which case he'll probably need to work hard to get good marks). 2. Whether dynamically generating DDL is safe/practical in a business setting. This seems to split some *very* experienced people about 50:50. Half of them agree with me and the other half are inexplicably wrong ;-) If you read section 0.3 of Edward's project proposal he argues that dynamic DDL causes problems for the application because: "However, SQL does not provide an easy syntax for querying these properties." (meaning the changed structure of the database). I'm not saying this part is easy, but I'm not convinced it's harder than doing it the "wrong" way. At least not if you do as Edward does and enforce types. Now, in 0.3.1 he describes a normalised webpage=>tags table pair and shows some cumbersome-looking SQL. However, I'd argue this is due to the verbose nature of the SQL rather than the underlying expressions. He then looks at what I think of as the "system settings table"* problem, where you have a bunch of configuration-settings you'd tend to store in a single table (setting_name, value), except you'd like to have different types for each setting (a further wrinkle is that you might like lists of settings - do you use arrays or a sub-table?). This is your classic EAV scenario. Now, he proposes a set of tables - one per property, so you can enforce type constraints, but he will need to create syntax to make this easier to query. Presumably it'll need an automatically-generated view over the top. (Edward - tip: read up on Date's thoughts on automatically determining what table you can insert a row in based on its type). This certainly looks like a variant on EAV to me, and I'm not convinced that it's gaining much since you'll have possibly huge numbers of joins going on in the background while not having any semantics to your table definitions. The other variant (which I've used myself) is to have a "type" column and a trigger to enforce the text-value is valid for "type". Now, if you do it "properly", that is define tables as you would as a human designer, then you do have the problem of informing your application on the relevant structure. Now, the syntax issues of this are largely solved - plenty of ActiveRecord-style classes out there for Ruby,Python,Perl,PHP,... The interesting question then becomes where the practical limits in such flexibility are. Simple attribute/value pairs are easy enough. How do you feel about changes to existing tables though? How about new tables that add relationships between existing tables? At what point does the application just become pgAdmin? * Another good example is the "questionnaire". Users need to be able to define their own lists of questions and if "driving_licence=no" then don't bother to ask "type of car=?". Edward - if you want to see a schema that implements this sort of thing, contact me off list and I'll see what I can do. The client for that project will probably be fine with sharing it with one student. -- Richard Huxton Archonet Ltd
Re: My honours project - databases using dynamically attached entity-properties
From
David Fetter
Date:
On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote: > Sean Utt wrote: > >And then what? Make the search box on www.postgresql.org able to > >handle an email address as search text without throwing a shoe? > > > >Search for linus@torvalds.com or any other 'email' address from the > >postgres home page. Barfage every time. Easy for some isn't easy > >for all, apparently. Left that out as a test case did we? Someone > >searching a mailing list for an email address? Who wudda thunk it? > >It works without the . -- don't know why, but then I also don't > >know why someone hasn't tried that before me. > > Had a bad day? Today was a good day ;) > >Since we already have to do that, maybe someone could make that easier? > > Good idea! Hear, hear! > > Someone asked if something would be > >useful, and the people who use the database to do real work said YES, > >and here's how I might use it. Like full text seach and recursive > >queries, user defined (fields|attributes|properties) and the ability to > >manage them would be BUTTER! Is it a difficult problem? YES, but if it > >wasn't, why should it be worth an advanced degree? > > I think the main discussion has been around: > 1. Whether Edward's final-year project is basically EAV (in which case > he'll probably need to work hard to get good marks). As Josh mentioned, I'm on of the people who says EAV is never justified. We do have a way of expressing wide ranges of constraints not known in advance: it's called SQL, and people need to do some design using it, however much they may fear that they've left something out somehow. :) > 2. Whether dynamically generating DDL is safe/practical in a > business setting. This seems to split some *very* experienced people > about 50:50. Half of them agree with me and the other half are > inexplicably wrong ;-) For the record, I'm in the "ag'in" column. > * Another good example is the "questionnaire". With all due respect, this is a solved problem *without EAV or run-time DDL*. The URL below has one excellent approach to this. <http://www.varlena.com/GeneralBits/110.php> Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate
Re: My honours project - databases using dynamically attached entity-properties
From
Richard Huxton
Date:
David Fetter wrote: > On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote: >> * Another good example is the "questionnaire". > > With all due respect, this is a solved problem *without EAV or > run-time DDL*. The URL below has one excellent approach to this. > > <http://www.varlena.com/GeneralBits/110.php> Which broadly speaking was the solution I used for my questionnaire, except I had a restricted set of types so basically just coerced them to text and side-stepped the inheritance issue. To the extent that it's dynamic, it's still just EAV though. It doesn't remove the need for run-time DDL if you allow users to add their own questions. If I add a "National Insurance No." (Social security No) then either it's: 1. Plain text, and can take clearly invalid codes 2. A user-space construct with regexp matches etc (basically recreating DDL) 3. DDL. And as the example says, you need to create the table types in advance. If you want to add e.g. "Work History" (employer, from_date, to_date) to a questionnaire then you'll need dynamic DDL (of form #2 or #3 above). -- Richard Huxton Archonet Ltd
Re: My honours project - databases using dynamically attached entity-properties
From
Andrew Dunstan
Date:
David Fetter wrote: >> I think the main discussion has been around: >> 1. Whether Edward's final-year project is basically EAV (in which case >> he'll probably need to work hard to get good marks). >> > > As Josh mentioned, I'm on of the people who says EAV is never > justified. We do have a way of expressing wide ranges of constraints > not known in advance: it's called SQL, and people need to do some > design using it, however much they may fear that they've left > something out somehow. :) > > ISTM that the biggest problem with EAV is that is is far too seductive, and allows for lazy design (or lack of design). There might be odd cases (pace David) that require it, but I strongly suspect in most cases it is not necessary. Perhaps we need some sort of discriminant record type ;-) cheers andrew
Re: My honours project - databases using dynamically attached entity-properties
From
Teodor Sigaev
Date:
Hmm, hstore + (optionally) functional indexes. Is it answer? Edward Stanley wrote: > Hi, > > > Was wondering if people would mind having a read over what I plan to do for my > undergraduate honours project - you can get the proposal here: > > http://www.mcs.vuw.ac.nz/~eddie/489_Proposal.pdf > > What I'd basically like to know is > a) Is this problem worth solving? > b) Is there already a good solution (particularly, within PostgreSQL)? > c) Any other feedback or comments you may have. > > > Regards > Eddie Stanley > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Re: My honours project - databases using dynamically attached entity-properties
From
David Fetter
Date:
On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote: > David Fetter wrote: > >On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote: > >>* Another good example is the "questionnaire". > > > >With all due respect, this is a solved problem *without EAV or > >run-time DDL*. The URL below has one excellent approach to this. > > > ><http://www.varlena.com/GeneralBits/110.php> > > Which broadly speaking was the solution I used for my questionnaire, > except I had a restricted set of types so basically just coerced > them to text and side-stepped the inheritance issue. To the extent > that it's dynamic, it's still just EAV though. That's precisely the difference between the above solution and yours, and it's the difference between a good design and one that will come up and bit you on the as^Hnkle. > It doesn't remove the need for run-time DDL if you allow users to add > their own questions. Sure it does. When a user, who should be talking with you, wants to ask a new kind of question, that's the start of a discussion about what new kind(s) of questions would be generally applicable in the questionnaire schema. Then, when you come to an agreement, you roll it into the new schema, and the whole system gets an improvement. > If I add a "National Insurance No." (Social > security No) then either it's: > 1. Plain text, and can take clearly invalid codes > 2. A user-space construct with regexp matches etc (basically recreating DDL) > 3. DDL. DDL, yes. Run-time, no. > And as the example says, you need to create the table types in advance. > If you want to add e.g. "Work History" (employer, from_date, to_date) to > a questionnaire then you'll need dynamic DDL (of form #2 or #3 above). Again see above for new types of questions and answers. You can pay for it once up front in a re-jigger of the schema, or you will pay a much larger price when you discover you've got EAV goo all over the place. There is a third option, of course, which is, "I'm sorry, Dave. I can't do that." ;) Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate
Re: My honours project - databases using dynamically attached entity-properties
From
Richard Huxton
Date:
David Fetter wrote: > On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote: >> David Fetter wrote: >>> On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote: >>>> * Another good example is the "questionnaire". >>> With all due respect, this is a solved problem *without EAV or >>> run-time DDL*. The URL below has one excellent approach to this. >>> >>> <http://www.varlena.com/GeneralBits/110.php> >> Which broadly speaking was the solution I used for my questionnaire, >> except I had a restricted set of types so basically just coerced >> them to text and side-stepped the inheritance issue. To the extent >> that it's dynamic, it's still just EAV though. > > That's precisely the difference between the above solution and yours, > and it's the difference between a good design and one that will come > up and bit you on the as^Hnkle. It's still basically EAV (either approach). The key fault with EAV is that the tables have no semantic meaning - answer_int contains number of oranges, days since birth and the price of a tube ticket in pennies. Now, with a questionnaire that might not matter because everything is an "answer" and you're not necessarily going to do much more than count/aggregate it. >> It doesn't remove the need for run-time DDL if you allow users to add >> their own questions. > > Sure it does. When a user, who should be talking with you, wants to > ask a new kind of question, that's the start of a discussion about > what new kind(s) of questions would be generally applicable in the > questionnaire schema. Then, when you come to an agreement, you roll > it into the new schema, and the whole system gets an improvement. Fine, but if you're not letting the user extend the system, then it's not really addressing Edward's original posting, is it? If the user's talking to me, I might as well just write the DDL myself - it's the talk that'll take the time, not writing a dozen lines of SQL. The interesting part of the problem (from a Comp-Sci point of view) is precisely in automating part of that discussion. It's providing an abstraction so that you don't end up with a mass of attributes while still providing freedom to the user. -- Richard Huxton Archonet Ltd
Re: My honours project - databases using dynamically attached entity-properties
From
David Fetter
Date:
On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote: > David Fetter wrote: > >On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote: > >>David Fetter wrote: > >>>On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote: > >>>>* Another good example is the "questionnaire". > >>>With all due respect, this is a solved problem *without EAV or > >>>run-time DDL*. The URL below has one excellent approach to this. > >>> > >>><http://www.varlena.com/GeneralBits/110.php> > >>Which broadly speaking was the solution I used for my > >>questionnaire, except I had a restricted set of types so basically > >>just coerced them to text and side-stepped the inheritance issue. > >>To the extent that it's dynamic, it's still just EAV though. > > > >That's precisely the difference between the above solution and > >yours, and it's the difference between a good design and one that > >will come up and bit you on the as^Hnkle. > > It's still basically EAV (either approach). The key fault with EAV > is that the tables have no semantic meaning - answer_int contains > number of oranges, days since birth and the price of a tube ticket > in pennies. Stuffing all of those into an answer_int is *precisely* what the end user must not do. That's pilot error. > Now, with a questionnaire that might not matter because everything > is an "answer" and you're not necessarily going to do much more than > count/aggregate it. See above. > >>It doesn't remove the need for run-time DDL if you allow users to > >>add their own questions. > > > >Sure it does. When a user, who should be talking with you, wants > >to ask a new kind of question, that's the start of a discussion > >about what new kind(s) of questions would be generally applicable > >in the questionnaire schema. Then, when you come to an agreement, > >you roll it into the new schema, and the whole system gets an > >improvement. > > Fine, but if you're not letting the user extend the system, then > it's not really addressing Edward's original posting, is it? It's my contention that Edward's original idea is ill-posed. SQL is just fine for doing this kind of thing, and it's *not that hard*. > If the user's talking to me, I might as well just write the DDL > myself - it's the talk that'll take the time, not writing a dozen > lines of SQL. It's the talk that's the important part. Machines are really bad at seeing the broader picture. In the attempt to "save" a few minutes' discussion, he's trying to borrow that time from a system asked to do things that computers are inherently bad at doing, and every end user will pay that time back at a very high rate of interest. This is precisely the kind of false economy that so plagues software development and maintenance these days. > The interesting part of the problem (from a Comp-Sci point of view) > is precisely in automating part of that discussion. It's providing > an abstraction so that you don't end up with a mass of attributes > while still providing freedom to the user. This freedom and efficiency you're talking about is better supplied, IMHO, by putting a standard DDL for questionnaires up on a pgfoundry or an SF.net. That way, improvements to the DDL get spread all over the world, and a very large amount of wheel reinvention gets avoided. Reusable components are a big chunk of both freedom and efficiency. :) Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate
Re: My honours project - databases using dynamically attached entity-properties
From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, Mar 13, 2007 at 05:39:05PM +0300, Teodor Sigaev wrote: > Hmm, hstore + (optionally) functional indexes. Is it answer? I have used it in a (yet) test system. It works surprisingly well. Thanks - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFF96sWBcgs9XrR2kYRApgIAJwJ/dbyHc5gnsgm8ljMvF0RL5qSiwCfXzSz PVrJkIbi497Ezns2+vXOLM8= =xGj+ -----END PGP SIGNATURE-----
Re: My honours project - databases using dynamically attached entity-properties
From
David Fetter
Date:
On Wed, Mar 14, 2007 at 12:07:45PM +1300, Edward Stanley wrote: > On Wed, 14 Mar 2007, David Fetter wrote: > > On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote: > > > David Fetter wrote: > > > >On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote: > > > >>David Fetter wrote: > > > >>>On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote: > > > >>>>* Another good example is the "questionnaire". > > > >>> > > > >>>With all due respect, this is a solved problem *without EAV or > > > >>>run-time DDL*. The URL below has one excellent approach to this. > > > >>> > > > >>><http://www.varlena.com/GeneralBits/110.php> > > > >> > > > >>Which broadly speaking was the solution I used for my > > > >>questionnaire, except I had a restricted set of types so basically > > > >>just coerced them to text and side-stepped the inheritance issue. > > > >>To the extent that it's dynamic, it's still just EAV though. > > > > > > > >That's precisely the difference between the above solution and > > > >yours, and it's the difference between a good design and one that > > > >will come up and bit you on the as^Hnkle. > > > > > > It's still basically EAV (either approach). The key fault with EAV > > > is that the tables have no semantic meaning - answer_int contains > > > number of oranges, days since birth and the price of a tube ticket > > > in pennies. > > > > Stuffing all of those into an answer_int is *precisely* what the end > > user must not do. That's pilot error. > > > > > Now, with a questionnaire that might not matter because everything > > > is an "answer" and you're not necessarily going to do much more than > > > count/aggregate it. > > > > See above. > > > > > >>It doesn't remove the need for run-time DDL if you allow users to > > > >>add their own questions. > > > > > > > >Sure it does. When a user, who should be talking with you, wants > > > >to ask a new kind of question, that's the start of a discussion > > > >about what new kind(s) of questions would be generally applicable > > > >in the questionnaire schema. Then, when you come to an agreement, > > > >you roll it into the new schema, and the whole system gets an > > > >improvement. > > > > > > Fine, but if you're not letting the user extend the system, then > > > it's not really addressing Edward's original posting, is it? > > > > It's my contention that Edward's original idea is ill-posed. SQL is > > just fine for doing this kind of thing, and it's *not that hard*. > > > > > If the user's talking to me, I might as well just write the DDL > > > myself - it's the talk that'll take the time, not writing a dozen > > > lines of SQL. > > > > It's the talk that's the important part. Machines are really bad at > > seeing the broader picture. In the attempt to "save" a few minutes' > > discussion, he's trying to borrow that time from a system asked to do > > things that computers are inherently bad at doing, and every end user > > will pay that time back at a very high rate of interest. This is > > precisely the kind of false economy that so plagues software > > development and maintenance these days. > > > > > The interesting part of the problem (from a Comp-Sci point of view) > > > is precisely in automating part of that discussion. It's providing > > > an abstraction so that you don't end up with a mass of attributes > > > while still providing freedom to the user. > > > > This freedom and efficiency you're talking about is better supplied, > > IMHO, by putting a standard DDL for questionnaires up on a pgfoundry > > or an SF.net. That way, improvements to the DDL get spread all over > > the world, and a very large amount of wheel reinvention gets avoided. > > Reusable components are a big chunk of both freedom and efficiency. :) > > > > Cheers, > > D > > Maybe I should rethink the problem a bit - from the very brief > initial research I've done, it seems EAV schemas have two common > uses: > 1) When new attributes have to be created on-the-fly > 2) When the number of possible properties for an entity greatly (orders of > magnitude) exceeds the number of properties any one entity is likely to have. Um, no. The first use case is bad coding practice, and the second is a classic case for a join table, which is the standard way to handle M:N relationships. > I'm not sure about solving the first problem - there seems to be a lot of > debate around this. I can see reasons for and against allowing this. However > I think the second is a very real problem. One such example is a patient > record system. > > For each patient we have a table of common data (dob, sex, height, weight etc) > but as well as this a patient can present with many symptoms. This might be a > table of 40,000 possible symptoms. Here's how I'd do that: CREATE TABLE patient ( patient_id SERIAL PRIMARY KEY, /* for simplicity. Some combinationof columns in the table would also have a UNIQUE NOT NULL constraint on it. */ ... ); CREATE TABLE symptom ( symptom_id SERIAL PRIMARY KEY, /* See above. */ ... ); CREATE TABLE patient_presents_with ( patient_id INTEGER NOT NULL REFERENCES patient(patient_id), symptom_id INTEGER NOTNULL REFERENCES symptom(symptom_id), UNIQUE(patient_id, symptom_id) ); > Lets say we want to run a query on these symptoms (using a boolean expression) I'd use something like the following: SELECT p.patient_id, p.f_name, p.l_name, s.symptom_name, s.symptom_desc FROM patient p JOIN patient_presents_with ppw USING (patient_id) JOIN symptom s USING (symptom_id) WHERE s.symptom_name = ALL('foo','bar','baz') AND s.symptom_name = ANY('quux','fleeg'); > to return the patient records which match the query string on the symptoms. > > (This turns out to be a very similar problem to the 'tags' example I first > presented) - assume a similar schema. With more than a couple of symptoms and > a complex tree, the resulting SQL can span pages. Not really. See above :) > When I first started thinking about this project I believed the two problems > essentially to be the same class of problem, but this may not be the case. EAV will bite you. It's not *that* much work to keep its from biting you. :) Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate
Re: My honours project - databases using dynamically attached entity-properties
From
Gregory Stark
Date:
"David Fetter" <david@fetter.org> writes: > CREATE TABLE symptom ( > symptom_id SERIAL PRIMARY KEY, /* See above. */ > ... > ); > > CREATE TABLE patient_presents_with ( > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > UNIQUE(patient_id, symptom_id) > ); I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are all boolean values. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: My honours project - databases using dynamically attached entity-properties
From
mark@mark.mielke.cc
Date:
On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote: > "David Fetter" <david@fetter.org> writes: > > CREATE TABLE symptom ( > > symptom_id SERIAL PRIMARY KEY, /* See above. */ > > ... > > ); > > > > CREATE TABLE patient_presents_with ( > > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > > UNIQUE(patient_id, symptom_id) > > ); > > I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are > all boolean values. Where is the boolean above? It is M:N, with each having whatever data is required. The issue I have with the above is that it seems unnecessarily inefficient. Whenever mapping from a patient to a symptom, or a symptom to a patient, it requires searching indexes for three tables. Perhaps this would work well if there was heavy overlap of symptoms for different patients. For the cases I have hit this problem, however, there may be overlap, but it is not easy to detect, and even if it was detected, we would end with some sort of garbage collection requirements where symptoms are removed once all references to the symptoms are removed. The case most familiar to me, is a set of viewing preferences for web pages. Some users specify no preferences, while others have dozens of preferences. As I have no requirements to search for users with a particular preference, I chose to solve this by packing many of the preferences together into a TEXT field, and having the application pack/unpack the data. I still have tables that map object id to attribute/value, but they are used for the data that can require longer queries. Without clustering the data, searching for a dozen of these attributes requires either querying all attributes, where the attributes could be scattered throughout the table, or querying them one by one, which is worse. If there was an efficient way to do this for both of my use cases, I would be strongly tempted to use it. :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Re: My honours project - databases using dynamically attached entity-properties
From
"Joshua D. Drake"
Date:
Gregory Stark wrote: > "David Fetter" <david@fetter.org> writes: > >> CREATE TABLE symptom ( >> symptom_id SERIAL PRIMARY KEY, /* See above. */ >> ... >> ); >> >> CREATE TABLE patient_presents_with ( >> patient_id INTEGER NOT NULL REFERENCES patient(patient_id), >> symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), >> UNIQUE(patient_id, symptom_id) >> ); > > I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are > all boolean values. With a unique across (patient_id,symptom_id) you could have a single patient with as many unique symptoms as could be listed. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Re: My honours project - databases using dynamically attached entity-properties
From
Gregory Stark
Date:
<mark@mark.mielke.cc> writes: > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote: >> "David Fetter" <david@fetter.org> writes: >> > CREATE TABLE symptom ( >> > symptom_id SERIAL PRIMARY KEY, /* See above. */ >> > ... >> > ); >> > >> > CREATE TABLE patient_presents_with ( >> > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), >> > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), >> > UNIQUE(patient_id, symptom_id) >> > ); >> >> I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are >> all boolean values. > > Where is the boolean above? It is M:N, with each having whatever data > is required. No, the above schema can only show whether a patient has or doesn't have a symptom. There is nowhere to store *where* the pain, inflammation, swelling, aneurism, etc is, or how severe it is, or when it occurred, etc. In any case the above arguably *is* an EA schema anyways. Your "symptom" is just as much an abstract meaningless concept from a database point of view as the questionnaire's "answer" or the bug tracker's "tag". Especially once you start actually having to record information *about* the symptom. This is a silly argument. The only reasonable conclusion is that any dogmatic principle that doesn't take into account the application requirements is wrong. In some cases you want a flexible abstract schema because the application is flexible and abstract, in others you need the database schema to understand your specific data structures so it can help you manipulate it. You have to pick which is more useful for your application, you can't have your cake and eat it too. And all of the sudden I have a craving for cake... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: My honours project - databases using dynamically attached entity-properties
From
Csaba Nagy
Date:
On Wed, 2007-03-14 at 16:08, mark@mark.mielke.cc wrote: > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote: > > "David Fetter" <david@fetter.org> writes: > > > CREATE TABLE symptom ( > > > symptom_id SERIAL PRIMARY KEY, /* See above. */ > > > ... > > > ); > > > > > > CREATE TABLE patient_presents_with ( > > > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > > > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > > > UNIQUE(patient_id, symptom_id) > > > ); > > > > I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are > > all boolean values. > > Where is the boolean above? It is M:N, with each having whatever data > is required. The boolean is assumed in the symptoms table. In any case, even if it's not a boolean value, even if maybe the symptoms table is a complex one on it's own, it still is one single type for all symptoms of all patients. The real problem is that in some real world applications you have a mix of wildly varying types of attributes a user might want to use, and you can't know what those will be beforehand... the symptoms thing is simple to solve in the way David did it, but there really are other situations which a simple m:n can't easily cover. How would you handle a data base of user settings for 10K different applications and 100M different users where each application must be able to store it's own (type safe !!) settings in the same structure, and applications come and go with their own settings ? Come up with a good solution to this combined with queries like "give me all the users who have this set of settings set to these values" running fast, and then you're talking. Cheers, Csaba.
Re: My honours project - databases using dynamically attached entity-properties
From
Andrew Dunstan
Date:
Joshua D. Drake wrote: > Gregory Stark wrote: > >> "David Fetter" <david@fetter.org> writes: >> >> >>> CREATE TABLE symptom ( >>> symptom_id SERIAL PRIMARY KEY, /* See above. */ >>> ... >>> ); >>> >>> CREATE TABLE patient_presents_with ( >>> patient_id INTEGER NOT NULL REFERENCES patient(patient_id), >>> symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), >>> UNIQUE(patient_id, symptom_id) >>> ); >>> >> I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are >> all boolean values. >> > > With a unique across (patient_id,symptom_id) you could have a single > patient with as many unique symptoms as could be listed. > > > > I think Greg is suggesting that the table needs to be augmented with, say, a severity field. cheers andrew
Re: My honours project - databases using dynamically attached entity-properties
From
mark@mark.mielke.cc
Date:
On Wed, Mar 14, 2007 at 03:25:48PM +0000, Gregory Stark wrote: > <mark@mark.mielke.cc> writes: > > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote: > >> "David Fetter" <david@fetter.org> writes: > >> > CREATE TABLE symptom ( > >> > symptom_id SERIAL PRIMARY KEY, /* See above. */ > >> > ... > >> > ); > >> > CREATE TABLE patient_presents_with ( > >> > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > >> > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > >> > UNIQUE(patient_id, symptom_id) > >> > ); > >> I'm just glad I don't have your doctor. I hope mine doesn't think > >> symptoms are all boolean values. > > Where is the boolean above? It is M:N, with each having whatever data > > is required. > No, the above schema can only show whether a patient has or doesn't have a > symptom. There is nowhere to store *where* the pain, inflammation, swelling, > aneurism, etc is, or how severe it is, or when it occurred, etc. What do you think the '...' is in the symptom table? :-) I'm not saying I would do it that way - but it seems to me that this is a bit of a religious debate, without people listening to each other. For one person to suggest a workable solution, and another person to write it off as quickly as you did, misunderstanding, or misrepresenting it, is what I would call "not listening". :-) > In any case the above arguably *is* an EA schema anyways. Your "symptom" is > just as much an abstract meaningless concept from a database point of view as > the questionnaire's "answer" or the bug tracker's "tag". Especially once you > start actually having to record information *about* the symptom. > This is a silly argument. The only reasonable conclusion is that any dogmatic > principle that doesn't take into account the application requirements is > wrong. In some cases you want a flexible abstract schema because the > application is flexible and abstract, in others you need the database schema > to understand your specific data structures so it can help you manipulate it. > You have to pick which is more useful for your application, you can't have > your cake and eat it too. No. Another reasonable conclusion is that the answer is not simple. This doesn't mean the answer is undesirable. It means that people need more time. :-) I prefer abstractions, especially if they are more efficient than if I were to roll my own each time. Relational databases do a good job today. They can do a better job tomorrow. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Re: My honours project - databases using dynamically attached entity-properties
From
Gregory Stark
Date:
<mark@mark.mielke.cc> writes: > On Wed, Mar 14, 2007 at 03:25:48PM +0000, Gregory Stark wrote: >> <mark@mark.mielke.cc> writes: >> > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote: >> >> "David Fetter" <david@fetter.org> writes: >> >> > CREATE TABLE symptom ( >> >> > symptom_id SERIAL PRIMARY KEY, /* See above. */ >> >> > ... >> >> > ); >> >> > CREATE TABLE patient_presents_with ( >> >> > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), >> >> > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), >> >> > UNIQUE(patient_id, symptom_id) >> >> > ); >> >> I'm just glad I don't have your doctor. I hope mine doesn't think >> >> symptoms are all boolean values. >> > Where is the boolean above? It is M:N, with each having whatever data >> > is required. >> No, the above schema can only show whether a patient has or doesn't have a >> symptom. There is nowhere to store *where* the pain, inflammation, swelling, >> aneurism, etc is, or how severe it is, or when it occurred, etc. > > What do you think the '...' is in the symptom table? :-) Ah, I did in fact miss that. So then this is just a standard EA schema. I thought you were on the other side of the debate? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: My honours project - databases using dynamically attached entity-properties
From
David Fetter
Date:
On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote: > "David Fetter" <david@fetter.org> writes: > > > CREATE TABLE symptom ( > > symptom_id SERIAL PRIMARY KEY, /* See above. */ > > ... > > ); > > > > CREATE TABLE patient_presents_with ( > > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > > UNIQUE(patient_id, symptom_id) > > ); > > I'm just glad I don't have your doctor. I hope mine doesn't think > symptoms are all boolean values. What's in the symptom table is up to the doctor. Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate
Re: My honours project - databases using dynamically attached entity-properties
From
Richard Huxton
Date:
David Fetter wrote: > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote: >> "David Fetter" <david@fetter.org> writes: >> >>> CREATE TABLE symptom ( >>> symptom_id SERIAL PRIMARY KEY, /* See above. */ >>> ... >>> ); >>> >>> CREATE TABLE patient_presents_with ( >>> patient_id INTEGER NOT NULL REFERENCES patient(patient_id), >>> symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), >>> UNIQUE(patient_id, symptom_id) >>> ); >> I'm just glad I don't have your doctor. I hope mine doesn't think >> symptoms are all boolean values. > > What's in the symptom table is up to the doctor. Surely the point is that for "high temperature" I need a temperature value, whereas for "tooth decay" I'll want a tooth number (or whatever they use). Which brings us back to where we started... -- Richard Huxton Archonet Ltd
Re: My honours project - databases using dynamically attached entity-properties
From
Csaba Nagy
Date:
On Wed, 2007-03-14 at 16:50, David Fetter wrote: > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote: > > "David Fetter" <david@fetter.org> writes: > > > > > CREATE TABLE symptom ( > > > symptom_id SERIAL PRIMARY KEY, /* See above. */ > > > ... > > > ); > > > > > > CREATE TABLE patient_presents_with ( > > > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > > > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > > > UNIQUE(patient_id, symptom_id) > > > ); > > > > I'm just glad I don't have your doctor. I hope mine doesn't think > > symptoms are all boolean values. > > What's in the symptom table is up to the doctor. OK, and here's the problem: each doctor might want to put something else in the symptom table. Each doctor might want to do it in a type safe way, e.g. so that the application enforces an enumeration of "high/moderate/low" for the symptom fever (or maybe another doctor wants it in exact degrees)... you can all stuff it in a string field, but you know how reliable that will be. Cheers, Csaba.
Re: My honours project - databases using dynamically attached entity-properties
From
Edward Stanley
Date:
On Wed, 14 Mar 2007, David Fetter wrote: > On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote: > > David Fetter wrote: > > >On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote: > > >>David Fetter wrote: > > >>>On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote: > > >>>>* Another good example is the "questionnaire". > > >>> > > >>>With all due respect, this is a solved problem *without EAV or > > >>>run-time DDL*. The URL below has one excellent approach to this. > > >>> > > >>><http://www.varlena.com/GeneralBits/110.php> > > >> > > >>Which broadly speaking was the solution I used for my > > >>questionnaire, except I had a restricted set of types so basically > > >>just coerced them to text and side-stepped the inheritance issue. > > >>To the extent that it's dynamic, it's still just EAV though. > > > > > >That's precisely the difference between the above solution and > > >yours, and it's the difference between a good design and one that > > >will come up and bit you on the as^Hnkle. > > > > It's still basically EAV (either approach). The key fault with EAV > > is that the tables have no semantic meaning - answer_int contains > > number of oranges, days since birth and the price of a tube ticket > > in pennies. > > Stuffing all of those into an answer_int is *precisely* what the end > user must not do. That's pilot error. > > > Now, with a questionnaire that might not matter because everything > > is an "answer" and you're not necessarily going to do much more than > > count/aggregate it. > > See above. > > > >>It doesn't remove the need for run-time DDL if you allow users to > > >>add their own questions. > > > > > >Sure it does. When a user, who should be talking with you, wants > > >to ask a new kind of question, that's the start of a discussion > > >about what new kind(s) of questions would be generally applicable > > >in the questionnaire schema. Then, when you come to an agreement, > > >you roll it into the new schema, and the whole system gets an > > >improvement. > > > > Fine, but if you're not letting the user extend the system, then > > it's not really addressing Edward's original posting, is it? > > It's my contention that Edward's original idea is ill-posed. SQL is > just fine for doing this kind of thing, and it's *not that hard*. > > > If the user's talking to me, I might as well just write the DDL > > myself - it's the talk that'll take the time, not writing a dozen > > lines of SQL. > > It's the talk that's the important part. Machines are really bad at > seeing the broader picture. In the attempt to "save" a few minutes' > discussion, he's trying to borrow that time from a system asked to do > things that computers are inherently bad at doing, and every end user > will pay that time back at a very high rate of interest. This is > precisely the kind of false economy that so plagues software > development and maintenance these days. > > > The interesting part of the problem (from a Comp-Sci point of view) > > is precisely in automating part of that discussion. It's providing > > an abstraction so that you don't end up with a mass of attributes > > while still providing freedom to the user. > > This freedom and efficiency you're talking about is better supplied, > IMHO, by putting a standard DDL for questionnaires up on a pgfoundry > or an SF.net. That way, improvements to the DDL get spread all over > the world, and a very large amount of wheel reinvention gets avoided. > Reusable components are a big chunk of both freedom and efficiency. :) > > Cheers, > D Maybe I should rethink the problem a bit - from the very brief initial research I've done, it seems EAV schemas have two common uses: 1) When new attributes have to be created on-the-fly 2) When the number of possible properties for an entity greatly (orders of magnitude) exceeds the number of properties any one entity is likely to have. I'm not sure about solving the first problem - there seems to be a lot of debate around this. I can see reasons for and against allowing this. However I think the second is a very real problem. One such example is a patient record system. For each patient we have a table of common data (dob, sex, height, weight etc) but as well as this a patient can present with many symptoms. This might be a table of 40,000 possible symptoms. Lets say we want to run a query on these symptoms (using a boolean expression) to return the patient records which match the query string on the symptoms. (This turns out to be a very similar problem to the 'tags' example I first presented) - assume a similar schema. With more than a couple of symptoms and a complex tree, the resulting SQL can span pages. When I first started thinking about this project I believed the two problems essentially to be the same class of problem, but this may not be the case. What do people think? - Also, thanks everyone for your input thus far. It has been very valuable. Eddie Stanley
Re: My honours project - databases using dynamically attached entity-properties
From
Trent Shipley
Date:
On Wednesday 2007-03-14 08:26, Csaba Nagy wrote: > On Wed, 2007-03-14 at 16:08, mark@mark.mielke.cc wrote: > > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote: > > > "David Fetter" <david@fetter.org> writes: > > > > CREATE TABLE symptom ( > > > > symptom_id SERIAL PRIMARY KEY, /* See above. */ > > > > ... > > > > ); > > > > > > > > CREATE TABLE patient_presents_with ( > > > > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > > > > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > > > > UNIQUE(patient_id, symptom_id) > > > > ); > > > > > > I'm just glad I don't have your doctor. I hope mine doesn't think > > > symptoms are all boolean values. > > > > Where is the boolean above? It is M:N, with each having whatever data > > is required. > > The boolean is assumed in the symptoms table. In any case, even if it's > not a boolean value, even if maybe the symptoms table is a complex one > on it's own, it still is one single type for all symptoms of all > patients. The real problem is that in some real world applications you > have a mix of wildly varying types of attributes a user might want to > use, and you can't know what those will be beforehand... the symptoms > thing is simple to solve in the way David did it, but there really are > other situations which a simple m:n can't easily cover. How would you > handle a data base of user settings for 10K different applications and > 100M different users where each application must be able to store it's > own (type safe !!) settings in the same structure, and applications come > and go with their own settings ? Come up with a good solution to this > combined with queries like "give me all the users who have this set of > settings set to these values" running fast, and then you're talking. > > Cheers, > Csaba. At that point traditional referential database design starts to break down. If you need THAT MUCH flexibility it's probably time to look at network databases, knowledge bases, extensible knowledge bases, and ad hoc knowledge bases (OWL, RDF, etc). Flexibility, friendliness to marketeers or accountants, extesiblity, none are really the strong points of relational database. Databases as they exist today do best with finite domains that can be formally organized.
Re: My honours project - databases using dynamically attached entity-properties
From
Eddie Stanley
Date:
David Fetter wrote: > On Wed, Mar 14, 2007 at 12:07:45PM +1300, Edward Stanley wrote: > >> On Wed, 14 Mar 2007, David Fetter wrote: >> >>> On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote: >>> >>>> David Fetter wrote: >>>> >>>>> On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote: >>>>> >>>>>> David Fetter wrote: >>>>>> >>>>>>> On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote: >>>>>>> >>>>>>>> * Another good example is the "questionnaire". >>>>>>>> >>>>>>> With all due respect, this is a solved problem *without EAV or >>>>>>> run-time DDL*. The URL below has one excellent approach to this. >>>>>>> >>>>>>> <http://www.varlena.com/GeneralBits/110.php> >>>>>>> >>>>>> Which broadly speaking was the solution I used for my >>>>>> questionnaire, except I had a restricted set of types so basically >>>>>> just coerced them to text and side-stepped the inheritance issue. >>>>>> To the extent that it's dynamic, it's still just EAV though. >>>>>> >>>>> That's precisely the difference between the above solution and >>>>> yours, and it's the difference between a good design and one that >>>>> will come up and bit you on the as^Hnkle. >>>>> >>>> It's still basically EAV (either approach). The key fault with EAV >>>> is that the tables have no semantic meaning - answer_int contains >>>> number of oranges, days since birth and the price of a tube ticket >>>> in pennies. >>>> >>> Stuffing all of those into an answer_int is *precisely* what the end >>> user must not do. That's pilot error. >>> >>> >>>> Now, with a questionnaire that might not matter because everything >>>> is an "answer" and you're not necessarily going to do much more than >>>> count/aggregate it. >>>> >>> See above. >>> >>> >>>>>> It doesn't remove the need for run-time DDL if you allow users to >>>>>> add their own questions. >>>>>> >>>>> Sure it does. When a user, who should be talking with you, wants >>>>> to ask a new kind of question, that's the start of a discussion >>>>> about what new kind(s) of questions would be generally applicable >>>>> in the questionnaire schema. Then, when you come to an agreement, >>>>> you roll it into the new schema, and the whole system gets an >>>>> improvement. >>>>> >>>> Fine, but if you're not letting the user extend the system, then >>>> it's not really addressing Edward's original posting, is it? >>>> >>> It's my contention that Edward's original idea is ill-posed. SQL is >>> just fine for doing this kind of thing, and it's *not that hard*. >>> >>> >>>> If the user's talking to me, I might as well just write the DDL >>>> myself - it's the talk that'll take the time, not writing a dozen >>>> lines of SQL. >>>> >>> It's the talk that's the important part. Machines are really bad at >>> seeing the broader picture. In the attempt to "save" a few minutes' >>> discussion, he's trying to borrow that time from a system asked to do >>> things that computers are inherently bad at doing, and every end user >>> will pay that time back at a very high rate of interest. This is >>> precisely the kind of false economy that so plagues software >>> development and maintenance these days. >>> >>> >>>> The interesting part of the problem (from a Comp-Sci point of view) >>>> is precisely in automating part of that discussion. It's providing >>>> an abstraction so that you don't end up with a mass of attributes >>>> while still providing freedom to the user. >>>> >>> This freedom and efficiency you're talking about is better supplied, >>> IMHO, by putting a standard DDL for questionnaires up on a pgfoundry >>> or an SF.net. That way, improvements to the DDL get spread all over >>> the world, and a very large amount of wheel reinvention gets avoided. >>> Reusable components are a big chunk of both freedom and efficiency. :) >>> >>> Cheers, >>> D >>> >> Maybe I should rethink the problem a bit - from the very brief >> initial research I've done, it seems EAV schemas have two common >> uses: >> > > >> 1) When new attributes have to be created on-the-fly >> 2) When the number of possible properties for an entity greatly (orders of >> magnitude) exceeds the number of properties any one entity is likely to have. >> > > Um, no. The first use case is bad coding practice, and the second is > a classic case for a join table, which is the standard way to handle > M:N relationships. > > >> I'm not sure about solving the first problem - there seems to be a lot of >> debate around this. I can see reasons for and against allowing this. However >> I think the second is a very real problem. One such example is a patient >> record system. >> >> For each patient we have a table of common data (dob, sex, height, weight etc) >> but as well as this a patient can present with many symptoms. This might be a >> table of 40,000 possible symptoms. >> > > Here's how I'd do that: > > CREATE TABLE patient ( > patient_id SERIAL PRIMARY KEY, /* for simplicity. Some > combination of columns in the > table would also have a UNIQUE > NOT NULL constraint on it. > */ > ... > ); > > CREATE TABLE symptom ( > symptom_id SERIAL PRIMARY KEY, /* See above. */ > ... > ); > > CREATE TABLE patient_presents_with ( > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > UNIQUE(patient_id, symptom_id) > ); > > >> Lets say we want to run a query on these symptoms (using a boolean expression) >> > > I'd use something like the following: > > SELECT > p.patient_id, > p.f_name, > p.l_name, > s.symptom_name, > s.symptom_desc > FROM > patient p > JOIN > patient_presents_with ppw > USING (patient_id) > JOIN > symptom s > USING (symptom_id) > WHERE > s.symptom_name = ALL('foo','bar','baz') > AND > s.symptom_name = ANY('quux','fleeg'); > > Are the ALL and ANY functions new to Postgresql 8? I haven't met them before. Anyway this will work for some queries but not others. What about the following expression? ('foo' && 'bar') | ('baz' && ! 'quxx') | 'fleeg' Maybe I have misunderstood how these functions work, but I don't think they will handle anything but trivial examples of this problem. >> to return the patient records which match the query string on the symptoms. >> >> (This turns out to be a very similar problem to the 'tags' example I first >> presented) - assume a similar schema. With more than a couple of symptoms and >> a complex tree, the resulting SQL can span pages. >> > > Not really. See above :) > > >> When I first started thinking about this project I believed the two problems >> essentially to be the same class of problem, but this may not be the case. >> > > EAV will bite you. It's not *that* much work to keep its from biting > you. :) > > Cheers, > D >
Re: My honours project - databases using dynamically attached entity-properties
From
Ron Mayer
Date:
Josh Berkus wrote: >> And then what? dynamically construct all your SQL queries? >> Sure, sounds like a simple solution to me... > > Not to mention DB security issues. How do you secure your database when > your web client has DDL access? > > So, Edward, the really *interesting* idea would be to come up with a > secure, normalized way to do UDFs *without* EAV tables. People would be > very impressed. > I have a system with many essentially user-defined fields, and was thinking of creating something similar to an Array type and writing some GIST indexes for it. My current workaround is to store them as a YAML document and use tsearch to index it (with application logic to further refine the results) - but a EAV datatype that could be put in tables and effectively indexed would be of quite a bit of interest here. And yes, a better say to do UDFs would be even cooler.
Re: My honours project - databases using dynamically attached entity-properties
From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, Mar 15, 2007 at 08:31:24AM -0700, Ron Mayer wrote: > Josh Berkus wrote: > >> And then what? dynamically construct all your SQL queries? > >> Sure, sounds like a simple solution to me... > > > > Not to mention DB security issues. How do you secure your database when > > your web client has DDL access? > > > > So, Edward, the really *interesting* idea would be to come up with a > > secure, normalized way to do UDFs *without* EAV tables. People would be > > very impressed. > > > > I have a system with many essentially user-defined fields, and was > thinking of creating something similar to an Array type and writing > some GIST indexes for it. Do give hstore a try. It indexes over name/value mappings (the same authors have intarray, indexing over arrays). If it doesn't solve your problem it'll be at least a very good starting point for what you mention above (basically it's a lossy GiST index based on a hash bitmap. I was at least quite impressed). Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFF+WumBcgs9XrR2kYRAukhAJ9mPk89FfLX4E3dIBGlhfVTwqup0QCdE8Tx nh05Y1WpyX36uKI+9qspO60= =C/5K -----END PGP SIGNATURE-----
On Mar 15, 2007, at 11:31 , Ron Mayer wrote: > Josh Berkus wrote: >>> And then what? dynamically construct all your SQL queries? >>> Sure, sounds like a simple solution to me... >> >> Not to mention DB security issues. How do you secure your >> database when >> your web client has DDL access? >> >> So, Edward, the really *interesting* idea would be to come up with a >> secure, normalized way to do UDFs *without* EAV tables. People >> would be >> very impressed. >> > > I have a system with many essentially user-defined fields, and was > thinking of creating something similar to an Array type and writing > some GIST indexes for it. > > My current workaround is to store them as a YAML document and use > tsearch to index it (with application logic to further refine the > results) - but a EAV datatype that could be put in tables and > effectively indexed would be of quite a bit of interest here. > And yes, a better say to do UDFs would be even cooler. Out of all the databases that I have used, postgresql offers the most flexible DDL- mostly for one reason: they can operate within transactions. To handle arbitrary strings as column identifiers, the column names could actually be stripped down to lower-case letters and the "real title" could be stored in a separate table or as column comments. Mr. Berkus' concern regarding the security implications is already handled by privilege separation or security-definer functions. The OP's concern about the difficulty about querying a schema structure is alleviated via any number of APIs in Perl, JDBC, etc. It seems to me that postgresql is especially well-suited to run DDL at runtime, so what's the issue? -M
Re: My honours project - databases using dynamically attached entity-properties
From
Gregory Stark
Date:
"Ron Mayer" <rm_pg@cheapcomplexdevices.com> writes: > I have a system with many essentially user-defined fields, and was > thinking of creating something similar to an Array type and writing > some GIST indexes for it. > > My current workaround is to store them as a YAML document and use > tsearch to index it (with application logic to further refine the > results) - but a EAV datatype that could be put in tables and > effectively indexed would be of quite a bit of interest here. > And yes, a better say to do UDFs would be even cooler. I think that's what hstore does. Honestly I'm not entirely certain though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: My honours project - databases using dynamically attached entity-properties
From
Csaba Nagy
Date:
On Thu, 2007-03-15 at 17:01, A.M. wrote: > It seems to me that postgresql is especially well-suited to run DDL > at runtime, so what's the issue? The issue is that some applications are not well suited to run DDL at runtime :-) As I already mentioned in another post in this thread, our application also has a requirement of user defined fields in one table. Problem is, that table is so accessed in peak hours, that it is simply impossible to take an exclusive lock on it without causing an extended perceived downtime of the application. And guess what, users will always want to add new fields in peak hours... We did solve this in our case with some application logic, but a generic solution would be nice ;-) Cheers, Csaba.
Re: My honours project - databases using dynamically attached entity-properties
From
Andrew Dunstan
Date:
Gregory Stark wrote: > "Ron Mayer" <rm_pg@cheapcomplexdevices.com> writes: > > >> I have a system with many essentially user-defined fields, and was >> thinking of creating something similar to an Array type and writing >> some GIST indexes for it. >> >> My current workaround is to store them as a YAML document and use >> tsearch to index it (with application logic to further refine the >> results) - but a EAV datatype that could be put in tables and >> effectively indexed would be of quite a bit of interest here. >> And yes, a better say to do UDFs would be even cooler. >> > > I think that's what hstore does. Honestly I'm not entirely certain though. > > Does hstore nest? My impression is that it doesn't. Which might well not matter, of course. cheers andrew
Re: My honours project - databases using dynamically attached entity-properties
From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, Mar 15, 2007 at 12:49:10PM -0400, Andrew Dunstan wrote: > Gregory Stark wrote: > >"Ron Mayer" <rm_pg@cheapcomplexdevices.com> writes: > > > > > >>I have a system with many essentially user-defined fields, and was > >>thinking of creating something similar to an Array type and writing > >>some GIST indexes for it. [...] > >I think that's what hstore does. Honestly I'm not entirely certain though. > Does hstore nest? My impression is that it doesn't. Which might well not > matter, of course. If what you mean is to have "mappings of mappings" then no. Hstore implements a data type for a (finite) mapping (a set of key -> value pairs, think "hash" for perl folks), with operations like "H1 contains H2" (in the sense that all key-value pairs in H2 are also in H1) supported by an index. Keys and values are strings. But I don't see why it couldn't be extended to more structured data types (one of the good things of hstore at the moment is its surprising simplicity, because it provides a good example of what GiST is good for, therefore I'd understand if the authors don't wanted to add that much hair without need). In my view it's one of the cutest applications of GiST. Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFF+i5OBcgs9XrR2kYRAprYAJ46ecAnSNWnh+rizGdhWLutU7BphACdHweH k3rdsxmUzuxthwldXvJ3UKw= =+sT5 -----END PGP SIGNATURE-----
Re: My honours project - databases using dynamically attached entity-properties
From
Andrew Dunstan
Date:
tomas@tuxteam.de wrote: > >> Does hstore nest? My impression is that it doesn't. Which might well not >> matter, of course. >> > > If what you mean is to have "mappings of mappings" then no. > > Hstore implements a data type for a (finite) mapping (a set of key -> value > pairs, think "hash" for perl folks), with operations like "H1 contains > H2" (in the sense that all key-value pairs in H2 are also in H1) > supported by an index. Keys and values are strings. > > As a perl folk I think of hashes as nestable :-). Unlike hstore, the keys are strings but the values can be anything, including a hashref or arrayref. Anyway, this means that you can't use hstore to cover the same field as YAML or JSON. That doesn't mean it's not useful - far from it. cheers andrew
Re: My honours project - databases using dynamically attached entity-properties
From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, Mar 16, 2007 at 09:56:23AM -0400, Andrew Dunstan wrote: > tomas@tuxteam.de wrote: > > > >>Does hstore nest? [...] > > > >If what you mean is to have "mappings of mappings" then no. > > [...] think "hash" for perl folks [...] > As a perl folk I think of hashes as nestable :-) Heh. Point taken :-) > Anyway, this means that you can't use hstore to cover the same field as > YAML or JSON. That doesn't mean it's not useful - far from it. I think that hstore might be extensible in this direction. After all, the keys and (key, value) combinations just get hashed into a bitmap. There is no reason why one shouldn't be able to hash more complex data structures (unless, of course, the bit map gets so densely poked with ones that it stops being useful). The difficult part might be to reach a consensus on what a "complex data structure" might look like. The purists among the relational folks are sharpening their knives already, I can hear that ;-) Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFF++GaBcgs9XrR2kYRAs/hAJ9vfRy36T23vJ6eIYj6efrQVk9roQCeMkqQ kWqzNbgbkMOXKIDQuzwDeFY= =fCiG -----END PGP SIGNATURE-----