Thread: My honours project - databases using dynamically attached entity-properties

My honours project - databases using dynamically attached entity-properties

From
Edward Stanley
Date:
Hi,


Was wondering if people would mind having a read over what I plan to do for my 
undergraduate honours project - you can get the proposal here:

http://www.mcs.vuw.ac.nz/~eddie/489_Proposal.pdf

What I'd basically like to know is 
a) Is this problem worth solving? 
b) Is there already a good solution (particularly, within PostgreSQL)?
c) Any other feedback or comments you may have.


Regards
Eddie Stanley



Re: My honours project - databases using dynamically attached entity-properties

From
Heikki Linnakangas
Date:
Edward Stanley wrote:
> Was wondering if people would mind having a read over what I plan to do for my 
> undergraduate honours project - you can get the proposal here:
> 
> http://www.mcs.vuw.ac.nz/~eddie/489_Proposal.pdf

ISTM that what you're calling "dynamically attached properties" are 
generally called Entity-Attribute-Value schemas. In my experience, EAV 
schemas are usually result of improper database design by someone not 
understanding the relational theory and the principles of normalization. 
EAV offers a false sense of flexibility: sure, you can store whatever 
you want in an EAV database, but because the data isn't well structured, 
building queries becomes extremely complicated. No amount of syntactic 
sugar is going to fix that, you just need to normalize your data.

> What I'd basically like to know is 
> a) Is this problem worth solving? 
> b) Is there already a good solution (particularly, within PostgreSQL)?

You could create a view on top of the normalized schema. There's also a 
contrib module called "hstore" which might be relevant.

> c) Any other feedback or comments you may have.

I don't know how wedded you are to your subject or your ambitions, but 
let me suggest a few other ideas anyway:

- Temporal extensions. IIRC, two people/groups have recently come up on 
hackers, planning to do their thesis on that subject. It's a very real 
problem, so more people working on it wouldn't hurt. See 
http://www.amazon.co.uk/Temporal-Data-Relational-Model-Investigation/dp/1558608559 
for the right way to solve it

- Recursive queries.

- Materialized views.



--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Edward, Heikki,

> In my experience, EAV
> schemas are usually result of improper database design by someone not
> understanding the relational theory and the principles of normalization.

Edward, you should be aware that EAV schema are a source of disagreement among 
database designers.  Some developers like Heikki and David Fetter hate them, 
and other like me use them frequently when the application business logic 
warrants it -- primarily cases where each item has a variable number of 
attributes and a large number of the attributes are null for most items, or 
when the application is expected to define new attributes which can't be 
anticipated by the database designer.

No matter how much Heikki hates them, I think he'd agree that EAV tables are 
better than having the application execute DDL at runtime.  ;-)

> > What I'd basically like to know is
> > a) Is this problem worth solving?
> > b) Is there already a good solution (particularly, within PostgreSQL)?

It's pretty easy to do EAV ad-hoc using existing tools.  I'm not clear on what 
you're planning to implement that's not already available.  Could you be more 
specific?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: My honours project - databases using dynamically attached entity-properties

From
"Andrew Hammond"
Date:
On Mar 11, 12:47 pm, j...@agliodbs.com (Josh Berkus) wrote:
> No matter how much Heikki hates them, I think he'd agree that EAV tables are
> better than having the application execute DDL at runtime.

EAV moves the structure that is typically in the design of the tables
into the contents of the tables. With an EAV database you have
effectively destroyed the semantic difference between DML and DDL. I'm
willing to concede that there may be situations where EAV is actually
the right answer to a design problem. I have yet to encounter one, but
Josh has more experience, and more varied experience than I do. To me,
EAV is a perfect example of ignoring the YAGNI principal.

http://c2.com/xp/YouArentGonnaNeedIt.html

> > > What I'd basically like to know is
> > > a) Is this problem worth solving?

I think you're solving a symptom, not the root cause of the problem.
Clarify the root cause, and then you have an interesting project.

Andrew



Amdrew,

> I have yet to encounter one, but
> Josh has more experience, and more varied experience than I do. To me,
> EAV is a perfect example of ignoring the YAGNI principal.

I've done plenty of applications where part of the specification for the 
application was "User Defined Fields" allowing the users to customize the 
data structure at runtime.  This is a very, very common thing; of the 15 
or so commercial applications I implemented from scratch as a consultant 
probably 10 of them had it.

I really don't see any way you could implement UDFs other than EAV that 
wouldn't be immensely awkward, or result in executing DDL at runtime.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: My honours project - databases using dynamically attached entity-properties

From
"Joshua D. Drake"
Date:
Josh Berkus wrote:
> Amdrew,
> 
>> I have yet to encounter one, but
>> Josh has more experience, and more varied experience than I do. To me,
>> EAV is a perfect example of ignoring the YAGNI principal.
> 
> I've done plenty of applications where part of the specification for the 
> application was "User Defined Fields" allowing the users to customize the 
> data structure at runtime.  This is a very, very common thing; of the 15 
> or so commercial applications I implemented from scratch as a consultant 
> probably 10 of them had it.

Just to add a note from YAJ, custom fields are unfortunately a
requirement in most apps I have seen, from a CMS to Quickbooks. The
reality is, you don't know everything the person wants to know about a
particular set of data.

Joshua D. Drake


> 
> I really don't see any way you could implement UDFs other than EAV that 
> wouldn't be immensely awkward, or result in executing DDL at runtime.
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: My honours project - databases using dynamically attached entity-properties

From
Richard Huxton
Date:
Josh Berkus wrote:
> I really don't see any way you could implement UDFs other than EAV that 
> wouldn't be immensely awkward, or result in executing DDL at runtime.

What's so horrible about DDL at runtime? Obviously, you're only going to 
allow specific additions to specific schemas/tables, but why not?

--   Richard Huxton  Archonet Ltd


Re: My honours project - databases using dynamically attached entity-properties

From
"Andrew Hammond"
Date:
On 3/12/07, Richard Huxton <dev@archonet.com> wrote:
> Josh Berkus wrote:
> > I really don't see any way you could implement UDFs other than EAV that
> > wouldn't be immensely awkward, or result in executing DDL at runtime.
>
> What's so horrible about DDL at runtime? Obviously, you're only going to
> allow specific additions to specific schemas/tables, but why not?

More to the point, since EAV is effectively smearing the semantics of
DDL with DML, what, if any of the arguments against doing DDL at
runtime don't apply equally to EAV? Well, aside from being able to say
"hey, I'm not executing DDL at runtime". :)

I see the issue as one of cost: it's substantially harder to implement
DDL at runtime than to work around the problem using EAV. If that
analysis is reasonable, then it would be a very interesting research
project to see how to cut down that cost of implementation.

Andrew


Re: My honours project - databases using dynamically attached entity-properties

From
Richard Huxton
Date:
Andrew Hammond wrote:
> On 3/12/07, Richard Huxton <dev@archonet.com> wrote:
>> Josh Berkus wrote:
>> > I really don't see any way you could implement UDFs other than EAV that
>> > wouldn't be immensely awkward, or result in executing DDL at runtime.
>>
>> What's so horrible about DDL at runtime? Obviously, you're only going to
>> allow specific additions to specific schemas/tables, but why not?
> 
> More to the point, since EAV is effectively smearing the semantics of
> DDL with DML, what, if any of the arguments against doing DDL at
> runtime don't apply equally to EAV? Well, aside from being able to say
> "hey, I'm not executing DDL at runtime". :)
> 
> I see the issue as one of cost: it's substantially harder to implement
> DDL at runtime than to work around the problem using EAV. If that
> analysis is reasonable, then it would be a very interesting research
> project to see how to cut down that cost of implementation.

Well the cost depends on where/how complex the extra fields are. If 
you're just talking about adding columns usercol01..NN with different 
types and possibly a lookup to a single client_attributes table, it's 
not difficult.

Of course, if inheritence worked fully, you could just have core and 
user versions of relevant tables.

--   Richard Huxton  Archonet Ltd


"Richard Huxton" <dev@archonet.com> writes:

> Well the cost depends on where/how complex the extra fields are. If you're just
> talking about adding columns usercol01..NN with different types and possibly a
> lookup to a single client_attributes table, it's not difficult.

And then what? dynamically construct all your SQL queries? 
Sure, sounds like a simple solution to me...

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


> And then what? dynamically construct all your SQL queries?
> Sure, sounds like a simple solution to me...

Not to mention DB security issues.  How do you secure your database when 
your web client has DDL access?

So, Edward, the really *interesting* idea would be to come up with a 
secure, normalized way to do UDFs *without* EAV tables.  People would be 
very impressed.

BTW, Google Summer of Code opens Wednesday:
http://www.postgresql.org/developer/summerofcode.html

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: My honours project - databases using dynamically attached entity-properties

From
Richard Huxton
Date:
Gregory Stark wrote:
> "Richard Huxton" <dev@archonet.com> writes:
> 
>> Well the cost depends on where/how complex the extra fields are. If you're just
>> talking about adding columns usercol01..NN with different types and possibly a
>> lookup to a single client_attributes table, it's not difficult.
> 
> And then what? dynamically construct all your SQL queries? 
> Sure, sounds like a simple solution to me...

No different to dynamically constructing a query for a report. Simpler, 
since in my experience most of these user-defined fields are just slots 
for extra codes/tags/notes rather than something you'd join on.

--   Richard Huxton  Archonet Ltd


And then what? Make the search box on www.postgresql.org able to handle an 
email address as search text without throwing a shoe?

Search for linus@torvalds.com or any other 'email' address from the postgres 
home page. Barfage every time.
Easy for some isn't easy for all, apparently. Left that out as a test case 
did we? Someone searching a mailing list for an email address? Who wudda 
thunk it? It works without the . -- don't know why, but then I also don't 
know why someone hasn't tried that before me.

"Sure, sounds like a simple solution to me..." Richard said sarcastically. 
Would be funnier if the search on the website wasn't broken in a completely 
stupid, almost ironic way. Ah, irony and sarcasm -- the ugly twins.

Yeah, we have to dynamically generate queries day in and day out. But then 
some of us actually work for a living.
Since we already have to do that, maybe someone could make that easier? 
Isn't that really the point here? Someone asked if something would be 
useful, and the people who use the database to do real work said YES, and 
here's how I might use it. Like full text seach and recursive queries, user 
defined (fields|attributes|properties) and the ability to manage them would 
be BUTTER! Is it a difficult problem? YES, but if it wasn't, why should it 
be worth an advanced degree?

Or maybe 'we' only solve the trivial and obvious problems, like searching a 
mailing list for an email address?

Sarcastically yours,
Sean



----- Original Message ----- 
From: "Richard Huxton" <dev@archonet.com>
To: "Gregory Stark" <stark@enterprisedb.com>
Cc: "Andrew Hammond" <andrew.george.hammond@gmail.com>; <josh@agliodbs.com>; 
<pgsql-hackers@postgresql.org>
Sent: Monday, March 12, 2007 7:30 PM
Subject: Re: [HACKERS] My honours project - databases using dynamically 
attached entity-properties


> Gregory Stark wrote:
>> "Richard Huxton" <dev@archonet.com> writes:
>>
>>> Well the cost depends on where/how complex the extra fields are. If 
>>> you're just
>>> talking about adding columns usercol01..NN with different types and 
>>> possibly a
>>> lookup to a single client_attributes table, it's not difficult.
>>
>> And then what? dynamically construct all your SQL queries? Sure, sounds 
>> like a simple solution to me...
>
> No different to dynamically constructing a query for a report. Simpler, 
> since in my experience most of these user-defined fields are just slots 
> for extra codes/tags/notes rather than something you'd join on.
>
> -- 
>   Richard Huxton
>   Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>
> 



Re: My honours project - databases using dynamically attached entity-properties

From
Magnus Hagander
Date:
On Mon, Mar 12, 2007 at 10:05:58PM -0700, Sean Utt wrote:
> And then what? Make the search box on www.postgresql.org able to handle an 
> email address as search text without throwing a shoe?
> 
> Search for linus@torvalds.com or any other 'email' address from the 
> postgres home page. Barfage every time.
> Easy for some isn't easy for all, apparently. Left that out as a test case 
> did we? Someone searching a mailing list for an email address? Who wudda 
> thunk it? It works without the . -- don't know why, but then I also don't 
> know why someone hasn't tried that before me.

Obviously not, since nobody has reported it before.
Fixed, insofar that we don't barf on it, but show you the no hits message.
Because all email addresses are obfuscated in the archives, they're not
actually searchable.

Thanks for reporting the issue. You're advised to report issues with the
website to pgsql-www instead for a faster response, since web people
don't always monitor all the other lists.

//Magnus


On Tue, 2007-03-13 at 00:43, Richard Huxton wrote:
> Josh Berkus wrote:
> > I really don't see any way you could implement UDFs other than EAV that 
> > wouldn't be immensely awkward, or result in executing DDL at runtime.
> 
> What's so horrible about DDL at runtime? Obviously, you're only going to 
> allow specific additions to specific schemas/tables, but why not?

Well, exclusively locking the table for DDL is not always possible in
production systems. We also shortly had a solution where we added new
columns on the fly, and we had to ditch it... it was simply not working.
The users were usually adding columns in the peek business hours, and in
the same hours it was simply impossible to take an exclusive lock on
that table. I think DDL will actually also exclusively lock parent
tables of FK relationships (I might be mistaken here, but I think I have
seen such lock), which is even worse.

After it caused extensive perceived downtime for hours, we simply
redesigned the feature so that the users need now to announce beforehand
how many different types of new columns they will need and we just
create a few extra of them, and assign them to the users needs as they
need it... the unused columns stay null and hopefully don't have too big
overhead, but it's the simplest solution we found which actually works.
When the user runs out of spare columns, the admin will create some new
spares in quiet hours.

Cheers,
Csaba.




Re: My honours project - databases using dynamically attached entity-properties

From
Richard Huxton
Date:
Sean Utt wrote:
> And then what? Make the search box on www.postgresql.org able to handle 
> an email address as search text without throwing a shoe?
> 
> Search for linus@torvalds.com or any other 'email' address from the 
> postgres home page. Barfage every time.
> Easy for some isn't easy for all, apparently. Left that out as a test 
> case did we? Someone searching a mailing list for an email address? Who 
> wudda thunk it? It works without the . -- don't know why, but then I 
> also don't know why someone hasn't tried that before me.

Had a bad day?

And I'll be damned if you're not right - it doesn't return results for 
dev@archonet.com but does for dev@archonet. Presumably something to do 
with (not ?) splitting the email address on ".". Can't believe no-one 
has noticed this before (me, for example). I guess that even though I 
search a lot, it's not on email addrs.

Have you reported it to the www team?

> "Sure, sounds like a simple solution to me..." Richard said 
> sarcastically. Would be funnier if the search on the website wasn't 
> broken in a completely stupid, almost ironic way. Ah, irony and sarcasm 
> -- the ugly twins.

Actually, it was Greg who said that. And it was *me* the (really very 
gentle) sarcasm was directed at.

> Yeah, we have to dynamically generate queries day in and day out. But 
> then some of us actually work for a living.

Umm - like all of us?

> Since we already have to do that, maybe someone could make that easier? 

Good idea!

> Isn't that really the point here?

Not as I was reading the discussion.
> Someone asked if something would be
> useful, and the people who use the database to do real work said YES, 
> and here's how I might use it. Like full text seach and recursive 
> queries, user defined (fields|attributes|properties) and the ability to 
> manage them would be BUTTER! Is it a difficult problem? YES, but if it 
> wasn't, why should it be worth an advanced degree?

I think the main discussion has been around:
1. Whether Edward's final-year project is basically EAV (in which case 
he'll probably need to work hard to get good marks).
2. Whether dynamically generating DDL is safe/practical in a business 
setting. This seems to split some *very* experienced people about 50:50. 
Half of them agree with me and the other half are inexplicably wrong ;-)

If you read section 0.3 of Edward's project proposal he argues that 
dynamic DDL causes problems for the application because: "However, SQL 
does not provide an easy syntax for querying these properties." (meaning 
the changed structure of the database). I'm not saying this part is 
easy, but I'm not convinced it's harder than doing it the "wrong" way. 
At least not if you do as Edward does and enforce types.

Now, in 0.3.1 he describes a normalised webpage=>tags table pair and 
shows some cumbersome-looking SQL. However, I'd argue this is due to the 
verbose nature of the SQL rather than the underlying expressions.

He then looks at what I think of as the "system settings table"* 
problem, where you have a bunch of configuration-settings you'd tend to 
store in a single table (setting_name, value), except you'd like to have 
different types for each setting (a further wrinkle is that you might 
like lists of settings - do you use arrays or a sub-table?). This is 
your classic EAV scenario.

Now, he proposes a set of tables - one per property, so you can enforce 
type constraints, but he will need to create syntax to make this easier 
to query. Presumably it'll need an automatically-generated view over the 
top. (Edward - tip: read up on Date's thoughts on automatically 
determining what table you can insert a row in based on its type).

This certainly looks like a variant on EAV to me, and I'm not convinced 
that it's gaining much since you'll have possibly huge numbers of joins 
going on in the background while not having any semantics to your table 
definitions.

The other variant (which I've used myself) is to have a "type" column 
and a trigger to enforce the text-value is valid for "type".

Now, if you do it "properly", that is define tables as you would as a 
human designer, then you do have the problem of informing your 
application on the relevant structure. Now, the syntax issues of this 
are largely solved - plenty of ActiveRecord-style classes out there for 
Ruby,Python,Perl,PHP,...

The interesting question then becomes where the practical limits in such 
flexibility are. Simple attribute/value pairs are easy enough. How do 
you feel about changes to existing tables though? How about new tables 
that add relationships between existing tables? At what point does the 
application just become pgAdmin?


* Another good example is the "questionnaire". Users need to be able to 
define their own lists of questions and if "driving_licence=no" then 
don't bother to ask "type of car=?". Edward - if you want to see a 
schema that implements this sort of thing, contact me off list and I'll 
see what I can do. The client for that project will probably be fine 
with sharing it with one student.

--   Richard Huxton  Archonet Ltd


On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:
> Sean Utt wrote:
> >And then what? Make the search box on www.postgresql.org able to
> >handle an email address as search text without throwing a shoe?
> >
> >Search for linus@torvalds.com or any other 'email' address from the
> >postgres home page. Barfage every time.  Easy for some isn't easy
> >for all, apparently. Left that out as a test case did we? Someone
> >searching a mailing list for an email address? Who wudda thunk it?
> >It works without the . -- don't know why, but then I also don't
> >know why someone hasn't tried that before me.
> 
> Had a bad day?

Today was a good day ;)

> >Since we already have to do that, maybe someone could make that easier? 
> 
> Good idea!

Hear, hear!

> > Someone asked if something would be
> >useful, and the people who use the database to do real work said YES, 
> >and here's how I might use it. Like full text seach and recursive 
> >queries, user defined (fields|attributes|properties) and the ability to 
> >manage them would be BUTTER! Is it a difficult problem? YES, but if it 
> >wasn't, why should it be worth an advanced degree?
> 
> I think the main discussion has been around:
> 1. Whether Edward's final-year project is basically EAV (in which case 
> he'll probably need to work hard to get good marks).

As Josh mentioned, I'm on of the people who says EAV is never
justified.  We do have a way of expressing wide ranges of constraints
not known in advance: it's called SQL, and people need to do some
design using it, however much they may fear that they've left
something out somehow. :)

> 2. Whether dynamically generating DDL is safe/practical in a
> business setting. This seems to split some *very* experienced people
> about 50:50.  Half of them agree with me and the other half are
> inexplicably wrong ;-)

For the record, I'm in the "ag'in" column.

> * Another good example is the "questionnaire".

With all due respect, this is a solved problem *without EAV or
run-time DDL*.  The URL below has one excellent approach to this.

<http://www.varlena.com/GeneralBits/110.php>

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate


Re: My honours project - databases using dynamically attached entity-properties

From
Richard Huxton
Date:
David Fetter wrote:
> On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:
>> * Another good example is the "questionnaire".
> 
> With all due respect, this is a solved problem *without EAV or
> run-time DDL*.  The URL below has one excellent approach to this.
> 
> <http://www.varlena.com/GeneralBits/110.php>

Which broadly speaking was the solution I used for my questionnaire, 
except I had a restricted set of types so basically just coerced them to 
text and side-stepped the inheritance issue. To the extent that it's 
dynamic, it's still just EAV though.

It doesn't remove the need for run-time DDL if you allow users to add 
their own questions. If I add a "National Insurance No." (Social 
security No) then either it's:
1. Plain text, and can take clearly invalid codes
2. A user-space construct with regexp matches etc (basically recreating DDL)
3. DDL.

And as the example says, you need to create the table types in advance. 
If you want to add e.g. "Work History" (employer, from_date, to_date) to 
a questionnaire then you'll need dynamic DDL (of form #2 or #3 above).

--   Richard Huxton  Archonet Ltd


Re: My honours project - databases using dynamically attached entity-properties

From
Andrew Dunstan
Date:
David Fetter wrote:
>> I think the main discussion has been around:
>> 1. Whether Edward's final-year project is basically EAV (in which case 
>> he'll probably need to work hard to get good marks).
>>     
>
> As Josh mentioned, I'm on of the people who says EAV is never
> justified.  We do have a way of expressing wide ranges of constraints
> not known in advance: it's called SQL, and people need to do some
> design using it, however much they may fear that they've left
> something out somehow. :)
>
>   

ISTM that the biggest problem with EAV is that is is far too seductive, 
and allows for lazy design (or lack of design). There might be odd cases 
(pace David) that require it, but I strongly suspect in most cases it is 
not necessary.

Perhaps we need some sort of discriminant record type ;-)

cheers

andrew


Hmm, hstore + (optionally) functional indexes. Is it answer?

Edward Stanley wrote:
> Hi,
> 
> 
> Was wondering if people would mind having a read over what I plan to do for my 
> undergraduate honours project - you can get the proposal here:
> 
> http://www.mcs.vuw.ac.nz/~eddie/489_Proposal.pdf
> 
> What I'd basically like to know is 
> a) Is this problem worth solving? 
> b) Is there already a good solution (particularly, within PostgreSQL)?
> c) Any other feedback or comments you may have.
> 
> 
> Regards
> Eddie Stanley
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote:
> David Fetter wrote:
> >On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:
> >>* Another good example is the "questionnaire".
> >
> >With all due respect, this is a solved problem *without EAV or
> >run-time DDL*.  The URL below has one excellent approach to this.
> >
> ><http://www.varlena.com/GeneralBits/110.php>
> 
> Which broadly speaking was the solution I used for my questionnaire,
> except I had a restricted set of types so basically just coerced
> them to text and side-stepped the inheritance issue. To the extent
> that it's dynamic, it's still just EAV though.

That's precisely the difference between the above solution and yours,
and it's the difference between a good design and one that will come
up and bit you on the as^Hnkle.

> It doesn't remove the need for run-time DDL if you allow users to add 
> their own questions.

Sure it does.   When a user, who should be talking with you, wants to
ask a new kind of question, that's the start of a discussion about
what new kind(s) of questions would be generally applicable in the
questionnaire schema.  Then, when you come to an agreement, you roll
it into the new schema, and the whole system gets an improvement.

> If I add a "National Insurance No." (Social 
> security No) then either it's:
> 1. Plain text, and can take clearly invalid codes
> 2. A user-space construct with regexp matches etc (basically recreating DDL)
> 3. DDL.

DDL, yes.  Run-time, no.

> And as the example says, you need to create the table types in advance. 
> If you want to add e.g. "Work History" (employer, from_date, to_date) to 
> a questionnaire then you'll need dynamic DDL (of form #2 or #3 above).

Again see above for new types of questions and answers.  You can pay
for it once up front in a re-jigger of the schema, or you will pay a
much larger price when you discover you've got EAV goo all over the
place.  There is a third option, of course, which is, "I'm sorry,
Dave.  I can't do that." ;)

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate


Re: My honours project - databases using dynamically attached entity-properties

From
Richard Huxton
Date:
David Fetter wrote:
> On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote:
>> David Fetter wrote:
>>> On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:
>>>> * Another good example is the "questionnaire".
>>> With all due respect, this is a solved problem *without EAV or
>>> run-time DDL*.  The URL below has one excellent approach to this.
>>>
>>> <http://www.varlena.com/GeneralBits/110.php>
>> Which broadly speaking was the solution I used for my questionnaire,
>> except I had a restricted set of types so basically just coerced
>> them to text and side-stepped the inheritance issue. To the extent
>> that it's dynamic, it's still just EAV though.
> 
> That's precisely the difference between the above solution and yours,
> and it's the difference between a good design and one that will come
> up and bit you on the as^Hnkle.

It's still basically EAV (either approach). The key fault with EAV is 
that the tables have no semantic meaning - answer_int contains number of 
oranges, days since birth and the price of a tube ticket in pennies. 
Now, with a questionnaire that might not matter because everything is an 
"answer" and you're not necessarily going to do much more than 
count/aggregate it.

>> It doesn't remove the need for run-time DDL if you allow users to add 
>> their own questions.
> 
> Sure it does.   When a user, who should be talking with you, wants to
> ask a new kind of question, that's the start of a discussion about
> what new kind(s) of questions would be generally applicable in the
> questionnaire schema.  Then, when you come to an agreement, you roll
> it into the new schema, and the whole system gets an improvement.

Fine, but if you're not letting the user extend the system, then it's 
not really addressing Edward's original posting, is it? If the user's 
talking to me, I might as well just write the DDL myself - it's the talk 
that'll take the time, not writing a dozen lines of SQL.

The interesting part of the problem (from a Comp-Sci point of view) is 
precisely in automating part of that discussion. It's providing an 
abstraction so that you don't end up with a mass of attributes while 
still providing freedom to the user.

--   Richard Huxton  Archonet Ltd


On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote:
> David Fetter wrote:
> >On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote:
> >>David Fetter wrote:
> >>>On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:
> >>>>* Another good example is the "questionnaire".
> >>>With all due respect, this is a solved problem *without EAV or
> >>>run-time DDL*.  The URL below has one excellent approach to this.
> >>>
> >>><http://www.varlena.com/GeneralBits/110.php>
> >>Which broadly speaking was the solution I used for my
> >>questionnaire, except I had a restricted set of types so basically
> >>just coerced them to text and side-stepped the inheritance issue.
> >>To the extent that it's dynamic, it's still just EAV though.
> >
> >That's precisely the difference between the above solution and
> >yours, and it's the difference between a good design and one that
> >will come up and bit you on the as^Hnkle.
> 
> It's still basically EAV (either approach).  The key fault with EAV
> is that the tables have no semantic meaning - answer_int contains
> number of oranges, days since birth and the price of a tube ticket
> in pennies.

Stuffing all of those into an answer_int is *precisely* what the end
user must not do.  That's pilot error.

> Now, with a questionnaire that might not matter because everything
> is an "answer" and you're not necessarily going to do much more than
> count/aggregate it.

See above.

> >>It doesn't remove the need for run-time DDL if you allow users to
> >>add their own questions.
> >
> >Sure it does.   When a user, who should be talking with you, wants
> >to ask a new kind of question, that's the start of a discussion
> >about what new kind(s) of questions would be generally applicable
> >in the questionnaire schema.  Then, when you come to an agreement,
> >you roll it into the new schema, and the whole system gets an
> >improvement.
> 
> Fine, but if you're not letting the user extend the system, then
> it's not really addressing Edward's original posting, is it?

It's my contention that Edward's original idea is ill-posed.  SQL is
just fine for doing this kind of thing, and it's *not that hard*.

> If the user's talking to me, I might as well just write the DDL
> myself - it's the talk that'll take the time, not writing a dozen
> lines of SQL.

It's the talk that's the important part.  Machines are really bad at
seeing the broader picture.  In the attempt to "save" a few minutes'
discussion, he's trying to borrow that time from a system asked to do
things that computers are inherently bad at doing, and every end user
will pay that time back at a very high rate of interest.  This is
precisely the kind of false economy that so plagues software
development and maintenance these days.

> The interesting part of the problem (from a Comp-Sci point of view)
> is precisely in automating part of that discussion.  It's providing
> an abstraction so that you don't end up with a mass of attributes
> while still providing freedom to the user.

This freedom and efficiency you're talking about is better supplied,
IMHO, by putting a standard DDL for questionnaires up on a pgfoundry
or an SF.net.  That way, improvements to the DDL get spread all over
the world, and a very large amount of wheel reinvention gets avoided.
Reusable components are a big chunk of both freedom and efficiency. :)

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate


Re: My honours project - databases using dynamically attached entity-properties

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Mar 13, 2007 at 05:39:05PM +0300, Teodor Sigaev wrote:
> Hmm, hstore + (optionally) functional indexes. Is it answer?

I have used it in a (yet) test system. It works surprisingly well.

Thanks
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF96sWBcgs9XrR2kYRApgIAJwJ/dbyHc5gnsgm8ljMvF0RL5qSiwCfXzSz
PVrJkIbi497Ezns2+vXOLM8=
=xGj+
-----END PGP SIGNATURE-----



On Wed, Mar 14, 2007 at 12:07:45PM +1300, Edward Stanley wrote:
> On Wed, 14 Mar 2007, David Fetter wrote:
> > On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote:
> > > David Fetter wrote:
> > > >On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote:
> > > >>David Fetter wrote:
> > > >>>On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:
> > > >>>>* Another good example is the "questionnaire".
> > > >>>
> > > >>>With all due respect, this is a solved problem *without EAV or
> > > >>>run-time DDL*.  The URL below has one excellent approach to this.
> > > >>>
> > > >>><http://www.varlena.com/GeneralBits/110.php>
> > > >>
> > > >>Which broadly speaking was the solution I used for my
> > > >>questionnaire, except I had a restricted set of types so basically
> > > >>just coerced them to text and side-stepped the inheritance issue.
> > > >>To the extent that it's dynamic, it's still just EAV though.
> > > >
> > > >That's precisely the difference between the above solution and
> > > >yours, and it's the difference between a good design and one that
> > > >will come up and bit you on the as^Hnkle.
> > >
> > > It's still basically EAV (either approach).  The key fault with EAV
> > > is that the tables have no semantic meaning - answer_int contains
> > > number of oranges, days since birth and the price of a tube ticket
> > > in pennies.
> >
> > Stuffing all of those into an answer_int is *precisely* what the end
> > user must not do.  That's pilot error.
> >
> > > Now, with a questionnaire that might not matter because everything
> > > is an "answer" and you're not necessarily going to do much more than
> > > count/aggregate it.
> >
> > See above.
> >
> > > >>It doesn't remove the need for run-time DDL if you allow users to
> > > >>add their own questions.
> > > >
> > > >Sure it does.   When a user, who should be talking with you, wants
> > > >to ask a new kind of question, that's the start of a discussion
> > > >about what new kind(s) of questions would be generally applicable
> > > >in the questionnaire schema.  Then, when you come to an agreement,
> > > >you roll it into the new schema, and the whole system gets an
> > > >improvement.
> > >
> > > Fine, but if you're not letting the user extend the system, then
> > > it's not really addressing Edward's original posting, is it?
> >
> > It's my contention that Edward's original idea is ill-posed.  SQL is
> > just fine for doing this kind of thing, and it's *not that hard*.
> >
> > > If the user's talking to me, I might as well just write the DDL
> > > myself - it's the talk that'll take the time, not writing a dozen
> > > lines of SQL.
> >
> > It's the talk that's the important part.  Machines are really bad at
> > seeing the broader picture.  In the attempt to "save" a few minutes'
> > discussion, he's trying to borrow that time from a system asked to do
> > things that computers are inherently bad at doing, and every end user
> > will pay that time back at a very high rate of interest.  This is
> > precisely the kind of false economy that so plagues software
> > development and maintenance these days.
> >
> > > The interesting part of the problem (from a Comp-Sci point of view)
> > > is precisely in automating part of that discussion.  It's providing
> > > an abstraction so that you don't end up with a mass of attributes
> > > while still providing freedom to the user.
> >
> > This freedom and efficiency you're talking about is better supplied,
> > IMHO, by putting a standard DDL for questionnaires up on a pgfoundry
> > or an SF.net.  That way, improvements to the DDL get spread all over
> > the world, and a very large amount of wheel reinvention gets avoided.
> > Reusable components are a big chunk of both freedom and efficiency. :)
> >
> > Cheers,
> > D
> 
> Maybe I should rethink the problem a bit - from the very brief
> initial research I've done, it seems EAV schemas have two common
> uses: 

> 1) When new attributes have to be created on-the-fly 
> 2) When the number of possible properties for an entity greatly (orders of 
> magnitude) exceeds the number of properties any one entity is likely to have. 

Um, no.  The first use case is bad coding practice, and the second is
a classic case for a join table, which is the standard way to handle
M:N relationships.

> I'm not sure about solving the first problem - there seems to be a lot of 
> debate around this. I can see reasons for and against allowing this. However 
> I think the second is a very real problem. One such example is a patient 
> record system.
> 
> For each patient we have a table of common data (dob, sex, height, weight etc) 
> but as well as this a patient can present with many symptoms. This might be a 
> table of 40,000 possible symptoms. 

Here's how I'd do that:

CREATE TABLE patient (   patient_id SERIAL PRIMARY KEY, /* for simplicity.  Some
combinationof columns in the                                     table would also have a UNIQUE
           NOT NULL constraint on it.                                   */   ...
 
);

CREATE TABLE symptom (   symptom_id SERIAL PRIMARY KEY, /* See above. */   ...
);

CREATE TABLE patient_presents_with (   patient_id INTEGER NOT NULL REFERENCES patient(patient_id),   symptom_id INTEGER
NOTNULL REFERENCES symptom(symptom_id),   UNIQUE(patient_id, symptom_id)
 
);

> Lets say we want to run a query on these symptoms (using a boolean expression) 

I'd use something like the following:

SELECT   p.patient_id,   p.f_name,   p.l_name,   s.symptom_name,   s.symptom_desc
FROM   patient p
JOIN   patient_presents_with ppw   USING (patient_id)
JOIN   symptom s   USING (symptom_id)
WHERE   s.symptom_name = ALL('foo','bar','baz')
AND   s.symptom_name = ANY('quux','fleeg');

> to return the patient records which match the query string on the symptoms.
> 
> (This turns out to be a very similar problem to the 'tags' example I first 
> presented) - assume a similar schema. With more than a couple of symptoms and 
> a complex tree, the resulting SQL can span pages. 

Not really.  See above :)

> When I first started thinking about this project I believed the two problems 
> essentially to be the same class of problem, but this may not be the case.

EAV will bite you.  It's not *that* much work to keep its from biting
you. :)

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate


"David Fetter" <david@fetter.org> writes:

> CREATE TABLE symptom (
>     symptom_id SERIAL PRIMARY KEY, /* See above. */
>     ...
> );
>
> CREATE TABLE patient_presents_with (
>     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
>     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
>     UNIQUE(patient_id, symptom_id)
> );

I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are
all boolean values.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: My honours project - databases using dynamically attached entity-properties

From
mark@mark.mielke.cc
Date:
On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote:
> "David Fetter" <david@fetter.org> writes:
> > CREATE TABLE symptom (
> >     symptom_id SERIAL PRIMARY KEY, /* See above. */
> >     ...
> > );
> >
> > CREATE TABLE patient_presents_with (
> >     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
> >     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
> >     UNIQUE(patient_id, symptom_id)
> > );
> 
> I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are
> all boolean values.

Where is the boolean above? It is M:N, with each having whatever data
is required.

The issue I have with the above is that it seems unnecessarily
inefficient.  Whenever mapping from a patient to a symptom, or a
symptom to a patient, it requires searching indexes for three tables.
Perhaps this would work well if there was heavy overlap of symptoms
for different patients. For the cases I have hit this problem,
however, there may be overlap, but it is not easy to detect, and even
if it was detected, we would end with some sort of garbage collection
requirements where symptoms are removed once all references to the
symptoms are removed.

The case most familiar to me, is a set of viewing preferences for web
pages. Some users specify no preferences, while others have dozens of
preferences. As I have no requirements to search for users with a
particular preference, I chose to solve this by packing many of the
preferences together into a TEXT field, and having the application
pack/unpack the data. I still have tables that map object id to
attribute/value, but they are used for the data that can require
longer queries. Without clustering the data, searching for a dozen
of these attributes requires either querying all attributes, where
the attributes could be scattered throughout the table, or querying
them one by one, which is worse.

If there was an efficient way to do this for both of my use cases,
I would be strongly tempted to use it. :-)

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: My honours project - databases using dynamically attached entity-properties

From
"Joshua D. Drake"
Date:
Gregory Stark wrote:
> "David Fetter" <david@fetter.org> writes:
> 
>> CREATE TABLE symptom (
>>     symptom_id SERIAL PRIMARY KEY, /* See above. */
>>     ...
>> );
>>
>> CREATE TABLE patient_presents_with (
>>     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
>>     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
>>     UNIQUE(patient_id, symptom_id)
>> );
> 
> I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are
> all boolean values.

With a unique across (patient_id,symptom_id) you could have a single
patient with as many unique symptoms as could be listed.



Joshua D. Drake




-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



<mark@mark.mielke.cc> writes:

> On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote:
>> "David Fetter" <david@fetter.org> writes:
>> > CREATE TABLE symptom (
>> >     symptom_id SERIAL PRIMARY KEY, /* See above. */
>> >     ...
>> > );
>> >
>> > CREATE TABLE patient_presents_with (
>> >     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
>> >     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
>> >     UNIQUE(patient_id, symptom_id)
>> > );
>> 
>> I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are
>> all boolean values.
>
> Where is the boolean above? It is M:N, with each having whatever data
> is required.

No, the above schema can only show whether a patient has or doesn't have a
symptom. There is nowhere to store *where* the pain, inflammation, swelling,
aneurism, etc is, or how severe it is, or when it occurred, etc.

In any case the above arguably *is* an EA schema anyways. Your "symptom" is
just as much an abstract meaningless concept from a database point of view as
the questionnaire's "answer" or the bug tracker's "tag". Especially once you
start actually having to record information *about* the symptom.

This is a silly argument. The only reasonable conclusion is that any dogmatic
principle that doesn't take into account the application requirements is
wrong. In some cases you want a flexible abstract schema because the
application is flexible and abstract, in others you need the database schema
to understand your specific data structures so it can help you manipulate it.
You have to pick which is more useful for your application, you can't have
your cake and eat it too.

And all of the sudden I have a craving for cake...

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


On Wed, 2007-03-14 at 16:08, mark@mark.mielke.cc wrote:
> On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote:
> > "David Fetter" <david@fetter.org> writes:
> > > CREATE TABLE symptom (
> > >     symptom_id SERIAL PRIMARY KEY, /* See above. */
> > >     ...
> > > );
> > >
> > > CREATE TABLE patient_presents_with (
> > >     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
> > >     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
> > >     UNIQUE(patient_id, symptom_id)
> > > );
> > 
> > I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are
> > all boolean values.
> 
> Where is the boolean above? It is M:N, with each having whatever data
> is required.

The boolean is assumed in the symptoms table. In any case, even if it's
not a boolean value, even if maybe the symptoms table is a complex one
on it's own, it still is one single type for all symptoms of all
patients. The real problem is that in some real world applications you
have a mix of wildly varying types of attributes a user might want to
use, and you can't know what those will be beforehand... the symptoms
thing is simple to solve in the way David did it, but there really are
other situations which a simple m:n can't easily cover. How would you
handle a data base of user settings for 10K different applications and
100M different users where each application must be able to store it's
own (type safe !!) settings in the same structure, and applications come
and go with their own settings ? Come up with a good solution to this
combined with queries like "give me all the users who have this set of
settings set to these values" running fast, and then you're talking.

Cheers,
Csaba.




Re: My honours project - databases using dynamically attached entity-properties

From
Andrew Dunstan
Date:
Joshua D. Drake wrote:
> Gregory Stark wrote:
>   
>> "David Fetter" <david@fetter.org> writes:
>>
>>     
>>> CREATE TABLE symptom (
>>>     symptom_id SERIAL PRIMARY KEY, /* See above. */
>>>     ...
>>> );
>>>
>>> CREATE TABLE patient_presents_with (
>>>     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
>>>     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
>>>     UNIQUE(patient_id, symptom_id)
>>> );
>>>       
>> I'm just glad I don't have your doctor. I hope mine doesn't think symptoms are
>> all boolean values.
>>     
>
> With a unique across (patient_id,symptom_id) you could have a single
> patient with as many unique symptoms as could be listed.
>
>
>
>   

I think Greg is suggesting that the table needs to be augmented with, 
say, a severity field.

cheers

andrew


Re: My honours project - databases using dynamically attached entity-properties

From
mark@mark.mielke.cc
Date:
On Wed, Mar 14, 2007 at 03:25:48PM +0000, Gregory Stark wrote:
> <mark@mark.mielke.cc> writes:
> > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote:
> >> "David Fetter" <david@fetter.org> writes:
> >> > CREATE TABLE symptom (
> >> >     symptom_id SERIAL PRIMARY KEY, /* See above. */
> >> >     ...
> >> > );
> >> > CREATE TABLE patient_presents_with (
> >> >     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
> >> >     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
> >> >     UNIQUE(patient_id, symptom_id)
> >> > );
> >> I'm just glad I don't have your doctor. I hope mine doesn't think 
> >> symptoms are all boolean values.
> > Where is the boolean above? It is M:N, with each having whatever data
> > is required.
> No, the above schema can only show whether a patient has or doesn't have a
> symptom. There is nowhere to store *where* the pain, inflammation, swelling,
> aneurism, etc is, or how severe it is, or when it occurred, etc.

What do you think the '...' is in the symptom table? :-)

I'm not saying I would do it that way - but it seems to me that this
is a bit of a religious debate, without people listening to each
other. For one person to suggest a workable solution, and another
person to write it off as quickly as you did, misunderstanding, or
misrepresenting it, is what I would call "not listening". :-)

> In any case the above arguably *is* an EA schema anyways. Your "symptom" is
> just as much an abstract meaningless concept from a database point of view as
> the questionnaire's "answer" or the bug tracker's "tag". Especially once you
> start actually having to record information *about* the symptom.

> This is a silly argument. The only reasonable conclusion is that any dogmatic
> principle that doesn't take into account the application requirements is
> wrong. In some cases you want a flexible abstract schema because the
> application is flexible and abstract, in others you need the database schema
> to understand your specific data structures so it can help you manipulate it.
> You have to pick which is more useful for your application, you can't have
> your cake and eat it too.

No. Another reasonable conclusion is that the answer is not simple. This
doesn't mean the answer is undesirable. It means that people need more
time. :-)

I prefer abstractions, especially if they are more efficient than if I were
to roll my own each time. Relational databases do a good job today. They can
do a better job tomorrow.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



<mark@mark.mielke.cc> writes:

> On Wed, Mar 14, 2007 at 03:25:48PM +0000, Gregory Stark wrote:
>> <mark@mark.mielke.cc> writes:
>> > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote:
>> >> "David Fetter" <david@fetter.org> writes:
>> >> > CREATE TABLE symptom (
>> >> >     symptom_id SERIAL PRIMARY KEY, /* See above. */
>> >> >     ...
>> >> > );
>> >> > CREATE TABLE patient_presents_with (
>> >> >     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
>> >> >     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
>> >> >     UNIQUE(patient_id, symptom_id)
>> >> > );
>> >> I'm just glad I don't have your doctor. I hope mine doesn't think 
>> >> symptoms are all boolean values.
>> > Where is the boolean above? It is M:N, with each having whatever data
>> > is required.
>> No, the above schema can only show whether a patient has or doesn't have a
>> symptom. There is nowhere to store *where* the pain, inflammation, swelling,
>> aneurism, etc is, or how severe it is, or when it occurred, etc.
>
> What do you think the '...' is in the symptom table? :-)

Ah, I did in fact miss that. So then this is just a standard EA schema. I
thought you were on the other side of the debate?


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote:
> "David Fetter" <david@fetter.org> writes:
> 
> > CREATE TABLE symptom (
> >     symptom_id SERIAL PRIMARY KEY, /* See above. */
> >     ...
> > );
> >
> > CREATE TABLE patient_presents_with (
> >     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
> >     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
> >     UNIQUE(patient_id, symptom_id)
> > );
> 
> I'm just glad I don't have your doctor.  I hope mine doesn't think
> symptoms are all boolean values.

What's in the symptom table is up to the doctor.

Cheers,
D
-- 
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666                             Skype: davidfetter

Remember to vote!
Consider donating to PostgreSQL: http://www.postgresql.org/about/donate


Re: My honours project - databases using dynamically attached entity-properties

From
Richard Huxton
Date:
David Fetter wrote:
> On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote:
>> "David Fetter" <david@fetter.org> writes:
>>
>>> CREATE TABLE symptom (
>>>     symptom_id SERIAL PRIMARY KEY, /* See above. */
>>>     ...
>>> );
>>>
>>> CREATE TABLE patient_presents_with (
>>>     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
>>>     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
>>>     UNIQUE(patient_id, symptom_id)
>>> );
>> I'm just glad I don't have your doctor.  I hope mine doesn't think
>> symptoms are all boolean values.
> 
> What's in the symptom table is up to the doctor.

Surely the point is that for "high temperature" I need a temperature 
value, whereas for "tooth decay" I'll want a tooth number (or whatever 
they use). Which brings us back to where we started...

--   Richard Huxton  Archonet Ltd


On Wed, 2007-03-14 at 16:50, David Fetter wrote:
> On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote:
> > "David Fetter" <david@fetter.org> writes:
> > 
> > > CREATE TABLE symptom (
> > >     symptom_id SERIAL PRIMARY KEY, /* See above. */
> > >     ...
> > > );
> > >
> > > CREATE TABLE patient_presents_with (
> > >     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
> > >     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
> > >     UNIQUE(patient_id, symptom_id)
> > > );
> > 
> > I'm just glad I don't have your doctor.  I hope mine doesn't think
> > symptoms are all boolean values.
> 
> What's in the symptom table is up to the doctor.

OK, and here's the problem: each doctor might want to put something else
in the symptom table. Each doctor might want to do it in a type safe
way, e.g. so that the application enforces an enumeration of
"high/moderate/low" for the symptom fever (or maybe another doctor wants
it in exact degrees)... you can all stuff it in a string field, but you
know how reliable that will be.

Cheers,
Csaba.




Re: My honours project - databases using dynamically attached entity-properties

From
Edward Stanley
Date:
On Wed, 14 Mar 2007, David Fetter wrote:
> On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote:
> > David Fetter wrote:
> > >On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote:
> > >>David Fetter wrote:
> > >>>On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:
> > >>>>* Another good example is the "questionnaire".
> > >>>
> > >>>With all due respect, this is a solved problem *without EAV or
> > >>>run-time DDL*.  The URL below has one excellent approach to this.
> > >>>
> > >>><http://www.varlena.com/GeneralBits/110.php>
> > >>
> > >>Which broadly speaking was the solution I used for my
> > >>questionnaire, except I had a restricted set of types so basically
> > >>just coerced them to text and side-stepped the inheritance issue.
> > >>To the extent that it's dynamic, it's still just EAV though.
> > >
> > >That's precisely the difference between the above solution and
> > >yours, and it's the difference between a good design and one that
> > >will come up and bit you on the as^Hnkle.
> >
> > It's still basically EAV (either approach).  The key fault with EAV
> > is that the tables have no semantic meaning - answer_int contains
> > number of oranges, days since birth and the price of a tube ticket
> > in pennies.
>
> Stuffing all of those into an answer_int is *precisely* what the end
> user must not do.  That's pilot error.
>
> > Now, with a questionnaire that might not matter because everything
> > is an "answer" and you're not necessarily going to do much more than
> > count/aggregate it.
>
> See above.
>
> > >>It doesn't remove the need for run-time DDL if you allow users to
> > >>add their own questions.
> > >
> > >Sure it does.   When a user, who should be talking with you, wants
> > >to ask a new kind of question, that's the start of a discussion
> > >about what new kind(s) of questions would be generally applicable
> > >in the questionnaire schema.  Then, when you come to an agreement,
> > >you roll it into the new schema, and the whole system gets an
> > >improvement.
> >
> > Fine, but if you're not letting the user extend the system, then
> > it's not really addressing Edward's original posting, is it?
>
> It's my contention that Edward's original idea is ill-posed.  SQL is
> just fine for doing this kind of thing, and it's *not that hard*.
>
> > If the user's talking to me, I might as well just write the DDL
> > myself - it's the talk that'll take the time, not writing a dozen
> > lines of SQL.
>
> It's the talk that's the important part.  Machines are really bad at
> seeing the broader picture.  In the attempt to "save" a few minutes'
> discussion, he's trying to borrow that time from a system asked to do
> things that computers are inherently bad at doing, and every end user
> will pay that time back at a very high rate of interest.  This is
> precisely the kind of false economy that so plagues software
> development and maintenance these days.
>
> > The interesting part of the problem (from a Comp-Sci point of view)
> > is precisely in automating part of that discussion.  It's providing
> > an abstraction so that you don't end up with a mass of attributes
> > while still providing freedom to the user.
>
> This freedom and efficiency you're talking about is better supplied,
> IMHO, by putting a standard DDL for questionnaires up on a pgfoundry
> or an SF.net.  That way, improvements to the DDL get spread all over
> the world, and a very large amount of wheel reinvention gets avoided.
> Reusable components are a big chunk of both freedom and efficiency. :)
>
> Cheers,
> D

Maybe I should rethink the problem a bit - from the very brief initial 
research I've done, it seems EAV schemas have two common uses: 

1) When new attributes have to be created on-the-fly 
2) When the number of possible properties for an entity greatly (orders of 
magnitude) exceeds the number of properties any one entity is likely to have. 

I'm not sure about solving the first problem - there seems to be a lot of 
debate around this. I can see reasons for and against allowing this. However 
I think the second is a very real problem. One such example is a patient 
record system.

For each patient we have a table of common data (dob, sex, height, weight etc) 
but as well as this a patient can present with many symptoms. This might be a 
table of 40,000 possible symptoms. 

Lets say we want to run a query on these symptoms (using a boolean expression) 
to return the patient records which match the query string on the symptoms.

(This turns out to be a very similar problem to the 'tags' example I first 
presented) - assume a similar schema. With more than a couple of symptoms and 
a complex tree, the resulting SQL can span pages. 

When I first started thinking about this project I believed the two problems 
essentially to be the same class of problem, but this may not be the case.

What do people think?
- Also, thanks everyone for your input thus far. It has been very valuable.


Eddie Stanley


On Wednesday 2007-03-14 08:26, Csaba Nagy wrote:
> On Wed, 2007-03-14 at 16:08, mark@mark.mielke.cc wrote:
> > On Wed, Mar 14, 2007 at 02:28:03PM +0000, Gregory Stark wrote:
> > > "David Fetter" <david@fetter.org> writes:
> > > > CREATE TABLE symptom (
> > > >     symptom_id SERIAL PRIMARY KEY, /* See above. */
> > > >     ...
> > > > );
> > > >
> > > > CREATE TABLE patient_presents_with (
> > > >     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
> > > >     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
> > > >     UNIQUE(patient_id, symptom_id)
> > > > );
> > >
> > > I'm just glad I don't have your doctor. I hope mine doesn't think
> > > symptoms are all boolean values.
> >
> > Where is the boolean above? It is M:N, with each having whatever data
> > is required.
>
> The boolean is assumed in the symptoms table. In any case, even if it's
> not a boolean value, even if maybe the symptoms table is a complex one
> on it's own, it still is one single type for all symptoms of all
> patients. The real problem is that in some real world applications you
> have a mix of wildly varying types of attributes a user might want to
> use, and you can't know what those will be beforehand... the symptoms
> thing is simple to solve in the way David did it, but there really are
> other situations which a simple m:n can't easily cover. How would you
> handle a data base of user settings for 10K different applications and
> 100M different users where each application must be able to store it's
> own (type safe !!) settings in the same structure, and applications come
> and go with their own settings ? Come up with a good solution to this
> combined with queries like "give me all the users who have this set of
> settings set to these values" running fast, and then you're talking.
>
> Cheers,
> Csaba.

At that point traditional referential database design starts to break down.  
If you need THAT MUCH flexibility it's probably time to look at network 
databases, knowledge bases, extensible knowledge bases, and ad hoc knowledge 
bases (OWL, RDF, etc).  Flexibility, friendliness to marketeers or 
accountants, extesiblity, none are really the strong points of relational 
database.  Databases as they exist today do best with finite domains that can 
be formally organized.


David Fetter wrote:
> On Wed, Mar 14, 2007 at 12:07:45PM +1300, Edward Stanley wrote:
>   
>> On Wed, 14 Mar 2007, David Fetter wrote:
>>     
>>> On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote:
>>>       
>>>> David Fetter wrote:
>>>>         
>>>>> On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote:
>>>>>           
>>>>>> David Fetter wrote:
>>>>>>             
>>>>>>> On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:
>>>>>>>               
>>>>>>>> * Another good example is the "questionnaire".
>>>>>>>>                 
>>>>>>> With all due respect, this is a solved problem *without EAV or
>>>>>>> run-time DDL*.  The URL below has one excellent approach to this.
>>>>>>>
>>>>>>> <http://www.varlena.com/GeneralBits/110.php>
>>>>>>>               
>>>>>> Which broadly speaking was the solution I used for my
>>>>>> questionnaire, except I had a restricted set of types so basically
>>>>>> just coerced them to text and side-stepped the inheritance issue.
>>>>>> To the extent that it's dynamic, it's still just EAV though.
>>>>>>             
>>>>> That's precisely the difference between the above solution and
>>>>> yours, and it's the difference between a good design and one that
>>>>> will come up and bit you on the as^Hnkle.
>>>>>           
>>>> It's still basically EAV (either approach).  The key fault with EAV
>>>> is that the tables have no semantic meaning - answer_int contains
>>>> number of oranges, days since birth and the price of a tube ticket
>>>> in pennies.
>>>>         
>>> Stuffing all of those into an answer_int is *precisely* what the end
>>> user must not do.  That's pilot error.
>>>
>>>       
>>>> Now, with a questionnaire that might not matter because everything
>>>> is an "answer" and you're not necessarily going to do much more than
>>>> count/aggregate it.
>>>>         
>>> See above.
>>>
>>>       
>>>>>> It doesn't remove the need for run-time DDL if you allow users to
>>>>>> add their own questions.
>>>>>>             
>>>>> Sure it does.   When a user, who should be talking with you, wants
>>>>> to ask a new kind of question, that's the start of a discussion
>>>>> about what new kind(s) of questions would be generally applicable
>>>>> in the questionnaire schema.  Then, when you come to an agreement,
>>>>> you roll it into the new schema, and the whole system gets an
>>>>> improvement.
>>>>>           
>>>> Fine, but if you're not letting the user extend the system, then
>>>> it's not really addressing Edward's original posting, is it?
>>>>         
>>> It's my contention that Edward's original idea is ill-posed.  SQL is
>>> just fine for doing this kind of thing, and it's *not that hard*.
>>>
>>>       
>>>> If the user's talking to me, I might as well just write the DDL
>>>> myself - it's the talk that'll take the time, not writing a dozen
>>>> lines of SQL.
>>>>         
>>> It's the talk that's the important part.  Machines are really bad at
>>> seeing the broader picture.  In the attempt to "save" a few minutes'
>>> discussion, he's trying to borrow that time from a system asked to do
>>> things that computers are inherently bad at doing, and every end user
>>> will pay that time back at a very high rate of interest.  This is
>>> precisely the kind of false economy that so plagues software
>>> development and maintenance these days.
>>>
>>>       
>>>> The interesting part of the problem (from a Comp-Sci point of view)
>>>> is precisely in automating part of that discussion.  It's providing
>>>> an abstraction so that you don't end up with a mass of attributes
>>>> while still providing freedom to the user.
>>>>         
>>> This freedom and efficiency you're talking about is better supplied,
>>> IMHO, by putting a standard DDL for questionnaires up on a pgfoundry
>>> or an SF.net.  That way, improvements to the DDL get spread all over
>>> the world, and a very large amount of wheel reinvention gets avoided.
>>> Reusable components are a big chunk of both freedom and efficiency. :)
>>>
>>> Cheers,
>>> D
>>>       
>> Maybe I should rethink the problem a bit - from the very brief
>> initial research I've done, it seems EAV schemas have two common
>> uses: 
>>     
>
>   
>> 1) When new attributes have to be created on-the-fly 
>> 2) When the number of possible properties for an entity greatly (orders of 
>> magnitude) exceeds the number of properties any one entity is likely to have. 
>>     
>
> Um, no.  The first use case is bad coding practice, and the second is
> a classic case for a join table, which is the standard way to handle
> M:N relationships.
>
>   
>> I'm not sure about solving the first problem - there seems to be a lot of 
>> debate around this. I can see reasons for and against allowing this. However 
>> I think the second is a very real problem. One such example is a patient 
>> record system.
>>
>> For each patient we have a table of common data (dob, sex, height, weight etc) 
>> but as well as this a patient can present with many symptoms. This might be a 
>> table of 40,000 possible symptoms. 
>>     
>
> Here's how I'd do that:
>
> CREATE TABLE patient (
>     patient_id SERIAL PRIMARY KEY, /* for simplicity.  Some
>                                       combination of columns in the
>                                       table would also have a UNIQUE
>                                       NOT NULL constraint on it.
>                                     */
>     ...
> );
>
> CREATE TABLE symptom (
>     symptom_id SERIAL PRIMARY KEY, /* See above. */
>     ...
> );
>
> CREATE TABLE patient_presents_with (
>     patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
>     symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
>     UNIQUE(patient_id, symptom_id)
> );
>
>   
>> Lets say we want to run a query on these symptoms (using a boolean expression) 
>>     
>
> I'd use something like the following:
>
> SELECT
>     p.patient_id,
>     p.f_name,
>     p.l_name,
>     s.symptom_name,
>     s.symptom_desc
> FROM
>     patient p
> JOIN
>     patient_presents_with ppw
>     USING (patient_id)
> JOIN
>     symptom s
>     USING (symptom_id)
> WHERE
>     s.symptom_name = ALL('foo','bar','baz')
> AND
>     s.symptom_name = ANY('quux','fleeg');
>
>   
Are the ALL and ANY functions new to Postgresql 8? I haven't met them 
before.
Anyway this will work for some queries but not others.

What about the following expression? ('foo' && 'bar') | ('baz' && ! 
'quxx') | 'fleeg'
Maybe I have misunderstood how these functions work, but I don't think 
they will handle anything but trivial examples of this problem.

>> to return the patient records which match the query string on the symptoms.
>>
>> (This turns out to be a very similar problem to the 'tags' example I first 
>> presented) - assume a similar schema. With more than a couple of symptoms and 
>> a complex tree, the resulting SQL can span pages. 
>>     
>
> Not really.  See above :)
>
>   
>> When I first started thinking about this project I believed the two problems 
>> essentially to be the same class of problem, but this may not be the case.
>>     
>
> EAV will bite you.  It's not *that* much work to keep its from biting
> you. :)
>
> Cheers,
> D
>   



Josh Berkus wrote:
>> And then what? dynamically construct all your SQL queries?
>> Sure, sounds like a simple solution to me...
> 
> Not to mention DB security issues.  How do you secure your database when 
> your web client has DDL access?
> 
> So, Edward, the really *interesting* idea would be to come up with a 
> secure, normalized way to do UDFs *without* EAV tables.  People would be 
> very impressed.
> 

I have a system with many essentially user-defined fields, and was
thinking of creating something similar to an Array type and writing
some GIST indexes for it.

My current workaround is to store them as a YAML document and use
tsearch to index it (with application logic to further refine the
results) - but a EAV datatype that could be put in tables and
effectively indexed would be of quite a bit of interest here.
And yes, a better say to do UDFs would be even cooler.


Re: My honours project - databases using dynamically attached entity-properties

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Mar 15, 2007 at 08:31:24AM -0700, Ron Mayer wrote:
> Josh Berkus wrote:
> >> And then what? dynamically construct all your SQL queries?
> >> Sure, sounds like a simple solution to me...
> > 
> > Not to mention DB security issues.  How do you secure your database when 
> > your web client has DDL access?
> > 
> > So, Edward, the really *interesting* idea would be to come up with a 
> > secure, normalized way to do UDFs *without* EAV tables.  People would be 
> > very impressed.
> > 
> 
> I have a system with many essentially user-defined fields, and was
> thinking of creating something similar to an Array type and writing
> some GIST indexes for it.

Do give hstore a try. It indexes over name/value mappings (the same
authors have intarray, indexing over arrays). If it doesn't solve your
problem it'll be at least a very good starting point for what you
mention above (basically it's a lossy GiST index based on a hash bitmap.
I was at least quite impressed).

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF+WumBcgs9XrR2kYRAukhAJ9mPk89FfLX4E3dIBGlhfVTwqup0QCdE8Tx
nh05Y1WpyX36uKI+9qspO60=
=C/5K
-----END PGP SIGNATURE-----



On Mar 15, 2007, at 11:31 , Ron Mayer wrote:

> Josh Berkus wrote:
>>> And then what? dynamically construct all your SQL queries?
>>> Sure, sounds like a simple solution to me...
>>
>> Not to mention DB security issues.  How do you secure your  
>> database when
>> your web client has DDL access?
>>
>> So, Edward, the really *interesting* idea would be to come up with a
>> secure, normalized way to do UDFs *without* EAV tables.  People  
>> would be
>> very impressed.
>>
>
> I have a system with many essentially user-defined fields, and was
> thinking of creating something similar to an Array type and writing
> some GIST indexes for it.
>
> My current workaround is to store them as a YAML document and use
> tsearch to index it (with application logic to further refine the
> results) - but a EAV datatype that could be put in tables and
> effectively indexed would be of quite a bit of interest here.
> And yes, a better say to do UDFs would be even cooler.

Out of all the databases that I have used, postgresql offers the most  
flexible DDL- mostly for one reason: they can operate within  
transactions.

To handle arbitrary strings as column identifiers, the column names  
could actually be stripped down to lower-case letters and the "real  
title" could be stored in a separate table or as column comments.

Mr. Berkus' concern regarding the security implications is already  
handled by privilege separation or security-definer functions.

The OP's concern about the difficulty about querying a schema  
structure is alleviated via any number of APIs in Perl, JDBC, etc.

It seems to me that postgresql is especially well-suited to run DDL  
at runtime, so what's the issue?


-M


"Ron Mayer" <rm_pg@cheapcomplexdevices.com> writes:

> I have a system with many essentially user-defined fields, and was
> thinking of creating something similar to an Array type and writing
> some GIST indexes for it.
>
> My current workaround is to store them as a YAML document and use
> tsearch to index it (with application logic to further refine the
> results) - but a EAV datatype that could be put in tables and
> effectively indexed would be of quite a bit of interest here.
> And yes, a better say to do UDFs would be even cooler.

I think that's what hstore does. Honestly I'm not entirely certain though.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


On Thu, 2007-03-15 at 17:01, A.M. wrote:
> It seems to me that postgresql is especially well-suited to run DDL  
> at runtime, so what's the issue?

The issue is that some applications are not well suited to run DDL at
runtime :-)

As I already mentioned in another post in this thread, our application
also has a requirement of user defined fields in one table. Problem is,
that table is so accessed in peak hours, that it is simply impossible to
take an exclusive lock on it without causing an extended perceived
downtime of the application. And guess what, users will always want to
add new fields in peak hours...

We did solve this in our case with some application logic, but a generic
solution would be nice ;-)

Cheers,
Csaba.




Re: My honours project - databases using dynamically attached entity-properties

From
Andrew Dunstan
Date:
Gregory Stark wrote:
> "Ron Mayer" <rm_pg@cheapcomplexdevices.com> writes:
>
>   
>> I have a system with many essentially user-defined fields, and was
>> thinking of creating something similar to an Array type and writing
>> some GIST indexes for it.
>>
>> My current workaround is to store them as a YAML document and use
>> tsearch to index it (with application logic to further refine the
>> results) - but a EAV datatype that could be put in tables and
>> effectively indexed would be of quite a bit of interest here.
>> And yes, a better say to do UDFs would be even cooler.
>>     
>
> I think that's what hstore does. Honestly I'm not entirely certain though.
>
>   

Does hstore nest? My impression is that it doesn't. Which might well not 
matter, of course.

cheers

andrew


Re: My honours project - databases using dynamically attached entity-properties

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Mar 15, 2007 at 12:49:10PM -0400, Andrew Dunstan wrote:
> Gregory Stark wrote:
> >"Ron Mayer" <rm_pg@cheapcomplexdevices.com> writes:
> >
> >  
> >>I have a system with many essentially user-defined fields, and was
> >>thinking of creating something similar to an Array type and writing
> >>some GIST indexes for it.

[...]

> >I think that's what hstore does. Honestly I'm not entirely certain though.

> Does hstore nest? My impression is that it doesn't. Which might well not 
> matter, of course.

If what you mean is to have "mappings of mappings" then no.

Hstore implements a data type for a (finite) mapping (a set of key -> value
pairs, think "hash" for perl folks), with operations like "H1 contains
H2" (in the sense that all key-value pairs in H2 are also in H1)
supported by an index. Keys and values are strings.

But I don't see why it couldn't be extended to more structured data
types (one of the good things of hstore at the moment is its surprising
simplicity, because it provides a good example of what GiST is good for,
therefore I'd understand if the authors don't wanted to add that much
hair without need).

In my view it's one of the cutest applications of GiST.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF+i5OBcgs9XrR2kYRAprYAJ46ecAnSNWnh+rizGdhWLutU7BphACdHweH
k3rdsxmUzuxthwldXvJ3UKw=
=+sT5
-----END PGP SIGNATURE-----



Re: My honours project - databases using dynamically attached entity-properties

From
Andrew Dunstan
Date:
tomas@tuxteam.de wrote:
>
>> Does hstore nest? My impression is that it doesn't. Which might well not 
>> matter, of course.
>>     
>
> If what you mean is to have "mappings of mappings" then no.
>
> Hstore implements a data type for a (finite) mapping (a set of key -> value
> pairs, think "hash" for perl folks), with operations like "H1 contains
> H2" (in the sense that all key-value pairs in H2 are also in H1)
> supported by an index. Keys and values are strings.
>
>   

As a perl folk I think of hashes as nestable :-). Unlike hstore, the 
keys are strings but the values can be anything, including a hashref or 
arrayref.

Anyway, this means that you can't use hstore to cover the same field as 
YAML or JSON. That doesn't mean it's not useful - far from it.

cheers

andrew


Re: My honours project - databases using dynamically attached entity-properties

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Mar 16, 2007 at 09:56:23AM -0400, Andrew Dunstan wrote:
> tomas@tuxteam.de wrote:
> >
> >>Does hstore nest? [...]
> >
> >If what you mean is to have "mappings of mappings" then no.
> >  [...] think "hash" for perl folks [...]

> As a perl folk I think of hashes as nestable :-)

Heh. Point taken :-)

> Anyway, this means that you can't use hstore to cover the same field as 
> YAML or JSON. That doesn't mean it's not useful - far from it.

I think that hstore might be extensible in this direction. After all,
the keys and (key, value) combinations just get hashed into a bitmap.
There is no reason why one shouldn't be able to hash more complex data
structures (unless, of course, the bit map gets so densely poked with
ones that it stops being useful). The difficult part might be to reach a
consensus on what a "complex data structure" might look like.

The purists among the relational folks are sharpening their knives
already, I can hear that ;-)

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF++GaBcgs9XrR2kYRAs/hAJ9vfRy36T23vJ6eIYj6efrQVk9roQCeMkqQ
kWqzNbgbkMOXKIDQuzwDeFY=
=fCiG
-----END PGP SIGNATURE-----