Thread: Question regarding keyword checkboxes in database design

Question regarding keyword checkboxes in database design

From
Lynna Landstreet
Date:
Hello,

Apologies in advance, because this post might be long, but I'm not sure how
to explain what I need to for this question to make sense otherwise.

I'm a total PostgreSQL beginner, working in a contract job that initially
involved developing a database in FileMaker Pro and then evolved into
porting it to the web using PostgreSQL because FMP web hosting was too
expensive. So now I'm working on all kinds of thorny conversion issues. I've
more or less given up on getting the two to interact the way I'd initially
hoped, using a third-party SQL plug-in for FMP, because there are too many
differences in the way they store data. So it's probably going to end up
being only in Pg, and with people who update it after my contract ends using
a web form to do so. But in the meantime, I need to get the existing FMP
database totally converted over to Pg.

So today's conversion question, relates to the several lists of checkboxes
in the current FMP database, for keywords under which the images, artists
and exhibitions (it's for an art gallery) are categorized. In FMP, these are
handled by using one field per *set* of checkboxes, with a value list
associated with it that provides the different checkboxes or radio buttons
in the layouts, but stores the data as one long string delimited by some
weird character that doesn't display properly when you export the content as
text.

This works fine in FMP, but my impression from the book I'm currently
reading ("Beginning Databases with PostgreSQL" by Richard Stones and Neil
Matthew) is that in Pg only one type of information should be stored in each
column, so each keyword checkbox should probably be one column. So keeping
the current structure where the keywords in any given set (i.e. subject,
medium, etc.) are all lumped together into one long string for each record
would be considered bad database design and probably make scripting
trickier. But if I make each keyword into a boolean field, I'm not sure how
to display them. From what I've read thus far, the select statements that
you use to display data in Pg usually return the data in the columns, not
the column names. When people ultimately view the database records on the
web, I want them to be able to see something like:

Medium: black and white photograph
Subject: landscape, nature
Processes: hand-tinting, photocollage

Or something to that effect, but NOT:

black and white photograph: yes
landscape: yes
nature: yes
hand-tinting: yes
photocollage: yes

So I'm thinking that probably each set of keywords (medium, subject, etc.)
probably needs to be a table and each keyword a boolean column within it,
but I'm not sure if there's a way to modify a select statement so that it
outputs the names of all columns with a yes/true/1 in them, rather than
outputting the values themselves.

Or even, for that matter, if this really is the best way of doing it or not.
All three of the main data files - artists, exhibitions and images - need to
reference these keywords, so it wouldn't really make sense to include them
within each of those tables (despite the fact that that's currently the way
it's set up in FMP). It seems more logical to make each set a table and then
use a combination of the id number from the artist/image/etc. record plus a
code like "a" for artist, "i" for image  and "e" for exhibition as a primary
key to relate each record in the keyword tables back to record the keywords
are describing. Does that make sense? Or is there another way that would be
better?

Sorry for the length of this, but like I said, I'm new to Pg and I wasn't
sure how else to explain it. BTW, I'll be using PHP to create the front end
of this for the web site, if that makes a difference.

Thanks,

Lynna Landstreet
--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org


Re: Question regarding keyword checkboxes in database design

From
Jean-Christian Imbeault
Date:
Lynna Landstreet wrote:
 >
 > the current structure where the keywords in any given set (i.e. subject,
 > medium, etc.) are all lumped together into one long string for each
record
 > would be considered bad database design

Again, I'm no expert either but I would definitely say that is bad
design. Strangely enough my current job is also converting a FMP DB to
postgres and I had the same issue. Many data clumped into one field.

 > But if I make each keyword into a boolean field, I'm not sure how
 > to display them.

That's not a DB issue. In your case that's a web browser issue and
related to whatever web programming language you will use.

 > Pg usually return the data in the columns, not
 > the column names.

The data returned is associated with a column name so you always know
where your data came from ;)

 > When people ultimately view the database records on the
 > web, I want them to be able to see something like:
 >
 > Medium: black and white photograph
 > Subject: landscape, nature
 > Processes: hand-tinting, photocollage
 >
 > Or something to that effect, but NOT:
 >
 > black and white photograph: yes
 > landscape: yes
 > nature: yes
 > hand-tinting: yes
 > photocollage: yes

The only difference between the two versions you show here is that in
the second one you don't show the column names, but of course you (the
programmer) know what they are since you did the SELECT that fetched the
data ...

 > So I'm thinking that probably each set of keywords (medium, subject,
etc.)
 > probably needs to be a table {...]

Hum, database design. A thorny issue for the newbie. The way you decide
to go will affect many things so plan carefully.

I don't know enough about your data or database design to suggest
anything but from what I can gather of your data you would have an
exhibition/art piece (?) table in it.

Then you would have a table for, say b/w photography and in that table
you would have one entry for each art piece that fell into that
category. And so on for all the other kinds of possible ways to
categorize the art piece.

So the art piece table would no contain any information at all on what
kind of art it is. To find that out you would need to search all the
possible category tables to see if there was a matching entry for that
art piece.

I think someone may suggest that a view would make you life easier if
you did decide to go that route ...

 > but I'm not sure if there's a way to modify a select statement so that it
 > outputs the names of all columns with a yes/true/1 in them, rather than
 > outputting the values themselves.

If you go with my design you don't knew to output anything. If there is
an entry for that art piece in a particular category table then it's of
that category.

 > BTW, I'll be using PHP to create the front end
 > of this for the web site, if that makes a difference.

I'm using PHP too. So far so good ... I did have to dump out all the FMP
data into one big text file and create a custom PHP script to parse the
data and insert it into PG tables and make the necessary relationships
though.

In my case I did have the same serial issue you mentioned in your last
question but since the primary keys in FMP weren't used for anything
else but keys into tables I didn't need to actually keep the same key
id. I just found all the data in the text file that matched on that key,
parsed it, inserted it into PG, let PG assign a new serial and used the
assigned serial to create the relationships in all the other tables.

HTH,

--

Jean-Christian Imbeault


Re: Question regarding keyword checkboxes in database design

From
Bruno Wolff III
Date:
On Thu, Jun 05, 2003 at 14:00:16 -0400,
  Lynna Landstreet <lynna@gallery44.org> wrote:
>
> So I'm thinking that probably each set of keywords (medium, subject, etc.)
> probably needs to be a table and each keyword a boolean column within it,
> but I'm not sure if there's a way to modify a select statement so that it
> outputs the names of all columns with a yes/true/1 in them, rather than
> outputting the values themselves.

If the keywords are fixed then this is probably what you want to do.
If new ones can get added more than once in a great while, then you
may want to store the keywords and primary key of the parent record
they belong to in a separate table.

You can get the names of the columns programatically if you want to try
to make the web application relatively immune from changes when a new
keyword gets added. How you do this depends on what interface you are
using. It should be covered in the documentation for that interface.

Re: Question regarding keyword checkboxes in database design

From
Dani Oderbolz
Date:
Hi Lynna,

Lynna Landstreet wrote:

>So today's conversion question, relates to the several lists of checkboxes
>in the current FMP database, for keywords under which the images, artists
>and exhibitions (it's for an art gallery) are categorized. In FMP, these are
>handled by using one field per *set* of checkboxes, with a value list
>associated with it that provides the different checkboxes or radio buttons
>in the layouts, but stores the data as one long string delimited by some
>weird character that doesn't display properly when you export the content as
>text.
>
So if I get this right, you want to state something like
"This and that Keyword is associated with this record".
If this is the case, I would propose this design:

We have a table "items", which contains the actual stuff,
then a table "keywords" which contains all the keywords.
Then you need a third table for example called "items_keywords".
This third table has at least 2 fields: item_id and keyword_id
(for simplicity I assume that the item table has a Primary Key
field called item_id).
So every entry in items_keywords describes an assiciation of an item
with a given keyword.
You could use this table  to specify even more information about this
association (you could, for example put a descriptive column there,
which lets you say that a given keyword is NOT there).

I guess this would solve your problems and leave a lot of
flexibility.

Cheers, Dani



Re: Question regarding keyword checkboxes in database

From
Lynna Landstreet
Date:
Many thanks for the replies to my question - I think a lot of it is more an
interface issue than a database structure issue, and needs to be addressed
with PHP when I start creating the front end. The important thing now is
having a structure that makes it as easy as possible to get to all the
information.

Given that the keywords are grouped into sets (Medium, Subject, Theme,
Processes, etc.), and that each image, artist or exhibition can have more
than one keyword checked even within each set, I think I've narrowed down my
choices to two:

1. Separate tables for each keyword set, with the tables being the set name,
the columns being the individual keywords (as boolean fields) plus a foreign
key for the ID number of the artist/image/etc. as well as a one-letter
identifier as to whether that ID belongs to an artist, and image or an
exhibition, and the rows being the individual artists/images/etc. The
primary key would be the combination of the ID and type columns, because
while there might be two items with, say ID number 101, they'd have to be of
different types - one an artist and one an exhibition, or something like
that. So there would never be two instances of 101-A or 101-E.

Advantages: Fairly clear and easy to understand. Keeps tables smaller, as no
keyword table could have more entries than the total number of artists,
exhibitions and images combined, and most would have a lot less.

Disadvantages: More tables. Harder to add new keywords if needed (though new
ones won't need to be added very often).

2. A many-to-many relationship, with one big table of keywords, with the
keywords being the rows rather than the columns, and the columns being the
keyword name, which set it falls into, and an ID number as a primary key.
Plus a second join table, with the rows being instances of keyword per item.

Advantages: Fewer tables. Easier to add new keywords.

Disadvantages: Coding for the way I want to display the keyword info
(grouped by set) probably a bit trickier. Tables much larger, especially the
join table, which could easily be five to ten times the total number of
artists, exhibitions and images combined (and there are already several
hundred items in each of those three files, with more coming, so this would
be many thousands of rows).

I suppose one of the issues I need to think of in deciding between these two
options is performance: I don't know PostgreSQL well enough to know whether
it's quicker to access several small tables, or one or two very large ones.
Anyone know? And can anyone see any other issues or problems I haven't
thought of here?

Thanks,

Lynna
--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org


Re: Question regarding keyword checkboxes in database

From
Heath Tanner
Date:
On Wednesday, June 11, 2003, at 02:19  PM, Lynna Landstreet wrote:

[shortened]

> Given that the keywords are grouped into sets (Medium, Subject, Theme,
> Processes, etc.)...I think I've narrowed down my choices to two:
>
> 1. Separate tables for each keyword set, with the tables being the
> set name,
> the columns being the individual keywords...

> Disadvantages: More tables. Harder to add new keywords if needed...

> 2. A many-to-many relationship, with one big table of keywords,
> with the
> keywords being the rows rather than the columns...

> Advantages: Fewer tables. Easier to add new keywords.
> Disadvantages: Coding for the way I want to display the keyword info
> (grouped by set) probably a bit trickier. Tables much larger,
> especially the
> join table...

> I suppose one of the issues I need to think of in deciding between
> these two
> options is performance: I don't know PostgreSQL well enough to
> know whether
> it's quicker to access several small tables, or one or two very
> large ones.

I don't know about PostgreSQL to tell you whether performance would
be ultimately better with several small tables or two larger
tables, but I can tell you that the "keywords" table and its
accompanying join table are not going to be large enough to make
this an issue at all (assuming your SQL is reasonable and the join
column are indexed, of course). Solution #2 will definitely provide
adequate (or better) performance.

Maybe there are some people who have used an approach like #1 with
some success, but I generally try avoid creating database
structures that must change when new data (new keywords) is
introduced.

-heath


Re: Question regarding keyword checkboxes in database

From
Lynna Landstreet
Date:
on 6/11/03 2:43 PM, Heath Tanner at heath@inligo.com wrote:

> I don't know about PostgreSQL to tell you whether performance would
> be ultimately better with several small tables or two larger
> tables, but I can tell you that the "keywords" table and its
> accompanying join table are not going to be large enough to make
> this an issue at all (assuming your SQL is reasonable and the join
> column are indexed, of course). Solution #2 will definitely provide
> adequate (or better) performance.

So even if the join table ends up with something like 100,000 rows in it,
it's not going to cause much deterioration in performance as long as it's
indexed? Because right now, there are 1357 records in the three core tables
combined, and there are presently 99 keywords they can potentially be
categorized under, grouped into 9 sets. So there's a theoretical maximum of
134,343 entries that could go into the join table right now - in practice,
there won't be nearly that many, because nothing is going to have *all* the
keywords checked, but it's still going to be a pretty impressively large
table. And there's still 15 years worth of past exhibitions to enter, plus
new exhibitions will be entered on an ongoing basis. That's why I was
nervous about this option.


> Maybe there are some people who have used an approach like #1 with
> some success, but I generally try avoid creating database
> structures that must change when new data (new keywords) is
> introduced.

I guess I've been a bit spoiled because in FileMaker Pro, with its value
lists, new keywords are quite easy to add, but they won't necessarily be in
PostgreSQL, or at least not with this structure. I don't anticipate that new
keywords will be added very often - I've already entered ten years worth of
exhibitions and the current keyword list is the result of that, so I don't
think there will be too many themes or formats in the back catalogue that
won't have occurred during that time, but there could well be a few. Plus we
want this to be pretty much infinitely extensible into the future, so new
things could well be added then (given the frequency with which new ideas
crop up in the art world).

So maybe the join table idea is the better one of the two. It's just the
size issue that worries me. I know there's theoretically no limit to the
number of rows in a table, but in practice there's got to be a point when
performance degrades noticeably...


Lynna
--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org


Re: Question regarding keyword checkboxes in database

From
Josh Berkus
Date:
Lynna,

> So maybe the join table idea is the better one of the two. It's just the
> size issue that worries me. I know there's theoretically no limit to the
> number of rows in a table, but in practice there's got to be a point when
> performance degrades noticeably...

Sure.  But 130,000 rows in a two-INT-column table is not going to strain even
a workstation converted to server usage.  I have an application of similar
size that does not just keyword matching, but relative scoring and weighting
based on data in external tables, and its performance is perfectly acceptable
( < 15 seconds including HTML rendering) on an IMac.

Second, remember that raw query performance is not the only kind of
performance you need to worry about.  Any not-normalized database structure
carries with it a maintainence penalty of extra DBA time, down time, and
troubleshooting; often these issues far overshadow (especially in terms of
cost) any extra query time required for the normalized structure -- provided
that there is any extra, which often there's not.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Question regarding keyword checkboxes in database

From
Dani Oderbolz
Date:
Lynna Landstreet wrote:

>Given that the keywords are grouped into sets (Medium, Subject, Theme,
>Processes, etc.), and that each image, artist or exhibition can have more
>than one keyword checked even within each set, I think I've narrowed down my
>choices to two:
>
>1. Separate tables for each keyword set, with the tables being the set name,
>the columns being the individual keywords (as boolean fields) plus a foreign
>key for the ID number of the artist/image/etc. as well as a one-letter
>identifier as to whether that ID belongs to an artist, and image or an
>exhibition, and the rows being the individual artists/images/etc. The
>primary key would be the combination of the ID and type columns, because
>while there might be two items with, say ID number 101, they'd have to be of
>different types - one an artist and one an exhibition, or something like
>that. So there would never be two instances of 101-A or 101-E.
>
>
[snip]

>
>2. A many-to-many relationship, with one big table of keywords, with the
>keywords being the rows rather than the columns, and the columns being the
>keyword name, which set it falls into, and an ID number as a primary key.
>Plus a second join table, with the rows being instances of keyword per item.
>
Actually I dont see how your first solution is easier to code.
Imagine Maintenance - you would have to change your DB structure to add
new keyword sets,
as well as change the front end!
I would keep things as general as possible, so that your structrue
imposes as little as possible
on your data.
In other words: you might need to think a bit more about solution 2,
but when you have it, you got many things at a time:
- less code than solution 1
- stable code (in slution 1 it will potentially grow)
- easy maintenance (a change of data is only a change of data and not a
change of structure)
- the good feeling to have come to a general solution.

I mean, its up to you, but if you want to do real programming, choose
the second solution :-)

BTW: Maybe you have a look in one of these very nice books about
programming:
"Code Complete"

(http://www.amazon.com/exec/obidos/tg/detail/-/1556154844/qid=1055404159/sr=8-2/ref=sr_8_2/002-7605764-8176851?v=glance&s=books&n=507846)
"The Pragmatic Programmer"

http://www.amazon.com/exec/obidos/tg/detail/-/020161622X/qid=1055404193/sr=1-2/ref=sr_1_2/002-7605764-8176851?v=glance&s=books

Cheers,
Dani


Encoding names?

From
"M. Bastin"
Date:
Hi all,

pg_database gives me a number for the encoding.  Where do I find the
name that goes with that number?

I've read through the chapter about system catalogs, and done many
queries on them, but didn't find the encoding names.  (By creating a
db in unicode, I've found that '6' stands for 'unicode', but where do
I find that in the system catalogs?)

Thanks,

Marc

Re: Question regarding keyword checkboxes in database

From
Lynna Landstreet
Date:
on 6/12/03 3:50 AM, Dani Oderbolz at oderbolz@ecologic.de wrote:

> I mean, its up to you, but if you want to do real programming, choose
> the second solution :-)

Yes, you're probably quite right. So, I just executed the script I wrote to
create all the tables, and I went with option 2, with one slight
modification - I split the keywords into two tables, one for format-oriented
keyword sets (medium, techniques, etc.) and one for content-oriented ones
(subject, theme, etc.). I think that may make coding the front end somewhat
easier, since they need to be separated that way when they're presented in
search results and what not.

WRT "if you want to do real programming" - that made me smile, because the
fundamental problem behind this and nearly all my questions on this list and
pgsql-general is that I'm *not* a real programmer! I'm a designer who has
had to expand into more technical areas because the current state of the web
industry makes it a necessity - work is scarce enough that you have to be
able to do just about anything if you want to keep the rent paid. And while
I did do well in computer science in high school (way back in the stone age)
and seem to be picking things up reasonably well, there are probably some
basic principles I'm not as familiar with as I should be.


> BTW: Maybe you have a look in one of these very nice books about
> programming:
> "Code Complete"
> (http://www.amazon.com/exec/obidos/tg/detail/-/1556154844/qid=1055404159/sr=8-
> 2/ref=sr_8_2/002-7605764-8176851?v=glance&s=books&n=507846)
> "The Pragmatic Programmer"
> http://www.amazon.com/exec/obidos/tg/detail/-/020161622X/qid=1055404193/sr=1-2
> /ref=sr_1_2/002-7605764-8176851?v=glance&s=books

Many thanks -  those look like exactly the sort of thing I need. Once I
finish Stones & Matthew's "Beginning Databases with PostgreSQL", anyway.
Though the fact the first one is published by Microsoft Press kind of scares
me... :-)


Lynna
--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org


Re: Question regarding keyword checkboxes in database

From
Josh Berkus
Date:
Lynna,

> Yes, you're probably quite right. So, I just executed the script I wrote to
> create all the tables, and I went with option 2, with one slight
> modification - I split the keywords into two tables, one for format-oriented
> keyword sets (medium, techniques, etc.) and one for content-oriented ones
> (subject, theme, etc.). I think that may make coding the front end somewhat
> easier, since they need to be separated that way when they're presented in
> search results and what not.

Not real sure of your schema, but you might also consider a single keyword
table with a "keyword_type" column.  Not only would this keep your joins
simple, but it would allow you to add an new keyword type in the future.

Either way, another victory for normalization!   Wahoo!

> WRT "if you want to do real programming" - that made me smile, because the
> fundamental problem behind this and nearly all my questions on this list and
> pgsql-general is that I'm *not* a real programmer! I'm a designer who has
> had to expand into more technical areas because the current state of the web
> industry makes it a necessity -

<grin>  At least you're facing up to reality ... I have several friends who
keep whining "but why can't I just do Flash the way I did in 1999?"

It's not just Web work, btw.  I get *lots* more work than most of my
programmer friends because I know networks, accounting and litigiation
support as well as code; it's just not enough to have just one skill anymore.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Question regarding keyword checkboxes in database

From
Lynna Landstreet
Date:
on 6/12/03 4:43 PM, Josh Berkus at josh@agliodbs.com wrote:

> Not real sure of your schema, but you might also consider a single keyword
> table with a "keyword_type" column.  Not only would this keep your joins
> simple, but it would allow you to add an new keyword type in the future.

Hmmm, yes, that might be better. Haven't started populating it yet (it'll
take me a while to export and clean up all the data from FileMaker Pro), so
I can still make changes to the structure pretty easily.

But I've just run into a new wrinkle -- up until now I was mainly
concentrating on how to organize the keywords themselves, and now I'm
realizing I overlooked something when adding in the items (artist,
exhibitions, etc.) that the keywords pertain to. I had used two columns to
indicate that in the join table, one for item_id (the id number from the
artists, images or exhibitions table) and one for item_type, a single
character that indicated which of the three it was. But when I was checking
to make sure I'd added all the right constraints, I realized that I couldn't
add a foreign key constraint to item_id if it might be referring to an id
number in any of *three* other tables rather than one specific one. And
without that it isn't going to be able to check referential integrity.

Is there some way to indicate that a foreign key can apply to any of more
than one table? The book I'm mostly working from doesn't say anything about
that, but it does say that foreign key constraints are an advanced topic and
they're only covering the basics of them. Or should I be using three join
tables, one each for artists, exhibitions and images?


> Either way, another victory for normalization!   Wahoo!

About the only circumstance under which I'd consider being normal a good
thing. :-)


>> WRT "if you want to do real programming" - that made me smile, because the
>> fundamental problem behind this and nearly all my questions on this list and
>> pgsql-general is that I'm *not* a real programmer! I'm a designer who has
>> had to expand into more technical areas because the current state of the web
>> industry makes it a necessity -
>
> <grin>  At least you're facing up to reality ... I have several friends who
> keep whining "but why can't I just do Flash the way I did in 1999?"

Oh, I don't know, maybe because -- no one's hiring Flash developers any
more? *rolls eyes* At least half the web developers I used to know aren't
even the field any more.


> It's not just Web work, btw.  I get *lots* more work than most of my
> programmer friends because I know networks, accounting and litigiation
> support as well as code; it's just not enough to have just one skill anymore.

Yes, the latest revision of my resume isn't as web-focussed as it used to
be, and highlights non-tech skills like writing, editing and research as
well as the technical stuff. And I suppose when this contract ends, I'll be
able to not only add what I've learned about PostgreSQL and PHP, but also
arranging veggies and crackers artistically for gallery openings. :-)


Lynna
--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org


Re: Question regarding keyword checkboxes in database

From
Josh Berkus
Date:
Lynna,

> Is there some way to indicate that a foreign key can apply to any of more
> than one table? The book I'm mostly working from doesn't say anything about
> that, but it does say that foreign key constraints are an advanced topic and
> they're only covering the basics of them. Or should I be using three join
> tables, one each for artists, exhibitions and images?

Congratulations!  You've just run into one of the failures of the SQL
Standard.  What you want is called a "distributed key", and it is a concept
well-supported in Relational Calculus, but for some reason omitted from the
SQL standard.

(And, BTW, your book is WRONG.  Foriegn Keys are *not* and "advanced topic";
they are fundamental and nobody should design a database without
understanding them).

I've had to handle this before.   The approach is to do a "do it yourself"
key, consisting of:

---optional, but a good idea-------------------------------------
1) create a sequence called, for example "aie_sq"
2) Alter the Artists, Exhibitions, and Images tables so they all draw on this
same sequence for their unique ids (NEXTVAL('aie_sq')).  This will ensure
that all IDs are unique between the 3 tables.
-----------------------------------------------------------------------
3)  Create a unique 3-column index in the join table.
4)  Create BEFORE INSERT OR UPDATE triggers on the join table which checks
    that the id exists in one of the 3 tables.
5) Create BEFORE or AFTER UPDATE OR DELETE triggers on each of the 3 tables
which check dependant records in the join table and take approprate action.

If this is too much for you, then just create 3 seperate join tables.   In
your situation, I'm not sure there's that much advantage in doing it the more
sophisticated way.

> About the only circumstance under which I'd consider being normal a good
> thing. :-)

<chuckle>

> Oh, I don't know, maybe because -- no one's hiring Flash developers any
> more? *rolls eyes* At least half the web developers I used to know aren't
> even the field any more.

Yeah.

> Yes, the latest revision of my resume isn't as web-focussed as it used to
> be, and highlights non-tech skills like writing, editing and research as
> well as the technical stuff. And I suppose when this contract ends, I'll be
> able to not only add what I've learned about PostgreSQL and PHP, but also
> arranging veggies and crackers artistically for gallery openings. :-)

Well, contact me when you're done; I sometimes get non-profit referrals that I
don't have time for.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: Question regarding keyword checkboxes in database

From
Lynna Landstreet
Date:
on 6/12/03 7:16 PM, Josh Berkus at josh@agliodbs.com wrote:

>> Is there some way to indicate that a foreign key can apply to any of more
>> than one table? The book I'm mostly working from doesn't say anything about
>> that, but it does say that foreign key constraints are an advanced topic and
>> they're only covering the basics of them. Or should I be using three join
>> tables, one each for artists, exhibitions and images?
>
> Congratulations!  You've just run into one of the failures of the SQL
> Standard.  What you want is called a "distributed key", and it is a concept
> well-supported in Relational Calculus, but for some reason omitted from the
> SQL standard.

Damn. Calculus was a long, long time ago for me, but I thought this sounded
like something that *should* be possible. However, "should" does not mean
"is". :-/


> (And, BTW, your book is WRONG.  Foriegn Keys are *not* and "advanced topic";
> they are fundamental and nobody should design a database without
> understanding them).

They're fundamental to using the referential integrity feature of
PostgreSQL, right? Or so it would appear to me, anyway... FWIW, the book
("Beginning Databases in PostgreSQL" by Richard Stones & Neil Matthew) does
state that they're important, but they only go over the basic aspects of
them (adding the REFERENCES constraint to tables or columns and a couple of
options you can include with it) and say that their *full* potential is
advanced, so I did at least get to learn a little bit about them. But more
would have been better.


> I've had to handle this before.   The approach is to do a "do it yourself"
> key, consisting of:
>
> ---optional, but a good idea-------------------------------------
> 1) create a sequence called, for example "aie_sq"
> 2) Alter the Artists, Exhibitions, and Images tables so they all draw on this
> same sequence for their unique ids (NEXTVAL('aie_sq')).  This will ensure
> that all IDs are unique between the 3 tables.
> -----------------------------------------------------------------------
> 3)  Create a unique 3-column index in the join table.
> 4)  Create BEFORE INSERT OR UPDATE triggers on the join table which checks
> that the id exists in one of the 3 tables.
> 5) Create BEFORE or AFTER UPDATE OR DELETE triggers on each of the 3 tables
> which check dependant records in the join table and take approprate action.
>
> If this is too much for you, then just create 3 seperate join tables.   In
> your situation, I'm not sure there's that much advantage in doing it the more
> sophisticated way.

It's not necessarily too much for me -- I generally like doing things the
harder but more elegant way rather than the easier but sloppier way. But in
this particular case, the decision is complicated by the fact that there are
hundreds of existing records needing to be imported into the database once I
get the structure finalized, all of which have existing ID numbers and
existing relationships between them. So if I let PostgreSQL renumber all the
records according to a new sequence, I'd have to manually recreate all the
links between them, which given there are something like 1300 records in
total, would seriously suck. So I think I'm stuck with separate join tables.

However, this does sound like an excellent way of dealing with this sort of
problem if one is creating a new database from scratch, and I will
definitely save this message for future reference because I could run into
this situation again on future projects.


>> Yes, the latest revision of my resume isn't as web-focussed as it used to
>> be, and highlights non-tech skills like writing, editing and research as
>> well as the technical stuff. And I suppose when this contract ends, I'll be
>> able to not only add what I've learned about PostgreSQL and PHP, but also
>> arranging veggies and crackers artistically for gallery openings. :-)
>
> Well, contact me when you're done; I sometimes get non-profit referrals that I
> don't have time for.

Cool - thanks!


Lynna
--
Resource Centre Database Coordinator
Gallery 44
www.gallery44.org


Re: Question regarding keyword checkboxes in database

From
Josh Berkus
Date:
Lynna,

> Damn. Calculus was a long, long time ago for me, but I thought this sounded
> like something that *should* be possible. However, "should" does not mean
> "is". :-/

Check out www.dbdebunk.com.   Fabian Pascal, CJ Date and others are developing
a "post-SQL" relational database; maybe someday it will be commercially
viable.

> They're fundamental to using the referential integrity feature of
> PostgreSQL, right? Or so it would appear to me, anyway... FWIW, the book
> ("Beginning Databases in PostgreSQL" by Richard Stones & Neil Matthew) does
> state that they're important, but they only go over the basic aspects of
> them (adding the REFERENCES constraint to tables or columns and a couple of
> options you can include with it) and say that their *full* potential is
> advanced, so I did at least get to learn a little bit about them. But more
> would have been better.

Oh, that makes sense then.

> It's not necessarily too much for me -- I generally like doing things the
> harder but more elegant way rather than the easier but sloppier way. But in
> this particular case, the decision is complicated by the fact that there
> are hundreds of existing records needing to be imported into the database
> once I get the structure finalized, all of which have existing ID numbers
> and existing relationships between them. So if I let PostgreSQL renumber
> all the records according to a new sequence, I'd have to manually recreate
> all the links between them, which given there are something like 1300
> records in total, would seriously suck. So I think I'm stuck with separate
> join tables.

Not necessarily:
-- The re-keying is optional.   That is, it makes your solution more elegant
and foolproof, but is not absolutely required to do a distributed key.  You
can handle the distributed key with overlapping ids by having your index on
the "id" and "type" fields combined.

-- You can re-key during conversion using a "was-is" table:
    1. make a table with old_key INT, new_key INT;
    2. select all of the current ids into old_key;
    3. use procedural code to populate new_key with sequential values;
    4. reference this was-is table for converting all child tables.
    5. When done SETVAL your sequence to the highest value in any of the was-is
tables.


--
Josh Berkus
Aglio Database Solutions
San Francisco