Thread: Question regarding keyword checkboxes in database design
Hello, Apologies in advance, because this post might be long, but I'm not sure how to explain what I need to for this question to make sense otherwise. I'm a total PostgreSQL beginner, working in a contract job that initially involved developing a database in FileMaker Pro and then evolved into porting it to the web using PostgreSQL because FMP web hosting was too expensive. So now I'm working on all kinds of thorny conversion issues. I've more or less given up on getting the two to interact the way I'd initially hoped, using a third-party SQL plug-in for FMP, because there are too many differences in the way they store data. So it's probably going to end up being only in Pg, and with people who update it after my contract ends using a web form to do so. But in the meantime, I need to get the existing FMP database totally converted over to Pg. So today's conversion question, relates to the several lists of checkboxes in the current FMP database, for keywords under which the images, artists and exhibitions (it's for an art gallery) are categorized. In FMP, these are handled by using one field per *set* of checkboxes, with a value list associated with it that provides the different checkboxes or radio buttons in the layouts, but stores the data as one long string delimited by some weird character that doesn't display properly when you export the content as text. This works fine in FMP, but my impression from the book I'm currently reading ("Beginning Databases with PostgreSQL" by Richard Stones and Neil Matthew) is that in Pg only one type of information should be stored in each column, so each keyword checkbox should probably be one column. So keeping the current structure where the keywords in any given set (i.e. subject, medium, etc.) are all lumped together into one long string for each record would be considered bad database design and probably make scripting trickier. But if I make each keyword into a boolean field, I'm not sure how to display them. From what I've read thus far, the select statements that you use to display data in Pg usually return the data in the columns, not the column names. When people ultimately view the database records on the web, I want them to be able to see something like: Medium: black and white photograph Subject: landscape, nature Processes: hand-tinting, photocollage Or something to that effect, but NOT: black and white photograph: yes landscape: yes nature: yes hand-tinting: yes photocollage: yes So I'm thinking that probably each set of keywords (medium, subject, etc.) probably needs to be a table and each keyword a boolean column within it, but I'm not sure if there's a way to modify a select statement so that it outputs the names of all columns with a yes/true/1 in them, rather than outputting the values themselves. Or even, for that matter, if this really is the best way of doing it or not. All three of the main data files - artists, exhibitions and images - need to reference these keywords, so it wouldn't really make sense to include them within each of those tables (despite the fact that that's currently the way it's set up in FMP). It seems more logical to make each set a table and then use a combination of the id number from the artist/image/etc. record plus a code like "a" for artist, "i" for image and "e" for exhibition as a primary key to relate each record in the keyword tables back to record the keywords are describing. Does that make sense? Or is there another way that would be better? Sorry for the length of this, but like I said, I'm new to Pg and I wasn't sure how else to explain it. BTW, I'll be using PHP to create the front end of this for the web site, if that makes a difference. Thanks, Lynna Landstreet -- Resource Centre Database Coordinator Gallery 44 www.gallery44.org
Lynna Landstreet wrote: > > the current structure where the keywords in any given set (i.e. subject, > medium, etc.) are all lumped together into one long string for each record > would be considered bad database design Again, I'm no expert either but I would definitely say that is bad design. Strangely enough my current job is also converting a FMP DB to postgres and I had the same issue. Many data clumped into one field. > But if I make each keyword into a boolean field, I'm not sure how > to display them. That's not a DB issue. In your case that's a web browser issue and related to whatever web programming language you will use. > Pg usually return the data in the columns, not > the column names. The data returned is associated with a column name so you always know where your data came from ;) > When people ultimately view the database records on the > web, I want them to be able to see something like: > > Medium: black and white photograph > Subject: landscape, nature > Processes: hand-tinting, photocollage > > Or something to that effect, but NOT: > > black and white photograph: yes > landscape: yes > nature: yes > hand-tinting: yes > photocollage: yes The only difference between the two versions you show here is that in the second one you don't show the column names, but of course you (the programmer) know what they are since you did the SELECT that fetched the data ... > So I'm thinking that probably each set of keywords (medium, subject, etc.) > probably needs to be a table {...] Hum, database design. A thorny issue for the newbie. The way you decide to go will affect many things so plan carefully. I don't know enough about your data or database design to suggest anything but from what I can gather of your data you would have an exhibition/art piece (?) table in it. Then you would have a table for, say b/w photography and in that table you would have one entry for each art piece that fell into that category. And so on for all the other kinds of possible ways to categorize the art piece. So the art piece table would no contain any information at all on what kind of art it is. To find that out you would need to search all the possible category tables to see if there was a matching entry for that art piece. I think someone may suggest that a view would make you life easier if you did decide to go that route ... > but I'm not sure if there's a way to modify a select statement so that it > outputs the names of all columns with a yes/true/1 in them, rather than > outputting the values themselves. If you go with my design you don't knew to output anything. If there is an entry for that art piece in a particular category table then it's of that category. > BTW, I'll be using PHP to create the front end > of this for the web site, if that makes a difference. I'm using PHP too. So far so good ... I did have to dump out all the FMP data into one big text file and create a custom PHP script to parse the data and insert it into PG tables and make the necessary relationships though. In my case I did have the same serial issue you mentioned in your last question but since the primary keys in FMP weren't used for anything else but keys into tables I didn't need to actually keep the same key id. I just found all the data in the text file that matched on that key, parsed it, inserted it into PG, let PG assign a new serial and used the assigned serial to create the relationships in all the other tables. HTH, -- Jean-Christian Imbeault
On Thu, Jun 05, 2003 at 14:00:16 -0400, Lynna Landstreet <lynna@gallery44.org> wrote: > > So I'm thinking that probably each set of keywords (medium, subject, etc.) > probably needs to be a table and each keyword a boolean column within it, > but I'm not sure if there's a way to modify a select statement so that it > outputs the names of all columns with a yes/true/1 in them, rather than > outputting the values themselves. If the keywords are fixed then this is probably what you want to do. If new ones can get added more than once in a great while, then you may want to store the keywords and primary key of the parent record they belong to in a separate table. You can get the names of the columns programatically if you want to try to make the web application relatively immune from changes when a new keyword gets added. How you do this depends on what interface you are using. It should be covered in the documentation for that interface.
Hi Lynna, Lynna Landstreet wrote: >So today's conversion question, relates to the several lists of checkboxes >in the current FMP database, for keywords under which the images, artists >and exhibitions (it's for an art gallery) are categorized. In FMP, these are >handled by using one field per *set* of checkboxes, with a value list >associated with it that provides the different checkboxes or radio buttons >in the layouts, but stores the data as one long string delimited by some >weird character that doesn't display properly when you export the content as >text. > So if I get this right, you want to state something like "This and that Keyword is associated with this record". If this is the case, I would propose this design: We have a table "items", which contains the actual stuff, then a table "keywords" which contains all the keywords. Then you need a third table for example called "items_keywords". This third table has at least 2 fields: item_id and keyword_id (for simplicity I assume that the item table has a Primary Key field called item_id). So every entry in items_keywords describes an assiciation of an item with a given keyword. You could use this table to specify even more information about this association (you could, for example put a descriptive column there, which lets you say that a given keyword is NOT there). I guess this would solve your problems and leave a lot of flexibility. Cheers, Dani
Many thanks for the replies to my question - I think a lot of it is more an interface issue than a database structure issue, and needs to be addressed with PHP when I start creating the front end. The important thing now is having a structure that makes it as easy as possible to get to all the information. Given that the keywords are grouped into sets (Medium, Subject, Theme, Processes, etc.), and that each image, artist or exhibition can have more than one keyword checked even within each set, I think I've narrowed down my choices to two: 1. Separate tables for each keyword set, with the tables being the set name, the columns being the individual keywords (as boolean fields) plus a foreign key for the ID number of the artist/image/etc. as well as a one-letter identifier as to whether that ID belongs to an artist, and image or an exhibition, and the rows being the individual artists/images/etc. The primary key would be the combination of the ID and type columns, because while there might be two items with, say ID number 101, they'd have to be of different types - one an artist and one an exhibition, or something like that. So there would never be two instances of 101-A or 101-E. Advantages: Fairly clear and easy to understand. Keeps tables smaller, as no keyword table could have more entries than the total number of artists, exhibitions and images combined, and most would have a lot less. Disadvantages: More tables. Harder to add new keywords if needed (though new ones won't need to be added very often). 2. A many-to-many relationship, with one big table of keywords, with the keywords being the rows rather than the columns, and the columns being the keyword name, which set it falls into, and an ID number as a primary key. Plus a second join table, with the rows being instances of keyword per item. Advantages: Fewer tables. Easier to add new keywords. Disadvantages: Coding for the way I want to display the keyword info (grouped by set) probably a bit trickier. Tables much larger, especially the join table, which could easily be five to ten times the total number of artists, exhibitions and images combined (and there are already several hundred items in each of those three files, with more coming, so this would be many thousands of rows). I suppose one of the issues I need to think of in deciding between these two options is performance: I don't know PostgreSQL well enough to know whether it's quicker to access several small tables, or one or two very large ones. Anyone know? And can anyone see any other issues or problems I haven't thought of here? Thanks, Lynna -- Resource Centre Database Coordinator Gallery 44 www.gallery44.org
On Wednesday, June 11, 2003, at 02:19 PM, Lynna Landstreet wrote: [shortened] > Given that the keywords are grouped into sets (Medium, Subject, Theme, > Processes, etc.)...I think I've narrowed down my choices to two: > > 1. Separate tables for each keyword set, with the tables being the > set name, > the columns being the individual keywords... > Disadvantages: More tables. Harder to add new keywords if needed... > 2. A many-to-many relationship, with one big table of keywords, > with the > keywords being the rows rather than the columns... > Advantages: Fewer tables. Easier to add new keywords. > Disadvantages: Coding for the way I want to display the keyword info > (grouped by set) probably a bit trickier. Tables much larger, > especially the > join table... > I suppose one of the issues I need to think of in deciding between > these two > options is performance: I don't know PostgreSQL well enough to > know whether > it's quicker to access several small tables, or one or two very > large ones. I don't know about PostgreSQL to tell you whether performance would be ultimately better with several small tables or two larger tables, but I can tell you that the "keywords" table and its accompanying join table are not going to be large enough to make this an issue at all (assuming your SQL is reasonable and the join column are indexed, of course). Solution #2 will definitely provide adequate (or better) performance. Maybe there are some people who have used an approach like #1 with some success, but I generally try avoid creating database structures that must change when new data (new keywords) is introduced. -heath
on 6/11/03 2:43 PM, Heath Tanner at heath@inligo.com wrote: > I don't know about PostgreSQL to tell you whether performance would > be ultimately better with several small tables or two larger > tables, but I can tell you that the "keywords" table and its > accompanying join table are not going to be large enough to make > this an issue at all (assuming your SQL is reasonable and the join > column are indexed, of course). Solution #2 will definitely provide > adequate (or better) performance. So even if the join table ends up with something like 100,000 rows in it, it's not going to cause much deterioration in performance as long as it's indexed? Because right now, there are 1357 records in the three core tables combined, and there are presently 99 keywords they can potentially be categorized under, grouped into 9 sets. So there's a theoretical maximum of 134,343 entries that could go into the join table right now - in practice, there won't be nearly that many, because nothing is going to have *all* the keywords checked, but it's still going to be a pretty impressively large table. And there's still 15 years worth of past exhibitions to enter, plus new exhibitions will be entered on an ongoing basis. That's why I was nervous about this option. > Maybe there are some people who have used an approach like #1 with > some success, but I generally try avoid creating database > structures that must change when new data (new keywords) is > introduced. I guess I've been a bit spoiled because in FileMaker Pro, with its value lists, new keywords are quite easy to add, but they won't necessarily be in PostgreSQL, or at least not with this structure. I don't anticipate that new keywords will be added very often - I've already entered ten years worth of exhibitions and the current keyword list is the result of that, so I don't think there will be too many themes or formats in the back catalogue that won't have occurred during that time, but there could well be a few. Plus we want this to be pretty much infinitely extensible into the future, so new things could well be added then (given the frequency with which new ideas crop up in the art world). So maybe the join table idea is the better one of the two. It's just the size issue that worries me. I know there's theoretically no limit to the number of rows in a table, but in practice there's got to be a point when performance degrades noticeably... Lynna -- Resource Centre Database Coordinator Gallery 44 www.gallery44.org
Lynna, > So maybe the join table idea is the better one of the two. It's just the > size issue that worries me. I know there's theoretically no limit to the > number of rows in a table, but in practice there's got to be a point when > performance degrades noticeably... Sure. But 130,000 rows in a two-INT-column table is not going to strain even a workstation converted to server usage. I have an application of similar size that does not just keyword matching, but relative scoring and weighting based on data in external tables, and its performance is perfectly acceptable ( < 15 seconds including HTML rendering) on an IMac. Second, remember that raw query performance is not the only kind of performance you need to worry about. Any not-normalized database structure carries with it a maintainence penalty of extra DBA time, down time, and troubleshooting; often these issues far overshadow (especially in terms of cost) any extra query time required for the normalized structure -- provided that there is any extra, which often there's not. -- -Josh Berkus Aglio Database Solutions San Francisco
Lynna Landstreet wrote: >Given that the keywords are grouped into sets (Medium, Subject, Theme, >Processes, etc.), and that each image, artist or exhibition can have more >than one keyword checked even within each set, I think I've narrowed down my >choices to two: > >1. Separate tables for each keyword set, with the tables being the set name, >the columns being the individual keywords (as boolean fields) plus a foreign >key for the ID number of the artist/image/etc. as well as a one-letter >identifier as to whether that ID belongs to an artist, and image or an >exhibition, and the rows being the individual artists/images/etc. The >primary key would be the combination of the ID and type columns, because >while there might be two items with, say ID number 101, they'd have to be of >different types - one an artist and one an exhibition, or something like >that. So there would never be two instances of 101-A or 101-E. > > [snip] > >2. A many-to-many relationship, with one big table of keywords, with the >keywords being the rows rather than the columns, and the columns being the >keyword name, which set it falls into, and an ID number as a primary key. >Plus a second join table, with the rows being instances of keyword per item. > Actually I dont see how your first solution is easier to code. Imagine Maintenance - you would have to change your DB structure to add new keyword sets, as well as change the front end! I would keep things as general as possible, so that your structrue imposes as little as possible on your data. In other words: you might need to think a bit more about solution 2, but when you have it, you got many things at a time: - less code than solution 1 - stable code (in slution 1 it will potentially grow) - easy maintenance (a change of data is only a change of data and not a change of structure) - the good feeling to have come to a general solution. I mean, its up to you, but if you want to do real programming, choose the second solution :-) BTW: Maybe you have a look in one of these very nice books about programming: "Code Complete" (http://www.amazon.com/exec/obidos/tg/detail/-/1556154844/qid=1055404159/sr=8-2/ref=sr_8_2/002-7605764-8176851?v=glance&s=books&n=507846) "The Pragmatic Programmer" http://www.amazon.com/exec/obidos/tg/detail/-/020161622X/qid=1055404193/sr=1-2/ref=sr_1_2/002-7605764-8176851?v=glance&s=books Cheers, Dani
Hi all, pg_database gives me a number for the encoding. Where do I find the name that goes with that number? I've read through the chapter about system catalogs, and done many queries on them, but didn't find the encoding names. (By creating a db in unicode, I've found that '6' stands for 'unicode', but where do I find that in the system catalogs?) Thanks, Marc
on 6/12/03 3:50 AM, Dani Oderbolz at oderbolz@ecologic.de wrote: > I mean, its up to you, but if you want to do real programming, choose > the second solution :-) Yes, you're probably quite right. So, I just executed the script I wrote to create all the tables, and I went with option 2, with one slight modification - I split the keywords into two tables, one for format-oriented keyword sets (medium, techniques, etc.) and one for content-oriented ones (subject, theme, etc.). I think that may make coding the front end somewhat easier, since they need to be separated that way when they're presented in search results and what not. WRT "if you want to do real programming" - that made me smile, because the fundamental problem behind this and nearly all my questions on this list and pgsql-general is that I'm *not* a real programmer! I'm a designer who has had to expand into more technical areas because the current state of the web industry makes it a necessity - work is scarce enough that you have to be able to do just about anything if you want to keep the rent paid. And while I did do well in computer science in high school (way back in the stone age) and seem to be picking things up reasonably well, there are probably some basic principles I'm not as familiar with as I should be. > BTW: Maybe you have a look in one of these very nice books about > programming: > "Code Complete" > (http://www.amazon.com/exec/obidos/tg/detail/-/1556154844/qid=1055404159/sr=8- > 2/ref=sr_8_2/002-7605764-8176851?v=glance&s=books&n=507846) > "The Pragmatic Programmer" > http://www.amazon.com/exec/obidos/tg/detail/-/020161622X/qid=1055404193/sr=1-2 > /ref=sr_1_2/002-7605764-8176851?v=glance&s=books Many thanks - those look like exactly the sort of thing I need. Once I finish Stones & Matthew's "Beginning Databases with PostgreSQL", anyway. Though the fact the first one is published by Microsoft Press kind of scares me... :-) Lynna -- Resource Centre Database Coordinator Gallery 44 www.gallery44.org
Lynna, > Yes, you're probably quite right. So, I just executed the script I wrote to > create all the tables, and I went with option 2, with one slight > modification - I split the keywords into two tables, one for format-oriented > keyword sets (medium, techniques, etc.) and one for content-oriented ones > (subject, theme, etc.). I think that may make coding the front end somewhat > easier, since they need to be separated that way when they're presented in > search results and what not. Not real sure of your schema, but you might also consider a single keyword table with a "keyword_type" column. Not only would this keep your joins simple, but it would allow you to add an new keyword type in the future. Either way, another victory for normalization! Wahoo! > WRT "if you want to do real programming" - that made me smile, because the > fundamental problem behind this and nearly all my questions on this list and > pgsql-general is that I'm *not* a real programmer! I'm a designer who has > had to expand into more technical areas because the current state of the web > industry makes it a necessity - <grin> At least you're facing up to reality ... I have several friends who keep whining "but why can't I just do Flash the way I did in 1999?" It's not just Web work, btw. I get *lots* more work than most of my programmer friends because I know networks, accounting and litigiation support as well as code; it's just not enough to have just one skill anymore. -- -Josh Berkus Aglio Database Solutions San Francisco
on 6/12/03 4:43 PM, Josh Berkus at josh@agliodbs.com wrote: > Not real sure of your schema, but you might also consider a single keyword > table with a "keyword_type" column. Not only would this keep your joins > simple, but it would allow you to add an new keyword type in the future. Hmmm, yes, that might be better. Haven't started populating it yet (it'll take me a while to export and clean up all the data from FileMaker Pro), so I can still make changes to the structure pretty easily. But I've just run into a new wrinkle -- up until now I was mainly concentrating on how to organize the keywords themselves, and now I'm realizing I overlooked something when adding in the items (artist, exhibitions, etc.) that the keywords pertain to. I had used two columns to indicate that in the join table, one for item_id (the id number from the artists, images or exhibitions table) and one for item_type, a single character that indicated which of the three it was. But when I was checking to make sure I'd added all the right constraints, I realized that I couldn't add a foreign key constraint to item_id if it might be referring to an id number in any of *three* other tables rather than one specific one. And without that it isn't going to be able to check referential integrity. Is there some way to indicate that a foreign key can apply to any of more than one table? The book I'm mostly working from doesn't say anything about that, but it does say that foreign key constraints are an advanced topic and they're only covering the basics of them. Or should I be using three join tables, one each for artists, exhibitions and images? > Either way, another victory for normalization! Wahoo! About the only circumstance under which I'd consider being normal a good thing. :-) >> WRT "if you want to do real programming" - that made me smile, because the >> fundamental problem behind this and nearly all my questions on this list and >> pgsql-general is that I'm *not* a real programmer! I'm a designer who has >> had to expand into more technical areas because the current state of the web >> industry makes it a necessity - > > <grin> At least you're facing up to reality ... I have several friends who > keep whining "but why can't I just do Flash the way I did in 1999?" Oh, I don't know, maybe because -- no one's hiring Flash developers any more? *rolls eyes* At least half the web developers I used to know aren't even the field any more. > It's not just Web work, btw. I get *lots* more work than most of my > programmer friends because I know networks, accounting and litigiation > support as well as code; it's just not enough to have just one skill anymore. Yes, the latest revision of my resume isn't as web-focussed as it used to be, and highlights non-tech skills like writing, editing and research as well as the technical stuff. And I suppose when this contract ends, I'll be able to not only add what I've learned about PostgreSQL and PHP, but also arranging veggies and crackers artistically for gallery openings. :-) Lynna -- Resource Centre Database Coordinator Gallery 44 www.gallery44.org
Lynna, > Is there some way to indicate that a foreign key can apply to any of more > than one table? The book I'm mostly working from doesn't say anything about > that, but it does say that foreign key constraints are an advanced topic and > they're only covering the basics of them. Or should I be using three join > tables, one each for artists, exhibitions and images? Congratulations! You've just run into one of the failures of the SQL Standard. What you want is called a "distributed key", and it is a concept well-supported in Relational Calculus, but for some reason omitted from the SQL standard. (And, BTW, your book is WRONG. Foriegn Keys are *not* and "advanced topic"; they are fundamental and nobody should design a database without understanding them). I've had to handle this before. The approach is to do a "do it yourself" key, consisting of: ---optional, but a good idea------------------------------------- 1) create a sequence called, for example "aie_sq" 2) Alter the Artists, Exhibitions, and Images tables so they all draw on this same sequence for their unique ids (NEXTVAL('aie_sq')). This will ensure that all IDs are unique between the 3 tables. ----------------------------------------------------------------------- 3) Create a unique 3-column index in the join table. 4) Create BEFORE INSERT OR UPDATE triggers on the join table which checks that the id exists in one of the 3 tables. 5) Create BEFORE or AFTER UPDATE OR DELETE triggers on each of the 3 tables which check dependant records in the join table and take approprate action. If this is too much for you, then just create 3 seperate join tables. In your situation, I'm not sure there's that much advantage in doing it the more sophisticated way. > About the only circumstance under which I'd consider being normal a good > thing. :-) <chuckle> > Oh, I don't know, maybe because -- no one's hiring Flash developers any > more? *rolls eyes* At least half the web developers I used to know aren't > even the field any more. Yeah. > Yes, the latest revision of my resume isn't as web-focussed as it used to > be, and highlights non-tech skills like writing, editing and research as > well as the technical stuff. And I suppose when this contract ends, I'll be > able to not only add what I've learned about PostgreSQL and PHP, but also > arranging veggies and crackers artistically for gallery openings. :-) Well, contact me when you're done; I sometimes get non-profit referrals that I don't have time for. -- -Josh Berkus Aglio Database Solutions San Francisco
on 6/12/03 7:16 PM, Josh Berkus at josh@agliodbs.com wrote: >> Is there some way to indicate that a foreign key can apply to any of more >> than one table? The book I'm mostly working from doesn't say anything about >> that, but it does say that foreign key constraints are an advanced topic and >> they're only covering the basics of them. Or should I be using three join >> tables, one each for artists, exhibitions and images? > > Congratulations! You've just run into one of the failures of the SQL > Standard. What you want is called a "distributed key", and it is a concept > well-supported in Relational Calculus, but for some reason omitted from the > SQL standard. Damn. Calculus was a long, long time ago for me, but I thought this sounded like something that *should* be possible. However, "should" does not mean "is". :-/ > (And, BTW, your book is WRONG. Foriegn Keys are *not* and "advanced topic"; > they are fundamental and nobody should design a database without > understanding them). They're fundamental to using the referential integrity feature of PostgreSQL, right? Or so it would appear to me, anyway... FWIW, the book ("Beginning Databases in PostgreSQL" by Richard Stones & Neil Matthew) does state that they're important, but they only go over the basic aspects of them (adding the REFERENCES constraint to tables or columns and a couple of options you can include with it) and say that their *full* potential is advanced, so I did at least get to learn a little bit about them. But more would have been better. > I've had to handle this before. The approach is to do a "do it yourself" > key, consisting of: > > ---optional, but a good idea------------------------------------- > 1) create a sequence called, for example "aie_sq" > 2) Alter the Artists, Exhibitions, and Images tables so they all draw on this > same sequence for their unique ids (NEXTVAL('aie_sq')). This will ensure > that all IDs are unique between the 3 tables. > ----------------------------------------------------------------------- > 3) Create a unique 3-column index in the join table. > 4) Create BEFORE INSERT OR UPDATE triggers on the join table which checks > that the id exists in one of the 3 tables. > 5) Create BEFORE or AFTER UPDATE OR DELETE triggers on each of the 3 tables > which check dependant records in the join table and take approprate action. > > If this is too much for you, then just create 3 seperate join tables. In > your situation, I'm not sure there's that much advantage in doing it the more > sophisticated way. It's not necessarily too much for me -- I generally like doing things the harder but more elegant way rather than the easier but sloppier way. But in this particular case, the decision is complicated by the fact that there are hundreds of existing records needing to be imported into the database once I get the structure finalized, all of which have existing ID numbers and existing relationships between them. So if I let PostgreSQL renumber all the records according to a new sequence, I'd have to manually recreate all the links between them, which given there are something like 1300 records in total, would seriously suck. So I think I'm stuck with separate join tables. However, this does sound like an excellent way of dealing with this sort of problem if one is creating a new database from scratch, and I will definitely save this message for future reference because I could run into this situation again on future projects. >> Yes, the latest revision of my resume isn't as web-focussed as it used to >> be, and highlights non-tech skills like writing, editing and research as >> well as the technical stuff. And I suppose when this contract ends, I'll be >> able to not only add what I've learned about PostgreSQL and PHP, but also >> arranging veggies and crackers artistically for gallery openings. :-) > > Well, contact me when you're done; I sometimes get non-profit referrals that I > don't have time for. Cool - thanks! Lynna -- Resource Centre Database Coordinator Gallery 44 www.gallery44.org
Lynna, > Damn. Calculus was a long, long time ago for me, but I thought this sounded > like something that *should* be possible. However, "should" does not mean > "is". :-/ Check out www.dbdebunk.com. Fabian Pascal, CJ Date and others are developing a "post-SQL" relational database; maybe someday it will be commercially viable. > They're fundamental to using the referential integrity feature of > PostgreSQL, right? Or so it would appear to me, anyway... FWIW, the book > ("Beginning Databases in PostgreSQL" by Richard Stones & Neil Matthew) does > state that they're important, but they only go over the basic aspects of > them (adding the REFERENCES constraint to tables or columns and a couple of > options you can include with it) and say that their *full* potential is > advanced, so I did at least get to learn a little bit about them. But more > would have been better. Oh, that makes sense then. > It's not necessarily too much for me -- I generally like doing things the > harder but more elegant way rather than the easier but sloppier way. But in > this particular case, the decision is complicated by the fact that there > are hundreds of existing records needing to be imported into the database > once I get the structure finalized, all of which have existing ID numbers > and existing relationships between them. So if I let PostgreSQL renumber > all the records according to a new sequence, I'd have to manually recreate > all the links between them, which given there are something like 1300 > records in total, would seriously suck. So I think I'm stuck with separate > join tables. Not necessarily: -- The re-keying is optional. That is, it makes your solution more elegant and foolproof, but is not absolutely required to do a distributed key. You can handle the distributed key with overlapping ids by having your index on the "id" and "type" fields combined. -- You can re-key during conversion using a "was-is" table: 1. make a table with old_key INT, new_key INT; 2. select all of the current ids into old_key; 3. use procedural code to populate new_key with sequential values; 4. reference this was-is table for converting all child tables. 5. When done SETVAL your sequence to the highest value in any of the was-is tables. -- Josh Berkus Aglio Database Solutions San Francisco