Thread: db design question
Hello everyone, I'm currently designing the structure of our new pg database, and I have a question. In our current platform, addresses are stored in several places, like this (in semicode): company(name varchar(100), street varchar(100), state varchar(100)) employee(code int, street varchar(100), state varchar(100)) consultant(name varchar(50), street varchar(100), state varchar(100)) The colums street and state (and a lot more in real life) are the same. My idea for the new db was someting like this: company(name varchar(100)) employee(code int) consultant(name varchar(50)) address(ref_oid OID, street varchar(100), state varchar(100)) In this way, I can store all the addresses together and find them with. SELECT * WHERE addres.ref_oid = company.oid; Sort of the same manner as the storeage of BLOBs. Is this approach common practice? Is it safe? I know I have to make backups with the OIDs etc., but besides that, any more pitfalls? TIA!
Jules, > My idea for the new db was someting like this: > > company(name varchar(100)) > employee(code int) > consultant(name varchar(50)) > address(ref_oid OID, street varchar(100), state varchar(100)) > > In this way, I can store all the addresses together and find them > with. > SELECT * WHERE addres.ref_oid = company.oid; That's a fine idea, except that you have the referential integrity backward: Company(name varchar(100), address_id INT) employee(code int, address_id INT) consultant(name varchar(50), address_id INT) address(address_id INT PRIMARY KEY, street varchar(100), state varchar(100)) While there are reasons to do the kind of multi-table join that you propose, the standard relational model (above) works better. You can even automate the creation and relationship of addresses to companies, employees, etc. through VIEWS and RULES. I heartily reccomend "Practical Issues in Database Management" to you. Fabian Pascal, the author, treats extensively some of the pitfalls of getting unneccessarily creative with the relational model. BTW, don't use the OID. The OID, as of 7.2.0, is for *system purposes only* and should not be used for queries, joins, indexes, or keys. If you need a table-indepentant unique ID, use a sequence. -Josh Berkus
On 15 Oct 2002 at 9:38, Josh Berkus wrote: > Jules, > > > My idea for the new db was someting like this: > > > > company(name varchar(100)) > > employee(code int) > > consultant(name varchar(50)) > > address(ref_oid OID, street varchar(100), state varchar(100)) > > > > In this way, I can store all the addresses together and find them > > with. > > SELECT * WHERE addres.ref_oid = company.oid; > > That's a fine idea, except that you have the referential integrity > backward: > > Company(name varchar(100), address_id INT) > employee(code int, address_id INT) > consultant(name varchar(50), address_id INT) > address(address_id INT PRIMARY KEY, street varchar(100), state > varchar(100)) > > While there are reasons to do the kind of multi-table join that you > propose, the standard relational model (above) works better. You can > even automate the creation and relationship of addresses to companies, > employees, etc. through VIEWS and RULES. Thanks, great advice! > I heartily reccomend "Practical Issues in Database Management" to you. > Fabian Pascal, the author, treats extensively some of the pitfalls of > getting unneccessarily creative with the relational model. OK, I will have a look. > BTW, don't use the OID. The OID, as of 7.2.0, is for *system purposes > only* and should not be used for queries, joins, indexes, or keys. If > you need a table-indepentant unique ID, use a sequence. I'll drop it in this case, your approach "feels" a lot safer. However, I plan to store BLOBs in my db, and is this case I'm afraid I will _have_ to use OIDs. The idea is to be able to "attach" a blob to _any_ row in the db. An example of how I planned to use it (this works BTW) -- import a BLOB insert into blobs (ref_oid, blob_oid, blob_name, description) values ( (select oid from employee where name='Jules'), lo_import('/usr/share/pixmaps/gimp.png'), '/usr/share/pixmaps/gimp.png', 'test: imported picture'); -- retrieve the BLOB select lo_export((select blob_oid from blobs where ref_oid= (select oid from employee where name='Jules')), '/tmp/gimp.png'); One pitfall I was already warned for: allways use the datatype OID, or get in trouble when dumping / restoring, as the actual values of the OIDs change.
On 15 Oct 2002 at 9:38, Josh Berkus wrote: > Jules, > > > My idea for the new db was someting like this: > > > > company(name varchar(100)) > > employee(code int) > > consultant(name varchar(50)) > > address(ref_oid OID, street varchar(100), state varchar(100)) > > > > In this way, I can store all the addresses together and find them > > with. > > SELECT * WHERE addres.ref_oid = company.oid; > > That's a fine idea, except that you have the referential integrity > backward: > > Company(name varchar(100), address_id INT) > employee(code int, address_id INT) > consultant(name varchar(50), address_id INT) > address(address_id INT PRIMARY KEY, street varchar(100), state > varchar(100)) > > While there are reasons to do the kind of multi-table join that you > propose, the standard relational model (above) works better. <snip> I just thought of something... If I wanted the possibility to have more than one address per employee / company / consultant the OID approach would be better. A way (other than using OIDs) to solve this "(several tables):N" issue (if you know what I mean) might be using an array for the address references: company(name varchar(50), address_id INT[]) but somehow I don't like the idea of an array datatype -a table within a table- in the relational model.
Jules, > I'll drop it in this case, your approach "feels" a lot safer. > However, > I plan to store BLOBs in my db, and is this case I'm afraid I will > _have_ to use OIDs. The idea is to be able to "attach" a blob to > _any_ > row in the db. An example of how I planned to use it (this works BTW) Yeah, you need to use OIDs for lo_export. I'm not personally familiar with the issues on referencing these OIDs in tables. Instead of using the row_oid for the standard tables, though, consider doing this: 1) Create a table-independant sequence, "global_seq" 2) Make the primary key of each significant table DEFAULT NEXTVAL ('global_seq') instead of SERIAL. 3) Reference the primary keys in your blobs table, instead of the OID. The above will work as well as using the OID, without the potential headaches and with a greater degree of control. I used this scheme, in fact, to collectivize modification timestamps and journaled notes for 5 tables in one DB design. Keep in mind one other thing, though: while collectivizing your BLOBS in the fashion above simplifies your database schema (almost always a good thing) it can come at a substantial performance penalty if your database contains many large tables. For example, I did *not* merge my modification timestamps into a single table for my latest database effort, as it contains 6 significant tables totalling 2.5 million rows. And a single, 2.5 million row mod_data table searches and sorts very much slower than 6 sets of columns with an average of 400,000 rows each. -Josh Berkus
On 16 Oct 2002 at 9:19, Josh Berkus wrote: <snip> > Instead of using the row_oid for the standard tables, though, consider > doing this: > > 1) Create a table-independant sequence, "global_seq" > 2) Make the primary key of each significant table DEFAULT NEXTVAL > ('global_seq') instead of SERIAL. > 3) Reference the primary keys in your blobs table, instead of the OID. > > The above will work as well as using the OID, without the potential > headaches and with a greater degree of control. <snip> I hadn't thought of that approach. Sounds like a workable solution, but then again, so does the OID way. I allways prefer to keep things as simple as possible, using the most commonly applicated method. Both OIDs and sequences exist and should do the job, but the question is, which one is more reliable? Which one will keep it's current syntax, functionality etc. longest? The db I'm designing will be the core of my companies IT and will be in use for at least 5 years (the current one is into its 7th) and a lot can change in that time. That's why these decisions are so important now, on what horse do I put my money? I'll have a closer look at sequences, thanks for sharing your solution!
Hi Jules What will happen to oid if you dump the table, upgrade PostgreSQL and recreate tables? In case a company has more than one consultent, company is going to feature in more than one tuples with different oid's. It might be a better idea to have two tables. One table assigns serial id number to company. Other table could have this serial id as one attribute , along with other things like consultant name etc. This should avert problem of repeating data. It is quite possible that one consultant gives consultations to more than one companies. In that case the data needs to be stored in three tables. Vijay Jules Alberts wrote: > > Hello everyone, > > The colums street and state (and a lot more in real life) are the same. > My idea for the new db was someting like this: > > company(name varchar(100)) > employee(code int) > consultant(name varchar(50)) > address(ref_oid OID, street varchar(100), state varchar(100)) > > In this way, I can store all the addresses together and find them with. > SELECT * WHERE addres.ref_oid = company.oid;
On 20 Oct 2002 at 19:16, Vijay Deval wrote: > Hi Jules Hello Vijay, > What will happen to oid if you dump the table, upgrade PostgreSQL and > recreate tables? A dump / restore (you don't even need an upgrade) will change the value of the OIDs. However, it will keep the relations working. Like this: create table person(name varchar); insert into person values ('Joe'); select oid from person where name = 'Joe'; -- e.g. 123456 create table address (ref_oid oid, street varchar); insert into address values (123456, 'Penny Lane'); dump + restore select oid from person where name = 'Joe'; -- new value! 888888 select stree from address where ref_oid = 888888 -- 'Penny Lane' I tested this and it works. You just have to avoid hardcoded OID's in your app, allways works with a subselect. Still, there is a translation going on in the dump / restore mechanism. That's one step extra that could cause a problem. > In case a company has more than one consultent, company is going to > feature in more than one tuples with different oid's. > > It might be a better idea to have two tables. One table assigns serial > id number to company. Other table could have this serial id as one > attribute , along with other things like consultant name etc. This > should avert problem of repeating data. I already do that. > It is quite possible that one consultant gives consultations to more > than one companies. In that case the data needs to be stored in three > tables. That too. The actual problem was that I have several tables (address, BLOBs, actions etc.) that are possibly related to _any_ table in my db. A solution could be to create a referring column from every table to an INT primary key in address, action etc., but that would limit the link to one at most. For "address" this is probably OK, but for "actions" an "BLOBs" surely not. I guess I could work with intermediate tables (between "any table" and address, BLOB's and actions), but that would create an overhead of factor 2. Quite a bit IMO. That's why I'm looking at the possibilities of using OID for this. Advantage: - OID is global (througout the db) unique, even if I don't care from which table a column comes, I will find it. Disadvantages: - more coding in select stataments. Each select that involves one of the address / BLOBs / actions tables will be about twice the size - the changing of OID value with dump and restore. This makes me a bit nervous, allthough the longer select statements should cover this There are many people saying "don't do it". :-) OIDs are considered an internal system mechanism that shouldn't be used in everyday life. If this is the common view on OIDs the developers may well change their behaviour, or replace them with some other mechanism ("why not? nobody's using them anyway"). A workaround for the address and actions tables may be using a global sequence and use nextval() primary keys for every table. Someone recommended a book (Pascal: practical issues in db management). I've ordered it and will read it, hope it will help me make up my mind.
Jules, > sequence and use nextval() primary keys for every table. Someone > recommended a book (Pascal: practical issues in db management). I've > ordered it and will read it, hope it will help me make up my mind. Me. Unfortunately, it won't help you make up your mind on the OID vs. Sequence issue. The reason I reccomended it is that Pascal is a real fanatic about the relational database model, and gives you a) detailed explanations of why it's dangerous* to get creative with the relational model, and b) suggestions for how to do a lot of complex things *within* the relational model. OID vs. sequence is strictly a PostgreSQL implementation issue, and even the core developers are divided on the issue, though it's 3 to 1 against the last time I saw this argued out. *not to say that you should *never* get creative -- though Pascal thinks you shouldn't -- just that it's very important to know the penalties involved. -Josh Berkus
On Thu, 2002-10-17 at 19:11, Jules Alberts wrote: > Both OIDs and sequences exist and should do the job, but the question > is, which one is more reliable? Which one will keep it's current > syntax, functionality etc. longest? The db I'm designing will be the > core of my companies IT and will be in use for at least 5 years (the > current one is into its 7th) and a lot can change in that time. That's > why these decisions are so important now, on what horse do I put my > money? Don't put your money on OID. These have changed in the past, are known to change between dump/reload and offer no advantages. In some databases the equivalent of the OID can provide fast access to a record, but this is not the case in PostgreSQL where you will still need to take normal measures (i.e. index on that column), just as you would a SERIAL column, which would be invariant between dump/restore. In more recent PostgreSQL versions you can create tables without OIDs, so there is no longer even space savings involved. For the large-object interface you are stuck with OID for now, of course. Regards, Andrew. -- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Survey for free with http://survey.net.nz/
On 21 Oct 2002 at 8:44, Josh Berkus wrote: Hello Josh, > > sequence and use nextval() primary keys for every table. Someone > > recommended a book (Pascal: practical issues in db management). I've > > ordered it and will read it, hope it will help me make up my mind. > > Me. Unfortunately, it won't help you make up your mind on the OID vs. > Sequence issue. That's not a problem. I saw some articles from Pascal on the web, and I'm sure his book will be valuable anyway. > The reason I reccomended it is that Pascal is a real > fanatic about the relational database model, and gives you > a) detailed explanations of why it's dangerous* to get creative with > the relational model, and > b) suggestions for how to do a lot of complex things *within* the > relational model. > OID vs. sequence is strictly a PostgreSQL implementation issue, and > even the core developers are divided on the issue, though it's 3 to 1 > against the last time I saw this argued out. I guess that's 3:1 against OID? Well, I'm beginning to be convinced not to use OID (Andrew, thanks for your posting too). So let's say I don't use OID. Is a SEQUENCE bound to several tables the best solution? Any other suggestions? > *not to say that you should *never* get creative -- though Pascal > thinks you shouldn't -- just that it's very important to know the > penalties involved. I don't _want_ to be creative, not in this db anyway :). I want everything to be as standard as possible, as solid as possible. The way I'm struggling to solve this issue makes me wonder if I'm not making a huge design mistake. OTOH I cant imaging that I'm the first one to encounter this. I guess the issue is a one to many relationship where "one" can be any table in the db How does one solve this in a relational model? Sorry if I keep going on about this, but I have a hunch that this is _very_ important for my db. If I don't solve it correctly, I'm sure I will be in a lot of trouble later on. TIA!
On Tue, 2002-10-22 at 20:14, Jules Alberts wrote: > On 21 Oct 2002 at 8:44, Josh Berkus wrote: > > So let's say I don't use OID. Is a SEQUENCE bound to several tables the > best solution? Any other suggestions? The sequence doesn't have to be bound to several tables - just your address table: Josh originally said: > That's a fine idea, except that you have the referential integrity > backward: > > Company(name varchar(100), address_id INT) > employee(code int, address_id INT) > consultant(name varchar(50), address_id INT) > address(address_id INT PRIMARY KEY, street varchar(100), state > varchar(100)) My only quibble with this would be to change the address table thus: address( address_id SERIAL PRIMARY KEY, street TEXT, state TEXT ); This will create a sequence for you called address_address_id_seq and set the default to nextval('address_address_id_seq') so that whenever you create a new record without specifically assigning a sequence, it will get handed one. When writing a company record you do something like: BEGIN; addr_id = "SELECT nextval('address_address_id_seq'); insert into company (name, address_id ) values('company name', addr_id ); insert into address( address_id, street, state ) values( addr_id, 'George Street', 'New South Wales'); COMMIT; Obviously similar things happen writing an employee record or whatever. > a one to many relationship where "one" can be any table in the db > > How does one solve this in a relational model? Sorry if I keep going on > about this, but I have a hunch that this is _very_ important for my db. > If I don't solve it correctly, I'm sure I will be in a lot of trouble > later on. It isn't at all unusual, I'm afraid. Codes tables do this sort of thing all the time - where you have a table that contains: Code Value M Male F Female X Unknown And you want to refer to that in all sorts of places in the system. Well, in the places you want to refer to it you store the unique identifier, i.e. the "Code". For your case, you want to store the unique identifier (i.e. address_id). > In this way, I can store all the addresses together and find them > with. > SELECT * WHERE addres.ref_oid = company.oid; This turns around and becomes something like: SELECT * FROM address a, company c WHERE a.address_id = c.address_id ; Cheers, Andrew. -- --------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Survey for free with http://survey.net.nz/ ---------------------------------------------------------------------
On 22 Oct 2002 at 21:30, Andrew McMillan wrote: > On Tue, 2002-10-22 at 20:14, Jules Alberts wrote: > > On 21 Oct 2002 at 8:44, Josh Berkus wrote: > > > > So let's say I don't use OID. Is a SEQUENCE bound to several tables the > > best solution? Any other suggestions? > > The sequence doesn't have to be bound to several tables - just your > address table: > > Josh originally said: > > That's a fine idea, except that you have the referential integrity > > backward: > > > > Company(name varchar(100), address_id INT) > > employee(code int, address_id INT) > > consultant(name varchar(50), address_id INT) > > address(address_id INT PRIMARY KEY, street varchar(100), state > > varchar(100)) Hello Andrew, That wouldn't solve my problem, because I want to be able to store more addresses per company / client / consultant. Also, I have other tables (action and blobs) that do the same. I had a conversation with an Oracle DBA this morning. After explaining, which took some time :-), he told me that what I wanted is in Oracle called an "arc" relationship. Knowing this word, Google was a lot friendlier, I found a good explanation here: http://www.stormloader.com/yonghuang/computer/SemanticallyDependentAttri butes.txt Apparently there are several solutions: 1. create an intermediate table for every table you want address / action / blobs to be related to. This would create a big overhead, I don't like that. 2. in address / action / blobs create a column for each table it is related to. Enforce that per row only one of these columns is filled. The value would be the primary key of the related table. The other columns would be NULL, which could cause problems. 3. in address / action / blobs create a column that contains the unique identifier of the row in the related table. This is the OID / sequence approach I was thinking about. 4. same as 3., but in address / action / blobs also create a column that contains the name of the related table. This will makes a backward search (which is the company / client / consultant this address belongs to) easier. I think number 4 is the best solution. Create a sequence from which all tables get their primary key and include an integer row in address etc that points to the primary key of the related table and store the name of that table. Things to keep an eye on: - don't change table names without changing the values in address etc - set a huge maximum value for the global sequence Thanks for your reactions everyone! If there is more feedback / remarks, please let me know. TIA!
On Tue, 2002-10-22 at 22:34, Jules Alberts wrote: > That wouldn't solve my problem, because I want to be able to store more > addresses per company / client / consultant. Also, I have other tables > (action and blobs) that do the same. Oh, I see. Yes, the standard way to deal with this is a to 'normalise' the data through intermediate tables. In my past I have usually used an 'address_type' or 'contact_type' field to additionally identify these relationships, but that is probably irrelevant to this question. Basic form is: company ( company_id, company_data ); address ( address_id, address_data); employee ( employee_id, employee_data ); company_address( company_id, address_id ); employee_address( employee_id, address_id ); However, if you can guarantee that company_id and employee_id are non-colliding (e.g. you use the same sequence to generate them) then company_address and employee_address can be combined into a single table. Alternatively, a defining attribute can be added to identify the source, so you can have individual sequences within the source tables: contact_address( contact_source, contact_id, address_id ) Where a 'contact_source' of 'E' might identify that contact_id is an employee_id or 'C' might indicate a company_id and so on. > I had a conversation with an Oracle DBA this morning. After explaining, > which took some time :-), he told me that what I wanted is in Oracle > called an "arc" relationship. Yes. I've heard them called other things too (polite ones, even :-). > 4. same as 3., but in address / action / blobs also create a column > that contains the name of the related table. This will makes a backward > search (which is the company / client / consultant this address belongs > to) easier. > > I think number 4 is the best solution. Create a sequence from which all > tables get their primary key and include an integer row in address etc > that points to the primary key of the related table and store the name > of that table. Things to keep an eye on: > > - don't change table names without changing the values in address etc > - set a huge maximum value for the global sequence Note that the 'contact_source' can be anything you want - it doesn't have to be the full table name from the source table. It is probably better if it isn't, in fact, because then you are restricted to a single reference field in your source table. Consider a company record that required two signatories to be recorded, each with their address. This could potentially (easily) be handled as two fields on the company record, but you would need your 'contact_source' linking to their address to indicate not just the table, but the field as well. Usually it is easier to do this symbolically, than to use the full table.field sort of notation (which _still_ might not support arrays, or whatever). Cheers, Andrew. -- --------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Survey for free with http://survey.net.nz/ ---------------------------------------------------------------------
Folks, Getting back to Jules' original question about having a universal "blob" table, as I said, I've done this. An example: Table Clients(usq PK NEXTVAL('universal_sq'), client_name, etc ...) Table Orders(usq PK NEXTVAL('universal_sq'), order_date, etc ...) Table Invoices(usq PK NEXTVAL('universal_sq'), invoice_no, etc ...) And the multi-relational tables: Table mod_data (ref_usq PK INT, mod_user, mod_date, create_user, create_date); Table notes (note_id SERIAL, ref_usq INT, note_type, note_date, note_user, note_text); A simplified query: SELECT clients.*, mod_data.* FROM clients JOIN mod_data ON clients.usq = mod_data.ref_usq; Both of the above tables, through the ref_usq, are related to any of the tables possessing a USQ as the primary key. This is in use in a production system, and has been quite reliable. Advantages of the above approach: 1) It allows you to "attach" the "flexible child" tables to any qualified parent table without changing the schema. 2) It reduces the number of tables in your schema, simplifying and reducing clutter and confusion. Disadvantages of the above approach: 1) The relationships are not enforcable through the standard SQL foreign key constraints. As a result, you need to write your own custom triggers and rules to enforce the desired relationships. This can get annoying, with up to 3 triggers per relationship. 2) Any "flexible child" table has, of necessity, as many rows as the sum total of the rows in the parent tables, or the requisite multiple for one-to-many relations. This results in a child table that is much, much larger than the standard model of having several different child tables would be. As I have previously mentioned, this is a performance drag as one large table is, in practice, slower than several small tables. 3) Any auto-journaling system or auto-archiving system of record changes will have difficulty working around the above ambiguous relationships. My conclusion is that the above is a fine approach for small databases (the production one in which I'm using it has about 1,000-2,000 records in each of the 6 main tables) but a poor approach for very large databases for performance reasons. -Josh Berkus
On 22 Oct 2002 at 9:36, Josh Berkus wrote: Hello Josh, > Getting back to Jules' original question about having a universal > "blob" table, as I said, I've done this. An example: <snip> > Both of the above tables, through the ref_usq, are related to any of > the tables possessing a USQ as the primary key. This is in use in a > production system, and has been quite reliable. That's good to hear! <snip> > 2) Any "flexible child" table has, of necessity, as many rows as the > sum total of the rows in the parent tables, or the requisite multiple > for one-to-many relations. This results in a child table that is much, > much larger than the standard model of having several different child > tables would be. As I have previously mentioned, this is a > performance drag as one large table is, in practice, slower than > several small tables. Why the necessity to have a row in the child table for each row in the parent? Only when customer has an address, I will insert a row in the child. So the child table will be only as large as the sum of addresses required for all parents together, not as large as the sum of all rows in all parents together. Maybe I didn't understand you correctly (English isn't my native language) > 3) Any auto-journaling system or auto-archiving system of record > changes will have difficulty working around the above ambiguous > relationships. <snip> Do you have an URL to such systems? I'm not familiar with them, I guess you don't mean journalling filesystems? TIA! With the info I have so far, I plan to work like this: ------------------------------------------------------ -- sequence for global primary key create sequence glob_sq; -- the parents create table customer ( glob_id int unique default nextval('glob_sq'), id serial primary key, name varchar); create table employee ( glob_id int unique default nextval('glob_sq'), id serial primary key, name varchar); -- one of the children create table address ( -- tables that have no children need no global ID id serial primary key, ref_table varchar, ref_id int not null, street varchar); -- some testing, create some parents insert into employee (name) values ('Paul'); insert into employee (name) values ('John'); insert into customer (name) values ('Apple Inc.'); select * from employee; select * from customer; -- create some children (that's the nice part :-) insert into address (ref_table, ref_id, street) values ( 'employee', (select glob_id from employee where name = 'Paul'), 'Penny Lane'); insert into address (ref_table, ref_id, street) values ( 'customer', (select glob_id from customer where name = 'Apple Inc.'), 'Abbey Road'); select * from address; -- search Paul's address select street from address where ref_id = (select glob_id from employee where name = 'Paul'); -- now backward, find the employee that lives on Penny Lane. If done -- in real life, the FROM table would be generated in runtime with a -- SELECT REF_TABLE FROM ADDRES WHERE ...Thas' where the ref_table -- column comes in handy select name from employee where glob_id = (select glob_id from address where street = 'Penny Lane'); ------------------------------------------------------
Jules, > Why the necessity to have a row in the child table for each row in > the > parent? Only when customer has an address, I will insert a row in the > > child. So the child table will be only as large as the sum of > addresses > required for all parents together, not as large as the sum of all > rows > in all parents together. Maybe I didn't understand you correctly > (English isn't my native language) 1) I am not reccomending that you use this approach for addresses -- the standard relational model will serve your purposes, so there's not reason to get creative. I was reccomending that you try the "flexible child" approach *only* for the BLOB reference table. 2) In answer to your question: Imagine that I have 5 tables, clients, employees, invoices, orders, and payments. Imagine that each table has roughly 20,000 rows. Each table also has a row in the "mod_data" table. If I want to query the mod_data for a particular client, then the database has to search 100,000 rows, not the 20,000 it would search if the mod_data were directly in the clients table. get it? > Do you have an URL to such systems? I'm not familiar with them, I > guess > you don't mean journalling filesystems? TIA! I'm talking about triggers or other mechanisms that record each change to the database records into a permanent archive for auditing purposes. > With the info I have so far, I plan to work like this: See above. As I said before, I feel that the "flexible child" approach is a *bad* approach for storing the addresses. I just suggested it for the BOLBs. For the addresses, see my first e-mail to you on the topic. -Josh Berkus
On 23 Oct 2002 at 9:39, Josh Berkus wrote: <snip> > 1) I am not reccomending that you use this approach for addresses -- > the standard relational model will serve your purposes, so there's not > reason to get creative. I was reccomending that you try the "flexible > child" approach *only* for the BLOB reference table. OK, I'm convinced. I will apply the "arc" only when there is no other way. For the addresses I will use the relational method. > 2) In answer to your question: Imagine that I have 5 tables, clients, > employees, invoices, orders, and payments. Imagine that each table > has roughly 20,000 rows. Each table also has a row in the "mod_data" > table. If I want to query the mod_data for a particular client, then > the database has to search 100,000 rows, not the 20,000 it would search > if the mod_data were directly in the clients table. get it? > > > Do you have an URL to such systems? I'm not familiar with them, I > > guess > > you don't mean journalling filesystems? TIA! > > I'm talking about triggers or other mechanisms that record each change > to the database records into a permanent archive for auditing purposes. Oh, I see. > > With the info I have so far, I plan to work like this: > > See above. As I said before, I feel that the "flexible child" approach > is a *bad* approach for storing the addresses. I just suggested it > for the BOLBs. For the addresses, see my first e-mail to you on the > topic. I will, thanks again for all your help!