Thread: unreferenced primary keys: garbage collection
I have a database in which five separate tables may (or may not) reference any given row in a table of postal addresses. I am using the primary / foreign key support in postgres 7 to represent these references. My problem is that, any time a reference is removed (either by deleting or updating a row in one of the five referencing tables), no garbage collection is being performed on the address table. That is, when the last reference to an address record goes away, the record is not removed from the address table. Over time, my database will fill up with abandoned address records. I suppose I could write procedural code in my client application, to check for abandonment when a reference is removed, but that would require examining each of the five referencing tables. I consider this a messy option, and I expect it would be rather inefficient. I thought of attempting to delete the address record any time a reference to it is removed, and relying on foreign key constraints to prevent the deletion if it is referenced elsewhere. However, I believe postgres will force the entire transaction block to be rolled back in such cases, thus nullifying all the other work done in the transaction. This is clearly undesirable. Isn't there some way to tell postgres *not* to roll back my transaction if a particular DELETE fails due to referential integrity? Are there any other options that might help me? Regards, Forest Wilkinson
Forest Wilkinson wrote: > I have a database in which five separate tables may (or may not) reference > any given row in a table of postal addresses. I am using the primary / > foreign key support in postgres 7 to represent these references. > > My problem is that, any time a reference is removed (either by deleting or > updating a row in one of the five referencing tables), no garbage > collection is being performed on the address table. That is, when the > last reference to an address record goes away, the record is not removed > from the address table. Over time, my database will fill up with > abandoned address records. While this behaviour makes sense in your case, it's not subject to referential integrity constraints. You could arrange for it with custom trigger procedures, checking all the five tables on DELETE or UPDATE on one ofthem. I'll make up a little example and post it the other day - need to take a nap now and tomorrow will be one ofthese 30-hour days (from MET to EST), so don't expect anything before Monday afternoon (EST). Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck wrote: > Forest Wilkinson wrote: > > I have a database in which five separate tables may (or may not) reference > > any given row in a table of postal addresses. I am using the primary / > > foreign key support in postgres 7 to represent these references. > > > > My problem is that, any time a reference is removed (either by deleting or > > updating a row in one of the five referencing tables), no garbage > > collection is being performed on the address table. That is, when the > > last reference to an address record goes away, the record is not removed > > from the address table. Over time, my database will fill up with > > abandoned address records. > > While this behaviour makes sense in your case, it's not > subject to referential integrity constraints. You could > arrange for it with custom trigger procedures, checking all > the five tables on DELETE or UPDATE on one of them. > > I'll make up a little example and post it the other day - > need to take a nap now and tomorrow will be one of these > 30-hour days (from MET to EST), so don't expect anything > before Monday afternoon (EST). Here it is: CREATE TABLE t_addr ( a_id integer PRIMARY KEY, a_name text ); CREATE TABLE t_customer ( c_id integer PRIMARY KEY, c_address integer REFERENCES t_addr ); CREATE TABLE t_order ( o_id integer PRIMARY KEY, o_customer integer REFERENCES t_customer ON DELETE CASCADE, o_shipaddr integer REFERENCES t_addr ); CREATE FUNCTION tidy_up_addr(integer) RETURNS bool AS ' DECLARE chk_addr ALIAS FOR $1; BEGIN -- -- Check if address is still referenced from t_customer -- IF count(c_address)> 0 FROM t_customer WHERE c_address = chk_addr THEN RETURN''f''; END IF; -- -- Check if address is still referenced from t_order -- IF count(o_shipaddr) > 0 FROM t_order WHERE o_shipaddr = chk_addr THEN RETURN ''f''; END IF; -- -- Address not required any more - get rid of it. -- DELETE FROM t_addr WHERE a_id = chk_addr; RETURN ''t''; END; ' LANGUAGE'plpgsql'; CREATE FUNCTION customer_upd_or_del() RETURNS opaque AS ' BEGIN PERFORM tidy_up_addr(old.c_address); RETURN NULL; END; ' LANGUAGE 'plpgsql'; CREATE TRIGGER customer_upd_or_del AFTER UPDATE OR DELETE ON t_customer FOR EACH ROW EXECUTEPROCEDURE customer_upd_or_del(); CREATE FUNCTION order_upd_or_del() RETURNS opaque AS ' BEGIN PERFORM tidy_up_addr(old.o_shipaddr); RETURN NULL; END; ' LANGUAGE 'plpgsql'; CREATE TRIGGER order_upd_or_del AFTER UPDATE OR DELETE ON t_order FOR EACH ROW EXECUTE PROCEDUREorder_upd_or_del(); We have two tables referencing the address table. Each of get's it's own trigger, simply calling the tidy-up function that removes the address if it's not referenced any more. Thus, adding a 3rd referencing table to the schema needs to add the check for reference to one central function, plus a very simple trigger on the new table. Hope this works for you. Have fun, Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan, Thanks for the reply, but your solution is rather unattractive to me. It requires that, any time a reference to an address id is changed, five tables be searched for the address id. This will create unwanted overhead every time a change is made. In order to make those searches even remotely fast, I'd have to add indexes to every one of those tables, which will mean an additional performance hit on table inserts. Moreover, if a new table is created that references address ids, and the maintainer at the time forgets to rewrite those trigger functions, the system will break. I'd much rather be able to simply attempt a delete of any given address, relying on referential integrity to prevent the delete if the address is still being referenced. I don't see why postgres has to treat such a situation as a fatal error. If postgres issued (for example) a warning instead of an error here, I'd be home free! Hasn't there been some talk on the lists about this lately? Forest Jan Wieck wrote: >> While this behaviour makes sense in your case, it's not >> subject to referential integrity constraints. You could >> arrange for it with custom trigger procedures, checking all >> the five tables on DELETE or UPDATE on one of them. Forest Wilkinson wrote: >> > I have a database in which five separate tables may (or may not) reference >> > any given row in a table of postal addresses. I am using the primary / >> > foreign key support in postgres 7 to represent these references. >> > >> > My problem is that, any time a reference is removed (either by deleting or >> > updating a row in one of the five referencing tables), no garbage >> > collection is being performed on the address table. That is, when the >> > last reference to an address record goes away, the record is not removed >> > from the address table. Over time, my database will fill up with >> > abandoned address records.
One other method is to setup up the foreign keys as ON DELETE RESTRICT, then outside of your transaction block issue a DELETE FROM address WHERE add_id = 1; If there are still records in the other tables referencing this record, it will error out and nothing will happen, however if no related records are left, the delete will succeed (you have to do it outside of transaction, otherwise I belive it will rollback on the error if other rows are found to be referencing the primary key).... Michael Fork - CCNA - MCP - A+ Network Support - Toledo Internet Access - Toledo Ohio On Tue, 23 Jan 2001, Forest Wilkinson wrote: > Jan, > > Thanks for the reply, but your solution is rather unattractive to me. It > requires that, any time a reference to an address id is changed, five > tables be searched for the address id. This will create unwanted overhead > every time a change is made. In order to make those searches even > remotely fast, I'd have to add indexes to every one of those tables, which > will mean an additional performance hit on table inserts. Moreover, if a > new table is created that references address ids, and the maintainer at > the time forgets to rewrite those trigger functions, the system will > break. > > I'd much rather be able to simply attempt a delete of any given address, > relying on referential integrity to prevent the delete if the address is > still being referenced. I don't see why postgres has to treat such a > situation as a fatal error. If postgres issued (for example) a warning > instead of an error here, I'd be home free! Hasn't there been some talk > on the lists about this lately? > > Forest > > Jan Wieck wrote: > >> While this behaviour makes sense in your case, it's not > >> subject to referential integrity constraints. You could > >> arrange for it with custom trigger procedures, checking all > >> the five tables on DELETE or UPDATE on one of them. > > Forest Wilkinson wrote: > >> > I have a database in which five separate tables may (or may not) reference > >> > any given row in a table of postal addresses. I am using the primary / > >> > foreign key support in postgres 7 to represent these references. > >> > > >> > My problem is that, any time a reference is removed (either by deleting or > >> > updating a row in one of the five referencing tables), no garbage > >> > collection is being performed on the address table. That is, when the > >> > last reference to an address record goes away, the record is not removed > >> > from the address table. Over time, my database will fill up with > >> > abandoned address records. >
On Wed, 24 Jan 2001 00:26:58 -0500 (EST), Michael Fork wrote: >One other method is to setup up the foreign keys as ON DELETE RESTRICT, >then outside of your transaction block issue a DELETE FROM address WHERE >add_id = 1; If there are still records in the other tables referencing >this record, it will error out and nothing will happen, however if no >related records are left, the delete will succeed (you have to do it >outside of transaction, otherwise I belive it will rollback on the >error if other rows are found to be referencing the primary key).... Yes, that's the approach I originally posted. The rollback issue is the thing I'm complaining about. The code in question gets called from within a parent function, which uses a single transaction block for all of its operations. This means that executing a query outside a transaction block (or within a separate one) is not an option. I want to be able to tell postgres not to rollback the whole transaction just because my delete attempt fails. I can think of 3 ways to do this: 1. Allow the delete to fail without throwing a fatal error. (Perhaps a warning would suffice.) 2. Allow the client to tell postgres not to roll back if a specified query produces an error. 3. Implement nested transactions. Forest
> On Tue, 23 Jan 2001, Forest Wilkinson wrote: > > > Jan, > > > > Thanks for the reply, but your solution is rather unattractive to me. It > > requires that, any time a reference to an address id is changed, five > > tables be searched for the address id. This will create unwanted overhead If - and I think this is the case for you - it is no problem for you to have some superfluous adresses in your tables, but you only want to avoid that those adresses remain there for a long time, you could simply run the function Jan sent from a cron job. That seems to be likely to be more efficient not only than the triggers but also to the ON DELETE RESTRICT solution, I guess. Trivial, but I HTH - Albert. > > Forest Wilkinson wrote: > > >> > I have a database in which five separate tables may (or may not) reference > > >> > any given row in a table of postal addresses. I am using the primary / > > >> > foreign key support in postgres 7 to represent these references. > > >> > > > >> > My problem is that, any time a reference is removed (either by deleting or > > >> > updating a row in one of the five referencing tables), no garbage > > >> > collection is being performed on the address table. That is, when the > > >> > last reference to an address record goes away, the record is not removed > > >> > from the address table. Over time, my database will fill up with > > >> > abandoned address records. -- -------------------------------------------------------------------------- Albert Reiner <areiner@tph.tuwien.ac.at> Deutsch * English * Esperanto * Latine --------------------------------------------------------------------------