Thread: unreferenced primary keys: garbage collection

unreferenced primary keys: garbage collection

From

Forest Wilkinson

Date:

19 January 2001, 23:53:12

I have a database in which five separate tables may (or may not) reference
any given row in a table of postal addresses.  I am using the primary /
foreign key support in postgres 7 to represent these references.

My problem is that, any time a reference is removed (either by deleting or
updating a row in one of the five referencing tables), no garbage
collection is being performed on the address table.  That is, when the
last reference to an address record goes away, the record is not removed
from the address table.  Over time, my database will fill up with
abandoned address records.

I suppose I could write procedural code in my client application, to check
for abandonment when a reference is removed, but that would require
examining each of the five referencing tables.  I consider this a messy
option, and I expect it would be rather inefficient.

I thought of attempting to delete the address record any time a reference
to it is removed, and relying on foreign key constraints to prevent the
deletion if it is referenced elsewhere.  However, I believe postgres will
force the entire transaction block to be rolled back in such cases, thus
nullifying all the other work done in the transaction.  This is clearly
undesirable.

Isn't there some way to tell postgres *not* to roll back my transaction if
a particular DELETE fails due to referential integrity?  Are there any
other options that might help me?

Regards,

Forest Wilkinson

Re: unreferenced primary keys: garbage collection

From

Jan Wieck

Date:

20 January 2001, 18:00:30

Forest Wilkinson wrote:
> I have a database in which five separate tables may (or may not) reference
> any given row in a table of postal addresses.  I am using the primary /
> foreign key support in postgres 7 to represent these references.
>
> My problem is that, any time a reference is removed (either by deleting or
> updating a row in one of the five referencing tables), no garbage
> collection is being performed on the address table.  That is, when the
> last reference to an address record goes away, the record is not removed
> from the address table.  Over time, my database will fill up with
> abandoned address records.
   While  this  behaviour  makes  sense  in  your case, it's not   subject  to  referential  integrity  constraints.
You could   arrange  for  it with custom trigger procedures, checking all   the five tables on DELETE or UPDATE on one
ofthem.

   I'll make up a little example and post it  the  other  day  -   need  to  take  a  nap  now and tomorrow will be one
ofthese   30-hour days (from MET to  EST),  so  don't  expect  anything   before Monday afternoon (EST).

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Re: unreferenced primary keys: garbage collection

From

Jan Wieck

Date:

22 January 2001, 17:17:43

Jan Wieck wrote:
> Forest Wilkinson wrote:
> > I have a database in which five separate tables may (or may not) reference
> > any given row in a table of postal addresses.  I am using the primary /
> > foreign key support in postgres 7 to represent these references.
> >
> > My problem is that, any time a reference is removed (either by deleting or
> > updating a row in one of the five referencing tables), no garbage
> > collection is being performed on the address table.  That is, when the
> > last reference to an address record goes away, the record is not removed
> > from the address table.  Over time, my database will fill up with
> > abandoned address records.
>
>     While  this  behaviour  makes  sense  in  your case, it's not
>     subject  to  referential  integrity  constraints.  You  could
>     arrange  for  it with custom trigger procedures, checking all
>     the five tables on DELETE or UPDATE on one of them.
>
>     I'll make up a little example and post it  the  other  day  -
>     need  to  take  a  nap  now and tomorrow will be one of these
>     30-hour days (from MET to  EST),  so  don't  expect  anything
>     before Monday afternoon (EST).
   Here it is:
        CREATE TABLE t_addr (            a_id        integer PRIMARY KEY,            a_name      text        );
        CREATE TABLE t_customer (            c_id        integer PRIMARY KEY,            c_address   integer REFERENCES
t_addr       );
 
        CREATE TABLE t_order (            o_id        integer PRIMARY KEY,            o_customer  integer REFERENCES
t_customer                               ON DELETE CASCADE,            o_shipaddr  integer REFERENCES t_addr        );
 
        CREATE FUNCTION tidy_up_addr(integer) RETURNS bool AS '        DECLARE            chk_addr    ALIAS FOR $1;
  BEGIN            --            -- Check if address is still referenced from t_customer            --            IF
count(c_address)> 0 FROM t_customer                        WHERE c_address = chk_addr            THEN
RETURN''f'';            END IF;            --            -- Check if address is still referenced from t_order
--            IF count(o_shipaddr) > 0 FROM t_order                        WHERE o_shipaddr = chk_addr            THEN
             RETURN ''f'';            END IF;            --            -- Address not required any more - get rid of
it.           --            DELETE FROM t_addr WHERE a_id = chk_addr;            RETURN ''t'';        END;        '
LANGUAGE'plpgsql';
 
        CREATE FUNCTION customer_upd_or_del() RETURNS opaque AS '        BEGIN            PERFORM
tidy_up_addr(old.c_address);           RETURN NULL;        END;        ' LANGUAGE 'plpgsql';
 
        CREATE TRIGGER customer_upd_or_del            AFTER UPDATE OR DELETE ON t_customer            FOR EACH ROW
EXECUTEPROCEDURE customer_upd_or_del();
 
        CREATE FUNCTION order_upd_or_del() RETURNS opaque AS '        BEGIN            PERFORM
tidy_up_addr(old.o_shipaddr);           RETURN NULL;        END;        ' LANGUAGE 'plpgsql';
 
        CREATE TRIGGER order_upd_or_del            AFTER UPDATE OR DELETE ON t_order            FOR EACH ROW EXECUTE
PROCEDUREorder_upd_or_del();
 
   We  have  two  tables  referencing the address table. Each of   get's it's own trigger, simply calling the  tidy-up
function  that  removes  the  address  if it's not referenced any more.   Thus, adding a 3rd referencing table to the
schema needs  to   add  the  check for reference to one central function, plus a   very simple trigger on the new
table. Hope  this  works  for   you.
 


Have fun, Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Re: unreferenced primary keys: garbage collection

From

Forest Wilkinson

Date:

23 January 2001, 23:58:37

Jan,

Thanks for the reply, but your solution is rather unattractive to me.  It
requires that, any time a reference to an address id is changed, five
tables be searched for the address id.  This will create unwanted overhead
every time a change is made.  In order to make those searches even
remotely fast, I'd have to add indexes to every one of those tables, which
will mean an additional performance hit on table inserts.  Moreover, if a
new table is created that references address ids, and the maintainer at
the time forgets to rewrite those trigger functions, the system will
break.

I'd much rather be able to simply attempt a delete of any given address,
relying on referential integrity to prevent the delete if the address is
still being referenced.  I don't see why postgres has to treat such a
situation as a fatal error.  If postgres issued (for example) a warning
instead of an error here, I'd be home free!  Hasn't there been some talk
on the lists about this lately?

Forest

Jan Wieck wrote:
>>     While  this  behaviour  makes  sense  in  your case, it's not
>>     subject  to  referential  integrity  constraints.  You  could
>>     arrange  for  it with custom trigger procedures, checking all
>>     the five tables on DELETE or UPDATE on one of them.

Forest Wilkinson wrote:
>> > I have a database in which five separate tables may (or may not) reference
>> > any given row in a table of postal addresses.  I am using the primary /
>> > foreign key support in postgres 7 to represent these references.
>> >
>> > My problem is that, any time a reference is removed (either by deleting or
>> > updating a row in one of the five referencing tables), no garbage
>> > collection is being performed on the address table.  That is, when the
>> > last reference to an address record goes away, the record is not removed
>> > from the address table.  Over time, my database will fill up with
>> > abandoned address records.

Re: unreferenced primary keys: garbage collection

From

Michael Fork

Date:

24 January 2001, 00:33:43

One other method is to setup up the foreign keys as ON DELETE RESTRICT,
then outside of your transaction block issue a DELETE FROM address WHERE
add_id = 1;  If there are still records in the other tables referencing
this record, it will error out and nothing will happen, however if no
related records are left, the delete will succeed (you have to do it
outside of transaction, otherwise I belive it will rollback on the
error if other rows are found to be referencing the primary key)....

Michael Fork - CCNA - MCP - A+
Network Support - Toledo Internet Access - Toledo Ohio

On Tue, 23 Jan 2001, Forest Wilkinson wrote:

> Jan,
> 
> Thanks for the reply, but your solution is rather unattractive to me.  It
> requires that, any time a reference to an address id is changed, five
> tables be searched for the address id.  This will create unwanted overhead
> every time a change is made.  In order to make those searches even
> remotely fast, I'd have to add indexes to every one of those tables, which
> will mean an additional performance hit on table inserts.  Moreover, if a
> new table is created that references address ids, and the maintainer at
> the time forgets to rewrite those trigger functions, the system will
> break.
> 
> I'd much rather be able to simply attempt a delete of any given address,
> relying on referential integrity to prevent the delete if the address is
> still being referenced.  I don't see why postgres has to treat such a
> situation as a fatal error.  If postgres issued (for example) a warning
> instead of an error here, I'd be home free!  Hasn't there been some talk
> on the lists about this lately?
> 
> Forest
> 
> Jan Wieck wrote:
> >>     While  this  behaviour  makes  sense  in  your case, it's not
> >>     subject  to  referential  integrity  constraints.  You  could
> >>     arrange  for  it with custom trigger procedures, checking all
> >>     the five tables on DELETE or UPDATE on one of them.
> 
> Forest Wilkinson wrote:
> >> > I have a database in which five separate tables may (or may not) reference
> >> > any given row in a table of postal addresses.  I am using the primary /
> >> > foreign key support in postgres 7 to represent these references.
> >> >
> >> > My problem is that, any time a reference is removed (either by deleting or
> >> > updating a row in one of the five referencing tables), no garbage
> >> > collection is being performed on the address table.  That is, when the
> >> > last reference to an address record goes away, the record is not removed
> >> > from the address table.  Over time, my database will fill up with
> >> > abandoned address records.
>

Re: unreferenced primary keys: garbage collection

From

Forest Wilkinson

Date:

24 January 2001, 16:07:56

On Wed, 24 Jan 2001 00:26:58 -0500 (EST), Michael Fork wrote:

>One other method is to setup up the foreign keys as ON DELETE RESTRICT,
>then outside of your transaction block issue a DELETE FROM address WHERE
>add_id = 1;  If there are still records in the other tables referencing
>this record, it will error out and nothing will happen, however if no
>related records are left, the delete will succeed (you have to do it
>outside of transaction, otherwise I belive it will rollback on the
>error if other rows are found to be referencing the primary key)....

Yes, that's the approach I originally posted.  The rollback issue is the
thing I'm complaining about.  The code in question gets called from within
a parent function, which uses a single transaction block for all of its
operations.  This means that executing a query outside a transaction block
(or within a separate one) is not an option.

I want to be able to tell postgres not to rollback the whole transaction
just because my delete attempt fails.  I can think of 3 ways to do this:
1. Allow the delete to fail without throwing a fatal error.  (Perhaps a
warning would suffice.)
2. Allow the client to tell postgres not to roll back if a specified query
produces an error.
3. Implement nested transactions.

Forest

Re: unreferenced primary keys: garbage collection

From

"Albert REINER"

Date:

25 January 2001, 12:02:17

> On Tue, 23 Jan 2001, Forest Wilkinson wrote:
> 
> > Jan,
> > 
> > Thanks for the reply, but your solution is rather unattractive to me.  It
> > requires that, any time a reference to an address id is changed, five
> > tables be searched for the address id.  This will create unwanted overhead

If - and I think this is the case for you - it is no problem for you
to have some superfluous adresses in your tables, but you only want to
avoid that those adresses remain there for a long time, you could
simply run the function Jan sent from a cron job. That seems to be
likely to be more efficient not only than the triggers but also to the
ON DELETE RESTRICT solution, I guess.

Trivial, but I HTH -

Albert.


> > Forest Wilkinson wrote:
> > >> > I have a database in which five separate tables may (or may not) reference
> > >> > any given row in a table of postal addresses.  I am using the primary /
> > >> > foreign key support in postgres 7 to represent these references.
> > >> >
> > >> > My problem is that, any time a reference is removed (either by deleting or
> > >> > updating a row in one of the five referencing tables), no garbage
> > >> > collection is being performed on the address table.  That is, when the
> > >> > last reference to an address record goes away, the record is not removed
> > >> > from the address table.  Over time, my database will fill up with
> > >> > abandoned address records.

-- 

--------------------------------------------------------------------------
Albert Reiner                                   <areiner@tph.tuwien.ac.at>
Deutsch       *       English       *       Esperanto       *       Latine
--------------------------------------------------------------------------