Thread: When to use cascading deletes?

When to use cascading deletes?

From
David
Date:
Hi there.

When is a good time to use cascading deletes?

Usually, for safety reasons, I prefer to not ever use cascading
deletes. But that can lead to some complex code that uses topological
sorts etc to manually delete records in the right order, when a
cascading delete is needed.

Ideally, I'd like postgresql to not do cascading deletes, *except*
when I tell it to, and the rest of the time fail when the user didn't
explicitly "opt in" for cascading deletes. When it comes to enabling
cascading deletes, I don't really like the idea that deleting or
updating a row from one table can have a possibly unexpected (to the
app programmer, using the database) chain reaction to other tables.

I don't know, maybe I have the wrong mindset, and cascading is
preferable (in terms of object model) in some cases? I'd like to read
more on this subject (general best practices for what types of
cascading are appropriate to use when).

Any tips?

Thanks,

David.

Re: When to use cascading deletes?

From
Alban Hertroys
Date:
On Jun 11, 2009, at 10:59 AM, David wrote:

> Hi there.
>
> When is a good time to use cascading deletes?

As a general rule of thumb I use cascading deletes if the data in a
record is meaningless without the record that the foreign key points to.

> Ideally, I'd like postgresql to not do cascading deletes, *except*
> when I tell it to, and the rest of the time fail when the user didn't
> explicitly "opt in" for cascading deletes.

That's the default behaviour. If you don't specify what to do on
DELETE (or UPDATE) then postgresql defaults to 'NO ACTION', which
means it will not allow the delete if there are records referencing
the row you try to delete.

> I don't know, maybe I have the wrong mindset, and cascading is
> preferable (in terms of object model) in some cases? I'd like to read
> more on this subject (general best practices for what types of
> cascading are appropriate to use when).

It's not an object model, it's a relational model. In a relational
model cascading makes sense, as it's a property of the relation
between records in two (or more) tables.

I often notice people get confused about the direction of a cascade.
If you define a foreign key with a cascade (whether on UPDATE or
DELETE doesn't matter), then the cascade fires on operations on the
_referred_ record, not on operations on the _referring_ record.

Basically I determine which cascading method to use by how records are
related:

If the referring record only has any meaning in relation to the
referred record, then you use CASCADE on deletes. A typical use case
us a connection table in a many-to-many relationship.

If the referring record has a meaning regardless of whether the
referred record exists, then you use SET NULL. For example, if you
have a relation between an employee and a desk, the desk still exists
after you fire the employee.

If the referring table is important to your data then you don't want
to be allowed to delete the referred records while there are records
referring them. Typically this is the case if most of the meaning of a
relation is in the referring tables. Often these are tables referring
to user records; such a database generally isn't about the users, but
about what we want to store ABOUT them. The users are just
placeholders to be able to make a distinction between them. In this
case your application needs to be able to deal with a foreign key
constraint violation or have some method to prevent those from occuring.

If you don't know what to choose for a given relation it's safe to
stick with the default, but you do need to think about what your
application needs to do in such cases.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4a30d738759151089568354!



Re: When to use cascading deletes?

From
David
Date:
Thanks for the tips, those make sense.

I was thinking through this some more after sending my mail, and came
to similar conclusions.

It would be nice though if this info was more available to people
doing research on the subject. Where did you pick up these ideas? At
least this thread should start turning up in Google at some point :-)

Also, probably some of my confusion comes from being used to
app-centric logic, where the app (and programmer/sysadmin/etc) likes
to be in control of all the database updates, rather than additional
deletes/updates/etc taking place behind the scenes. I guess it's a
kind of trade-off. Handing off more logic to the db to simplify db
interaction, at the expense of the database sometimes doing
unexpected/unwanted things.

For instance, it's possible to make the db rules *too* clever. That
kind of thing can bite you later, when the db makes updates in the
background that you don't want for some reason. Then you need to work
around them, and can't take those rules out permanently, since it
breaks other apps that depend on the db being intelligent. You need to
think really carefully about when to add db intelligence, which could
have unwanted consequences later...

For people like me, I think it helps to think of the data as living in
it's own world inside the DB, with extra rules there that apps don't
always know or care about, and the users & apps need to be aware of
the conventions around each table. As opposed to being a simple data
store for apps (with integrity checks). For similar reasons I've
avoided stored procedures and rules, preferring simpler database
schema, but more complex logic to handle them in apps.

Another consideration, is revision controlling of the extra rules/etc.
My current approach is to have all db schema setup & updates etc in a
Python script during development, which is revision controlled (and
then later re-used for remote db installs/upgrades).

David.

Re: When to use cascading deletes?

From
Steve Clark
Date:
David wrote:
> Thanks for the tips, those make sense.
>
> I was thinking through this some more after sending my mail, and came
> to similar conclusions.
>
> It would be nice though if this info was more available to people
> doing research on the subject. Where did you pick up these ideas? At
> least this thread should start turning up in Google at some point :-)
>
> Also, probably some of my confusion comes from being used to
> app-centric logic, where the app (and programmer/sysadmin/etc) likes
> to be in control of all the database updates, rather than additional
> deletes/updates/etc taking place behind the scenes. I guess it's a
> kind of trade-off. Handing off more logic to the db to simplify db
> interaction, at the expense of the database sometimes doing
> unexpected/unwanted things.
>
> For instance, it's possible to make the db rules *too* clever. That
> kind of thing can bite you later, when the db makes updates in the
> background that you don't want for some reason. Then you need to work
> around them, and can't take those rules out permanently, since it
> breaks other apps that depend on the db being intelligent. You need to
> think really carefully about when to add db intelligence, which could
> have unwanted consequences later...
>
> For people like me, I think it helps to think of the data as living in
> it's own world inside the DB, with extra rules there that apps don't
> always know or care about, and the users & apps need to be aware of
> the conventions around each table. As opposed to being a simple data
> store for apps (with integrity checks). For similar reasons I've
> avoided stored procedures and rules, preferring simpler database
> schema, but more complex logic to handle them in apps.
>
> Another consideration, is revision controlling of the extra rules/etc.
> My current approach is to have all db schema setup & updates etc in a
> Python script during development, which is revision controlled (and
> then later re-used for remote db installs/upgrades).
>
> David.
>
Hi David,

The advantage of putting business logic in the database means it is in only one place.
If you leave it to the apps then you have no consistency unless you have a library
of the more complex logic that all the apps use. The problem with this is I find
programmers that always want to reinvent the wheel and do their own thing so
they ignore library and do it "better" themselves.

My $.02
Steve

Re: When to use cascading deletes?

From
Sim Zacks
Date:
I use cascading deletes as per business rule.
For example, my customer record has multiple orders and each order can
have multiple shipments and multiple payments.
My business rule is not to erase a customer with orders, but orders
should be erased even if they have shipments or payments.


The business logic behind this was that we want to make it a hassle to
erase a customer. On the other hand, erasing an order that has shipments
  or payments is done very regularly, because shipment and payment
records are added before they actually happen (planned shipment and
payments), so order records are erased when the order isn't finalized.
The business rule was to make it less of a hassle.


Sim

David wrote:
> Hi there.
>
> When is a good time to use cascading deletes?
>
> Usually, for safety reasons, I prefer to not ever use cascading
> deletes. But that can lead to some complex code that uses topological
> sorts etc to manually delete records in the right order, when a
> cascading delete is needed.

> I don't know, maybe I have the wrong mindset, and cascading is
> preferable (in terms of object model) in some cases? I'd like to read
> more on this subject (general best practices for what types of
> cascading are appropriate to use when).
>
> Any tips?
>
> Thanks,
>
> David.
>

Re: When to use cascading deletes?

From
Greg Stark
Date:
On Thu, Jun 11, 2009 at 9:59 AM, David<wizzardx@gmail.com> wrote:
>
> Ideally, I'd like postgresql to not do cascading deletes, *except*
> when I tell it to, and the rest of the time fail when the user didn't
> explicitly "opt in" for cascading deletes. When it comes to enabling
> cascading deletes, I don't really like the idea that deleting or
> updating a row from one table can have a possibly unexpected (to the
> app programmer, using the database) chain reaction to other tables.

One option would be to enable cascading deletes but not grant delete
option to the user the application connects as. So either an
administrative interface which is expected to do larger slower
operations sometimes might be allowed but the public-facing user
interface isn't allowed to accidentally delete a record which would
cause massive damage. Generally I find it makes sense to design the
database so that user-facing public interfaces can't delete much of
substance anyways.

I generally leave cascade off except for many-to-many mapping tables
which contain no additional data and are a pain to manage. Which does
sound similar to Alban's rule of thumb.

Incidentally you can avoid the topological sort by deferring
constraints and doing all the deletes in the same transaction.

--
Gregory Stark
http://mit.edu/~gsstark/resume.pdf

Re: When to use cascading deletes?

From
Richard Broersma
Date:
On Thu, Jun 11, 2009 at 4:44 AM, Greg Stark<stark@enterprisedb.com> wrote:

> I generally leave cascade off except for many-to-many mapping tables
> which contain no additional data and are a pain to manage. Which does
> sound similar to Alban's rule of thumb.

Cascading deletes also make sense for vertically partitioned tables
with a 1 to 1 relationship.


--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Re: When to use cascading deletes?

From
Eric Schwarzenbach
Date:
My rule of thumb for when to use to not use cascading deletes is this:

If the what the record represents can essentially be thought of a "part
of" what the record that it references represents, I use cascading
deletes. If what the record represents has an independent existence,
that it, it does not necessarily have the same life cycle, I prohibit
cascading. (This is more or less the distinction between composition and
aggregation in UML terms, if I remember correctly.)

This amounts to the same advice others have already given, but in other
terms, and may be helpful if you conceive of your data this way.

Eric

David wrote:
> Hi there.
>
> When is a good time to use cascading deletes?
>
> Usually, for safety reasons, I prefer to not ever use cascading
> deletes. But that can lead to some complex code that uses topological
> sorts etc to manually delete records in the right order, when a
> cascading delete is needed.
>
> Ideally, I'd like postgresql to not do cascading deletes, *except*
> when I tell it to, and the rest of the time fail when the user didn't
> explicitly "opt in" for cascading deletes. When it comes to enabling
> cascading deletes, I don't really like the idea that deleting or
> updating a row from one table can have a possibly unexpected (to the
> app programmer, using the database) chain reaction to other tables.
>
> I don't know, maybe I have the wrong mindset, and cascading is
> preferable (in terms of object model) in some cases? I'd like to read
> more on this subject (general best practices for what types of
> cascading are appropriate to use when).
>
> Any tips?
>
> Thanks,
>
> David.
>
>


Re: When to use cascading deletes?

From
Russ Brown
Date:
On 06/11/2009 11:33 AM, Eric Schwarzenbach wrote:
> My rule of thumb for when to use to not use cascading deletes is this:
>
> If the what the record represents can essentially be thought of a "part
> of" what the record that it references represents, I use cascading
> deletes. If what the record represents has an independent existence,
> that it, it does not necessarily have the same life cycle, I prohibit
> cascading. (This is more or less the distinction between composition and
> aggregation in UML terms, if I remember correctly.)
>
> This amounts to the same advice others have already given, but in other
> terms, and may be helpful if you conceive of your data this way.
>
> Eric

"part of" is exactly the term that I was thinking of as well.

Re: When to use cascading deletes?

From
"Leif B. Kristensen"
Date:
On Thursday 11. June 2009, David wrote:

>When is a good time to use cascading deletes?

As a real world example, I've got a data model that consists of three
major entities: Persons, Events, and Sources. The Events table is
linked to Persons through the junction table Participants, and to the
Sources through the junction table Citations. In both the Participants
and the Citations tables I have declared the foreign key events_fk as
ON DELETE CASCADE. That is because they don't carry extra information
that is usable outside the context of the Event to which they are
pointing. So, if I delete an Event, the Participants as well as the
Citations to that event are hosed. But the Persons and the Sources will
of course remain in the database.
--
Leif Biberg Kristensen | Registered Linux User #338009
Me And My Database: http://solumslekt.org/blog/