Thread: When to use cascading deletes?
Hi there. When is a good time to use cascading deletes? Usually, for safety reasons, I prefer to not ever use cascading deletes. But that can lead to some complex code that uses topological sorts etc to manually delete records in the right order, when a cascading delete is needed. Ideally, I'd like postgresql to not do cascading deletes, *except* when I tell it to, and the rest of the time fail when the user didn't explicitly "opt in" for cascading deletes. When it comes to enabling cascading deletes, I don't really like the idea that deleting or updating a row from one table can have a possibly unexpected (to the app programmer, using the database) chain reaction to other tables. I don't know, maybe I have the wrong mindset, and cascading is preferable (in terms of object model) in some cases? I'd like to read more on this subject (general best practices for what types of cascading are appropriate to use when). Any tips? Thanks, David.
On Jun 11, 2009, at 10:59 AM, David wrote: > Hi there. > > When is a good time to use cascading deletes? As a general rule of thumb I use cascading deletes if the data in a record is meaningless without the record that the foreign key points to. > Ideally, I'd like postgresql to not do cascading deletes, *except* > when I tell it to, and the rest of the time fail when the user didn't > explicitly "opt in" for cascading deletes. That's the default behaviour. If you don't specify what to do on DELETE (or UPDATE) then postgresql defaults to 'NO ACTION', which means it will not allow the delete if there are records referencing the row you try to delete. > I don't know, maybe I have the wrong mindset, and cascading is > preferable (in terms of object model) in some cases? I'd like to read > more on this subject (general best practices for what types of > cascading are appropriate to use when). It's not an object model, it's a relational model. In a relational model cascading makes sense, as it's a property of the relation between records in two (or more) tables. I often notice people get confused about the direction of a cascade. If you define a foreign key with a cascade (whether on UPDATE or DELETE doesn't matter), then the cascade fires on operations on the _referred_ record, not on operations on the _referring_ record. Basically I determine which cascading method to use by how records are related: If the referring record only has any meaning in relation to the referred record, then you use CASCADE on deletes. A typical use case us a connection table in a many-to-many relationship. If the referring record has a meaning regardless of whether the referred record exists, then you use SET NULL. For example, if you have a relation between an employee and a desk, the desk still exists after you fire the employee. If the referring table is important to your data then you don't want to be allowed to delete the referred records while there are records referring them. Typically this is the case if most of the meaning of a relation is in the referring tables. Often these are tables referring to user records; such a database generally isn't about the users, but about what we want to store ABOUT them. The users are just placeholders to be able to make a distinction between them. In this case your application needs to be able to deal with a foreign key constraint violation or have some method to prevent those from occuring. If you don't know what to choose for a given relation it's safe to stick with the default, but you do need to think about what your application needs to do in such cases. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4a30d738759151089568354!
Thanks for the tips, those make sense. I was thinking through this some more after sending my mail, and came to similar conclusions. It would be nice though if this info was more available to people doing research on the subject. Where did you pick up these ideas? At least this thread should start turning up in Google at some point :-) Also, probably some of my confusion comes from being used to app-centric logic, where the app (and programmer/sysadmin/etc) likes to be in control of all the database updates, rather than additional deletes/updates/etc taking place behind the scenes. I guess it's a kind of trade-off. Handing off more logic to the db to simplify db interaction, at the expense of the database sometimes doing unexpected/unwanted things. For instance, it's possible to make the db rules *too* clever. That kind of thing can bite you later, when the db makes updates in the background that you don't want for some reason. Then you need to work around them, and can't take those rules out permanently, since it breaks other apps that depend on the db being intelligent. You need to think really carefully about when to add db intelligence, which could have unwanted consequences later... For people like me, I think it helps to think of the data as living in it's own world inside the DB, with extra rules there that apps don't always know or care about, and the users & apps need to be aware of the conventions around each table. As opposed to being a simple data store for apps (with integrity checks). For similar reasons I've avoided stored procedures and rules, preferring simpler database schema, but more complex logic to handle them in apps. Another consideration, is revision controlling of the extra rules/etc. My current approach is to have all db schema setup & updates etc in a Python script during development, which is revision controlled (and then later re-used for remote db installs/upgrades). David.
David wrote: > Thanks for the tips, those make sense. > > I was thinking through this some more after sending my mail, and came > to similar conclusions. > > It would be nice though if this info was more available to people > doing research on the subject. Where did you pick up these ideas? At > least this thread should start turning up in Google at some point :-) > > Also, probably some of my confusion comes from being used to > app-centric logic, where the app (and programmer/sysadmin/etc) likes > to be in control of all the database updates, rather than additional > deletes/updates/etc taking place behind the scenes. I guess it's a > kind of trade-off. Handing off more logic to the db to simplify db > interaction, at the expense of the database sometimes doing > unexpected/unwanted things. > > For instance, it's possible to make the db rules *too* clever. That > kind of thing can bite you later, when the db makes updates in the > background that you don't want for some reason. Then you need to work > around them, and can't take those rules out permanently, since it > breaks other apps that depend on the db being intelligent. You need to > think really carefully about when to add db intelligence, which could > have unwanted consequences later... > > For people like me, I think it helps to think of the data as living in > it's own world inside the DB, with extra rules there that apps don't > always know or care about, and the users & apps need to be aware of > the conventions around each table. As opposed to being a simple data > store for apps (with integrity checks). For similar reasons I've > avoided stored procedures and rules, preferring simpler database > schema, but more complex logic to handle them in apps. > > Another consideration, is revision controlling of the extra rules/etc. > My current approach is to have all db schema setup & updates etc in a > Python script during development, which is revision controlled (and > then later re-used for remote db installs/upgrades). > > David. > Hi David, The advantage of putting business logic in the database means it is in only one place. If you leave it to the apps then you have no consistency unless you have a library of the more complex logic that all the apps use. The problem with this is I find programmers that always want to reinvent the wheel and do their own thing so they ignore library and do it "better" themselves. My $.02 Steve
I use cascading deletes as per business rule. For example, my customer record has multiple orders and each order can have multiple shipments and multiple payments. My business rule is not to erase a customer with orders, but orders should be erased even if they have shipments or payments. The business logic behind this was that we want to make it a hassle to erase a customer. On the other hand, erasing an order that has shipments or payments is done very regularly, because shipment and payment records are added before they actually happen (planned shipment and payments), so order records are erased when the order isn't finalized. The business rule was to make it less of a hassle. Sim David wrote: > Hi there. > > When is a good time to use cascading deletes? > > Usually, for safety reasons, I prefer to not ever use cascading > deletes. But that can lead to some complex code that uses topological > sorts etc to manually delete records in the right order, when a > cascading delete is needed. > I don't know, maybe I have the wrong mindset, and cascading is > preferable (in terms of object model) in some cases? I'd like to read > more on this subject (general best practices for what types of > cascading are appropriate to use when). > > Any tips? > > Thanks, > > David. >
On Thu, Jun 11, 2009 at 9:59 AM, David<wizzardx@gmail.com> wrote: > > Ideally, I'd like postgresql to not do cascading deletes, *except* > when I tell it to, and the rest of the time fail when the user didn't > explicitly "opt in" for cascading deletes. When it comes to enabling > cascading deletes, I don't really like the idea that deleting or > updating a row from one table can have a possibly unexpected (to the > app programmer, using the database) chain reaction to other tables. One option would be to enable cascading deletes but not grant delete option to the user the application connects as. So either an administrative interface which is expected to do larger slower operations sometimes might be allowed but the public-facing user interface isn't allowed to accidentally delete a record which would cause massive damage. Generally I find it makes sense to design the database so that user-facing public interfaces can't delete much of substance anyways. I generally leave cascade off except for many-to-many mapping tables which contain no additional data and are a pain to manage. Which does sound similar to Alban's rule of thumb. Incidentally you can avoid the topological sort by deferring constraints and doing all the deletes in the same transaction. -- Gregory Stark http://mit.edu/~gsstark/resume.pdf
On Thu, Jun 11, 2009 at 4:44 AM, Greg Stark<stark@enterprisedb.com> wrote: > I generally leave cascade off except for many-to-many mapping tables > which contain no additional data and are a pain to manage. Which does > sound similar to Alban's rule of thumb. Cascading deletes also make sense for vertically partitioned tables with a 1 to 1 relationship. -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug
My rule of thumb for when to use to not use cascading deletes is this: If the what the record represents can essentially be thought of a "part of" what the record that it references represents, I use cascading deletes. If what the record represents has an independent existence, that it, it does not necessarily have the same life cycle, I prohibit cascading. (This is more or less the distinction between composition and aggregation in UML terms, if I remember correctly.) This amounts to the same advice others have already given, but in other terms, and may be helpful if you conceive of your data this way. Eric David wrote: > Hi there. > > When is a good time to use cascading deletes? > > Usually, for safety reasons, I prefer to not ever use cascading > deletes. But that can lead to some complex code that uses topological > sorts etc to manually delete records in the right order, when a > cascading delete is needed. > > Ideally, I'd like postgresql to not do cascading deletes, *except* > when I tell it to, and the rest of the time fail when the user didn't > explicitly "opt in" for cascading deletes. When it comes to enabling > cascading deletes, I don't really like the idea that deleting or > updating a row from one table can have a possibly unexpected (to the > app programmer, using the database) chain reaction to other tables. > > I don't know, maybe I have the wrong mindset, and cascading is > preferable (in terms of object model) in some cases? I'd like to read > more on this subject (general best practices for what types of > cascading are appropriate to use when). > > Any tips? > > Thanks, > > David. > >
On 06/11/2009 11:33 AM, Eric Schwarzenbach wrote: > My rule of thumb for when to use to not use cascading deletes is this: > > If the what the record represents can essentially be thought of a "part > of" what the record that it references represents, I use cascading > deletes. If what the record represents has an independent existence, > that it, it does not necessarily have the same life cycle, I prohibit > cascading. (This is more or less the distinction between composition and > aggregation in UML terms, if I remember correctly.) > > This amounts to the same advice others have already given, but in other > terms, and may be helpful if you conceive of your data this way. > > Eric "part of" is exactly the term that I was thinking of as well.
On Thursday 11. June 2009, David wrote: >When is a good time to use cascading deletes? As a real world example, I've got a data model that consists of three major entities: Persons, Events, and Sources. The Events table is linked to Persons through the junction table Participants, and to the Sources through the junction table Citations. In both the Participants and the Citations tables I have declared the foreign key events_fk as ON DELETE CASCADE. That is because they don't carry extra information that is usable outside the context of the Event to which they are pointing. So, if I delete an Event, the Participants as well as the Citations to that event are hosed. But the Persons and the Sources will of course remain in the database. -- Leif Biberg Kristensen | Registered Linux User #338009 Me And My Database: http://solumslekt.org/blog/