Thread: FOREIGN KEYS vs PERFORMANCE
Hi,
I think this is an old question, but I want to know if it really is well worth to not create some foreign keys an deal with the referential integrity at application-level?????
Specifically, the system we are developing is a server/cliente architecture that the server is the database and the fat client is an application developed in DELPHI!!!
Thanks in advance!!
On Tue, 2006-04-11 at 14:13, Rodrigo Sakai wrote: > Hi, > > I think this is an old question, but I want to know if it really is > well worth to not create some foreign keys an deal with the > referential integrity at application-level????? > Specifically, the system we are developing is a server/cliente > architecture that the server is the database and the fat client is an > application developed in DELPHI!!! > If ref integrity is important, you'll have to do it either in the app or the database. Almost always, it's faster to let the database do it, as there's less traffic across the wire required to maintain ref integrity, plus, the guys who wrote the database have spent years making sure race conditions won't scram your data. For simple, straight forward FK->PK relationships, you will likely NOT be able to beat the database in terms of either reliability or performance with your own code.
On Apr 12, 2006, at 4:13 , Rodrigo Sakai wrote: > I think this is an old question, but I want to know if it really > is well worth to not create some foreign keys an deal with the > referential integrity at application-level????? If I had to choose between one or the other, I'd leave all referential integrity in the database and deal with the errors thrown when referential integrity is violated in the application. PostgreSQL is designed to handle these kinds of issues. Anything you code in your application is more likely to contain bugs or miss corner cases that would allow referential integrity to be violated. PostgreSQL has been pounded on for years by a great many users and developers, making the likelihood of bugs still remaining much smaller. Of course, you can add some referential integrity checks in your application code, but those should be in addition to your database- level checks. Michael Glaesemann grzm myrealbox com
On Wed, Apr 12, 2006 at 08:06:17AM +0900, Michael Glaesemann wrote: > > On Apr 12, 2006, at 4:13 , Rodrigo Sakai wrote: > > > I think this is an old question, but I want to know if it really > >is well worth to not create some foreign keys an deal with the > >referential integrity at application-level????? > > If I had to choose between one or the other, I'd leave all > referential integrity in the database and deal with the errors thrown > when referential integrity is violated in the application. PostgreSQL > is designed to handle these kinds of issues. Anything you code in > your application is more likely to contain bugs or miss corner cases > that would allow referential integrity to be violated. PostgreSQL has > been pounded on for years by a great many users and developers, > making the likelihood of bugs still remaining much smaller. It's also pretty unlikely that you can make RI in the application perform better than in the database. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Wed, Apr 12, 2006 at 08:06:17AM +0900, Michael Glaesemann wrote: >> ... Anything you code in >> your application is more likely to contain bugs or miss corner cases >> that would allow referential integrity to be violated. PostgreSQL has >> been pounded on for years by a great many users and developers, >> making the likelihood of bugs still remaining much smaller. > It's also pretty unlikely that you can make RI in the application > perform better than in the database. I think the traditional assumption among the "you should do RI in the application" crowd is that the application has higher-level knowledge that lets it understand when it can skip doing an RI check entirely. Skipping an RI check is always faster than doing it --- so that's right, it's faster. As long as you don't make any mistakes. The question you have to ask yourself is whether you are really that smart ... not just today, but every single time. To quote Clint Eastwood: "Do you feel lucky punk? Well, do you?" regards, tom lane
>> I think this is an old question, but I want to know if it really is >> well worth to not create some foreign keys an deal with the referential >> integrity at application-level????? Trust me : do it in the application and you'll enter a world of hurt. I'm doing it with some mysql apps, and it's a nightmare ; doing cascaded delete's by hand, etc, you always forget something, you have to modify a million places in your code everytime you add a new table, your ORM bloats, you get to write cleanup cron scripts which take forever to run, your website crashes etc.
Hi, Michael, Hi, Rodrigo, Michael Glaesemann wrote: > If I had to choose between one or the other, I'd leave all referential > integrity in the database and deal with the errors thrown when > referential integrity is violated in the application. PostgreSQL is > designed to handle these kinds of issues. Anything you code in your > application is more likely to contain bugs or miss corner cases that > would allow referential integrity to be violated. PostgreSQL has been > pounded on for years by a great many users and developers, making the > likelihood of bugs still remaining much smaller. I strictly agree with Michael here. > Of course, you can add some referential integrity checks in your > application code, but those should be in addition to your database- > level checks. Agree. It does make sense to have reference checks in the UI or application level for the sake of better error handling, but the database should be the mandatory judge. There's another advantage of database based checking: Should there ever be the need of a different application working on the same database (e. G. an "expert level UI", or some connector that connects / synchronizes to another software, or a data import tool), database based constraints cannot be broken opposed to application based ones. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
>>> I think this is an old question, but I want to know if it really >>> is well worth to not create some foreign keys an deal with the >>> referential integrity at application-level????? > > > Trust me : do it in the application and you'll enter a world of > hurt. I'm doing it with some mysql apps, and it's a nightmare ; doing > cascaded delete's by hand, etc, you always forget something, you have > to modify a million places in your code everytime you add a new table, > your ORM bloats, you get to write cleanup cron scripts which take > forever to run, your website crashes etc. All good advice, but... there are no absolutes in this world. Application-enforced referential integrity makes sense if(and probably ONLY if): 1. You have only one application that modifies the data. (Otherwise, you have to duplicate the rules across many applications,leading to a code-maintenance nightmare). 2. If your application crashes and leaves a mess, it's not a catastrophe, and you have a good way to clean it up. For example,a bank shouldn't do this, but it might be OK for a computer-aided-design application, or the backend of a news website. 3. You have application-specific knowledge about when you can skip referential integrity and thereby greatly improve performance. For example, you may have batch operations where large numbers of rows are temporarily inconsistent. If your application doesn't meet ALL of these criteria, you probably should use the database for referential integrity. Craig
Thanks for all help!! But my problem is with performance, I agree with all of you, the RI must be maintained by the database, because a bunch of reasons that everyone knows! But, I'm dealing with a very huge database that servers more than 200 clientes at the same time, and because of it, each manipulation (delete, insert, update, select) on the database have a poor performance. So, if we deal with RI in each client station, we take this work off the database! The application is an ERP developed with DELPHI + (postgresql or oracle or sql server)!! Thanks again!! ----- Original Message ----- From: "Markus Schaber" <schabi@logix-tt.com> To: <pgsql-performance@postgresql.org> Cc: "Rodrigo Sakai" <rodrigo.sakai@zanthus.com.br> Sent: Wednesday, April 12, 2006 10:18 AM Subject: Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE > Hi, Michael, > Hi, Rodrigo, > > Michael Glaesemann wrote: > > > If I had to choose between one or the other, I'd leave all referential > > integrity in the database and deal with the errors thrown when > > referential integrity is violated in the application. PostgreSQL is > > designed to handle these kinds of issues. Anything you code in your > > application is more likely to contain bugs or miss corner cases that > > would allow referential integrity to be violated. PostgreSQL has been > > pounded on for years by a great many users and developers, making the > > likelihood of bugs still remaining much smaller. > > I strictly agree with Michael here. > > > Of course, you can add some referential integrity checks in your > > application code, but those should be in addition to your database- > > level checks. > > Agree. It does make sense to have reference checks in the UI or > application level for the sake of better error handling, but the > database should be the mandatory judge. > > There's another advantage of database based checking: Should there ever > be the need of a different application working on the same database (e. > G. an "expert level UI", or some connector that connects / synchronizes > to another software, or a data import tool), database based constraints > cannot be broken opposed to application based ones. > > HTH, > Markus > -- > Markus Schaber | Logical Tracking&Tracing International AG > Dipl. Inf. | Software Development GIS > > Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org >
On 4/11/06, Rodrigo Sakai <rodrigo.sakai@zanthus.com.br> wrote: > > Hi, > > I think this is an old question, but I want to know if it really is well > worth to not create some foreign keys an deal with the referential integrity > at application-level????? > Specifically, the system we are developing is a server/cliente > architecture that the server is the database and the fat client is an > application developed in DELPHI!!! > > Thanks in advance!! Delphi IMO is the best RAD win32 IDE ever invented (especially when paired the very excellent Zeos connection objects). However, for purposes of data management ,imperative languages, Delphi included, simply suck. Great form editor though. merlin
On Apr 12, 2006, at 23:49 , Rodrigo Sakai wrote: > Thanks for all help!! But my problem is with performance, I agree > with all > of you, the RI must be maintained by the database, because a bunch of > reasons that everyone knows! You've gotten a variety of good advice from a number of people. For more specific advice (e.g., for your particular situation), it would be very helpful if you could provide examples of queries that aren't performing well for you (including table schema and explain analyze output). Michael Glaesemann grzm myrealbox com
What kinds of operations are made slow by foreign key checks ? Is it : - Simple checks on INSERT ? - Simple checks on UPDATE ? - Cascaded deletes ? - Cascaded updates ? - Locks ? - Something else ? Foreign keys are to ensure that the value in a column is always part of a specific set (the referenced table). If this set changes very rarely, like the list of countries in the world, or the list of states in a country, or the various possible states an order can be in (received, processed, shipped...) then it's alright to skip the check if you're sure your application will insert a valid value. For some other situations, doing the check in the application will need some queries and could be slower (and certainly will be more complicated...) Are you sure you're not missing a few indexes, which would then force fkey checks to use sequential scans ? > Thanks for all help!! But my problem is with performance, I agree with > all > of you, the RI must be maintained by the database, because a bunch of > reasons that everyone knows! > But, I'm dealing with a very huge database that servers more than 200 > clientes at the same time, and because of it, each manipulation (delete, > insert, update, select) on the database have a poor performance. So, if > we > deal with RI in each client station, we take this work off the database! > The application is an ERP developed with DELPHI + (postgresql or > oracle or > sql server)!!
On Wed, 2006-04-12 at 09:49, Rodrigo Sakai wrote: > Thanks for all help!! But my problem is with performance, I agree with all > of you, the RI must be maintained by the database, because a bunch of > reasons that everyone knows! > But, I'm dealing with a very huge database that servers more than 200 > clientes at the same time, and because of it, each manipulation (delete, > insert, update, select) on the database have a poor performance. So, if we > deal with RI in each client station, we take this work off the database! > The application is an ERP developed with DELPHI + (postgresql or oracle or > sql server)!! These are separate issues. One is performance of PostgreSQL handling FK->PK relationships. PostgreSQL, in my experience, is quite fast at this. However, there are ways you can set up FK->PK relationships that are non-optimal and will result in poor performance. FK->PK relationships are generally fastest when they are 1-1 and based on integer types. If there's a type mismatch, or you use slower types, like large text fields, or numerics, you may have poor performance. Give us a sample of your schema where you're having problems, let us help you troubleshoot your performance. High parallel load is another issue. No matter where you put your FK->PK relationship handling, having 200+ users connected at the same time and manipulating your database is a heavy load. Handling FK->PK relationships in software often is vulnerable to race conditions. Like so: (T1 and T2 are different "threads) T1: select id from mastertable where id=99; -- check for row T2: delete from mastertable where id=99; -- delete a row T1: insert into slavetable values (....); -- whoops! No master If we change the T1 to select for update, we now have the overhead that most FK->PK relationships have. What version of PostgreSQL are you running. Older versions had much poorer performance than newer versions when updating FK->PK relationships. Don't assume that application level FK->PK relationships will be faster AND as good as the ones at database level. It's quite possible that they're faster for you because you're cutting corners, referentially speaking, and your data will wind up incoherent over time. Also, you may be dealing with a database that is IO bound, and moving the FK checks to software is only a short stop gap, and as the machine hits the IO performance ceiling, you'll have the same problem again, need a bigger machine, and have incoherent data. I.e. the same problem, plus a few more, and have spent a lot of time spinning your wheels going a short distance.
On Wed, Apr 12, 2006 at 09:22:52AM +0200, PFC wrote: > > >> I think this is an old question, but I want to know if it really is > >>well worth to not create some foreign keys an deal with the referential > >>integrity at application-level????? > > Trust me : do it in the application and you'll enter a world of > hurt. I'm doing it with some mysql apps, and it's a nightmare ; doing > cascaded delete's by hand, etc, you always forget something, you have to > modify a million places in your code everytime you add a new table, your > ORM bloats, you get to write cleanup cron scripts which take forever to > run, your website crashes etc. Well, yeah, thats typical for MySQL sites, but what's that have to do with RI? Sorry, couldn't resist. :P
On Wed, Apr 12, 2006 at 07:45:17AM -0700, Craig A. James wrote: > All good advice, but... there are no absolutes in this world. > Application-enforced referential integrity makes sense if (and probably > ONLY if): > > 1. You have only one application that modifies the data. (Otherwise, you > have to duplicate the rules across many applications, leading to a > code-maintenance nightmare). You forgot something: 1a: You know that there will never, ever, ever, ever, be any other application that wants to talk to the database. I know tons of people that get burned because they go with something that's "good enough for now", and then regret that decision for years to come. > 2. If your application crashes and leaves a mess, it's not a catastrophe, > and you have a good way to clean it up. For example, a bank shouldn't do > this, but it might be OK for a computer-aided-design application, or the > backend of a news web site. > > 3. You have application-specific knowledge about when you can skip > referential integrity and thereby greatly improve performance. For > example, you may have batch operations where large numbers of rows are > temporarily inconsistent. > > If your application doesn't meet ALL of these criteria, you probably should > use the database for referential integrity. > > Craig > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: >>1. You have only one application that modifies the data. (Otherwise, you >>have to duplicate the rules across many applications, leading to a >>code-maintenance nightmare). > > You forgot something: > > 1a: You know that there will never, ever, ever, ever, be any other > application that wants to talk to the database. > > I know tons of people that get burned because they go with something > that's "good enough for now", and then regret that decision for years to > come. No, I don't agree with this. Too many people waste time designing for "what if..." scenarios that never happen. You don'twant to be dumb and design something that locks out a foreseeable and likely future need, but referential integritydoesn't meet this criterion. There's nothing to keep you from changing from app-managed to database-managed referentialintegrity if your needs change. Design for your current requirements. Let us be of good cheer, remembering that the misfortunes hardest to bear are those which never happen. - James Russell Lowell (1819-1891) Therefore do not be anxious about tomorrow, for tomorrow will be anxious for itself. Let the day's own trouble be sufficient for the day. - Matthew 6:34 Craig
On Wed, Apr 12, 2006 at 10:36:28AM -0700, Craig A. James wrote: > Jim C. Nasby wrote: > >>1. You have only one application that modifies the data. (Otherwise, you > >>have to duplicate the rules across many applications, leading to a > >>code-maintenance nightmare). > > > >You forgot something: > > > >1a: You know that there will never, ever, ever, ever, be any other > >application that wants to talk to the database. > > > >I know tons of people that get burned because they go with something > >that's "good enough for now", and then regret that decision for years to > >come. > > No, I don't agree with this. Too many people waste time designing for > "what if..." scenarios that never happen. You don't want to be dumb and > design something that locks out a foreseeable and likely future need, but > referential integrity doesn't meet this criterion. There's nothing to keep > you from changing from app-managed to database-managed referential > integrity if your needs change. In this case your argument makes no sense, because you will spend far more time re-creating RI capability inside an application than if you just use what the database offers natively. It's certainly true that you don't want to over-engineer for no reason, but many times choices are made to save a very small amount of time or hassle up-front, and those choices become extremely painful later. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: >>No, I don't agree with this. Too many people waste time designing for >>"what if..." scenarios that never happen. You don't want to be dumb and >>design something that locks out a foreseeable and likely future need, but >>referential integrity doesn't meet this criterion. There's nothing to keep >>you from changing from app-managed to database-managed referential >>integrity if your needs change. > > In this case your argument makes no sense, because you will spend far > more time re-creating RI capability inside an application than if you > just use what the database offers natively. But one of the specific conditions in my original response was, "You have application-specific knowledge about when you canskip referential integrity and thereby greatly improve performance." If you can't do that, I agree with you. Anyway, this discussion is probably going on too long, and I'm partly to blame. I think we all agree that in almost allsituations, using the database to do referential integrity is the right choice, and that you should only violate thisrule if you have a really, really good reason, and you've thought out the implications carefully, and you know you mayhave to pay a steep price later if your requirements change. Craig
Thanks for all responses! I agree with most of you, and say that the RI is best maintened by Database ! Performance must be improved in other ways (indexes, hardware, etc)! ----- Original Message ----- From: "Jim C. Nasby" <jnasby@pervasive.com> To: "Craig A. James" <cjames@modgraph-usa.com> Cc: "PFC" <lists@peufeu.com>; "Michael Glaesemann" <grzm@myrealbox.com>; "Rodrigo Sakai" <rodrigo.sakai@zanthus.com.br>; <pgsql-performance@postgresql.org> Sent: Wednesday, April 12, 2006 5:59 PM Subject: Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE > On Wed, Apr 12, 2006 at 10:36:28AM -0700, Craig A. James wrote: > > Jim C. Nasby wrote: > > >>1. You have only one application that modifies the data. (Otherwise, you > > >>have to duplicate the rules across many applications, leading to a > > >>code-maintenance nightmare). > > > > > >You forgot something: > > > > > >1a: You know that there will never, ever, ever, ever, be any other > > >application that wants to talk to the database. > > > > > >I know tons of people that get burned because they go with something > > >that's "good enough for now", and then regret that decision for years to > > >come. > > > > No, I don't agree with this. Too many people waste time designing for > > "what if..." scenarios that never happen. You don't want to be dumb and > > design something that locks out a foreseeable and likely future need, but > > referential integrity doesn't meet this criterion. There's nothing to keep > > you from changing from app-managed to database-managed referential > > integrity if your needs change. > > In this case your argument makes no sense, because you will spend far > more time re-creating RI capability inside an application than if you > just use what the database offers natively. > > It's certainly true that you don't want to over-engineer for no reason, > but many times choices are made to save a very small amount of time or > hassle up-front, and those choices become extremely painful later. > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >