Thread: Foreign Keys Constraints, perforamance analysis
CREATE TABLE person (
id integer DEFAULT nextval('person_id_seq'),
name TEXT
);
CREATE UNIQUE INDEX person_id_key ON person(id);
CREATE TABLE married_fkc (
id integer DEFAULT nextval('married_fkc_id_seq'),
person1ID integer NOT NULL REFERENCES person ( id ) ON DELETE CASCADE,
person2ID integer NOT NULL REFERENCES person ( id ) ON DELETE CASCADE,
UNIQUE ( person1ID ),
UNIQUE ( person2ID )
);
CREATE UNIQUE INDEX married_fkc_id_key ON married_fkc(id);
CREATE TABLE married (
id integer DEFAULT nextval('married_id_seq'),
person1ID integer NOT NULL,
person2ID integer NOT NULL,
UNIQUE ( person1ID ),
UNIQUE ( person2ID )
);
CREATE UNIQUE INDEX married_id_key ON married(id);
CREATE TABLE child_fkc (
id integer DEFAULT nextval('child_fkc_id_seq'),
marriedID integer NOT NULL REFERENCES married_fkc ( id ) ON DELETE CASCADE,
name TEXT
);
CREATE UNIQUE INDEX child_fkc_id_key ON child_fkc(id);
CREATE TABLE child (
id integer DEFAULT nextval('child_id_seq'),
marriedID integer NOT NULL,
name TEXT
);
1. First, with no measuring of time, I fill the person table with 2*N persons.
2. Filling the married tables with N tuples ( [1, 2] [3, 4] [5, 6] ... ). Measuring time.
3. Fillinf the child tables with 2*N tuples ( [1] [1] [2] [2] ... ) (two children per married couple). Measuring time.
4. Emptying the tables. This means, for *_fkc tables only delete person table. But for the other tables, manual deletion of all the tables.
(this is ofcourse run in two rounds, as step four deletes a common (to both sets) table)
I was not very surprised to see how little difference it made when inserting into the married_fkc table (< 3%), compared to inserting to the married table. I was VERY surprised to see that the difference when inserting into child and child_fkc gave more than 5 times a difference than inserting into the married and married_fkc (25% slower).
Deleting really showed what the MySQL team means. The deletion was sometimes 30 seconds to < 1 second.
If anyone could help, I would really appriciate if someone could tell me why the child/child_fkc difference was so much more than the married/married_fkc difference...
I doubt is was becuase of the lack of VACUUM ANALYSE. It was quite a big difference. Strange is that married_fkc has TWO foreign keys, while child_fkc has only ONE.
Thanks.
Daniel Åkerud
=?iso-8859-1?Q?Daniel_=C5kerud?= <zilch@home.se> writes: > Deleting really showed what the MySQL team means. The deletion was sometime= > s 30 seconds to < 1 second. Well, if I understand your rather vague description, you're comparing a simple bulk delete of all the tuples in the tables, versus a case where one table sees a bulk delete but the other ones see retail deletion (one tuple deleted per triggered query, and that tuple has to be searched for via an indexscan). Not surprising that it's much slower. The real question is what this scenario has to do with production activities. > If anyone could help, I would really appriciate if someone could tell me wh= > y the child/child_fkc difference was so much more than the married/married_= > fkc difference... That strikes me as odd too, since the one case has only one FK reference and the other has two ... seems like it should have been the other way 'round. Experimental noise maybe? Did you repeat the test to make sure the numbers were reproducible? Do you care to post all the details (scripts etc) so that others can try to reproduce it? > I doubt is was becuase of the lack of VACUUM ANALYSE. You *should* be worried about that. The queries triggered by foreign-key checks are planned by the regular planner. regards, tom lane
> > Deleting really showed what the MySQL team means. The deletion was sometime= > > s 30 seconds to < 1 second. > > Well, if I understand your rather vague description, you're comparing a > simple bulk delete of all the tuples in the tables, versus a case where > one table sees a bulk delete but the other ones see retail deletion (one > tuple deleted per triggered query, and that tuple has to be searched for > via an indexscan). Not surprising that it's much slower. The real > question is what this scenario has to do with production activities. It has nothing to do with production activities. I just want to know how, and how much, Foreign Keys Constraints affect performance. I compare (1) manual deletion of person, married and child versus (2) deletion of person which implies automatic deletion of married_fkc and child_fkc using ON DELETE CASCADE. > > If anyone could help, I would really appriciate if someone could tell me wh= > > y the child/child_fkc difference was so much more than the married/married_= > > fkc difference... > > That strikes me as odd too, since the one case has only one FK reference > and the other has two ... seems like it should have been the other way > 'round. Experimental noise maybe? Did you repeat the test to make sure > the numbers were reproducible? Do you care to post all the details > (scripts etc) so that others can try to reproduce it? Well, the tests were run with quite high values and took quite some time, so I doubt it was experimental noise. And that was what I thought too, 2 FK versus 1. > > I doubt is was becuase of the lack of VACUUM ANALYSE. > > You *should* be worried about that. The queries triggered by > foreign-key checks are planned by the regular planner. I'll rerun the test using VACUUM ANALYSE in between inserting into married/married_fkc and child/child_fkc, and post the results! > regards, tom lane This whole thing is about making myself aware of the performance impace of Foreign Keys Constraints.
=?iso-8859-1?Q?Daniel_=C5kerud?= <zilch@home.se> writes: >> ... Not surprising that it's much slower. The real >> question is what this scenario has to do with production activities. > It has nothing to do with production activities. I just want to know how, > and how much, Foreign Keys Constraints affect performance. My point is that unless bulk delete is an operation you do a lot, this measurement has little to do with everyday performance. A more reasonable test (I think) would be to time deletion of a *single* person record --- and the associated implicit deletion of a small number of dependent records --- against deletion of the same person record and explicit deletion of the same number of dependent records. That actually has something to do with performance of real-world applications that delete individual records. As is, you are measuring (in effect) DELETE FROM married; against FOR akey IN (SELECT key FROM married) DO DELETE FROM married WHERE key = akey; and then blaming the speed difference on foreign keys. It's got nothing to do with foreign keys and everything to do with number of queries issued. regards, tom lane
> =?iso-8859-1?Q?Daniel_=C5kerud?= <zilch@home.se> writes: > >> ... Not surprising that it's much slower. The real > >> question is what this scenario has to do with production activities. > > > It has nothing to do with production activities. I just want to know how, > > and how much, Foreign Keys Constraints affect performance. > > My point is that unless bulk delete is an operation you do a lot, > this measurement has little to do with everyday performance. A more > reasonable test (I think) would be to time deletion of a *single* person > record --- and the associated implicit deletion of a small number of > dependent records --- against deletion of the same person record and > explicit deletion of the same number of dependent records. That > actually has something to do with performance of real-world applications > that delete individual records. As is, you are measuring (in effect) > DELETE FROM married; > against > FOR akey IN (SELECT key FROM married) DO > DELETE FROM married WHERE key = akey; > and then blaming the speed difference on foreign keys. It's got nothing > to do with foreign keys and everything to do with number of queries > issued. No, I compare DELETE FROM person; against DELETE FROM person; DELETE FROM married; DELETE FROM child; Which I think has very much to do with performane of real-worl applications i think. I often think of Accounts, where there are numerous records stored for this account - which should be deleted when the account is deleted. > regards, tom lane Daniel Åkerud
> No, > I compare > DELETE FROM person; > against > DELETE FROM person; > DELETE FROM married; > DELETE FROM child; > > Which I think has very much to do with performane of real-worl applications > i think. I often think of Accounts, where there are numerous records stored > for this account - which should be deleted when the account is deleted. It doesn't unless you delete all your people alot (as Tom said). There's a BIG difference between delete from person where name='foo' compared to delete from person where name='foo'; delete from married where ... ; delete from child where ...; and delete from person; compared to delete from person; delete from married; delete from child; In the first case, the system sees either 1 statement that expands into 3 statements effectively versus 3 statements. Not too different. In the second case the system sees 1 statement + 1 statement per row versus 3 statements. Very different, because it doesn't know it's going to be deleting all of the rows so it's probably going to choose to index scan to find the matching rows for each row per each row in person versus knowing before hand to delete them all. In addition, with match unspecified, these two behaviors are also not guaranteed to be the same. With NULLs in the FK fields, you can have rows that shouldn't get deleted when you delete all of the PK rows. ("At least one of the values of the referencing columns in R1 shall be a null value, or the value of each referencing column in R1 shall be equal to the value of the corresponding referenced column in some row of the referenced table.... let matching rows be all rows in the referencing table whose referencing column values equal the corresponding referenced column values for the referential constraint") There are problems, and it would be nice to figure out a way to combine actions and checks when a large number of changes are seen (of course how do you define a large number, but...) to get around some of these bulk cases. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
> > No, > > I compare > > DELETE FROM person; > > against > > DELETE FROM person; > > DELETE FROM married; > > DELETE FROM child; > > > > Which I think has very much to do with performane of real-worl > applications > > i think. I often think of Accounts, where there are numerous records > stored > > for this account - which should be deleted when the account is deleted. > > It doesn't unless you delete all your people alot (as Tom said). Agreed, but I don't want to measure the performance of real-world application anyway, I just want to issolate how much you loose having the database manager handle the deletion for you, as the ON DELETE CASCADE foreign key constraint does. > There's a BIG difference between > delete from person where name='foo' compared to > delete from person where name='foo'; delete from married where ... ; delete > from child where ...; > and > delete from person; compared to > delete from person; delete from married; delete from child; I can see that, In the first case there are a hell lot of overhead sending the queries. > In the first case, the system sees either 1 statement that expands into 3 > statements effectively versus 3 statements. Not too different. ok... > In the second case the system sees 1 statement + 1 statement per row versus > 3 statements. I can't see what you mean here... "+ 1 statement per row"... there is only one row? > Very different, because it doesn't know it's going to be deleting all of the > rows so it's probably going to choose to index scan to find the matching > rows for each row per each row in person versus knowing before hand to > delete them all. OK... hmm... *confused* :) What is the difference between these two (only comparing the tables with foreign keys constraits now): DELETE FROM PERSON; and DELETE FROM PERSON where id = 1; DELETE FROM PERSON where id = 2; The only thing I can see (which I assume is what I do wrong here), is that there is a lot of overhead sending the queries. If we ignore the overhead in our conversation, what is the difference? > In addition, with match unspecified, these two behaviors are also not > guaranteed to be the same. With NULLs in the FK fields, you can have rows > that shouldn't get deleted when you delete all of the PK rows. ("At least > one of the values of the referencing columns in R1 shall be a null value, or > the value of each referencing column in R1 shall be equal to the value of > the corresponding referenced column in some row of the referenced table.... > let matching rows be all rows in the referencing table whose referencing > column values equal the corresponding referenced column values for the > referential constraint") OK, but this is just a test i write. I _am_ sure there are no NULLs there. I just want to make myself aware of how what it costs in performance having foreign keys constraints. > There are problems, and it would be nice to figure out a way to combine > actions and checks when a large number of changes are seen (of course how do > you define a large number, but...) to get around some of these bulk cases. before I send this message I just gotta say thanks! i appreciate your input more than you think :) Anyway, what I do is, in pseudocode: FOR ( i = 1 to N*2) insert into person FOR (i = 1 to N) insert into married or married_fkc FOR (i = 1 to 2*N) insert into child or child_fkc if (fkc) delete from person; else delete from person, delete from married, delete from child; I guess this last example shows quite good what I do. Don't this change your minds? Daniel Åkerud
> > FOR ( i = 1 to N*2) > insert into person > > FOR (i = 1 to N) > insert into married or married_fkc > > FOR (i = 1 to 2*N) > insert into child or child_fkc > > if (fkc) > delete from person; > else > delete from person, delete from married, delete from child; > Forgott to say that these 4 sections is in four transactions. and with vacuum analyse in between all of them. NOW, why is it that the difference between the married/married_fkc (which is about 50% longer per insert) is the same on child/child_fkc ? Ofcourse child/child_fkc should take roughly twice the time as married/married_fkc ignoring the fact that there are FK Constraints. But considering the double foreign keys constraints in married_fkc that is quite strange... Is there something wrong with the tables? DROP SEQUENCE person_id_seq; DROP SEQUENCE married_fkc_id_seq; DROP SEQUENCE married_id_seq; DROP SEQUENCE child_fkc_id_seq; DROP SEQUENCE child_id_seq; CREATE SEQUENCE person_id_seq MINVALUE 0; CREATE SEQUENCE married_fkc_id_seq MINVALUE 0; CREATE SEQUENCE married_id_seq MINVALUE 0; CREATE SEQUENCE child_fkc_id_seq MINVALUE 0; CREATE SEQUENCE child_id_seq MINVALUE 0; DROP TABLE person; DROP TABLE married_fkc; DROP TABLE married; DROP TABLE child_fkc; DROP TABLE child; CREATE TABLE person ( id integer DEFAULT nextval('person_id_seq'), name TEXT ); CREATE UNIQUE INDEX person_id_key ON person(id); CREATE TABLE married_fkc ( id integer DEFAULT nextval('married_fkc_id_seq'), person1ID integer NOT NULL REFERENCES person ( id ) ON DELETE CASCADE, person2ID integer NOT NULL REFERENCES person ( id ) ON DELETE CASCADE, UNIQUE ( person1ID ), UNIQUE ( person2ID ) ); CREATE UNIQUE INDEX married_fkc_id_key ON married_fkc(id); CREATE TABLE married ( id integer DEFAULT nextval('married_id_seq'), person1ID integer NOT NULL, person2ID integer NOT NULL, UNIQUE ( person1ID ), UNIQUE ( person2ID ) ); CREATE UNIQUE INDEX married_id_key ON married(id); CREATE TABLE child_fkc ( id integer DEFAULT nextval('child_fkc_id_seq'), marriedID integer NOT NULL REFERENCES married_fkc ( id ) ON DELETE CASCADE, name TEXT ); CREATE UNIQUE INDEX child_fkc_id_key ON child_fkc(id); CREATE TABLE child ( id integer DEFAULT nextval('child_id_seq'), marriedID integer NOT NULL, name TEXT ); CREATE UNIQUE INDEX child_id_key ON child(id); Daniel Åkerud
> > There's a BIG difference between > > delete from person where name='foo' compared to > > delete from person where name='foo'; delete from married where ... ; > delete > > from child where ...; > > and > > delete from person; compared to > > delete from person; delete from married; delete from child; > > I can see that, > In the first case there are a hell lot of overhead sending the queries. > > > In the first case, the system sees either 1 statement that expands into 3 > > statements effectively versus 3 statements. Not too different. > > ok... > > > In the second case the system sees 1 statement + 1 statement per row > versus > > 3 statements. > > I can't see what you mean here... "+ 1 statement per row"... there is only > one row? Well, if the parent table has 100 rows, the delete from table turns into 101 statements pretty much. It basically turns into: delete from parent; delete from married where <keys from first person row> delete from married where <keys from second person row> ... etc... In your example with N*2 rows in person and N rows in married_fkc and N*2 rows in child_fkc, with the bulk delete in person is going to go N*2 deletes from married_fkc and N deletes in child_fkc. > What is the difference between these two (only comparing the tables with > foreign keys constraits now): > > DELETE FROM PERSON; > > and > > DELETE FROM PERSON where id = 1; > DELETE FROM PERSON where id = 2; > > The only thing I can see (which I assume is what I do wrong here), is that > there is a lot of overhead sending the queries. If we ignore the overhead in > our conversation, what is the difference? It's not the cost on the delete from person, it's the cost on the delete from married, and it's not the sending overhead, it's the processing overhead to find the particular row you want to delete over and over again. It costs more to find a row by index value than to just get the next row in the table (generally). So, in the first case you can get all the rows sequentially. In the second you have to pay the find a row by index cost for every row. [Either delete 100 rows sequentially or delete 1 row by index 100 times. The latter is more expensive.] The difference for purposes of "real world" testing is that you generally don't do bulk deletes like that very often and isn't at all what's being optimized for, and the cost of (taking only the first level): delete from person where id=1; delete from married_fkc where person1ID=1; delete from married_fkc where person2ID=1; (or even where person1ID=1 or person2ID=1) and the fk version as: delete from person where id=1; *should* be reasonably similar. If it's not, that's a definate performance problem. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
OK, I've been discussing this with a collegue of mine... and I'm starting to see the light here... I will, first of all, make a new, simpler, 1<->1 realtionship to test FK constraints... no 2<->1<->2 relasionship here... Person -> Item/item_fkc And I will do no bulk-delete. Instead these tests: Fill person. no time measuring. Fill item, no foreign keys constraints, time measuring. Fill item, foreign keys constraints Compare last to measurments. How much do you loose in performance having the Foreign Key check? For all persons { if (fkc) delete from person where id = $id else delete from person where id = $id; delete from item where personid=$id } Compare measurements. How much do you loose having the foreign keys constraints delete the item? This, I think, this is a more fair comparison. Can you call the FK itself a foreign key constraint, as it actually is constraining something? Thanks Stephan and Tom for the help! Greatly appreciated!!
Daniel Åkerud wrote: > > OK, I've been discussing this with a collegue of mine... and I'm starting to > see the light here... > > I will, first of all, make a new, simpler, 1<->1 realtionship to test FK > constraints... no 2<->1<->2 relasionship here... One thing you'd have to take into account is the fact that the MySQL documentation is totally misleading. If you intend to compare apples to apples, you'd have to take a more real world scenario. Every business application has to reflect the business rules it implements. Using a relational database system it's possible to move some of the business logic into the database itself. So we allways have to look at the middleware (e.g. PHP) and the database (e.g. Postgres) as a unit. If the business modell now requires some referential integrity, like there can never be an invoice referencing a non-existent customer, this unit has to ensure it, no matter what. There are two possible solutions, 1. Setup a foreign key constraint, so that the database will not allow the insertion of the invoice or deletion of the customer, however concurrent the DB access might be. 2. Implement the required checks with appropriate locking and transaction coverage at every place in the application, where these two relations are modified WRT the logical requirement of the business model. IMHO solution #1 has a major advantage. Only the DB designer really must understand the entire business modell to ensure that there will never be logically inconsistent data in the database. The worst thing that can happen if a WEB developer doesn't honor the business modell is an aborted transaction and maybe an error message the user doesn't understand. For #2 every WEB programmer, at any time editing a single PHP code snippet, has to have the business modell in mind. Here the worst possible consequence is violation of the business modell. So we might have invoices where we don't know anymore who's the customer. Thus, to compare MySQL vs. Postgres WRT referential integrity, create a sample application where the Postgres version checks for possible errors (can be optimized using deferred constraints and checking just at the COMMIT). The MySQL version instead implements the constraints on the middleware level including transaction handling and locking, so you'd have to use BDB tables only for example. Just my $0.02 Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
> > OK, I've been discussing this with a collegue of mine... and I'm starting to > > see the light here... > > > > I will, first of all, make a new, simpler, 1<->1 realtionship to test FK > > constraints... no 2<->1<->2 relasionship here... > > One thing you'd have to take into account is the fact that > the MySQL documentation is totally misleading. If you intend > to compare apples to apples, you'd have to take a more real > world scenario. > > Every business application has to reflect the business rules > it implements. Using a relational database system it's > possible to move some of the business logic into the database > itself. So we allways have to look at the middleware (e.g. > PHP) and the database (e.g. Postgres) as a unit. If the > business modell now requires some referential integrity, like > there can never be an invoice referencing a non-existent > customer, this unit has to ensure it, no matter what. There > are two possible solutions, > > 1. Setup a foreign key constraint, so that the database will > not allow the insertion of the invoice or deletion of the > customer, however concurrent the DB access might be. > > 2. Implement the required checks with appropriate locking > and transaction coverage at every place in the > application, where these two relations are modified WRT > the logical requirement of the business model. > > IMHO solution #1 has a major advantage. Only the DB designer > really must understand the entire business modell to ensure > that there will never be logically inconsistent data in the > database. The worst thing that can happen if a WEB developer > doesn't honor the business modell is an aborted transaction > and maybe an error message the user doesn't understand. For > #2 every WEB programmer, at any time editing a single PHP > code snippet, has to have the business modell in mind. Here > the worst possible consequence is violation of the business > modell. So we might have invoices where we don't know anymore > who's the customer. > > Thus, to compare MySQL vs. Postgres WRT referential > integrity, create a sample application where the Postgres > version checks for possible errors (can be optimized using > deferred constraints and checking just at the COMMIT). The > MySQL version instead implements the constraints on the > middleware level including transaction handling and locking, > so you'd have to use BDB tables only for example. > > Just my $0.02 > > > Jan > > -- Thanks for the input! ... anyway, I never had in mind to compare MySQL to PostgreSQL... just to see what performance impact the little check the "foreign key references" has on an insert... and what performance impact ON DELETE CASCADE has when deleting (if you choose to let the dbmanager to handle it)... Very good points, and well explained! Daniel Åkerud
> > Agreed, but I don't want to measure the performance of real-world > > application anyway, I just want > > to issolate how much you loose having the database manager handle the > > deletion for you, as the ON DELETE > > CASCADE foreign key constraint does. > > That doesn't make any sense. There's no performance in the abstract, > only with regard to actual cases. > > You might as well say, "I don't want to measure the performance of > any real CPU, but only the theoretical performance of various CPU > designs using imaginary data." You'll probably get a number, but it > won't refer to anything. > You have a good point there, but it has nothing to do with what I am doing... You assume I will conclude that there is X loss in performance and then that's it. However that is not the case here. The results will be generalised in time and then the numbers have no meaning in the big case. I'm writing a big chapter about foreign keys and foreign keys constraints in general (thanks Jan Wieck) and I just want to show that there _is_ actually performance loss. Thats all. Most of the chapter will explain what you GAIN having them! :) Thanks. Daniel Åkerud
On Mon, 25 Jun 2001, [iso-8859-1] Daniel �kerud wrote: > > OK, I've been discussing this with a collegue of mine... and I'm starting to > see the light here... > > I will, first of all, make a new, simpler, 1<->1 realtionship to test FK > constraints... no 2<->1<->2 relasionship here... > > Person -> Item/item_fkc > > And I will do no bulk-delete. Instead these tests: > > Fill person. no time measuring. > Fill item, no foreign keys constraints, time measuring. > Fill item, foreign keys constraints > Compare last to measurments. How much do you loose in performance having the > Foreign Key check? > For all persons { > if (fkc) > delete from person where id = $id > else > delete from person where id = $id; delete from item where personid=$id > } > Compare measurements. How much do you loose having the foreign keys > constraints delete the item? > > This, I think, this is a more fair comparison. > > Can you call the FK itself a foreign key constraint, as it actually is > constraining something? Yeah, that's a more even test. Since it sounds like you're writing something on foreign key constraints (presumably about postgres), you might want to mention the differences with table clearing deletes but point out the foreign key constraints aren't generally used in schemas where you're doing that :) As for naming, I'd think so. It is a constraint, just that "foreign key constraint" is much longer to type than FK :)
> > OK, I've been discussing this with a collegue of mine... and I'm starting to > > see the light here... > > > > I will, first of all, make a new, simpler, 1<->1 realtionship to test FK > > constraints... no 2<->1<->2 relasionship here... > > > > Person -> Item/item_fkc > > > > And I will do no bulk-delete. Instead these tests: > > > > Fill person. no time measuring. > > Fill item, no foreign keys constraints, time measuring. > > Fill item, foreign keys constraints > > Compare last to measurments. How much do you loose in performance having the > > Foreign Key check? > > For all persons { > > if (fkc) > > delete from person where id = $id > > else > > delete from person where id = $id; delete from item where personid=$id > > } > > Compare measurements. How much do you loose having the foreign keys > > constraints delete the item? > > > > This, I think, this is a more fair comparison. > > > > Can you call the FK itself a foreign key constraint, as it actually is > > constraining something? > > Yeah, that's a more even test. Since it sounds like you're writing > something on foreign key constraints (presumably about postgres), you > might want to mention the differences with table clearing deletes > but point out the foreign key constraints aren't generally used in > schemas where you're doing that :) > > As for naming, I'd think so. It is a constraint, just that > "foreign key constraint" is much longer to type than FK :) > Thanks :) Daniel Åkerud