Thread: Foreign key performance
I'm using 7.3.2 on Linux, with a decent amount of muscle behind it (1.5 GHz PPro CPU, 1G mem, 20M/sec disks, xlog on different disk than data). I've got a database that has several foreign keys, and I'm copying a bunch of data from an MS-SQL server into it via Perl DBI. I noticed that inserts into this database are very slow, on the order of 100 per second on this hardware. All the inserts are happening in a single transaction. The postmaster I'm connected to appears to be CPU limited, as it's pegging the CPU at a constant 85 percent or more. I have no problem with that under normal circumstances (i.e., the foreign key constraints are actively being enforced): it may well be the nature of foreign keys, but the problem is this: all the keys are DEFERRABLE INITIALLY DEFERRED and, on top of that, the Perl program will SET CONSTRAINTS ALL DEFERRED at the beginning of the transaction. If I remove all the foreign key constraints, my performance goes up to 700 inserts per second! Why isn't the insert performance with all the constraints deferred approximating that of the performance I get without the foreign keys?? If anything, I should get a big delay at transaction commit time while all the foreign key constraints are checked (and, indeed, I get that too), but the performance during the transaction prior to the commit should be the same as it is without the foreign key constraints. It's almost as if the foreign key constraints are being invoked and the results ignored during the inserts... In essence, this smells like a bug to me, but I don't know enough about the internals to really call it that. Any ideas on what can be done about this? -- Kevin Brown kevin@sysexperts.com
On Thu, 17 Apr 2003, Kevin Brown wrote: > I have no problem with that under normal circumstances (i.e., the > foreign key constraints are actively being enforced): it may well be > the nature of foreign keys, but the problem is this: all the keys are > DEFERRABLE INITIALLY DEFERRED and, on top of that, the Perl program > will SET CONSTRAINTS ALL DEFERRED at the beginning of the transaction. > > If I remove all the foreign key constraints, my performance goes up to > 700 inserts per second! > > Why isn't the insert performance with all the constraints deferred > approximating that of the performance I get without the foreign keys?? It appears (from some not terribly scientific experiments - see below) that it's likely to be related to managing the deferred trigger queue given that in my case at least running the constraints non-deferred was negligible in comparison. On batch inserts to three tables each with a foreign key to a table containing one row (and inserts of lots of that value), I saw a ratio of approximately 1:1.7:7 for normal inserts:non-deferred fk:deferred fk on my 7.4 dev server.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > It appears (from some not terribly scientific experiments - see below) > that it's likely to be related to managing the deferred trigger queue > given that in my case at least running the constraints non-deferred was > negligible in comparison. At one time the deferred-trigger queue had an O(N^2) behavioral problem for large N = number of pending trigger events. But I thought we'd fixed that. What's the test case exactly? Can you get a profile with gprof? regards, tom lane
On Fri, 18 Apr 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > It appears (from some not terribly scientific experiments - see below) > > that it's likely to be related to managing the deferred trigger queue > > given that in my case at least running the constraints non-deferred was > > negligible in comparison. > > At one time the deferred-trigger queue had an O(N^2) behavioral problem > for large N = number of pending trigger events. But I thought we'd > fixed that. What's the test case exactly? Can you get a profile with > gprof? I'm going to tomorrow hopefully - but it looks to me that we fixed one, but possibly not another place where we read through the list unnecessarily AFAICS. I think deferredTriggerInvokeEvents (when called with immediate_only = true) is going to go through the entire list looking for immediate triggers to fire after each statement. However, excepting set constraints, any immediate triggers for any event added prior to this statement will by necessity have already been run unless I'm missing something, which means that we're often looking through entries that aren't going to have any triggers to run now in any case. Keeping a pointer to the end of the list as of last statement and going through the list from there cut the time for the deferred case in half in my simple test (about 3.3x the no fk and just under 2x the immediate).
[Not sure this really is relevant for -performance at this point] On Thu, 17 Apr 2003, Stephan Szabo wrote: > On Fri, 18 Apr 2003, Tom Lane wrote: > > > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > > It appears (from some not terribly scientific experiments - see below) > > > that it's likely to be related to managing the deferred trigger queue > > > given that in my case at least running the constraints non-deferred was > > > negligible in comparison. > > > > At one time the deferred-trigger queue had an O(N^2) behavioral problem > > for large N = number of pending trigger events. But I thought we'd > > fixed that. What's the test case exactly? Can you get a profile with > > gprof? > > I'm going to tomorrow hopefully - but it looks to me that we fixed one, but Argh. I'm getting that state where gprof returns all 0s for times. I'm pretty sure this has come up before along with how to get it to work, but I couldn't find it in the archives. Someday I'll learn how to use gprof. :( In any case, the call list seemed reasonable. It's currently doing O(n^2) calls to MemoryContextReset and deferredTriggerCheckState in InvokeEvents I don't see anything else that's at that kind of number of calls (50 million calls for a backend that's only done 10000 inserts stands out a bit). Going only from last statement seems to make it linear (I think my attempt is checking 1 too many trigger values, need to change that probably).
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > Argh. I'm getting that state where gprof returns all 0s for times. I'm > pretty sure this has come up before along with how to get it to work, but > I couldn't find it in the archives. Someday I'll learn how to use gprof. :( You're on Linux? You need to compile postmaster.c with -DLINUX_PROFILE. But the call counts do sound pretty damning. regards, tom lane
On Fri, 18 Apr 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > Argh. I'm getting that state where gprof returns all 0s for times. I'm > > pretty sure this has come up before along with how to get it to work, but > > I couldn't find it in the archives. Someday I'll learn how to use gprof. :( > > You're on Linux? You need to compile postmaster.c with -DLINUX_PROFILE. Yep, thanks. :) > But the call counts do sound pretty damning. Yeah, but even with my hack last night it was still appreciably slower than immediate constraints. Comparing the call counts in that function for the immediate versus deferred(hacked) weren't giving me a good idea of where that time was going.
On Fri, 18 Apr 2003, Stephan Szabo wrote: > On Fri, 18 Apr 2003, Tom Lane wrote: > > > But the call counts do sound pretty damning. > > Yeah, but even with my hack last night it was still appreciably slower > than immediate constraints. Comparing the call counts in that function > for the immediate versus deferred(hacked) weren't giving me a good idea of > where that time was going. This last was due to assert checking I think. AllocSetCheck was the big time waster on the hacked deferred case. Turning off assert checking I get: Median over 3 100000 inserts in one transaction (excepting the original code which took a really long time so I ran it once) from time psql ... No Fk 24.14s Immediate FK 42.80s Original Deferred FK 1862.06s Hacked Deferred FK 35.30s The hack was just the keeping around the list pointer from the last run through (see attached - passed simple fk tests and regression, but there might be problems I don't see). Looking at the code, I also wonder if we would get some gain by not allocating the per_tuple_context at the beginning but only when a non-deferred constraint is found since otherwise we're creating and destroying the context and possibly never using it. The cost would presumably be some boolean tests inside the inner loop.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > The hack was just the keeping around the list pointer from the last run > through (see attached - passed simple fk tests and regression, but there > might be problems I don't see). Shouldn't this patch update the comment in deferredTriggerInvokeEvents (c. line 1860 in cvs tip)? > Looking at the code, I also wonder if we > would get some gain by not allocating the per_tuple_context at the > beginning but only when a non-deferred constraint is found since otherwise > we're creating and destroying the context and possibly never using it. I doubt it's worth worrying over. Creation/destruction of a never-used memory context is pretty cheap, I think. regards, tom lane
On Sat, 19 Apr 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > The hack was just the keeping around the list pointer from the last run > > through (see attached - passed simple fk tests and regression, but there > > might be problems I don't see). > > Shouldn't this patch update the comment in deferredTriggerInvokeEvents > (c. line 1860 in cvs tip)? Probably, since the second part of that is basically what this is. I'll update and send updated patch tomorrow. > > Looking at the code, I also wonder if we > > would get some gain by not allocating the per_tuple_context at the > > beginning but only when a non-deferred constraint is found since otherwise > > we're creating and destroying the context and possibly never using it. > > I doubt it's worth worrying over. Creation/destruction of a never-used > memory context is pretty cheap, I think. Okay, sounds good enough for me. :)
On Sat, 19 Apr 2003, Stephan Szabo wrote: > On Sat, 19 Apr 2003, Tom Lane wrote: > > > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > > The hack was just the keeping around the list pointer from the last run > > > through (see attached - passed simple fk tests and regression, but there > > > might be problems I don't see). > > > > Shouldn't this patch update the comment in deferredTriggerInvokeEvents > > (c. line 1860 in cvs tip)? > > Probably, since the second part of that is basically what this is. I'll > update and send updated patch tomorrow. Okay, this changes the second paragraph of that comment. I left in the comment that's really similar next to where I actually do the selection of which start point to use.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > Okay, this changes the second paragraph of that comment. I left in the > comment that's really similar next to where I actually do the selection of > which start point to use. This had a bit of a problem yet: the loop in deferredTriggerInvokeEvents expects 'prev_event' to point to the list entry just before 'event'. A nice byproduct of fixing that is we don't uselessly rescan the last list entry. I also tried to improve the comments a little. You can see what I actually applied at http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/trigger.c regards, tom lane
Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > Okay, this changes the second paragraph of that comment. I left in the > > comment that's really similar next to where I actually do the selection of > > which start point to use. > > This had a bit of a problem yet: the loop in deferredTriggerInvokeEvents > expects 'prev_event' to point to the list entry just before 'event'. > A nice byproduct of fixing that is we don't uselessly rescan the last list > entry. I also tried to improve the comments a little. You can see what > I actually applied at > > http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/trigger.c Any chance of backporting these changes to 7_3_STABLE (when you're satisfied they don't break anything)? Just looking at the CVS log for trigger.c, it appears there have been enough changes since then that it might not be easy to do (and since it's not necessarily a "bug fix" as such, it might not qualify for backporting to a stable version). Even if it's not something that can be put into another release of 7.3, it would certainly be useful to me. It might be useful to enough people to justify releasing it as a patch on -patches, if nothing else. I'd do it myself but I don't understand the trigger code at all (and if there's any documentation you can point me to that describes the various functions and supporting data structures in trigger.c, that would help a lot), and I'd rather not touch something like that until I understand it thoroughly. -- Kevin Brown kevin@sysexperts.com
Kevin Brown <kevin@sysexperts.com> writes: > Any chance of backporting these changes to 7_3_STABLE (when you're > satisfied they don't break anything)? Just looking at the CVS log for > trigger.c, it appears there have been enough changes since then that > it might not be easy to do (and since it's not necessarily a "bug fix" > as such, it might not qualify for backporting to a stable version). I'd be pretty hesitant to make such a change in the stable branch --- at least not without a lot of testing. If you and others want to provide such testing, go to it. The patch appears to apply cleanly enough to 7.3, but here's an adjusted patch if fuzz makes you nervous... regards, tom lane *** trigger.c~ Sun Apr 20 20:28:55 2003 --- trigger.c Sun Apr 20 20:29:13 2003 *************** *** 1461,1472 **** --- 1461,1478 ---- * Because this can grow pretty large, we don't use separate List nodes, * but instead thread the listthrough the dte_next fields of the member * nodes. Saves just a few bytes per entry, but that adds up. + * + * deftrig_events_imm holds the tail pointer as of the last + * deferredTriggerInvokeEvents call; we can use this to avoid rescanning + * entries unnecessarily. It is NULL if deferredTriggerInvokeEvents + * hasn't run since the last state change. * * XXX Need to be able to shove this data out to a file if it grows too * large... * ---------- */ static DeferredTriggerEvent deftrig_events; + static DeferredTriggerEvent deftrig_events_imm; static DeferredTriggerEvent deftrig_event_tail; *************** *** 1680,1686 **** deferredTriggerInvokeEvents(bool immediate_only) { DeferredTriggerEvent event, ! prev_event = NULL; MemoryContext per_tuple_context; Relation rel = NULL; TriggerDesc *trigdesc= NULL; --- 1686,1692 ---- deferredTriggerInvokeEvents(bool immediate_only) { DeferredTriggerEvent event, ! prev_event; MemoryContext per_tuple_context; Relation rel = NULL; TriggerDesc *trigdesc= NULL; *************** *** 1692,1704 **** * are going to discard the whole event queue on return anyway, so no * need to bother with "retail"pfree's. * ! * In a scenario with many commands in a transaction and many ! * deferred-to-end-of-transaction triggers, it could get annoying to ! * rescan all the deferred triggers at each command end. To speed this ! * up, we could remember the actual end of the queue at EndQuery and ! * examine only events that are newer. On state changes we simply ! * reset the saved position to the beginning of the queue and process ! * all events once with the new states. */ /* Make a per-tuple memory context for trigger function calls*/ --- 1698,1709 ---- * are going to discard the whole event queue on return anyway, so no * need to bother with "retail"pfree's. * ! * If immediate_only is true, we need only scan from where the end of ! * the queue was at the previous deferredTriggerInvokeEvents call; ! * any non-deferred events before that point are already fired. ! * (But if the deferral state changes, we must reset the saved position ! * to the beginning of the queue, so as to process all events once with ! * the new states. See DeferredTriggerSetState.) */ /* Make a per-tuple memory context for trigger functioncalls */ *************** *** 1709,1715 **** ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); ! event = deftrig_events; while (event != NULL) { bool still_deferred_ones = false; --- 1714,1735 ---- ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); ! /* ! * If immediate_only is true, then the only events that could need firing ! * are those since deftrig_events_imm. (But if deftrig_events_imm is ! * NULL, we must scan the entire list.) ! */ ! if (immediate_only && deftrig_events_imm != NULL) ! { ! prev_event = deftrig_events_imm; ! event = prev_event->dte_next; ! } ! else ! { ! prev_event = NULL; ! event = deftrig_events; ! } ! while (event != NULL) { bool still_deferred_ones = false; *************** *** 1830,1835 **** --- 1850,1858 ---- /* Update list tail pointer in case we just deleted tail event */ deftrig_event_tail = prev_event; + /* Set the immediate event pointer for next time */ + deftrig_events_imm = prev_event; + /* Release working resources */ if (rel) heap_close(rel, NoLock); *************** *** 1917,1922 **** --- 1940,1946 ---- MemoryContextSwitchTo(oldcxt); deftrig_events = NULL; + deftrig_events_imm = NULL; deftrig_event_tail = NULL; } *************** *** 2146,2153 **** * CONSTRAINTS command applies retroactively. This happens "for free" * since we have alreadymade the necessary modifications to the * constraints, and deferredTriggerEndQuery() is called by ! * finish_xact_command(). */ } --- 2170,2180 ---- * CONSTRAINTS command applies retroactively. This happens "for free" * since we have alreadymade the necessary modifications to the * constraints, and deferredTriggerEndQuery() is called by ! * finish_xact_command(). But we must reset deferredTriggerInvokeEvents' ! * tail pointer to make it rescan the entire list, in case some deferred ! * events are now immediately invokable. */ + deftrig_events_imm = NULL; }
Tom Lane wrote: > Kevin Brown <kevin@sysexperts.com> writes: > > Any chance of backporting these changes to 7_3_STABLE (when you're > > satisfied they don't break anything)? Just looking at the CVS log for > > trigger.c, it appears there have been enough changes since then that > > it might not be easy to do (and since it's not necessarily a "bug fix" > > as such, it might not qualify for backporting to a stable version). > > I'd be pretty hesitant to make such a change in the stable branch --- > at least not without a lot of testing. If you and others want to > provide such testing, go to it. The patch appears to apply cleanly > enough to 7.3, but here's an adjusted patch if fuzz makes you > nervous... Thanks, Tom. I've applied the patch to my server and it has so far passed the few tests I've thrown at it so far (it has detected foreign key violations in both immediate and deferred trigger mode). And just so you know, it performs FAR better than the pre-patched version does -- in the overall transaction I'm doing, I see very little difference now between deferred triggers and no triggers! -- Kevin Brown kevin@sysexperts.com
Stephan Szabo wrote: > > [Not sure this really is relevant for -performance at this point] > > On Thu, 17 Apr 2003, Stephan Szabo wrote: > > > On Fri, 18 Apr 2003, Tom Lane wrote: > > > > > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > > > It appears (from some not terribly scientific experiments - see below) > > > > that it's likely to be related to managing the deferred trigger queue > > > > given that in my case at least running the constraints non-deferred was > > > > negligible in comparison. > > > > > > At one time the deferred-trigger queue had an O(N^2) behavioral problem > > > for large N = number of pending trigger events. But I thought we'd > > > fixed that. What's the test case exactly? Can you get a profile with > > > gprof? > > > > I'm going to tomorrow hopefully - but it looks to me that we fixed one, but > > Argh. I'm getting that state where gprof returns all 0s for times. I'm > pretty sure this has come up before along with how to get it to work, but > I couldn't find it in the archives. Someday I'll learn how to use gprof. :( You have to save and restore the timers around the fork() under Linux. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #