Thread: Scaling up deferred unique checks and the after trigger queue
I've started looking at the following TODO item: "Improve deferrable unique constraints for cases with many conflicts" and Tom's suggestion that the rows to be checked can be stored in a bitmap, which would become lossy when the number of rows becomes large enough. There is also another TODO item: "Add deferred trigger queue file" to prevent the trigger queue from exhausting backend memory. I've got some prototype code which attempts to replace all the after-triggers-queue stuff with TID bitmaps (not just for deferred constraint triggers). This would solve the memory-usage problem without resorting to file storage, and makes it easier to then optimise constraint checks by doing a bulk check if the number of rows is large enough. The initial results are encouraging, but I'm still pretty new to a lot of this code, so I wanted to check that this is a sane thing to try to do. For UPDATEs, I'm storing the old tid in the bitmap and relying on its ctid pointer to retrieve the new tuple for the trigger function. AFAICS heap_update() always links the old and new tuples in this way. I'm aware that the "triggers on columns" patch is going to be a problem for this. I haven't looked at it in any detail, but I suspect that it won't work with a lossy queue, because the information about exactly which rows to trigger on is only known at update time. So maybe I could fall back on a tuplestore, spilling to disk in that case? Thoughts? - Dean
This is a WIP patch to replace the after-trigger queues with TID bitmaps to prevent them from using excessive amounts of memory. Each round of trigger executions is a modified bitmap heap scan. Part of the motivation for this is to scale up the deferrable unique constraints support. I've not done anything to optimise that case directly yet, but this patch will make it easier to switch over to doing a bulk check in that case, and even though it is still executing the trigger for each potential violation, performance for large updates is already greatly improved because of the reduction in the queue's memory footprint. It passes all the regression tests, except for the copy2 test, which sometimes fails, with rows being ordered differently. I believe that the reason for this failure is that during the bitmap heap scan, pages are being pruned, and so the UPDATEs inside later trigger executions start re-using space at the start of the page, whereas the old code moves onto the next page. This could be fixed by changing the test, grouping the COPYs inside a single transaction, to prevent page pruning. - Dean
Attachment
On Mon, Oct 19, 2009 at 12:48 PM, Dean Rasheed <dean.a.rasheed@googlemail.com> wrote: > This is a WIP patch to replace the after-trigger queues with TID bitmaps > to prevent them from using excessive amounts of memory. Each round of > trigger executions is a modified bitmap heap scan. If the bitmap becomes lossy, how do you preserve the correct semantics? ...Robert
2009/10/19 Robert Haas <robertmhaas@gmail.com>: > On Mon, Oct 19, 2009 at 12:48 PM, Dean Rasheed > <dean.a.rasheed@googlemail.com> wrote: >> This is a WIP patch to replace the after-trigger queues with TID bitmaps >> to prevent them from using excessive amounts of memory. Each round of >> trigger executions is a modified bitmap heap scan. > > If the bitmap becomes lossy, how do you preserve the correct semantics? > > ...Robert > The idea is that it filters by the transaction ID and command ID of modified rows to see what's been updated in the command(s) the trigger is for... - Dean
On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote: > This is a WIP patch to replace the after-trigger queues with TID bitmaps > to prevent them from using excessive amounts of memory. Each round of > trigger executions is a modified bitmap heap scan. This is an interesting patch. The justification is fine, the idea is good, though I'd like to see more analysis of the technique, what other options exist and some thought about when we should use the technique. We have a bitmap for each UPDATE statement, I think, but there's no docs or readme. Why just UPDATE? Is the cost of starting up the bitmap higher than the existing mechanism? Do we need to look at starting with an existing mechanism and then switching over to new mechanism? Is the TID bitmap always a win for large numbers of rows? The technique relies on these assumptions * Trigger functions are idempotent * Trigger execution order is not important (in terms of rows) * Multiple trigger execution order is not important All of those seem false in the general case. What will you do? -- Simon Riggs www.2ndQuadrant.com
On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote: > This is a WIP patch to replace the after-trigger queues with TID bitmaps > to prevent them from using excessive amounts of memory. Each round of > trigger executions is a modified bitmap heap scan. Can you please take a look at my patch here: http://archives.postgresql.org/message-id/1256499249.12775.20.camel@jdavis to make sure that we're not interfering with eachother? I implemented deferred constraint checking in my operator exclusion constraints patch (formerly "generalized index constraints"). After looking very briefly at your approach, I think that it's entirely orthogonal, so I don't expect a problem. I have a git repo here: http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=shortlog;h=refs/heads/operator-exclusion-constraints which may be helpful if you just want to look at the commit for deferred constraint checking. Any comments welcome. I'll also take a look at your patch in the next few days. Regards,Jeff Davis
2009/10/25 Simon Riggs <simon@2ndquadrant.com>: > On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote: > >> This is a WIP patch to replace the after-trigger queues with TID bitmaps >> to prevent them from using excessive amounts of memory. Each round of >> trigger executions is a modified bitmap heap scan. > > This is an interesting patch. The justification is fine, the idea is > good, though I'd like to see more analysis of the technique, what other > options exist and some thought about when we should use the technique. > > We have a bitmap for each UPDATE statement, I think, but there's no docs > or readme. Why just UPDATE? Is the cost of starting up the bitmap higher > than the existing mechanism? Do we need to look at starting with an > existing mechanism and then switching over to new mechanism? Is the TID > bitmap always a win for large numbers of rows? > Thanks for looking at this. It works for all kinds of trigger events, and is intended as a complete drop-in replacement for the after triggers queue. I admit that I haven't yet done very much performance testing. As it stands, there does appear to be a small performance penalty associated with the bitmaps, but I need to do more testing to be more specific about that. I had thought that, for relatively small numbers of rows, I could use something like a small list of CTID arrays of increasing size, and then switch over to the new mechanism when this becomes too large. But first, I wanted to get feedback on whether this TID bitmap approach is actually valid for general trigger operation. > The technique relies on these assumptions > * Trigger functions are idempotent I don't understand what you're saying here. It should execute the triggers in exactly the same way as the current code (but possibly in a different order). Idempotentence isn't required. > * Trigger execution order is not important (in terms of rows) It is true that the order in which the rows are processed will change. As far as I can tell from the spec, there is nothing to say that the rows for a given statement should be processed in any particular order. I guess that I'm looking for feedback from people on this list as to whether that will be a problem for existing apps. > * Multiple trigger execution order is not important This patch does not change the order of execution in the case where there are multiple triggers (at least not for regular non-constraint triggers). They should still be fired in name order, for each row. All such triggers share a single TID bitmap, and are processed together. This is in line with the spec. Deferrable constraint triggers are a different matter, and these will be fired in a different order (each set of triggers for a given constraint will be fired together, rather than being interleaved). This is not covered by the spec, but if they are genuinely being used to enforce constraints, the order shouldn't matter. > > All of those seem false in the general case. What will you do? At this point I'm looking for more feedback as to whether any of this is a show-stopper, before I expend more effort on this patch. - Dean
2009/10/25 Jeff Davis <pgsql@j-davis.com>: > On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote: >> This is a WIP patch to replace the after-trigger queues with TID bitmaps >> to prevent them from using excessive amounts of memory. Each round of >> trigger executions is a modified bitmap heap scan. > > Can you please take a look at my patch here: > http://archives.postgresql.org/message-id/1256499249.12775.20.camel@jdavis > > to make sure that we're not interfering with eachother? I implemented > deferred constraint checking in my operator exclusion constraints patch > (formerly "generalized index constraints"). > Yes, I've been following this, and I'm looking forward to this new functionality. > After looking very briefly at your approach, I think that it's entirely > orthogonal, so I don't expect a problem. > I agree. I think that the 2 are orthogonal. Possibly they could both share some common bulk checking code, but I've not thought much about how to do that yet. > I have a git repo here: > http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=shortlog;h=refs/heads/operator-exclusion-constraints > > which may be helpful if you just want to look at the commit for deferred > constraint checking. Any comments welcome. > I did a quick bit of testing, and I think that there is a locking/concurrency problem :-( Attached is a (rather crappy) python script (using PyGreSQL) that I used to test consistency while I was working on the deferrable uniqueness constraints patch. Basically it just spawns a bunch of threads, each of which does random CRUD, with heavy contention and lots of constraint violations and deadlocks, which are rolled back. I modified the script to enforce uniqueness with an exclusion constraint, and the script is able to break the constraint, forcing invalid data into the table. I haven't looked at your code in depth, but I hope that this is not a difficult problem to fix. It seems like it ought to be similar to the btree code. - Dean
Attachment
On Mon, 2009-10-26 at 13:28 +0000, Dean Rasheed wrote: > It works for all kinds of trigger events, > and is intended as a complete drop-in replacement for the after > triggers queue. > > All of those seem false in the general case. What will you do? > > At this point I'm looking for more feedback as to whether any of this > is a show-stopper, before I expend more effort on this patch. I see no show stoppers, only for you to look at ways of specifying that this optimization is possible for particular cases. I think we might be able to make the general statement that it will work for all after triggers that execute STABLE or IMMUTABLE functions. I don't think we can assume that firing order is irrelevant for some cases, e.g. message queues. -- Simon Riggs www.2ndQuadrant.com
On Mon, 2009-10-26 at 13:41 +0000, Dean Rasheed wrote: > I did a quick bit of testing, and I think that there is a > locking/concurrency problem :-( Unfortunately I can't reproduce the problem on my machine; it always passes. If you have a minute, can you try to determine if the problem can happen with a non-deferrable constraint? I'll keep looking into it. Thanks,Jeff Davis
2009/10/26 Simon Riggs <simon@2ndquadrant.com>: > On Mon, 2009-10-26 at 13:28 +0000, Dean Rasheed wrote: > >> It works for all kinds of trigger events, >> and is intended as a complete drop-in replacement for the after >> triggers queue. > >> > All of those seem false in the general case. What will you do? >> >> At this point I'm looking for more feedback as to whether any of this >> is a show-stopper, before I expend more effort on this patch. > > I see no show stoppers, only for you to look at ways of specifying that > this optimization is possible for particular cases. I think we might be > able to make the general statement that it will work for all after > triggers that execute STABLE or IMMUTABLE functions. I don't think we > can assume that firing order is irrelevant for some cases, e.g. message > queues. > Hmm, thinking about this some more... one thing this patch does is to separate out the queues for "regular" triggers from those for RI triggers and deferrable constraint checks. ITSM that row-order only really matters for the former. It's also the case that for these triggers there will never be any other choice but to execute them one at a time, so they may as well just spool to a file rather than using a TID bitmap. The bitmaps are probably only useful for constraint triggers, where a bulk check can be used instead of executing individual triggers for each row, if enough rows are modified. - Dean
2009/10/26 Jeff Davis <pgsql@j-davis.com>: > On Mon, 2009-10-26 at 13:41 +0000, Dean Rasheed wrote: >> I did a quick bit of testing, and I think that there is a >> locking/concurrency problem :-( > > Unfortunately I can't reproduce the problem on my machine; it always > passes. > That's odd. It happens every time on my machine (10 threads, 1000 loops). > If you have a minute, can you try to determine if the problem can happen > with a non-deferrable constraint? > If anything, that seems to make it fail more quickly. If it's of any relevance, I'm currently using an optimised build, with assert checking off. [Linux x86_64, 2 core Intel Core2] - Dean
On Mon, Oct 26, 2009 at 9:46 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2009-10-26 at 13:28 +0000, Dean Rasheed wrote: > >> It works for all kinds of trigger events, >> and is intended as a complete drop-in replacement for the after >> triggers queue. > >> > All of those seem false in the general case. What will you do? >> >> At this point I'm looking for more feedback as to whether any of this >> is a show-stopper, before I expend more effort on this patch. > > I see no show stoppers, only for you to look at ways of specifying that > this optimization is possible for particular cases. I think we might be > able to make the general statement that it will work for all after > triggers that execute STABLE or IMMUTABLE functions. I don't think we > can assume that firing order is irrelevant for some cases, e.g. message > queues. Hmm. After-trigger functions are very unlikely to really be STABLE or IMMUTABLE, though. Almost by definition, they'd better be modifying some data somewhere, or there's no point. ...Robert
On Mon, 2009-10-26 at 17:23 +0000, Dean Rasheed wrote: > If it's of any relevance, I'm currently using an optimised build, with > assert checking off. > [Linux x86_64, 2 core Intel Core2] Ok, I'm able to reproduce it now. Thanks for looking into it! Regards,Jeff Davis