Thread: WIP: Deferrable unique constraints
This is a feature I would find useful, and I wanted to learn more about the database internals, so I thought I would give it a go as an exercise. I added 2 new index insert modes to support deferrable. During the initial index insert, if the constraint is deferrable, it does a "partial" check of uniqueness. This is a non-blocking check which allows duplicates in, but can distinguish between definitely unique and potentially non-unique. For potential uniqueness violations a deferred trigger is queued to do a full check at the end of the statement or transaction, or when SET CONSTRAINTS is called. The trigger then replays the insert in a "fake insert" mode, which doesn't actually insert anything - it just checks that what is already there is unique, waiting for other transactions if necessary. This approach works well if the number of potential conflicts is small. For example, if uniqueness is not actually violated, there is no significant penalty for the deferred check, since no triggers are queued: pgdevel=# CREATE TABLE foo(a int UNIQUE DEFERRABLE INITIALLY DEFERRED); NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_a_key" for table "foo" CREATE TABLE Time: 79.131 ms pgdevel=# INSERT INTO foo (SELECT * FROM generate_series(0, 1999999, 2)); -- 1M rows INSERT 0 1000000 Time: 11403.906 ms pgdevel=# BEGIN; BEGIN Time: 0.145 ms pgdevel=# UPDATE foo SET a=a+1; -- Uniqueness never violated UPDATE 1000000 Time: 21258.245 ms pgdevel=# COMMIT; COMMIT Time: 83.267 ms compared with: pgdevel=# CREATE TABLE foo(a int UNIQUE); NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_a_key" for table "foo" CREATE TABLE Time: 110.277 ms pgdevel=# INSERT INTO foo (SELECT * FROM generate_series(0, 1999999, 2)); -- 1M rows INSERT 0 1000000 Time: 11196.600 ms pgdevel=# BEGIN; BEGIN Time: 0.146 ms pgdevel=# UPDATE foo SET a=a+1; -- Uniqueness never violated UPDATE 1000000 Time: 21054.430 ms pgdevel=# COMMIT; COMMIT Time: 186.174 ms However, if there are lots of temporarily non-unique values, the performance impact is much greater, since the unique check is repeated for each row at commit time: pgdevel=# CREATE TABLE foo(a int UNIQUE DEFERRABLE INITIALLY DEFERRED); NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_a_key" for table "foo" CREATE TABLE Time: 64.244 ms pgdevel=# INSERT INTO foo (SELECT * FROM generate_series(0, 1999999, 2)); -- 1M rows INSERT 0 1000000 Time: 11621.438 ms pgdevel=# BEGIN; BEGIN Time: 0.146 ms pgdevel=# UPDATE foo SET a=a+2; -- Uniqueness temporarily violated UPDATE 1000000 Time: 21807.128 ms pgdevel=# COMMIT; COMMIT Time: 14974.916 ms And similarly for an "immediate" check: pgdevel=# CREATE TABLE foo(a int UNIQUE DEFERRABLE INITIALLY IMMEDIATE); NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_a_key" for table "foo" CREATE TABLE Time: 44.477 ms pgdevel=# INSERT INTO foo (SELECT * FROM generate_series(0, 1999999, 2)); -- 1M rows INSERT 0 1000000 Time: 12083.089 ms pgdevel=# UPDATE foo SET a=a+2; UPDATE 1000000 Time: 37173.210 ms Also, as for FKs and other queued triggers, this doesn't scale because the queue is held in memory. Curing the scalability problem by spooling the queue to disk shouldn't be too hard to do, but that doesn't address the problem that if a significant proportion of rows from the table need to be checked, it is far quicker to scan the whole index once than check row by row. In fact, with this data on my machine, a single scan takes around 0.5sec, so on that basis a full scan would be quicker if more than around 3% of the data is flagged as potentially non-unique, although the planner might be a better judge of that. I can't see a way of coding something that can switch over to a full scan, without first recording all the potential violations. Then I'm not entirely sure how best to do the switch-over logic. - Dean
Attachment
First, I'm happy that you're working on this; I think it's important. I am working on another index constraints feature that may have some interaction: http://archives.postgresql.org/pgsql-hackers/2009-07/msg00302.php Let me know if you see any potential conflicts between our work. On Tue, 2009-07-07 at 19:38 +0100, Dean Rasheed wrote: > For potential uniqueness violations a > deferred trigger is queued to do a full check at the end of the > statement or transaction, or when SET CONSTRAINTS is called. The > trigger then replays the insert in a "fake insert" mode, which doesn't > actually insert anything - it just checks that what is already there > is unique, waiting for other transactions if necessary. What does the deferred trigger do? Do you need a "fake insert" mode or can you use an index search instead? I'm thinking that you could just store the TID of the tuple that causes the potential violation in your list. Then, when you recheck the list, for each potential violation, find the tuple from the TID, do a search using the appropriate attributes, and if you get multiple results there's a conflict. Regards,Jeff Davis
On Tue, 2009-07-07 at 19:38 +0100, Dean Rasheed wrote: > This approach works well if the number of potential conflicts is > small. [...] > Curing the scalability problem by spooling the queue to disk shouldn't > be too hard to do, but that doesn't address the problem that if a > significant proportion of rows from the table need to be checked, it > is far quicker to scan the whole index once than check row by row. Another approach that might be worth considering is to build a temporary index and try to merge them at constraint-checking time. That might work well for unique. However, there are some potential issues. I didn't think this through yet, but here is a quick list just to get some thoughts down: 1. It would be tricky to merge while checking constraints if we are supporting more general constraints like in my proposal ( http://archives.postgresql.org/pgsql-hackers/2009-07/msg00302.php ). 2. Which indexes can be merged efficiently, and how much effort would it take to make this work? 3. A related issue: making indexes mergeable would be useful for bulk inserts as well. 4. At the end of the command, the index needs to work, meaning that queries would have to search two indexes. That may be difficult (but check the GIN fast insert code, which does something similar). 5. The temporary index still can't be enforcing constraints if they are deferred, so it won't solve all the issues here. Regards,Jeff Davis
Jeff Davis wrote: > First, I'm happy that you're working on this; I think it's important. I > am working on another index constraints feature that may have some > interaction: > > http://archives.postgresql.org/pgsql-hackers/2009-07/msg00302.php > > Let me know if you see any potential conflicts between our work. > Hi Jeff, Yes, I've been following that other thread with interest. I don't think that there is any conflict between the 2 patches, although there may be some interaction. My stuff is specific to UNIQUE, and my triggers make the assumption that this is enforced by an index. The triggers don't actually care what kind of index, although of course currently it can only be a btree. I guess that your generalised constraints could be made deferrable in a similar way, and would use different triggers to do the final check, so no problem there. > On Tue, 2009-07-07 at 19:38 +0100, Dean Rasheed wrote: >> For potential uniqueness violations a >> deferred trigger is queued to do a full check at the end of the >> statement or transaction, or when SET CONSTRAINTS is called. The >> trigger then replays the insert in a "fake insert" mode, which doesn't >> actually insert anything - it just checks that what is already there >> is unique, waiting for other transactions if necessary. > > What does the deferred trigger do? Do you need a "fake insert" mode or > can you use an index search instead? > Well the point about the "fake insert" mode is that it has to be able to block on another transaction, to see if it commits or rolls back a potentially conflicting value. ISTM that trying to do this with an index search would open up the potential for all sorts of race conditions. - Dean > I'm thinking that you could just store the TID of the tuple that causes > the potential violation in your list. Then, when you recheck the list, > for each potential violation, find the tuple from the TID, do a search > using the appropriate attributes, and if you get multiple results > there's a conflict. > > Regards, > Jeff Davis > >
Jeff Davis wrote: > On Tue, 2009-07-07 at 19:38 +0100, Dean Rasheed wrote: >> This approach works well if the number of potential conflicts is >> small. > > [...] > >> Curing the scalability problem by spooling the queue to disk shouldn't >> be too hard to do, but that doesn't address the problem that if a >> significant proportion of rows from the table need to be checked, it >> is far quicker to scan the whole index once than check row by row. > > Another approach that might be worth considering is to build a temporary > index and try to merge them at constraint-checking time. That might work > well for unique. > I'm not really sure what you mean by a "temporary index". Do you mean one that you would just throw away at the end of the statement? That seems a bit heavy-weight. Also it seems too specific to unique constraints. I think it would be better to cure the scalability issues for all constraints and triggers in one place, in the after triggers queue code. I had hoped that after doing deferrable unique constraints, I might apply a similar approach to other constraints, eg. a deferrable check constraint. In that case, an index doesn't help, and there is no choice but to check the rows one at a time. Unique (and also FK) are special, in that they have potentially more optimal ways of checking them in bulk. ISTM that this is an orthogonal concept to the issue of making the trigger queue scalable, except that there ought to be an efficient way of discarding all the queued entries for a particular constraint, if we decide to check it en masse (perhaps a separate queue per constraint, or per trigger). - Dean > However, there are some potential issues. I didn't think this through > yet, but here is a quick list just to get some thoughts down: > > 1. It would be tricky to merge while checking constraints if we are > supporting more general constraints like in my proposal > ( http://archives.postgresql.org/pgsql-hackers/2009-07/msg00302.php ). > > 2. Which indexes can be merged efficiently, and how much effort would it > take to make this work? > > 3. A related issue: making indexes mergeable would be useful for bulk > inserts as well. > > 4. At the end of the command, the index needs to work, meaning that > queries would have to search two indexes. That may be difficult (but > check the GIN fast insert code, which does something similar). > > 5. The temporary index still can't be enforcing constraints if they are > deferred, so it won't solve all the issues here. > > Regards, > Jeff Davis >
Here is an updated version of this patch which should apply to HEAD, with updated docs, regression tests, pg_dump and psql \d. It works well for small numbers of temporary uniqueness violations, and at least as well (in fact about twice as fast) as deferred FK checks for large numbers of deferred checks. I propose trying to improve performance and scalability for large numbers of deferred checks in a separate patch. - Dean
Attachment
On Sun, 2009-07-12 at 14:14 +0100, Dean Rasheed wrote: > Here is an updated version of this patch which should apply to HEAD, > with updated docs, regression tests, pg_dump and psql \d. > > It works well for small numbers of temporary uniqueness violations, > and at least as well (in fact about twice as fast) as deferred FK > checks for large numbers of deferred checks. I took a brief look at this. You're extending the index AM, and that might not be necessary. It might be fine, but usually there is a lot of discussion around the changing of important APIs, so it might be worth looking at alternatives. With the patch I'm working on for generalized index constraints, there would be no need to extend the index AM. However, I don't expect my mechanism to replace the existing unique btree constraints, because I would expect the existing unique constraints to be faster (I haven't tested yet, though). Perhaps we could instead use the TRY/CATCH mechanism. It's generally difficult to figure out from the code exactly what happened, but in this case we have the error code ERRCODE_UNIQUE_VIOLATION. So, just check for that error code rather than passing back a boolean. You might want to change the signature of _bt_check_unique() so that it doesn't have to raise the error inside, and you can raise the error from _bt_doinsert(). The only problem there is telling the btree AM whether or not to do the insert or not (i.e. fake versus real insert). Perhaps you can just do that with careful use of a global variable? Sure, all of this is a little ugly, but we've already acknowledged that there is some ugliness around the existing unique constraint and the btree code that supports it (for one, the btree AM accesses the heap). > I propose trying to improve performance and scalability for large > numbers of deferred checks in a separate patch. Would it be possible to just check how long the list of potential conflicts is growing, and if it gets to big, just replace them all with a "bulk check" event? Regards,Jeff Davis
On Tue, Jul 14, 2009 at 09:56:48AM -0700, Jeff Davis wrote: > On Sun, 2009-07-12 at 14:14 +0100, Dean Rasheed wrote: > > Here is an updated version of this patch which should apply to HEAD, > > with updated docs, regression tests, pg_dump and psql \d. > > > > It works well for small numbers of temporary uniqueness violations, > > and at least as well (in fact about twice as fast) as deferred FK > > checks for large numbers of deferred checks. > > I took a brief look at this. You're extending the index AM, and that > might not be necessary. It might be fine, but usually there is a lot of > discussion around the changing of important APIs, so it might be worth > looking at alternatives. > > With the patch I'm working on for generalized index constraints, there > would be no need to extend the index AM. However, I don't expect my > mechanism to replace the existing unique btree constraints, because I > would expect the existing unique constraints to be faster (I haven't > tested yet, though). > > Perhaps we could instead use the TRY/CATCH mechanism. It's generally > difficult to figure out from the code exactly what happened, but in this > case we have the error code ERRCODE_UNIQUE_VIOLATION. So, just check for > that error code rather than passing back a boolean. You might want to > change the signature of _bt_check_unique() so that it doesn't have to > raise the error inside, and you can raise the error from _bt_doinsert(). > > The only problem there is telling the btree AM whether or not to do the > insert or not (i.e. fake versus real insert). Perhaps you can just do > that with careful use of a global variable? > > Sure, all of this is a little ugly, but we've already acknowledged that > there is some ugliness around the existing unique constraint and the > btree code that supports it (for one, the btree AM accesses the heap). > I am looking at adding unique support to hash indexes for 8.5 and they will definitely need to visit the heap. Regards, Ken > > I propose trying to improve performance and scalability for large > > numbers of deferred checks in a separate patch. > > Would it be possible to just check how long the list of potential > conflicts is growing, and if it gets to big, just replace them all with > a "bulk check" event? > > Regards, > Jeff Davis >
Jeff Davis wrote: > The only problem there is telling the btree AM whether or not to do the > insert or not (i.e. fake versus real insert). Perhaps you can just do > that with careful use of a global variable? > > Sure, all of this is a little ugly, but we've already acknowledged that > there is some ugliness around the existing unique constraint and the > btree code that supports it (for one, the btree AM accesses the heap). My 2c on this issue: if this is ugly (and it is) and needs revisiting to extend it, please by all means let's make it not ugly instead of moving the ugliness around. I didn't read the original proposal in detail so IMBFOS, but it doesn't seem like using our existing deferred constraints to handle uniqueness checks unuglifies this code enough ... For example I think we'd like to support stuff like "UPDATE ... SET a = -a" where the table is large. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Tue, 2009-07-14 at 13:29 -0500, Kenneth Marshall wrote: > I am looking at adding unique support to hash indexes for 8.5 and > they will definitely need to visit the heap. Have you seen this patch? http://archives.postgresql.org/message-id/1246840119.19547.126.camel@jdavis This patch will support unique constraints for hash indexes as well. There may still be a use-case for specialized hash index unique constraints, similar to btree, but please follow the work to make sure that no work is wasted. Also, I don't see a problem with using the same hacks in the hash index code as is used in the btree index code. If you see a better way, or if you think index AM changes would be useful to you as well, you should probably open that discussion. I was trying to provide an alternative to an index AM API change, because I thought there might be some resistance to that. However, if there are multiple index AMs that can make use of it, there is a stronger case for an API change. Regards,Jeff Davis
On Tue, 2009-07-14 at 15:00 -0400, Alvaro Herrera wrote: > My 2c on this issue: if this is ugly (and it is) and needs revisiting to > extend it, please by all means let's make it not ugly instead of moving > the ugliness around. I didn't read the original proposal in detail so > IMBFOS, but it doesn't seem like using our existing deferred constraints > to handle uniqueness checks unuglifies this code enough ... For example > I think we'd like to support stuff like "UPDATE ... SET a = -a" where > the table is large. I don't entirely understand what you're suggesting. 1. Are you saying that an AM API change is the best route? If so, we should probably start a discussion along those lines. Heikki is already changing the API for index-only scans, and Dean's API change proposal may be useful for Kenneth's unique hash indexes. You might as well all attack the current API in unison ;) 2. Even if we allow some kind of bulk constraint check to optimize the "set a = a + 1" case, we should still allow some much cheaper mechanism to defer retail constraint violations. For that, why not make use of the existing constraint trigger mechanism? Regards,Jeff Davis
2009/7/14 Alvaro Herrera <alvherre@commandprompt.com>: > Jeff Davis wrote: > >> The only problem there is telling the btree AM whether or not to do the >> insert or not (i.e. fake versus real insert). Perhaps you can just do >> that with careful use of a global variable? >> >> Sure, all of this is a little ugly, but we've already acknowledged that >> there is some ugliness around the existing unique constraint and the >> btree code that supports it (for one, the btree AM accesses the heap). > Well the ugliness referred to here (btree accessing the heap) seems like a necessary evil. I don't think I want to add to it by introducing global variables. > My 2c on this issue: if this is ugly (and it is) and needs revisiting to > extend it, please by all means let's make it not ugly instead of moving > the ugliness around. I didn't read the original proposal in detail so > IMBFOS, but it doesn't seem like using our existing deferred constraints > to handle uniqueness checks unuglifies this code enough ... For example > I think we'd like to support stuff like "UPDATE ... SET a = -a" where > the table is large. > This patch works OK for around 1M rows. 10M is a real stretch (for me it took around 1.7GB of backend memory). Any larger than that is not going to be feasible. There is a separate TODO item to tackle this scalability limit for deferred triggers, and I'd like to tackle that in a separate patch. I think more discussion needs to be had on ways to fix this (and hopefully unuglify that code in the process). ITSM that it is not simply a matter of spooling the current queues to disk. There is code in there which scans whole queues shuffling things around. So perhaps a queue per trigger would help optimise this, allowing us to move a whole queue cheaply, or drop it in favour of a bulk check. I've not thought it through much more than that so far. - Dean
Jeff Davis wrote: > 1. Are you saying that an AM API change is the best route? If so, we > should probably start a discussion along those lines. Heikki is already > changing the API for index-only scans, and Dean's API change proposal > may be useful for Kenneth's unique hash indexes. You might as well all > attack the current API in unison ;) Yeah, I don't think there's any point in keeping the API stable just for the sake of keeping it stable. I mean surely we don't want to break it for no reason, but if we can clean up the uniqueness check situation somehow by breaking the API, for all means let's explore that ... (I don't think anybody likes the way btree currently abuses the heap API; it's just that it's so damn fast to do it that way. Certainly we don't want to make it slower!) > 2. Even if we allow some kind of bulk constraint check to optimize the > "set a = a + 1" case, we should still allow some much cheaper mechanism > to defer retail constraint violations. For that, why not make use of the > existing constraint trigger mechanism? Sure, perhaps you're right, which is why I added the disclaimer that I hadn't actually read the patch in any detail ... -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, 2009-07-14 at 20:32 +0100, Dean Rasheed wrote: > Well the ugliness referred to here (btree accessing the heap) seems > like a necessary evil. I don't think I want to add to it by > introducing global variables. Ok, try to coordinate with Kenneth to make sure that the API change satisfies deferrable uniques for both btree and hash indexes. I don't have a strong opinion one way or another about the API change. Regards,Jeff Davis
On Tue, Jul 14, 2009 at 12:13:33PM -0700, Jeff Davis wrote: > On Tue, 2009-07-14 at 13:29 -0500, Kenneth Marshall wrote: > > I am looking at adding unique support to hash indexes for 8.5 and > > they will definitely need to visit the heap. > > Have you seen this patch? > > http://archives.postgresql.org/message-id/1246840119.19547.126.camel@jdavis > > This patch will support unique constraints for hash indexes as well. > There may still be a use-case for specialized hash index unique > constraints, similar to btree, but please follow the work to make sure > that no work is wasted. > > Also, I don't see a problem with using the same hacks in the hash index > code as is used in the btree index code. If you see a better way, or if > you think index AM changes would be useful to you as well, you should > probably open that discussion. > > I was trying to provide an alternative to an index AM API change, > because I thought there might be some resistance to that. However, if > there are multiple index AMs that can make use of it, there is a > stronger case for an API change. > > Regards, > Jeff Davis > I will take a look at that patch. My thought was to use the same process as the btree support for unique indexes since it has been well tested and optimized. Thanks, Ken
Review feedback: 1. Compiler warning: index.c: In function ‘index_create’: index.c:784: warning: implicit declaration of function ‘SystemFuncName’ 2. I know that the GIN, GiST, and Hash AMs don't use the uniqueness_check argument at all -- in fact it is #ifdef'ed out. However, for the sake of documentation it would be good to change those unused arguments (in, e.g., gininsert()) to be IndexUniqueCheck enums rather than bools. 3. The unique constraint no longer waits for concurrent transactions to finish if the unique constraint is deferrable anyway (and it's not time to actually check the constraint yet). That makes sense, because the whole point is to defer the constraint. However, that means there are a few degenerate situations that were OK before, but can now get us into trouble. For instance, if you have a big delete and a concurrent big insert, the current code will just block at the first conflicting tuple and wait for the delete to finish. With deferrable constraints, it would save all of those tuples up as potential conflicts, using a lot of memory, when it is not strictly necessary because the tuples will be gone anyway. I'm not particularly worried about this situation -- because I think it's a reasonable thing to expect when using deferred constraints -- but I'd like to bring it up. 4. Requiring DEFERRABLE after UNIQUE in order to get commands like "UPDATE ... SET i = i + 1" to work isn't following the spec. I'm not sure what we should do here, because the 8.4 behavior is not following the spec at all, but people may still want it. 5. In the docs, 50.2: "This is the only situation ...": it's a little unclear what "this" is. 6. Missing support for CREATE INDEX CONCURRENTLY is unfortunate. What would be involved in adding support? 7. It looks like deferrable unique indexes can only be created by adding a constraint, not as part of the CREATE INDEX syntax. One consequence is that CIC can't be supported (right?). If you don't plan to support CIC, then maybe that's OK. 8. Code like the following: is_vacuum ? UNIQUE_CHECK_NO : deferUniqueCheck ? UNIQUE_CHECK_PARTIAL : relationDescs[i]->rd_index->indisunique? UNIQUE_CHECK_YES : UNIQUE_CHECK_NO); Is a little difficult to read, at least for me. It's a style thing, so you don't have to agree with me about this. 9. Passing problemIndexList to AfterTriggerSaveEvent seems a little awkward. I don't see an obvious way to make it cleaner, but I thought it's worth mentioning. 10. You're overloading tgconstrrelid to hold the constraint's index's oid, when normally it holds the referenced table. You should probably document this a little better, because I don't think that field is used to hold an index oid anywhere else. The rest of the patch seems fairly complete. Tests, documentation, psql, and pg_dump support look good. And the patch works, of course. Code and comments look good to me as well. I like the patch. It solves a problem, brings us closer to the SQL standard, and the approach seems reasonable to me. Regards,Jeff Davis
Thanks for the thorough review. I attach an new version of the patch, updated to HEAD, and with the index AM change discussed. 2009/7/18 Jeff Davis <pgsql@j-davis.com>: > > Review feedback: > > 1. Compiler warning: > > index.c: In function ‘index_create’: > index.c:784: warning: implicit declaration of function ‘SystemFuncName’ > > 2. I know that the GIN, GiST, and Hash AMs don't use the > uniqueness_check argument at all -- in fact it is #ifdef'ed out. > However, for the sake of documentation it would be good to change those > unused arguments (in, e.g., gininsert()) to be IndexUniqueCheck enums > rather than bools. > Fixed. > 3. The unique constraint no longer waits for concurrent transactions to > finish if the unique constraint is deferrable anyway (and it's not time > to actually check the constraint yet). That makes sense, because the > whole point is to defer the constraint. However, that means there are a > few degenerate situations that were OK before, but can now get us into > trouble. > > For instance, if you have a big delete and a concurrent big insert, the > current code will just block at the first conflicting tuple and wait for > the delete to finish. With deferrable constraints, it would save all of > those tuples up as potential conflicts, using a lot of memory, when it > is not strictly necessary because the tuples will be gone anyway. I'm > not particularly worried about this situation -- because I think it's a > reasonable thing to expect when using deferred constraints -- but I'd > like to bring it up. > > 4. Requiring DEFERRABLE after UNIQUE in order to get commands like > "UPDATE ... SET i = i + 1" to work isn't following the spec. I'm not > sure what we should do here, because the 8.4 behavior is not following > the spec at all, but people may still want it. > So basically NOT DEFERRABLE should behave the same as DEFERRABLE-always-IMMEDIATE? I guess that you'd want something that blocks at the first conflict from another transaction but otherwise queues up potential conflicts to be checked at the end. It seems plausible to build that on top of what I've done so far, but I'm not brave enough to try in this patch (which is after all about implementing deferrable). > 5. In the docs, 50.2: "This is the only situation ...": it's a little > unclear what "this" is. > > 6. Missing support for CREATE INDEX CONCURRENTLY is unfortunate. What > would be involved in adding support? > Not too hard in the btree code, just a minor reshuffle. It seems to have no noticeable performance hit, so I've done it there, but... > 7. It looks like deferrable unique indexes can only be created by adding > a constraint, not as part of the CREATE INDEX syntax. One consequence is > that CIC can't be supported (right?). If you don't plan to support CIC, > then maybe that's OK. > that's right, the current constraint syntax doesn't allow the constraint index to be built concurrently. Perhaps in the future... > 8. Code like the following: > is_vacuum ? UNIQUE_CHECK_NO : > deferUniqueCheck ? UNIQUE_CHECK_PARTIAL : > relationDescs[i]->rd_index->indisunique ? > UNIQUE_CHECK_YES : UNIQUE_CHECK_NO); > > Is a little difficult to read, at least for me. It's a style thing, so > you don't have to agree with me about this. > Yeah, I agree with you on that, so I've tidied it up a bit. > 9. Passing problemIndexList to AfterTriggerSaveEvent seems a little > awkward. I don't see an obvious way to make it cleaner, but I thought > it's worth mentioning. > > 10. You're overloading tgconstrrelid to hold the constraint's index's > oid, when normally it holds the referenced table. You should probably > document this a little better, because I don't think that field is used > to hold an index oid anywhere else. > Done. Thanks for the feedback. - Dean > The rest of the patch seems fairly complete. Tests, documentation, psql, > and pg_dump support look good. And the patch works, of course. Code and > comments look good to me as well. > > I like the patch. It solves a problem, brings us closer to the SQL > standard, and the approach seems reasonable to me. > > Regards, > Jeff Davis > > >
Attachment
Dean Rasheed wrote: > Thanks for the thorough review. I attach an new version of the patch, > updated to HEAD, and with the index AM change discussed. Wow, this is a large patch. I didn't do a thorough review, but some quickies I noticed: * Please move the code that says that it should be in a new file to a new file. * Please don't mimic the silliness of RI_FKey_check of an unused return value. Just make it return void, and have the calleruse the proper PG_RETURN_FOO macro. (Unless there's a specific reason to not do things that way) * I'm not sure about the changes in trigger.h and related elsewhere. Seems related to the new list in AfterTriggerSaveEvent,which is used in ways that seem to conflict with its header comment ... I wonder if we should be doingthat in the executor itself instead. In any case it's inconsistent that the list is added to ExecARInsertTriggers butnot to ExecARUpdateTrigger et al ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
2009/7/20 Alvaro Herrera <alvherre@commandprompt.com>: > Dean Rasheed wrote: >> Thanks for the thorough review. I attach an new version of the patch, >> updated to HEAD, and with the index AM change discussed. > > Wow, this is a large patch. > > I didn't do a thorough review, but some quickies I noticed: > > * Please move the code that says that it should be in a new file to a > new file. > Ah yes, I forgot about that. Will do. utils/adt/constraints.c ? I'm thinking other deferred constraints (eg. CHECK) might get added to it one day. > * Please don't mimic the silliness of RI_FKey_check of an unused return > value. Just make it return void OK. > and have the caller use the proper > PG_RETURN_FOO macro. (Unless there's a specific reason to not do > things that way) > It looks like I can't use PG_RETURN_NULL() because that will break the trigger calling code, but I could use PG_RETURN_VOID(). But is that the recommended style? Section 35.4 of the docs suggests using PointerGetDatum(NULL). > * I'm not sure about the changes in trigger.h and related elsewhere. > Seems related to the new list in AfterTriggerSaveEvent, which is > used in ways that seem to conflict with its header comment ... I > wonder if we should be doing that in the executor itself instead. Yes I suppose the comment is a bit misleading. I put the check there because that's where the similar RI checks are, which already conflict with the header comment. The simplest solution is to just update the comment. This seemed like the least intrusive way to do this. It is also called from copy.c which duplicates some of the executor code. > In any case it's inconsistent that the list is added to > ExecARInsertTriggers but not to ExecARUpdateTrigger et al ... > Yes it is added to ExecARUpdateTriggers(). - Dean
Dean Rasheed <dean.a.rasheed@googlemail.com> writes: >> * Please move the code that says that it should be in a new file to a >> �new file. > Ah yes, I forgot about that. Will do. > utils/adt/constraints.c ? Please, no. Constraints are not a datatype. Do not copy the brain-dead decision that put ri_triggers there. regards, tom lane
On Mon, Jul 20, 2009 at 10:21:53AM -0400, Tom Lane wrote: > Dean Rasheed <dean.a.rasheed@googlemail.com> writes: > >> * Please move the code that says that it should be in a new file to a > >> �new file. > > > Ah yes, I forgot about that. Will do. > > > utils/adt/constraints.c ? > > Please, no. Constraints are not a datatype. Do not copy the brain-dead > decision that put ri_triggers there. Does that mean ri_triggers should come out of there? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
2009/7/20 David Fetter <david@fetter.org>: > On Mon, Jul 20, 2009 at 10:21:53AM -0400, Tom Lane wrote: >> Dean Rasheed <dean.a.rasheed@googlemail.com> writes: >> >> * Please move the code that says that it should be in a new file to a >> >> new file. >> >> > Ah yes, I forgot about that. Will do. >> >> > utils/adt/constraints.c ? >> >> Please, no. Constraints are not a datatype. Do not copy the brain-dead >> decision that put ri_triggers there. > > Does that mean ri_triggers should come out of there? > The same argument could be applied to ruleutils.c, trigfuncs.c and perhaps one or two others. And if not there, where then?
David Fetter <david@fetter.org> writes: >> Please, no. Constraints are not a datatype. Do not copy the brain-dead >> decision that put ri_triggers there. > Does that mean ri_triggers should come out of there? Not as long as we're on CVS --- it isn't worth the loss of history. I might think about it when/if we move to git. regards, tom lane
On Mon, Jul 20, 2009 at 01:00:12PM -0400, Tom Lane wrote: > David Fetter <david@fetter.org> writes: > >> Please, no. Constraints are not a datatype. Do not copy the brain-dead > >> decision that put ri_triggers there. > > > Does that mean ri_triggers should come out of there? > > Not as long as we're on CVS --- it isn't worth the loss of history. > I might think about it when/if we move to git. As far as you're concerned, what's blocking that? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Mon, Jul 20, 2009 at 01:00:12PM -0400, Tom Lane wrote: >> I might think about it when/if we move to git. > As far as you're concerned, what's blocking that? Lack of committer familiarity with git, lack of a bulletproof migration process, uncertainty about preferred multi-branch patching techniques, uncertainty about buildfarm integration, etc etc. Do I really need to repeat the previous discussions to you? AFAIR none of the open questions have gotten resolved with any clarity. regards, tom lane
[changing subject, as we seem to have done] Tom Lane wrote: > David Fetter <david@fetter.org> writes: > >> On Mon, Jul 20, 2009 at 01:00:12PM -0400, Tom Lane wrote: >> >>> I might think about it when/if we move to git. >>> > > >> As far as you're concerned, what's blocking that? >> > > Lack of committer familiarity with git, lack of a bulletproof migration > process, uncertainty about preferred multi-branch patching techniques, > uncertainty about buildfarm integration, etc etc. Do I really need to > repeat the previous discussions to you? AFAIR none of the open > questions have gotten resolved with any clarity. > > > I have been derailed slightly by pressure of other work from making the buildfarm client git-capable. But it is still quite high on my TODO list. I'm not sure where we got to with doing some surgery on the CVS repo so that we can replicate all the tags and branches properly. Has someone fully identified what needs to be fixed so we can have all the tags? cheers andrew
2009/7/20 Alvaro Herrera <alvherre@commandprompt.com>: > Seems related to the new list in AfterTriggerSaveEvent, which is > used in ways that seem to conflict with its header comment ... Reading the comment for that function, I think it is quite misleading - mainly because the meaning of the word "event" mutates half-way through the function. A better description might be: /* ----------* AfterTriggerSaveEvent()** Called by ExecA[RS]...Triggers() to queue up the triggers that should* be firedfor an event.** NOTE: this is called whenever there are any triggers associated with* the event (even if they aredisabled). This function decides which* triggers actually need to be queued.* ----------*/ - Dean
On Monday 20 July 2009 23:17:30 Andrew Dunstan wrote: > I'm not sure where we got to with doing some surgery on the CVS repo so > that we can replicate all the tags and branches properly. Has someone > fully identified what needs to be fixed so we can have all the tags? I think this depends on which tool we would end up using to do the final conversion. Each tool has its own quirks.
Peter Eisentraut wrote: > On Monday 20 July 2009 23:17:30 Andrew Dunstan wrote: > > I'm not sure where we got to with doing some surgery on the CVS repo so > > that we can replicate all the tags and branches properly. Has someone > > fully identified what needs to be fixed so we can have all the tags? > > I think this depends on which tool we would end up using to do the final > conversion. Each tool has its own quirks. Didn't someone report not long ago that there were files missing in a checkout of some not-so-old tag that made it impossible to compile such a tree? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, 2009-07-20 at 17:24 +0100, Dean Rasheed wrote: > The same argument could be applied to ruleutils.c, trigfuncs.c and perhaps > one or two others. > > And if not there, where then? I'm moving the patch status back to "waiting on author" until Alvaro's concerns are addressed. I don't have an opinion about what should happen to the location of the file, but some kind of decision should be made. Regards,Jeff Davis
2009/7/21 Jeff Davis <pgsql@j-davis.com>: > On Mon, 2009-07-20 at 17:24 +0100, Dean Rasheed wrote: >> The same argument could be applied to ruleutils.c, trigfuncs.c and perhaps >> one or two others. >> >> And if not there, where then? > > I'm moving the patch status back to "waiting on author" until Alvaro's > concerns are addressed. I don't have an opinion about what should happen > to the location of the file, but some kind of decision should be made. > > Regards, > Jeff Davis > > OK, I'll try to post an updated patch soon. I'm not seeing an obvious alternative location to utils/adt. Any advice would be appreciated. - Dean
On Tue, 2009-07-21 at 22:01 +0100, Dean Rasheed wrote: > OK, I'll try to post an updated patch soon. > > I'm not seeing an obvious alternative location to utils/adt. Any advice would > be appreciated. My first reaction is utils/misc. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2009-07-21 at 22:01 +0100, Dean Rasheed wrote: >> I'm not seeing an obvious alternative location to utils/adt. Any advice would >> be appreciated. You could make a fair case for either backend/catalog or backend/commands. Maybe the latter since that's where the core trigger support is. > My first reaction is utils/misc. I always hated that directory. *Double* failure to classify. regards, tom lane
2009/7/21 Tom Lane <tgl@sss.pgh.pa.us>: > Jeff Davis <pgsql@j-davis.com> writes: >> On Tue, 2009-07-21 at 22:01 +0100, Dean Rasheed wrote: >>> I'm not seeing an obvious alternative location to utils/adt. Any advice would >>> be appreciated. > > You could make a fair case for either backend/catalog or > backend/commands. Maybe the latter since that's where the core > trigger support is. > >> My first reaction is utils/misc. > > I always hated that directory. *Double* failure to classify. > > regards, tom lane > OK, here's an updated patch. - Dean
Attachment
2009/7/22 Dean Rasheed <dean.a.rasheed@googlemail.com>: > OK, here's an updated patch. > In case it's not obvious, I've opted for backend/commands for the new file. - Dean
On Wed, 2009-07-22 at 12:25 +0100, Dean Rasheed wrote: > OK, here's an updated patch. One thing that Alvaro mentioned that you didn't do yet is use the macro to return from the function (either PG_RETURN_VOID() or PG_RETURN_NULL()). You seem to be following the document here: http://www.postgresql.org/docs/8.4/static/trigger-example.html So I'm not going to hold you up on this issue. It's passed my review, and I'm marking it as such on the commitfest page. Thanks! Regards,Jeff Davis
Dean Rasheed <dean.a.rasheed@googlemail.com> writes: > [ latest deferrable-unique patch ] I'm starting to look at this now. I haven't read the patch itself yet, but I went through the discussion thread. I'm not sure whether we have actually decided that we want to commit this, as opposed to treating it as an investigation exercise. The main thing that is bothering me is something Dean pointed out at the very beginning: the patch will not scale well for large numbers of conflicts. The reason this bothers me is that from what I recall of past complaints about our lack of deferred unique checks, the *primary* use case is things like "update foo set id = id + 1". So I'm afraid that this might be a toy implementation that's not useful in practice. The three likely responses to this objection seem to be 1. "You're right, we should reject the patch until that's fixed." 2. "You're wrong, the patch is perfectly useful as-is." 3. "You're right, but we should commit anyway because it will be fixed later." I don't think I'm going to believe #3 though, because there's no concrete design for a fix on the table, much less a commitment to implement it. Comments? regards, tom lane
On Mon, 2009-07-27 at 13:14 -0400, Tom Lane wrote: > The main thing that is bothering me is something Dean pointed out at > the very beginning: the patch will not scale well for large numbers of > conflicts. The way I see it, there are two strategies: (a) build up a list as you go, and check it later (b) do a check of the fulltable at once Is there another reasonable option? The patch seems like a reasonable implementation of (a), so what it's missing is the ability to fall back to (b) when the list gets too large (compared with available memory or relative to the table size). Are you suggesting that we wait until (b) is implemented, or do you envision something else entirely? Regards,Jeff Davis
2009/7/27 Jeff Davis <pgsql@j-davis.com>: > On Mon, 2009-07-27 at 13:14 -0400, Tom Lane wrote: >> The main thing that is bothering me is something Dean pointed out at >> the very beginning: the patch will not scale well for large numbers of >> conflicts. > I'd pefer to take option #3, and I want to try to tackle the scaling issue next. I didn't do it as part of this patch because it has already grown very big, and I think the scaling problem is largely an orthogonal issue which affects all AR triggers. > The way I see it, there are two strategies: > (a) build up a list as you go, and check it later > (b) do a check of the full table at once > > Is there another reasonable option? > IMO (a) is the only way to go, since you don't know until the end of an update command what proportion of rows were potential conflicts, and so whether or not to do a row-by-row or a wholesale check. I think that this will be almost entirely a patch to trigger.c, for which there is already a separate TODO item. Actually I see 2 parts to this:(1). make trigger queue scalable with easy mechanism to switchover to wholesale check(2). implement wholesale check for UNIQUE but (1) seems like to lion's share of the work. (and then maybe (3). wholesale check for RI constraints) - Dean > The patch seems like a reasonable implementation of (a), so what it's > missing is the ability to fall back to (b) when the list gets too large > (compared with available memory or relative to the table size). > > Are you suggesting that we wait until (b) is implemented, or do you > envision something else entirely? > > Regards, > Jeff Davis > >
Jeff Davis <pgsql@j-davis.com> writes: > The way I see it, there are two strategies: > (a) build up a list as you go, and check it later > (b) do a check of the full table at once > The patch seems like a reasonable implementation of (a), so what it's > missing is the ability to fall back to (b) when the list gets too large > (compared with available memory or relative to the table size). > Are you suggesting that we wait until (b) is implemented, or do you > envision something else entirely? What's mainly bothering me is the fear that real use cases are so heavily skewed towards (b) that we shouldn't bother implementing (a) at all. In that case we'd end up throwing away a lot of this patch, or else carrying a lot of not-very-useful code. I don't have a lot of concrete evidence about this, but as I said most of the complaints we've heard are suggesting bulk updates, eg http://archives.postgresql.org/pgsql-bugs/2003-05/msg00052.php http://archives.postgresql.org/pgsql-bugs/2004-09/msg00248.php http://archives.postgresql.org/pgsql-bugs/2007-07/msg00070.php There are some suggesting the other, eg, http://archives.postgresql.org/pgsql-bugs/2008-01/msg00172.php but they're clearly in the minority. It's possible that the first group are just using simple updates to illustrate the bug, rather than as examples of what they really want to do, but I'm unconvinced of that. Anyway, I'd feel a lot better about it if there were a near-term plan to do something about (b). regards, tom lane
Dean Rasheed <dean.a.rasheed@googlemail.com> writes: > Actually I see 2 parts to this: > (1). make trigger queue scalable with easy mechanism to switchover to > wholesale check > (2). implement wholesale check for UNIQUE > but (1) seems like to lion's share of the work. > (and then maybe (3). wholesale check for RI constraints) We already have those wholesale checks. It might take a bit of refactoring to make them conveniently callable in this context, but there shouldn't be a lot of new code there. I think that "scalable" is sort of missing the mark as far as the needed queue behavior goes. "Lossy" might be a better adjective. The main reason nobody has implemented spill-to-disk for the existing trigger queue is that by the time you've filled memory with the queue, performance already sucks, because it's gonna take forever to fire all those trigger events. Being able to make the queue even bigger isn't going to make it suck less, quite the opposite. You need some way to slide into bulk instead of retail checking strategies, just like join strategies vary depending on the number of rows involved. (In fact, in a real sense these ARE join problems ... maybe we should stop thinking of them as fire-a-bunch-of-triggers and instead think of executing a single check query with appropriate WHERE clause ...) regards, tom lane
2009/7/27 Tom Lane <tgl@sss.pgh.pa.us>: > Jeff Davis <pgsql@j-davis.com> writes: >> The way I see it, there are two strategies: >> (a) build up a list as you go, and check it later >> (b) do a check of the full table at once > >> The patch seems like a reasonable implementation of (a), so what it's >> missing is the ability to fall back to (b) when the list gets too large >> (compared with available memory or relative to the table size). > >> Are you suggesting that we wait until (b) is implemented, or do you >> envision something else entirely? > > What's mainly bothering me is the fear that real use cases are so > heavily skewed towards (b) that we shouldn't bother implementing (a) > at all. In that case we'd end up throwing away a lot of this patch, > or else carrying a lot of not-very-useful code. > I have a definite use for (a), which is why I started this. I typically write web apps on top of PG using an ORM, and I've had to write some fairly intricate code to save things in the right order to avoid violating my own unique constraints. I agree that (b) is important too, but typically when I do this kind of bulk update, I drop and re-add the indexes. More importantly, if you implement (b) first, how is it ever going to help with (a)? I thought it best to do (a) first and then think about how to switchover to (b). - Dean > I don't have a lot of concrete evidence about this, but as I said > most of the complaints we've heard are suggesting bulk updates, eg > http://archives.postgresql.org/pgsql-bugs/2003-05/msg00052.php > http://archives.postgresql.org/pgsql-bugs/2004-09/msg00248.php > http://archives.postgresql.org/pgsql-bugs/2007-07/msg00070.php > > There are some suggesting the other, eg, > http://archives.postgresql.org/pgsql-bugs/2008-01/msg00172.php > but they're clearly in the minority. > > It's possible that the first group are just using simple updates > to illustrate the bug, rather than as examples of what they > really want to do, but I'm unconvinced of that. > > Anyway, I'd feel a lot better about it if there were a near-term > plan to do something about (b). > > regards, tom lane >
2009/7/27 Tom Lane <tgl@sss.pgh.pa.us>: > (In fact, in a real sense these ARE join problems ... maybe we should > stop thinking of them as fire-a-bunch-of-triggers and instead think of > executing a single check query with appropriate WHERE clause ...) > Hmm. Presumably that is the same WHERE clause as the UPDATE. But it has to execute after the update. How does it avoid re-executing functions, re-incrementing sequences, etc... ? - Dean
[ still poking around in this patch ] Jeff Davis <pgsql@j-davis.com> writes: > 10. You're overloading tgconstrrelid to hold the constraint's index's > oid, when normally it holds the referenced table. You should probably > document this a little better, because I don't think that field is used > to hold an index oid anywhere else. Having documented this kluge doesn't make it any more acceptable. It risks breaking any number of things that expect that field to reference a table, not an index. There seem to be two reasonable alternatives here: * Add another column to pg_trigger (and hence the Trigger data structure) to carry the index OID. * Store the index OID as a trigger argument (which means converting it to text form, and then back again for each use). The former approach would be a lot easier if anyone is trying to query pg_trigger for the triggers associated with an index, but OTOH I'm not sure anyone would really need to do that. The latter approach minimizes catalog changes at the cost of a bit of runtime inefficiency; but considering everything else that goes on to fire a deferred trigger, worrying about a strtoul call is probably silly. If we did add another column to pg_trigger, I'd be a bit tempted to add one to pg_constraint too. tgconstrrelid for RI triggers is a mirror of a pg_constraint column, and it seems like the index data perhaps should be as well. (Indeed, one of the thing that got me annoyed about this kluge in the first place was that it broke that relationship without changing the documentation.) Comments, opinions? regards, tom lane
On Mon, 2009-07-27 at 16:33 -0400, Tom Lane wrote: > If we did add another column to pg_trigger, I'd be a bit tempted to add > one to pg_constraint too. tgconstrrelid for RI triggers is a mirror of > a pg_constraint column, and it seems like the index data perhaps should > be as well. (Indeed, one of the thing that got me annoyed about this > kluge in the first place was that it broke that relationship without > changing the documentation.) That would work great for me, as I was planning to add such a column anyway for my "generalized index constraints" patch. Regards,Jeff Davis
2009/7/27 Jeff Davis <pgsql@j-davis.com>: > On Mon, 2009-07-27 at 16:33 -0400, Tom Lane wrote: >> If we did add another column to pg_trigger, I'd be a bit tempted to add >> one to pg_constraint too. tgconstrrelid for RI triggers is a mirror of >> a pg_constraint column, and it seems like the index data perhaps should >> be as well. (Indeed, one of the thing that got me annoyed about this >> kluge in the first place was that it broke that relationship without >> changing the documentation.) > > That would work great for me, as I was planning to add such a column > anyway for my "generalized index constraints" patch. > Yes that seems like the most sensible option to me. - Dean
On Mon, Jul 27, 2009 at 7:51 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > (In fact, in a real sense these ARE join problems ... maybe we should > stop thinking of them as fire-a-bunch-of-triggers and instead think of > executing a single check query with appropriate WHERE clause ...) A while back I suggested keeping the deferred trigger list in tuplestore format and executing the trigger check as a query between the materialized tuplestore of and the tuples on disk. I love the idea of doing a full SQL query but the problem is that there's no particular reason to assume that a deferred trigger list large enough to warrant a wholesale check is actually a significant percentage of the table. It might only take a few hundred or few thousand checks to warrant a bitmap index scan instead of repeated index probes but a plain SQL query with no reference back to the deferred list would have to check all the millions of rows in the table for no purpose. For foreign keys I was picturing some way to issue an SQL statement like "SELECT from tabletocheck where ctid in (<magic parameter>) and not exists (select 1 from referenced_table where pk = tabletocheck.fk)" and then somehow pass the list of ctids from the deferred list. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > For foreign keys I was picturing some way to issue an SQL statement > like "SELECT from tabletocheck where ctid in (<magic parameter>) and > not exists (select 1 from referenced_table where pk = > tabletocheck.fk)" and then somehow pass the list of ctids from the > deferred list. I have no problem with having some "magic" in there --- FK checks already have to do some things that aren't expressible in standard SQL, because of snapshotting issues. However, the above still presumes that we can afford to store all the CTIDs involved. Which is more or less exactly what the trigger event queue is doing now. We need a different view about that bit, I think. Perhaps we could remember the CTIDs the transaction has touched in a bitmap (which could become lossy under memory pressure)? On a lossy page you'd need to check all the tuples to see which ones have xmins matching your transaction; but that's not too terrible and at least you're not visiting pages you don't need to. regards, tom lane
Dean Rasheed <dean.a.rasheed@googlemail.com> writes: > 2009/7/27 Jeff Davis <pgsql@j-davis.com>: >> On Mon, 2009-07-27 at 16:33 -0400, Tom Lane wrote: >>> If we did add another column to pg_trigger, I'd be a bit tempted to add >>> one to pg_constraint too. >> >> That would work great for me, as I was planning to add such a column >> anyway for my "generalized index constraints" patch. > Yes that seems like the most sensible option to me. Okay. I will go off and do that, then come back to Dean's patch. Proposed names: pg_trigger.tgconstrindid analogous to tgconstrrelidpg_constraint.conuindid analogous to confrelid where the latter will be populated for any unique or pkey constraint. The former will be always 0 for the moment, but we'll start filling it in with Dean's patch. (thinks...) Actually, u for unique might be a poor choice if Jeff's patch goes in and starts using it for things that aren't exactly unique indexes. Should it be just conindid? regards, tom lane
On Mon, 2009-07-27 at 19:12 -0400, Tom Lane wrote: > (thinks...) Actually, u for unique might be a poor choice if Jeff's > patch goes in and starts using it for things that aren't exactly > unique indexes. Should it be just conindid? My thoughts exactly. Regards,Jeff Davis
On Tue, Jul 28, 2009 at 12:04 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Greg Stark <gsstark@mit.edu> writes: >> For foreign keys I was picturing some way to issue an SQL statement >> like "SELECT from tabletocheck where ctid in (<magic parameter>) and >> not exists (select 1 from referenced_table where pk = >> tabletocheck.fk)" and then somehow pass the list of ctids from the >> deferred list. > > I have no problem with having some "magic" in there --- FK checks > already have to do some things that aren't expressible in standard SQL, > because of snapshotting issues. However, the above still presumes that > we can afford to store all the CTIDs involved. Which is more or less > exactly what the trigger event queue is doing now. We need a different > view about that bit, I think. It wasn't clear in the SQL example but I described storing them in a tuplestore. The tuplestore would get spilled to disk automatically but the SQL query could (semi)join against it using whatever form of join is most efficient. Now that I look at that query though it's pretty clear that we don't actually have a good join type to handle this. We would need some kind of merge-join which knew that ctids from a sequential scan were in order (and could ensure that they were in fact in order). There might be a better way to write the query above in a way that didn't need anything special like that. The need to check that the inserted tuple is still live is a big part of the headache. If we could check for violations first and then go back and check any violations if there are any to see if they come from live tuples that would save a lot of work. -- greg http://mit.edu/~gsstark/resume.pdf
Jeff Davis <pgsql@j-davis.com> writes: > On Mon, 2009-07-27 at 19:12 -0400, Tom Lane wrote: >> (thinks...) Actually, u for unique might be a poor choice if Jeff's >> patch goes in and starts using it for things that aren't exactly >> unique indexes. Should it be just conindid? > My thoughts exactly. On looking closer, it appears we should populate this column for FKEY constraints too --- for example this would greatly simplify some of the information_schema views (cf _pg_underlying_index). Now those references will also point at unique indexes, but still this seems like another reason to use a relatively generic column name. conindid it is. regards, tom lane
... btw, where in the SQL spec do you read that PRIMARY KEY constraints can't be deferred? I don't see that. regards, tom lane
... speaking of adding catalog columns, I just discovered that the patch adds searches of pg_depend and pg_constraint to BuildIndexInfo. This seems utterly unacceptable on two grounds: * It's sheer luck that it gets through bootstrap without crashing. Those aren't part of the core set of catalogs that we expect to be accessed by primitive catalog searches. I wouldn't be too surprised if it gets the wrong answer, and the only reason there's not a visible bug is none of the core catalogs have deferrable indexes (so there's no pg_depend entry to be found). * It increases the runtime of BuildIndexInfo by what seems likely to be orders of magnitude (note that get_index_constraint is not a syscacheable operation). Did anyone do any performance checks on this patch, like seeing if pgbench got slower? I think we had better add the deferrability state to pg_index to avoid this. I tried to see if we could dispense with storing the flags in IndexInfo at all, so as not to have to do that. It looks to me like the only place where it's really needed is in ExecInsertIndexTuples: if (is_vacuum || !relationDescs[i]->rd_index->indisunique) uniqueCheck = UNIQUE_CHECK_NO; ==> else if (indexInfo->ii_Deferrable) uniqueCheck = UNIQUE_CHECK_PARTIAL; else uniqueCheck= UNIQUE_CHECK_YES; Since this code has its hands on the pg_index row already, it definitely doesn't need a copy in IndexInfo if the state is in pg_index. However, I also notice that it doesn't particularly care about the deferrability state in the sense that the triggers use (ie, whether to check at end of statement or end of transaction). What we want to know here is whether to force an old-school immediate uniqueness check in the index AM. So I'm thinking that we only need one bool added to pg_index, not two. And I'm further thinking about intentionally calling it something other than "deferred", to emphasize that the semantics aren't quite like regular constraint deferral. Maybe invert the sense and call it "immediate". Comments? regards, tom lane
On Tue, 2009-07-28 at 13:41 -0400, Tom Lane wrote: > * It's sheer luck that it gets through bootstrap without crashing. > Those aren't part of the core set of catalogs that we expect to be > accessed by primitive catalog searches. I wouldn't be too surprised > if it gets the wrong answer, and the only reason there's not a visible > bug is none of the core catalogs have deferrable indexes (so there's > no pg_depend entry to be found). > > * It increases the runtime of BuildIndexInfo by what seems likely to > be orders of magnitude (note that get_index_constraint is not a > syscacheable operation). Did anyone do any performance checks on > this patch, like seeing if pgbench got slower? Not I. I didn't notice anything that made me think it would slow down in the default case, but I'll be more careful in the future. This also impacts my patch. After moving the constraint information from pg_index (where my patch currently has it) to pg_constraint, I will also need access to the index and the constraint at the same time. Perhaps this can all be handled more like CHECK constraints, storing this information in the ResultRelInfo? > I think we had better add the deferrability state to pg_index > to avoid this. This might make it difficult to allow multiple constraints to use the same index. We might have to make some kind of rules about when two constraints can share an index, and when they can't. > I also notice that it doesn't particularly care about the deferrability > state in the sense that the triggers use (ie, whether to check at end of > statement or end of transaction). What we want to know here is whether > to force an old-school immediate uniqueness check in the index AM. [...] > And I'm further thinking about intentionally calling it something other > than "deferred", to emphasize that the semantics aren't quite like > regular constraint deferral. Maybe invert the sense and call it > "immediate". Comments? I'm trying to figure out how this fits with the generalized index constraints idea. We may want the generalized index constraints to have the same "immediate" behavior, but that doesn't have much to do with the index. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2009-07-28 at 13:41 -0400, Tom Lane wrote: >> I think we had better add the deferrability state to pg_index >> to avoid this. > This might make it difficult to allow multiple constraints to use the > same index. Huh? That hardly seems possible anyway, if some of them want deferred checks and others do not. > I'm trying to figure out how this fits with the generalized index > constraints idea. We may want the generalized index constraints to have > the same "immediate" behavior, but that doesn't have much to do with the > index. Sure it does. Whether the check is immediate must be considered a property of the index itself. Any checking you do later could be per-constraint, but the index is either going to fail at insert or not. regards, tom lane
On Tue, 2009-07-28 at 15:15 -0400, Tom Lane wrote: > > This might make it difficult to allow multiple constraints to use the > > same index. > > Huh? That hardly seems possible anyway, if some of them want deferred > checks and others do not. I don't see why it's completely impossible. You could have:* non-overlapping, deferred* "not completely contained in", fail-earlybehavior Probably not worth supporting, though. > Sure it does. Whether the check is immediate must be considered a > property of the index itself. Any checking you do later could be > per-constraint, but the index is either going to fail at insert or not. My point is that the "immediate" behavior does not require the index itself to fail early. My original patch for generalized index constraints has the same behavior as UNIQUE currently does (including the fail early behavior), but can be used over indexes that know nothing about UNIQUE (list GiST). Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2009-07-28 at 15:15 -0400, Tom Lane wrote: >> Sure it does. Whether the check is immediate must be considered a >> property of the index itself. Any checking you do later could be >> per-constraint, but the index is either going to fail at insert or not. > My point is that the "immediate" behavior does not require the index > itself to fail early. My original patch for generalized index > constraints has the same behavior as UNIQUE currently does (including > the fail early behavior), but can be used over indexes that know nothing > about UNIQUE (list GiST). Fail-early still sounds like a property of the index. Whether the property is implemented inside or outside the index AM isn't very relevant. Partial and functional index support are outside the AM, for example, but we have no problem representing those features in pg_index. In any case, this can be redesigned as needed when and if your other patch gets to the point of being ready for consideration. regards, tom lane
Another thought on the index AM API issues: after poking through the code I realized that there is *nobody* paying any attention to the existing bool result of aminsert() (ie, did we insert anything or not). So I think that instead of adding a bool* parameter, we should repurpose the function result, along the lines of this spec: <para> The method's boolean result value is significant only when <literal>checkUnique</> is <literal>UNIQUE_CHECK_PARTIAL</>. In this case a TRUE result means the new entry is known unique, whereas FALSE means itmight be non-unique (and a deferred uniqueness check must be scheduled). For other cases a constant FALSE result is recommended.</para> <para> For non-unique indexes, it is not required that <function>aminsert</> do anything; it might for instance refuseto index NULLs. </para> The bool* parameter is fairly ugly in a couple of ways: it's not clear when it's okay to pass a NULL pointer, and the compiler doesn't give you a lot of help in being sure you've set the result in all code paths. So I'd rather not use one if we don't have to. regards, tom lane
2009/7/28 Tom Lane <tgl@sss.pgh.pa.us>: > ... speaking of adding catalog columns, I just discovered that the patch > adds searches of pg_depend and pg_constraint to BuildIndexInfo. This > seems utterly unacceptable on two grounds: > > * It's sheer luck that it gets through bootstrap without crashing. > Those aren't part of the core set of catalogs that we expect to be > accessed by primitive catalog searches. I wouldn't be too surprised > if it gets the wrong answer, and the only reason there's not a visible > bug is none of the core catalogs have deferrable indexes (so there's > no pg_depend entry to be found). > > * It increases the runtime of BuildIndexInfo by what seems likely to > be orders of magnitude (note that get_index_constraint is not a > syscacheable operation). Did anyone do any performance checks on > this patch, like seeing if pgbench got slower? > > I think we had better add the deferrability state to pg_index > to avoid this. > > I tried to see if we could dispense with storing the flags in IndexInfo > at all, so as not to have to do that. It looks to me like the only > place where it's really needed is in ExecInsertIndexTuples: > > if (is_vacuum || !relationDescs[i]->rd_index->indisunique) > uniqueCheck = UNIQUE_CHECK_NO; > ==> else if (indexInfo->ii_Deferrable) > uniqueCheck = UNIQUE_CHECK_PARTIAL; > else > uniqueCheck = UNIQUE_CHECK_YES; > > Since this code has its hands on the pg_index row already, it definitely > doesn't need a copy in IndexInfo if the state is in pg_index. However, > I also notice that it doesn't particularly care about the deferrability > state in the sense that the triggers use (ie, whether to check at end of > statement or end of transaction). What we want to know here is whether > to force an old-school immediate uniqueness check in the index AM. So > I'm thinking that we only need one bool added to pg_index, not two. > > And I'm further thinking about intentionally calling it something other > than "deferred", to emphasize that the semantics aren't quite like > regular constraint deferral. Maybe invert the sense and call it > "immediate". Comments? > > regards, tom lane > Yes that makes sense. Sorry I didn't spot this - it was a performance regression, which I should have spotted with pgbench. - Dean
2009/7/29 Tom Lane <tgl@sss.pgh.pa.us>: > Another thought on the index AM API issues: after poking through the > code I realized that there is *nobody* paying any attention to the > existing bool result of aminsert() (ie, did we insert anything or not). > So I think that instead of adding a bool* parameter, we should repurpose > the function result, along the lines of this spec: > > <para> > The method's boolean result value is significant only when > <literal>checkUnique</> is <literal>UNIQUE_CHECK_PARTIAL</>. > In this case a TRUE result means the new entry is known unique, whereas > FALSE means it might be non-unique (and a deferred uniqueness check must > be scheduled). For other cases a constant FALSE result is recommended. > </para> > And you'll be moving the ereport() back into the btree code? Makes sense, provided that nothing is ever going to care whether the index actually inserted an entry. I can see arguments for making the recommended return value for "other cases" either TRUE or FALSE, but I guess it doesn't matter since nothing is going to check it. > <para> > For non-unique indexes, it is not required that <function>aminsert</> > do anything; it might for instance refuse to index NULLs. > </para> > Doesn't this comment apply equally to unique indexes? - Dean
Dean Rasheed <dean.a.rasheed@googlemail.com> writes: > 2009/7/29 Tom Lane <tgl@sss.pgh.pa.us>: >> � For non-unique indexes, it is not required that <function>aminsert</> >> � do anything; it might for instance refuse to index NULLs. > Doesn't this comment apply equally to unique indexes? Hmm, I was thinking that a unique-capable index would have to index all tuples. But I guess if it's restricted to one index column (like hash) it could omit nulls and still enforce uniqueness correctly. I'll change that comment. regards, tom lane
Dean Rasheed <dean.a.rasheed@googlemail.com> writes: > [ deferrable unique constraints patch ] Applied after rather extensive editorialization. Aside from the points we already discussed, I redid the logic in _bt_check_unique ... it didn't look right to me, and I also felt that we need a cross-check to verify that the original tuple's index entry gets found in the UNIQUE_CHECK_EXISTING search. I'm going to go add the point about better support of bulk updates to TODO. regards, tom lane
Tom Lane wrote: > Dean Rasheed <dean.a.rasheed@googlemail.com> writes: > > [ deferrable unique constraints patch ] > > Applied after rather extensive editorialization. Kudos!! -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, Jul 29, 2009 at 5:00 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rasheed@googlemail.com> writes: >> [ deferrable unique constraints patch ] > > Applied after rather extensive editorialization. Aside from the points Wow, cool. ...Robert
2009/7/29 Tom Lane <tgl@sss.pgh.pa.us>: > Dean Rasheed <dean.a.rasheed@googlemail.com> writes: >> [ deferrable unique constraints patch ] > > Applied after rather extensive editorialization. Excellent! Thanks for all your work sorting out my rookie mistakes. I haven't had a chance to go through all your "editorializations" in detail yet, but I'm sure that I'll learn a lot from them, so thanks again. And thanks also to everyone else who reviewed this and provided helpful feedback and advice. Cheers! - Dean