Thread: bug w/ cursors and savepoints
Someone at Fujitsu pointed out the following bug in 8.0: begin; savepoint x; create table abc (a int); insert into abc values (5); declare foo cursor for select * from abc; rollback to x; fetch from foo; -- hits an Assert() commit; The stacktrace is: #2 0x0826367b in ExceptionalCondition (conditionName=0x8316544 "!(((bool)((relation)->rd_refcnt == 0)))", errorType=0x8316004 "FailedAssertion", fileName=0x8315f08 "/home/neilc/pgsql/src/backend/utils/cache/relcache.c", lineNumber=2118) at /home/neilc/pgsql/src/backend/utils/error/assert.c:51 #3 0x0825cec0 in AtEOSubXact_RelationCache (isCommit=0 '\0', mySubid=2, parentSubid=1) at /home/neilc/pgsql/src/backend/utils/cache/relcache.c:2118 #4 0x080ade30 in AbortSubTransaction () at /home/neilc/pgsql/src/backend/access/transam/xact.c:3407 #5 0x080ac404 in CommitTransactionCommand () at /home/neilc/pgsql/src/backend/access/transam/xact.c:1982 #6 0x081de4ba in finish_xact_command () at /home/neilc/pgsql/src/backend/tcop/postgres.c:1843 #7 0x081dd102 in exec_simple_query (query_string=0x83b6ad4 "rollback to x;") at /home/neilc/pgsql/src/backend/tcop/postgres.c:950 So what's happening is that the cursor still holds a reference to the newly-created table, so we can't just blow it away. I don't know the subtransaction code too well, so I'm not sure of the right fix. Comments? -Neil
Neil Conway <neilc@samurai.com> writes: > Someone at Fujitsu pointed out the following bug in 8.0: > begin; > savepoint x; > create table abc (a int); > insert into abc values (5); > declare foo cursor for select * from abc; > rollback to x; > fetch from foo; -- hits an Assert() Offhand I'd say this should draw a "no such cursor as foo" error. I'm too tired to look into why foo still exists after the rollback... regards, tom lane
On Tue, 2005-01-25 at 02:40 -0500, Tom Lane wrote: > Offhand I'd say this should draw a "no such cursor as foo" error. > I'm too tired to look into why foo still exists after the rollback... I'm confused; I wasn't involved in the design discussions about portals and subtransactions this summer, but my understanding is that making portals non-transactional was the conclusion. Shouldn't that imply that a DECLARE in an aborted subtransaction should persist? Certainly, it seems AtSubAbort_Portals() makes sure that READY portals created in an aborted subtxn are adopted by the parent transaction. (It might make sense to persist the cursor to the parent transaction, unless the aborted transaction created a database object the cursor depends upon, a la [1] -- but in that case, IMHO we ought to give a different error message than "no such cursor".) If someone can shed some light on what the spec for this behavior ought to be, I'd be happy to fix the bug. -Neil [1] http://archives.postgresql.org/pgsql-hackers/2004-07/msg00349.php
Neil Conway <neilc@samurai.com> writes: > On Tue, 2005-01-25 at 02:40 -0500, Tom Lane wrote: >> Offhand I'd say this should draw a "no such cursor as foo" error. >> I'm too tired to look into why foo still exists after the rollback... > I'm confused; I wasn't involved in the design discussions about portals > and subtransactions this summer, but my understanding is that making > portals non-transactional was the conclusion. Shouldn't that imply that > a DECLARE in an aborted subtransaction should persist? I don't recall the discussions from last summer in detail, but it can't possibly be rational to allow a cursor created in a failed subtransaction to persist beyond that subtransaction... your example in which the cursor uses tables that no longer exist is a fairly egregious example of why not, but there are others. regards, tom lane
On Tue, Jan 25, 2005 at 02:40:51AM -0500, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > > Someone at Fujitsu pointed out the following bug in 8.0: > > begin; > > savepoint x; > > create table abc (a int); > > insert into abc values (5); > > declare foo cursor for select * from abc; > > rollback to x; > > fetch from foo; -- hits an Assert() > > Offhand I'd say this should draw a "no such cursor as foo" error. > I'm too tired to look into why foo still exists after the rollback... At this point, gdb says that the portal is in PORTAL_READY state. The code says to keep it open and reassign it to the parent subxact. I don't remember what the rationale for this was ... I'll review the discussion about this. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Tue, Jan 25, 2005 at 02:40:51AM -0500, Tom Lane wrote: >> Offhand I'd say this should draw a "no such cursor as foo" error. >> I'm too tired to look into why foo still exists after the rollback... > At this point, gdb says that the portal is in PORTAL_READY state. The > code says to keep it open and reassign it to the parent subxact. I > don't remember what the rationale for this was ... I'll review the > discussion about this. IIRC, we had agreed that in a sequence like BEGIN; DECLARE c CURSOR ...; SAVEPOINT x; FETCH FROM c; ROLLBACK TO x; FETCH FROM c; ... the second fetch should get the second cursor row, ie, the rollback does not undo the cursor state change caused by the first fetch. This was frankly driven mostly by implementation considerations, ie, we do not have any mechanism that would support undoing changes of Executor state. But I don't think that we really thought about the case of rolling back the *creation* of a cursor. Neil's example seems to prove that we need to do that. Even aside from the possibility that the database objects won't exist any more, the locks taken out during plan startup would get released by the rollback. So I think the rule ought to be that cursors created inside a failed subtransaction go away. I have some recollection that we did that initially and ran into the problem that the portal holding the ROLLBACK command itself goes away too soon :-(. So a bit of care will be needed here. regards, tom lane
I wrote: > So I think the rule ought to be that cursors > created inside a failed subtransaction go away. Other bits of recollection bubbling up: I think that one reason we made portals not go away instantly on error is that the JDBC guys objected, feeling that it made for weird special cases at the protocol level. So the right fix might involve putting the portal into PORTAL_FAILED state rather than just zapping it completely. regards, tom lane
On Tue, Jan 25, 2005 at 12:32:57PM -0500, Tom Lane wrote: > So the right fix might involve putting the portal into PORTAL_FAILED > state rather than just zapping it completely. Strangely, the code comes up simpler after the fix. Patch attached. Regression test pass. Additionaly I tried both cases mentioned in this thread; maybe it's worthy to add tests for them too. alvherre=# begin; BEGIN alvherre=# savepoint x; SAVEPOINT alvherre=# create table abc (a int); CREATE TABLE alvherre=# insert into abc values (5); INSERT 33616 1 alvherre=# declare foo cursor for select * from abc; DECLARE CURSOR alvherre=# rollback to x; ROLLBACK alvherre=# fetch from foo; -- hits an Assert() ERROR: no existe el cursor «foo» alvherre=# commit; ROLLBACK alvherre=# begin; BEGIN alvherre=# declare c cursor for select 1 union all select 2; DECLARE CURSOR alvherre=# savepoint x; SAVEPOINT alvherre=# fetch from c; ?column? ---------- 1 (1 fila) alvherre=# rollback to x; ROLLBACK alvherre=# fetch from c; ?column? ---------- 2 (1 fila) alvherre=# commit; COMMIT -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "The ability to monopolize a planet is insignificant next to the power of the source"
Attachment
On Tue, Jan 25, 2005 at 03:14:07PM -0300, Alvaro Herrera wrote: > On Tue, Jan 25, 2005 at 12:32:57PM -0500, Tom Lane wrote: > > > So the right fix might involve putting the portal into PORTAL_FAILED > > state rather than just zapping it completely. > > Strangely, the code comes up simpler after the fix. Patch attached. I forgot to mention that I looked at AtAbort_Portals and that while it has the same test to change the state of an active portal, it's probably safe to assume that no cursor will be allowed to run from that point on. Now that I think of it, maybe it's a good idea to add a comment saying why is that the case. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "When the proper man does nothing (wu-wei), his thought is felt ten thousand miles." (Lao Tse)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > ! if (portal->status == PORTAL_ACTIVE) > portal->status = PORTAL_FAILED; > ! if (portal->status == PORTAL_ACTIVE || portal->status == PORTAL_READY) > portal->status = PORTAL_FAILED; I don't think you actually need that change, since you're going to unconditionally close the portal below. The case for PORTAL_ACTIVE is just there to dodge the sanity check in PortalDrop. The routine's comments need a bit of work too. Otherwise it seems OK. Neil or anyone else --- see an issue here? regards, tom lane
Tom Lane wrote: > The routine's comments need a bit of work too. Otherwise it seems OK. > Neil or anyone else --- see an issue here? The policy will now be: cursor creation is transaction, but cursor state modifications (FETCH) are non-transactional -- right? I wonder if it wouldn't be more consistent to make cursor deletion (CLOSE) transactional as well -- so that a CLOSE in an aborted subtransaction would not actually destroy the cursor. Other than that, I think there ought to be some user-level documentation for how cursors and savepoints interact, and some regression tests for this behavior, but I'm happy to add that myself if no one beats me to it. -Neil
On Wed, Jan 26, 2005 at 03:33:07PM +1100, Neil Conway wrote: > Tom Lane wrote: > >The routine's comments need a bit of work too. Otherwise it seems OK. > >Neil or anyone else --- see an issue here? > > The policy will now be: cursor creation is transaction, but cursor state > modifications (FETCH) are non-transactional -- right? I wonder if it > wouldn't be more consistent to make cursor deletion (CLOSE) > transactional as well -- so that a CLOSE in an aborted subtransaction > would not actually destroy the cursor. Hmm ... not sure how hard that is. We left a lot of details for 8.1 though, like trying to save the state of the executor related to the cursor so that FETCH is transactional too. > Other than that, I think there ought to be some user-level documentation > for how cursors and savepoints interact, There is some detail (as of my patch, outdated) in http://developer.postgresql.org/docs/postgres/sql-rollback-to.html If you have a suggestion on where else it should go I'm all ears ... > and some regression tests for this behavior, but I'm happy to add that > myself if no one beats me to it. Please do. I'll post a corrected patch ASAP, including the doc change. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "La espina, desde que nace, ya pincha" (Proverbio africano)
On Tue, Jan 25, 2005 at 02:06:24PM -0300, Alvaro Herrera wrote: Hackers, > At this point, gdb says that the portal is in PORTAL_READY state. The > code says to keep it open and reassign it to the parent subxact. I > don't remember what the rationale for this was ... I'll review the > discussion about this. Our conclusion at the time was that cursors created inside failed subtransactions should remain open. See the proposal and followup discussion starting here: http://archives.postgresql.org/pgsql-hackers/2004-07/msg00700.php If we want to keep this behavior then we should not close all READY cursors as per my previous patch. We should keep them open, and only mark FAILED those cursors that are related to state reversed by the aborting subtransaction. I don't see any way to do this cleanly, until we have some relcache- dependency checking on Portals (maybe anything else?). Not sure what we can do about it right now, with 8.0.1 release tomorrow. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Ciencias políticas es la ciencia de entender por quélos políticos actúan como lo hacen" (netfunny.com)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Our conclusion at the time was that cursors created inside failed > subtransactions should remain open. See the proposal and followup > discussion starting here: > http://archives.postgresql.org/pgsql-hackers/2004-07/msg00700.php > If we want to keep this behavior then we should not close all READY > cursors as per my previous patch. We should keep them open, and only > mark FAILED those cursors that are related to state reversed by the > aborting subtransaction. > I don't see any way to do this cleanly, until we have some relcache- > dependency checking on Portals (maybe anything else?). Not sure what we > can do about it right now, with 8.0.1 release tomorrow. I don't think we have a lot of choices: we have to destroy (or at least mark FAILED) all such cursors for the time being. The whole question of cursor transaction semantics could stand to be revisited in 8.1, but we can't ship the system with such a trivial crashing bug. Note that dependency tracking would not in itself be enough to solve the problem, since the question is not merely what objects the cursor depends on, but whether their definitions were changed in the failed subtransaction. Talk about messy :-( regards, tom lane
On Wed, Jan 26, 2005 at 05:10:09PM -0500, Tom Lane wrote: > I don't think we have a lot of choices: we have to destroy (or at least > mark FAILED) all such cursors for the time being. I don't see a lot of difference between marking the portal FAILED and destroying it (maybe I'm looking at the wrong code). So I just took the simpler approach; patch attached. > Note that dependency tracking would not in itself be enough to solve the > problem, since the question is not merely what objects the cursor > depends on, but whether their definitions were changed in the failed > subtransaction. Talk about messy :-( Maybe we can use the shared cache invalidation mechanism for this. When a message is received that affects a relation marked as referenced by a portal, mark the portal obsolete. I don't recall the details of shared-inval right now, I may be missing something (like the time at which messages are sent. But I believe we send a message to our own backend, no?) -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Always assume the user will do much worse than the stupidest thing you can imagine." (Julien PUYDT)
Attachment
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Wed, Jan 26, 2005 at 05:10:09PM -0500, Tom Lane wrote: >> I don't think we have a lot of choices: we have to destroy (or at least >> mark FAILED) all such cursors for the time being. > I don't see a lot of difference between marking the portal FAILED and > destroying it (maybe I'm looking at the wrong code). So I just took the > simpler approach; patch attached. Mainly just that it'll be easier to improve the code later if we don't have to put back something we removed. I'll tweak the patch to do that. regards, tom lane
On Wed, 2005-01-26 at 12:02 -0300, Alvaro Herrera wrote: > > The policy will now be: cursor creation is transaction, but cursor state > > modifications (FETCH) are non-transactional -- right? I wonder if it > > wouldn't be more consistent to make cursor deletion (CLOSE) > > transactional as well -- so that a CLOSE in an aborted subtransaction > > would not actually destroy the cursor. > > Hmm ... not sure how hard that is. Would it work to record the sub XID of the deleting subtxn on CLOSE, and then consider whether to "really" do the deletion when the subtxn commits/aborts? -Neil
Neil Conway <neilc@samurai.com> writes: > On Wed, 2005-01-26 at 12:02 -0300, Alvaro Herrera wrote: >> Hmm ... not sure how hard that is. > Would it work to record the sub XID of the deleting subtxn on CLOSE, and > then consider whether to "really" do the deletion when the subtxn > commits/aborts? It'd take more than that. Consider BEGIN;DECLARE c CURSOR ...;SAVEPOINT x; CLOSE c; DECLARE c CURSOR ...; -- draws duplicate-cursor error You'd need to do something to "hide" the deleted cursor for the remainder of its subtransaction. regards, tom lane
On Wed, 2005-01-26 at 12:02 -0300, Alvaro Herrera wrote: > > and some regression tests for this behavior, but I'm happy to add that > > myself if no one beats me to it. > > Please do. Attached is a patch adding regression tests for this change -- I've already applied it to HEAD. -Neil
Attachment
Alvaro Herrera wrote: > On Wed, Jan 26, 2005 at 05:10:09PM -0500, Tom Lane wrote: > > >>I don't think we have a lot of choices: we have to destroy (or at least >>mark FAILED) all such cursors for the time being. > > > I don't see a lot of difference between marking the portal FAILED and > destroying it (maybe I'm looking at the wrong code). So I just took the > simpler approach; patch attached. I assume that you can CLOSE a failed portal, but you can't CLOSE a destroyed portal (because it's not there any more)? This is important for the JDBC driver as it creates portals internally, does fetches as the application code demands, then closes the portal at some point after the application is done with it. Having the close fail because of an intervening savepoint rollback isn't great -- the error will cause an unexpected failure of the current transaction. This can happen even if the application doesn't try to use the portal (via ResultSet) after the savepoint rollback at all. It wouldn't be so bad if the driver could track savepoint boundaries, but the current protocol doesn't make that easy.. -O
Oliver Jowett wrote: > Having the close fail > because of an intervening savepoint rollback isn't great -- the error > will cause an unexpected failure of the current transaction. Never mind -- I just reread the protocol docs, and it's safe to close a nonexistant portal. Did this previously issue a warning, or something similar? I'm sure I had seen problems in this area in the past.. -O
On Fri, Jan 28, 2005 at 12:27:05PM +1300, Oliver Jowett wrote: > Oliver Jowett wrote: > >Having the close fail > >because of an intervening savepoint rollback isn't great -- the error > >will cause an unexpected failure of the current transaction. > > Never mind -- I just reread the protocol docs, and it's safe to close a > nonexistant portal. Did this previously issue a warning, or something > similar? I'm sure I had seen problems in this area in the past.. Not sure ... however I remember somebody complained because the SQL level interface differed from the protocol level one on this point. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "In a specialized industrial society, it would be a disaster to have kids running around loose." (Paul Graham)