Thread: bug w/ cursors and savepoints

bug w/ cursors and savepoints

From
Neil Conway
Date:
Someone at Fujitsu pointed out the following bug in 8.0:

begin;
savepoint x;
create table abc (a int);
insert into abc values (5);
declare foo cursor for select * from abc;
rollback to x;
fetch from foo; -- hits an Assert()
commit;

The stacktrace is:

#2  0x0826367b in ExceptionalCondition (conditionName=0x8316544
"!(((bool)((relation)->rd_refcnt == 0)))",   errorType=0x8316004 "FailedAssertion", fileName=0x8315f08
"/home/neilc/pgsql/src/backend/utils/cache/relcache.c", lineNumber=2118)   at
/home/neilc/pgsql/src/backend/utils/error/assert.c:51
#3  0x0825cec0 in AtEOSubXact_RelationCache (isCommit=0 '\0', mySubid=2,
parentSubid=1)   at /home/neilc/pgsql/src/backend/utils/cache/relcache.c:2118
#4  0x080ade30 in AbortSubTransaction ()
at /home/neilc/pgsql/src/backend/access/transam/xact.c:3407
#5  0x080ac404 in CommitTransactionCommand ()
at /home/neilc/pgsql/src/backend/access/transam/xact.c:1982
#6  0x081de4ba in finish_xact_command ()
at /home/neilc/pgsql/src/backend/tcop/postgres.c:1843
#7  0x081dd102 in exec_simple_query (query_string=0x83b6ad4 "rollback to
x;") at /home/neilc/pgsql/src/backend/tcop/postgres.c:950

So what's happening is that the cursor still holds a reference to the
newly-created table, so we can't just blow it away. I don't know the
subtransaction code too well, so I'm not sure of the right fix.
Comments?

-Neil




Re: bug w/ cursors and savepoints

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Someone at Fujitsu pointed out the following bug in 8.0:
> begin;
> savepoint x;
> create table abc (a int);
> insert into abc values (5);
> declare foo cursor for select * from abc;
> rollback to x;
> fetch from foo; -- hits an Assert()

Offhand I'd say this should draw a "no such cursor as foo" error.
I'm too tired to look into why foo still exists after the rollback...
        regards, tom lane


Re: bug w/ cursors and savepoints

From
Neil Conway
Date:
On Tue, 2005-01-25 at 02:40 -0500, Tom Lane wrote:
> Offhand I'd say this should draw a "no such cursor as foo" error.
> I'm too tired to look into why foo still exists after the rollback...

I'm confused; I wasn't involved in the design discussions about portals
and subtransactions this summer, but my understanding is that making
portals non-transactional was the conclusion. Shouldn't that imply that
a DECLARE in an aborted subtransaction should persist? Certainly, it
seems AtSubAbort_Portals() makes sure that READY portals created in an
aborted subtxn are adopted by the parent transaction.

(It might make sense to persist the cursor to the parent transaction,
unless the aborted transaction created a database object the cursor
depends upon, a la [1] -- but in that case, IMHO we ought to give a
different error message than "no such cursor".)

If someone can shed some light on what the spec for this behavior ought
to be, I'd be happy to fix the bug.

-Neil

[1] http://archives.postgresql.org/pgsql-hackers/2004-07/msg00349.php



Re: bug w/ cursors and savepoints

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Tue, 2005-01-25 at 02:40 -0500, Tom Lane wrote:
>> Offhand I'd say this should draw a "no such cursor as foo" error.
>> I'm too tired to look into why foo still exists after the rollback...

> I'm confused; I wasn't involved in the design discussions about portals
> and subtransactions this summer, but my understanding is that making
> portals non-transactional was the conclusion. Shouldn't that imply that
> a DECLARE in an aborted subtransaction should persist?

I don't recall the discussions from last summer in detail, but it can't
possibly be rational to allow a cursor created in a failed
subtransaction to persist beyond that subtransaction... your example
in which the cursor uses tables that no longer exist is a fairly
egregious example of why not, but there are others.
        regards, tom lane


Re: bug w/ cursors and savepoints

From
Alvaro Herrera
Date:
On Tue, Jan 25, 2005 at 02:40:51AM -0500, Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > Someone at Fujitsu pointed out the following bug in 8.0:
> > begin;
> > savepoint x;
> > create table abc (a int);
> > insert into abc values (5);
> > declare foo cursor for select * from abc;
> > rollback to x;
> > fetch from foo; -- hits an Assert()
> 
> Offhand I'd say this should draw a "no such cursor as foo" error.
> I'm too tired to look into why foo still exists after the rollback...

At this point, gdb says that the portal is in PORTAL_READY state.  The
code says to keep it open and reassign it to the parent subxact.  I
don't remember what the rationale for this was ... I'll review the
discussion about this.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)


Re: bug w/ cursors and savepoints

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Tue, Jan 25, 2005 at 02:40:51AM -0500, Tom Lane wrote:
>> Offhand I'd say this should draw a "no such cursor as foo" error.
>> I'm too tired to look into why foo still exists after the rollback...

> At this point, gdb says that the portal is in PORTAL_READY state.  The
> code says to keep it open and reassign it to the parent subxact.  I
> don't remember what the rationale for this was ... I'll review the
> discussion about this.

IIRC, we had agreed that in a sequence like
BEGIN;    DECLARE c CURSOR ...;    SAVEPOINT x;        FETCH FROM c;    ROLLBACK TO x;    FETCH FROM c;    ...

the second fetch should get the second cursor row, ie, the rollback does
not undo the cursor state change caused by the first fetch.  This was
frankly driven mostly by implementation considerations, ie, we do not
have any mechanism that would support undoing changes of Executor state.
But I don't think that we really thought about the case of rolling back
the *creation* of a cursor.  Neil's example seems to prove that we need
to do that.  Even aside from the possibility that the database objects
won't exist any more, the locks taken out during plan startup would get
released by the rollback.  So I think the rule ought to be that cursors
created inside a failed subtransaction go away.

I have some recollection that we did that initially and ran into the
problem that the portal holding the ROLLBACK command itself goes away
too soon :-(.  So a bit of care will be needed here.
        regards, tom lane


Re: bug w/ cursors and savepoints

From
Tom Lane
Date:
I wrote:
> So I think the rule ought to be that cursors
> created inside a failed subtransaction go away.

Other bits of recollection bubbling up: I think that one reason we made
portals not go away instantly on error is that the JDBC guys objected,
feeling that it made for weird special cases at the protocol level.

So the right fix might involve putting the portal into PORTAL_FAILED
state rather than just zapping it completely.
        regards, tom lane


Re: bug w/ cursors and savepoints

From
Alvaro Herrera
Date:
On Tue, Jan 25, 2005 at 12:32:57PM -0500, Tom Lane wrote:

> So the right fix might involve putting the portal into PORTAL_FAILED
> state rather than just zapping it completely.

Strangely, the code comes up simpler after the fix.  Patch attached.
Regression test pass.  Additionaly I tried both cases mentioned in this
thread; maybe it's worthy to add tests for them too.

alvherre=# begin;
BEGIN
alvherre=# savepoint x;
SAVEPOINT
alvherre=# create table abc (a int);
CREATE TABLE
alvherre=# insert into abc values (5);
INSERT 33616 1
alvherre=# declare foo cursor for select * from abc;
DECLARE CURSOR
alvherre=# rollback to x;
ROLLBACK
alvherre=# fetch from foo; -- hits an Assert()
ERROR:  no existe el cursor «foo»
alvherre=# commit;
ROLLBACK
alvherre=# begin;
BEGIN
alvherre=# declare c cursor for select 1 union all select 2;
DECLARE CURSOR
alvherre=# savepoint x;
SAVEPOINT
alvherre=# fetch from c;
 ?column?
----------
        1
(1 fila)

alvherre=# rollback to x;
ROLLBACK
alvherre=# fetch from c;
 ?column?
----------
        2
(1 fila)

alvherre=# commit;
COMMIT

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"The ability to monopolize a planet is insignificant
next to the power of the source"

Attachment

Re: bug w/ cursors and savepoints

From
Alvaro Herrera
Date:
On Tue, Jan 25, 2005 at 03:14:07PM -0300, Alvaro Herrera wrote:
> On Tue, Jan 25, 2005 at 12:32:57PM -0500, Tom Lane wrote:
> 
> > So the right fix might involve putting the portal into PORTAL_FAILED
> > state rather than just zapping it completely.
> 
> Strangely, the code comes up simpler after the fix.  Patch attached.

I forgot to mention that I looked at AtAbort_Portals and that while it
has the same test to change the state of an active portal, it's probably
safe to assume that no cursor will be allowed to run from that point on.

Now that I think of it, maybe it's a good idea to add a comment saying
why is that the case.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"When the proper man does nothing (wu-wei),
his thought is felt ten thousand miles." (Lao Tse)


Re: bug w/ cursors and savepoints

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> !         if (portal->status == PORTAL_ACTIVE)
>               portal->status = PORTAL_FAILED;

> !         if (portal->status == PORTAL_ACTIVE || portal->status == PORTAL_READY)
>               portal->status = PORTAL_FAILED;

I don't think you actually need that change, since you're going to
unconditionally close the portal below.  The case for PORTAL_ACTIVE
is just there to dodge the sanity check in PortalDrop.

The routine's comments need a bit of work too.  Otherwise it seems OK.
Neil or anyone else --- see an issue here?
        regards, tom lane


Re: bug w/ cursors and savepoints

From
Neil Conway
Date:
Tom Lane wrote:
> The routine's comments need a bit of work too.  Otherwise it seems OK.
> Neil or anyone else --- see an issue here?

The policy will now be: cursor creation is transaction, but cursor state 
modifications (FETCH) are non-transactional -- right? I wonder if it 
wouldn't be more consistent to make cursor deletion (CLOSE) 
transactional as well -- so that a CLOSE in an aborted subtransaction 
would not actually destroy the cursor.

Other than that, I think there ought to be some user-level documentation 
for how cursors and savepoints interact, and some regression tests for 
this behavior, but I'm happy to add that myself if no one beats me to it.

-Neil


Re: bug w/ cursors and savepoints

From
Alvaro Herrera
Date:
On Wed, Jan 26, 2005 at 03:33:07PM +1100, Neil Conway wrote:
> Tom Lane wrote:
> >The routine's comments need a bit of work too.  Otherwise it seems OK.
> >Neil or anyone else --- see an issue here?
> 
> The policy will now be: cursor creation is transaction, but cursor state 
> modifications (FETCH) are non-transactional -- right? I wonder if it 
> wouldn't be more consistent to make cursor deletion (CLOSE) 
> transactional as well -- so that a CLOSE in an aborted subtransaction 
> would not actually destroy the cursor.

Hmm ... not sure how hard that is.  We left a lot of details for 8.1
though, like trying to save the state of the executor related to the
cursor so that FETCH is transactional too.

> Other than that, I think there ought to be some user-level documentation 
> for how cursors and savepoints interact,

There is some detail (as of my patch, outdated) in
http://developer.postgresql.org/docs/postgres/sql-rollback-to.html
If you have a suggestion on where else it should go I'm all ears ...

> and some regression tests for this behavior, but I'm happy to add that
> myself if no one beats me to it.

Please do.

I'll post a corrected patch ASAP, including the doc change.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"La espina, desde que nace, ya pincha" (Proverbio africano)


Re: bug w/ cursors and savepoints

From
Alvaro Herrera
Date:
On Tue, Jan 25, 2005 at 02:06:24PM -0300, Alvaro Herrera wrote:

Hackers,

> At this point, gdb says that the portal is in PORTAL_READY state.  The
> code says to keep it open and reassign it to the parent subxact.  I
> don't remember what the rationale for this was ... I'll review the
> discussion about this.

Our conclusion at the time was that cursors created inside failed
subtransactions should remain open.  See the proposal and followup
discussion starting here:

http://archives.postgresql.org/pgsql-hackers/2004-07/msg00700.php

If we want to keep this behavior then we should not close all READY
cursors as per my previous patch.  We should keep them open, and only
mark FAILED those cursors that are related to state reversed by the
aborting subtransaction.

I don't see any way to do this cleanly, until we have some relcache-
dependency checking on Portals (maybe anything else?).  Not sure what we
can do about it right now, with 8.0.1 release tomorrow.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Ciencias políticas es la ciencia de entender por quélos políticos actúan como lo hacen"  (netfunny.com)


Re: bug w/ cursors and savepoints

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Our conclusion at the time was that cursors created inside failed
> subtransactions should remain open.  See the proposal and followup
> discussion starting here:

> http://archives.postgresql.org/pgsql-hackers/2004-07/msg00700.php

> If we want to keep this behavior then we should not close all READY
> cursors as per my previous patch.  We should keep them open, and only
> mark FAILED those cursors that are related to state reversed by the
> aborting subtransaction.

> I don't see any way to do this cleanly, until we have some relcache-
> dependency checking on Portals (maybe anything else?).  Not sure what we
> can do about it right now, with 8.0.1 release tomorrow.

I don't think we have a lot of choices: we have to destroy (or at least
mark FAILED) all such cursors for the time being.  The whole question of
cursor transaction semantics could stand to be revisited in 8.1, but we
can't ship the system with such a trivial crashing bug.

Note that dependency tracking would not in itself be enough to solve the
problem, since the question is not merely what objects the cursor
depends on, but whether their definitions were changed in the failed
subtransaction.  Talk about messy :-(
        regards, tom lane


Re: bug w/ cursors and savepoints

From
Alvaro Herrera
Date:
On Wed, Jan 26, 2005 at 05:10:09PM -0500, Tom Lane wrote:

> I don't think we have a lot of choices: we have to destroy (or at least
> mark FAILED) all such cursors for the time being.

I don't see a lot of difference between marking the portal FAILED and
destroying it (maybe I'm looking at the wrong code).  So I just took the
simpler approach; patch attached.

> Note that dependency tracking would not in itself be enough to solve the
> problem, since the question is not merely what objects the cursor
> depends on, but whether their definitions were changed in the failed
> subtransaction.  Talk about messy :-(

Maybe we can use the shared cache invalidation mechanism for this.  When
a message is received that affects a relation marked as referenced by a
portal, mark the portal obsolete.  I don't recall the details of
shared-inval right now, I may be missing something (like the time at
which messages are sent.  But I believe we send a message to our own
backend, no?)

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Always assume the user will do much worse than the stupidest thing
you can imagine."                                (Julien PUYDT)

Attachment

Re: bug w/ cursors and savepoints

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Wed, Jan 26, 2005 at 05:10:09PM -0500, Tom Lane wrote:
>> I don't think we have a lot of choices: we have to destroy (or at least
>> mark FAILED) all such cursors for the time being.

> I don't see a lot of difference between marking the portal FAILED and
> destroying it (maybe I'm looking at the wrong code).  So I just took the
> simpler approach; patch attached.

Mainly just that it'll be easier to improve the code later if we don't
have to put back something we removed.  I'll tweak the patch to do that.
        regards, tom lane


Re: bug w/ cursors and savepoints

From
Neil Conway
Date:
On Wed, 2005-01-26 at 12:02 -0300, Alvaro Herrera wrote:
> > The policy will now be: cursor creation is transaction, but cursor state 
> > modifications (FETCH) are non-transactional -- right? I wonder if it 
> > wouldn't be more consistent to make cursor deletion (CLOSE) 
> > transactional as well -- so that a CLOSE in an aborted subtransaction 
> > would not actually destroy the cursor.
> 
> Hmm ... not sure how hard that is.

Would it work to record the sub XID of the deleting subtxn on CLOSE, and
then consider whether to "really" do the deletion when the subtxn
commits/aborts?

-Neil




Re: bug w/ cursors and savepoints

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Wed, 2005-01-26 at 12:02 -0300, Alvaro Herrera wrote:
>> Hmm ... not sure how hard that is.

> Would it work to record the sub XID of the deleting subtxn on CLOSE, and
> then consider whether to "really" do the deletion when the subtxn
> commits/aborts?

It'd take more than that.  Consider
BEGIN;DECLARE c CURSOR ...;SAVEPOINT x;    CLOSE c;    DECLARE c CURSOR ...;    -- draws duplicate-cursor error

You'd need to do something to "hide" the deleted cursor for the
remainder of its subtransaction.
        regards, tom lane


Re: bug w/ cursors and savepoints

From
Neil Conway
Date:
On Wed, 2005-01-26 at 12:02 -0300, Alvaro Herrera wrote:
> > and some regression tests for this behavior, but I'm happy to add that
> > myself if no one beats me to it.
>
> Please do.

Attached is a patch adding regression tests for this change -- I've
already applied it to HEAD.

-Neil


Attachment

Re: bug w/ cursors and savepoints

From
Oliver Jowett
Date:
Alvaro Herrera wrote:
> On Wed, Jan 26, 2005 at 05:10:09PM -0500, Tom Lane wrote:
> 
> 
>>I don't think we have a lot of choices: we have to destroy (or at least
>>mark FAILED) all such cursors for the time being.
> 
> 
> I don't see a lot of difference between marking the portal FAILED and
> destroying it (maybe I'm looking at the wrong code).  So I just took the
> simpler approach; patch attached.

I assume that you can CLOSE a failed portal, but you can't CLOSE a 
destroyed portal (because it's not there any more)?

This is important for the JDBC driver as it creates portals internally, 
does fetches as the application code demands, then closes the portal at 
some point after the application is done with it. Having the close fail 
because of an intervening savepoint rollback isn't great -- the error 
will cause an unexpected failure of the current transaction. This can 
happen even if the application doesn't try to use the portal (via 
ResultSet) after the savepoint rollback at all.

It wouldn't be so bad if the driver could track savepoint boundaries, 
but the current protocol doesn't make that easy..

-O


Re: bug w/ cursors and savepoints

From
Oliver Jowett
Date:
Oliver Jowett wrote:
> Having the close fail 
> because of an intervening savepoint rollback isn't great -- the error 
> will cause an unexpected failure of the current transaction.

Never mind -- I just reread the protocol docs, and it's safe to close a 
nonexistant portal. Did this previously issue a warning, or something 
similar? I'm sure I had seen problems in this area in the past..

-O


Re: bug w/ cursors and savepoints

From
Alvaro Herrera
Date:
On Fri, Jan 28, 2005 at 12:27:05PM +1300, Oliver Jowett wrote:
> Oliver Jowett wrote:
> >Having the close fail 
> >because of an intervening savepoint rollback isn't great -- the error 
> >will cause an unexpected failure of the current transaction.
> 
> Never mind -- I just reread the protocol docs, and it's safe to close a 
> nonexistant portal. Did this previously issue a warning, or something 
> similar? I'm sure I had seen problems in this area in the past..

Not sure ... however I remember somebody complained because the SQL
level interface differed from the protocol level one on this point.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"In a specialized industrial society, it would be a disaster
to have kids running around loose." (Paul Graham)