Thread: BUG #5566: High levels of savepoint nesting trigger stack overflow in AssignTransactionId

The following bug has been logged online:

Bug reference:      5566
Logged by:          Hans van Kranenburg
Email address:      hans.van.kranenburg@mendix.com
PostgreSQL version: 8.3.11
Operating system:   Debian GNU/Linux 5.0
Description:        High levels of savepoint nesting trigger stack overflow
in AssignTransactionId
Details:

Version: PostgreSQL 8.3.11 on x86_64-pc-linux-gnu, compiled by GCC
gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2

When issuing an update statement in a transaction with ~30800 levels of
savepoint nesting, (which is insane, but possible), postgresql segfaults due
to a stack overflow in the AssignTransactionId function, which recursively
assign transaction ids to parent transactions.

After noticing the segfaults, we tried to reproduce, dump core and got the
following stacktrace (name/IP address munged):

Core was generated by `postgres: dummy dummy 82.94.x.y(50442) UPDATE
                   '.
Program terminated with signal 11, Segmentation fault.
[New process 9454]
#0  0x0000000000483e5a in ?? ()
(gdb) bt
#0  0x0000000000483e5a in ?? ()
#1  0x0000000000483f4d in ?? ()
#2  0x0000000000483f4d in ?? ()
[...]
#30802 0x0000000000483f4d in ?? ()
#30803 0x0000000000483f4d in ?? ()
#30804 0x0000000000483f4d in ?? ()
#30805 0x0000000000483fa8 in GetCurrentTransactionId ()
#30806 0x000000000046ccc9 in heap_update ()
#30807 0x0000000000534a90 in ExecutorRun ()
#30808 0x00000000005d7ce2 in ?? ()
#30809 0x00000000005d7f38 in ?? ()
#30810 0x00000000005d883b in PortalRun ()
#30811 0x00000000005d54c1 in PostgresMain ()
#30812 0x00000000005a7098 in ?? ()
#30813 0x00000000005a7f60 in PostmasterMain ()
#30814 0x000000000055adae in main ()

This UPDATE statement triggered the AssignTransactionId function. In the
transaction, only SELECT statements were issued up to this point. The
creation of this amount of savepoints was a bug in non-postgresql software
that's using the database. Nonetheless a database user should not be able to
segfault the entire server process by behaving like this.

Greetings,
Hans van Kranenburg
On Monday 19 July 2010 17:26:25 Hans van Kranenburg wrote:
> When issuing an update statement in a transaction with ~30800 levels of
> savepoint nesting, (which is insane, but possible), postgresql segfaults
> due to a stack overflow in the AssignTransactionId function, which
> recursively assign transaction ids to parent transactions.
It seems easy enough to throw a check_stack_depth() in there - survives make
check here.

It would be nice to check this earlier on though - or simply impose a
artificial limit of nested transactions. I severely doubt that there are non-
bug situations with a nesting of above 1k (so maybe set the limit to 3k).

Thats independent from checking stack depth there though - it sounds possible
to get there after an already relatively deep stack usage (deeply nested
functions or such).

Andres
On Monday 19 July 2010 17:58:06 Andres Freund wrote:
> On Monday 19 July 2010 17:26:25 Hans van Kranenburg wrote:
> > When issuing an update statement in a transaction with ~30800 levels of
> > savepoint nesting, (which is insane, but possible), postgresql segfaults
> > due to a stack overflow in the AssignTransactionId function, which
> > recursively assign transaction ids to parent transactions.
>
> It seems easy enough to throw a check_stack_depth() in there - survives
> make check here.
>
> It would be nice to check this earlier on though - or simply impose a
> artificial limit of nested transactions. I severely doubt that there are
> non- bug situations with a nesting of above 1k (so maybe set the limit to
> 3k).
>
> Thats independent from checking stack depth there though - it sounds
> possible to get there after an already relatively deep stack usage (deeply
> nested functions or such).
A patch doing both is attached. No idea whether 3k is a good limit.

A small testprogramm is also attached.

Andres
Excerpts from Andres Freund's message of lun jul 19 11:58:06 -0400 2010:
> On Monday 19 July 2010 17:26:25 Hans van Kranenburg wrote:
> > When issuing an update statement in a transaction with ~30800 levels of
> > savepoint nesting, (which is insane, but possible), postgresql segfaults
> > due to a stack overflow in the AssignTransactionId function, which
> > recursively assign transaction ids to parent transactions.
> It seems easy enough to throw a check_stack_depth() in there - survives make
> check here.

I wonder if it would work to deal with the problem non-recursively
instead.  We don't impose subxact depth restrictions elsewhere, why
start now?
Hi,

On Monday 19 July 2010 19:57:13 Alvaro Herrera wrote:
> Excerpts from Andres Freund's message of lun jul 19 11:58:06 -0400 2010:
> > On Monday 19 July 2010 17:26:25 Hans van Kranenburg wrote:
> > > When issuing an update statement in a transaction with ~30800 levels of
> > > savepoint nesting, (which is insane, but possible), postgresql
> > > segfaults due to a stack overflow in the AssignTransactionId function,
> > > which recursively assign transaction ids to parent transactions.
> >
> > It seems easy enough to throw a check_stack_depth() in there - survives
> > make check here.
>
> I wonder if it would work to deal with the problem non-recursively
> instead.  We don't impose subxact depth restrictions elsewhere, why
> start now?
It looks trivial enough, but whats the point?

Andres
On 19/07/10 20:58, Andres Freund wrote:
> On Monday 19 July 2010 19:57:13 Alvaro Herrera wrote:
>> Excerpts from Andres Freund's message of lun jul 19 11:58:06 -0400 2010:
>>> On Monday 19 July 2010 17:26:25 Hans van Kranenburg wrote:
>>>> When issuing an update statement in a transaction with ~30800 levels of
>>>> savepoint nesting, (which is insane, but possible), postgresql
>>>> segfaults due to a stack overflow in the AssignTransactionId function,
>>>> which recursively assign transaction ids to parent transactions.
>>>
>>> It seems easy enough to throw a check_stack_depth() in there - survives
>>> make check here.
>>
>> I wonder if it would work to deal with the problem non-recursively
>> instead.  We don't impose subxact depth restrictions elsewhere, why
>> start now?
> It looks trivial enough, but whats the point?

To support more than <insert abitrary limit here> subtransactions,
obviously.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Excerpts from Andres Freund's message of lun jul 19 13:58:30 -0400 2010:
> Hi,
>
> On Monday 19 July 2010 19:57:13 Alvaro Herrera wrote:
> > Excerpts from Andres Freund's message of lun jul 19 11:58:06 -0400 2010:
> > > On Monday 19 July 2010 17:26:25 Hans van Kranenburg wrote:
> > > > When issuing an update statement in a transaction with ~30800 levels of
> > > > savepoint nesting, (which is insane, but possible), postgresql
> > > > segfaults due to a stack overflow in the AssignTransactionId function,
> > > > which recursively assign transaction ids to parent transactions.
> > >
> > > It seems easy enough to throw a check_stack_depth() in there - survives
> > > make check here.
> >
> > I wonder if it would work to deal with the problem non-recursively
> > instead.  We don't impose subxact depth restrictions elsewhere, why
> > start now?
> It looks trivial enough, but whats the point?

Avoid imposing unnecessary restrictions.
On Monday 19 July 2010 20:19:35 Heikki Linnakangas wrote:
> On 19/07/10 20:58, Andres Freund wrote:
> > On Monday 19 July 2010 19:57:13 Alvaro Herrera wrote:
> >> Excerpts from Andres Freund's message of lun jul 19 11:58:06 -0400 2010:
> >>> On Monday 19 July 2010 17:26:25 Hans van Kranenburg wrote:
> >>>> When issuing an update statement in a transaction with ~30800 levels
> >>>> of savepoint nesting, (which is insane, but possible), postgresql
> >>>> segfaults due to a stack overflow in the AssignTransactionId
> >>>> function, which recursively assign transaction ids to parent
> >>>> transactions.
> >>>
> >>> It seems easy enough to throw a check_stack_depth() in there - survives
> >>> make check here.
> >>
> >> I wonder if it would work to deal with the problem non-recursively
> >> instead.  We don't impose subxact depth restrictions elsewhere, why
> >> start now?
> >
> > It looks trivial enough, but whats the point?
>
> To support more than <insert abitrary limit here> subtransactions,
> obviously.
Well. I got that far. But why is that something worthy of support?
For one I have a hard time imaging a sensible use case, for another doing
anything in that deeply nested transactions seems to gets really slow (the
chain of transactions gets walked at some places for one thing, there seem to
be others).

If want I can write a patch for that as well, seems to be trivial enough.

Andres
On 19/07/10 21:32, Andres Freund wrote:
> On Monday 19 July 2010 20:19:35 Heikki Linnakangas wrote:
>> On 19/07/10 20:58, Andres Freund wrote:
>>> On Monday 19 July 2010 19:57:13 Alvaro Herrera wrote:
>>>> Excerpts from Andres Freund's message of lun jul 19 11:58:06 -0400 2010:
>>>>> On Monday 19 July 2010 17:26:25 Hans van Kranenburg wrote:
>>>>>> When issuing an update statement in a transaction with ~30800 levels
>>>>>> of savepoint nesting, (which is insane, but possible), postgresql
>>>>>> segfaults due to a stack overflow in the AssignTransactionId
>>>>>> function, which recursively assign transaction ids to parent
>>>>>> transactions.
>>>>>
>>>>> It seems easy enough to throw a check_stack_depth() in there - survives
>>>>> make check here.
>>>>
>>>> I wonder if it would work to deal with the problem non-recursively
>>>> instead.  We don't impose subxact depth restrictions elsewhere, why
>>>> start now?
>>>
>>> It looks trivial enough, but whats the point?
>>
>> To support more than<insert abitrary limit here>  subtransactions,
>> obviously.
> Well. I got that far. But why is that something worthy of support?

Because it's not really much harder than putting in the limit. Besides,
if you put in a limit of 3000, someone with a smaller stack might still
run out of stack space.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
On Monday 19 July 2010 21:03:25 Heikki Linnakangas wrote:
> On 19/07/10 21:32, Andres Freund wrote:
> > On Monday 19 July 2010 20:19:35 Heikki Linnakangas wrote:
> >> On 19/07/10 20:58, Andres Freund wrote:
> >>> On Monday 19 July 2010 19:57:13 Alvaro Herrera wrote:
> >>>> Excerpts from Andres Freund's message of lun jul 19 11:58:06 -0400
2010:
> >>>>> On Monday 19 July 2010 17:26:25 Hans van Kranenburg wrote:
> >>>>>> When issuing an update statement in a transaction with ~30800 levels
> >>>>>> of savepoint nesting, (which is insane, but possible), postgresql
> >>>>>> segfaults due to a stack overflow in the AssignTransactionId
> >>>>>> function, which recursively assign transaction ids to parent
> >>>>>> transactions.
> >>>>>
> >>>>> It seems easy enough to throw a check_stack_depth() in there -
> >>>>> survives make check here.
> >>>>
> >>>> I wonder if it would work to deal with the problem non-recursively
> >>>> instead.  We don't impose subxact depth restrictions elsewhere, why
> >>>> start now?
> >>>
> >>> It looks trivial enough, but whats the point?
> >>
> >> To support more than<insert abitrary limit here>  subtransactions,
> >> obviously.
> >
> > Well. I got that far. But why is that something worthy of support?
>
> Because it's not really much harder than putting in the limit.
The difference is that you then get errors like:

WARNING:  53200: out of shared memory
LOCATION:  ShmemAlloc, shmem.c:190
ERROR:  53200: out of shared memory
HINT:  You might need to increase max_locks_per_transaction.
LOCATION:  LockAcquireExtended, lock.c:680
STATEMENT:  INSERT INTO tstack VALUES(1)

After which pg takes longer to cleanup the transaction  than I am willing to
wait (ok ok, thats at an obscene 100k nesting level).

At 50k a single commit takes some minutes as well. (no cassert, -O0)

All that seems pretty annoying to debug...


> Besides, if you put in a limit of 3000, someone with a smaller stack might
> still run out of stack space.
I had left that check there.

Will send a patch, have it locally, just need to verify it.

Andres
On Monday 19 July 2010 20:32:49 Andres Freund wrote:
> On Monday 19 July 2010 20:19:35 Heikki Linnakangas wrote:
> > On 19/07/10 20:58, Andres Freund wrote:
> > > On Monday 19 July 2010 19:57:13 Alvaro Herrera wrote:
> > >> Excerpts from Andres Freund's message of lun jul 19 11:58:06 -0400
2010:
> > >>> It seems easy enough to throw a check_stack_depth() in there -
> > >>> survives make check here.
> > >>
> > >> I wonder if it would work to deal with the problem non-recursively
> > >> instead.  We don't impose subxact depth restrictions elsewhere, why
> > >> start now?
> > >
> > > It looks trivial enough, but whats the point?
> >
> > To support more than <insert abitrary limit here> subtransactions,
> > obviously.
>
> Well. I got that far. But why is that something worthy of support?
> For one I have a hard time imaging a sensible use case, for another doing
> anything in that deeply nested transactions seems to gets really slow (the
> chain of transactions gets walked at some places for one thing, there seem
> to be others).
>
> If want I can write a patch for that as well, seems to be trivial enough.
Updated patch attached.


Andres
On Mon, Jul 19, 2010 at 10:35:42PM +0200, Andres Freund wrote:
> On Monday 19 July 2010 20:32:49 Andres Freund wrote:
> > On Monday 19 July 2010 20:19:35 Heikki Linnakangas wrote:
> > > On 19/07/10 20:58, Andres Freund wrote:
> > > > On Monday 19 July 2010 19:57:13 Alvaro Herrera wrote:
> > > >> Excerpts from Andres Freund's message of lun jul 19 11:58:06 -0400
> 2010:
> > > >>> It seems easy enough to throw a check_stack_depth() in there -
> > > >>> survives make check here.
> > > >>
> > > >> I wonder if it would work to deal with the problem non-recursively
> > > >> instead.  We don't impose subxact depth restrictions elsewhere, why
> > > >> start now?
> > > >
> > > > It looks trivial enough, but whats the point?
> > >
> > > To support more than <insert abitrary limit here> subtransactions,
> > > obviously.
> >
> > Well. I got that far. But why is that something worthy of support?
> > For one I have a hard time imaging a sensible use case, for another doing
> > anything in that deeply nested transactions seems to gets really slow (the
> > chain of transactions gets walked at some places for one thing, there seem
> > to be others).
> >
> > If want I can write a patch for that as well, seems to be trivial enough.
> Updated patch attached.
hm. I dont want to push - just to ask: Is any comitter looking either
at the patch or the bug?

Greetings,

Andres
On Wed, 2010-07-21 at 21:20 +0200, Andres Freund wrote:
> > >
> > > If want I can write a patch for that as well, seems to be trivial enough.
> > Updated patch attached.
> hm. I dont want to push - just to ask: Is any comitter looking either
> at the patch or the bug?

Patch looks OK, though IMHO wouldn't be looking to backpatch this.

I'll look at this in more detail next week, unless others wish to.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Jul 19, 2010 at 4:35 PM, Andres Freund <andres@anarazel.de> wrote:
>> Well. I got that far. But why is that something worthy of support?
>> For one I have a hard time imaging a sensible use case, for another doing
>> anything in that deeply nested transactions seems to gets really slow (the
>> chain of transactions gets walked at some places for one thing, there seem
>> to be others).
>>
>> If want I can write a patch for that as well, seems to be trivial enough.
> Updated patch attached.

Considering that this is a crasher, I think we'll need to back-patch
this.  The proposed patch applies only as far back as 8.3, due to the
lazy XID assignment changes in that version, but it looks like the bug
exists all the way back to 8.0.  It looks like only minor adjustments
are required for the older branches, though.  7.4 is not affected, as
it does not have subtransactions.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Thu, Jul 22, 2010 at 5:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jul 19, 2010 at 4:35 PM, Andres Freund <andres@anarazel.de> wrote:
>>> Well. I got that far. But why is that something worthy of support?
>>> For one I have a hard time imaging a sensible use case, for another doing
>>> anything in that deeply nested transactions seems to gets really slow (the
>>> chain of transactions gets walked at some places for one thing, there seem
>>> to be others).
>>>
>>> If want I can write a patch for that as well, seems to be trivial enough.
>> Updated patch attached.
>
> Considering that this is a crasher, I think we'll need to back-patch
> this.  The proposed patch applies only as far back as 8.3, due to the
> lazy XID assignment changes in that version, but it looks like the bug
> exists all the way back to 8.0.  It looks like only minor adjustments
> are required for the older branches, though.  7.4 is not affected, as
> it does not have subtransactions.

Can someone provide a reproducible test case for this bug?  I wasn't
easily able to reproduce it.

Attached please find a cleaned-up version of the patch for CVS HEAD.
I am having a bit of trouble compiling the 8.2 patch I hacked up, and
I believe that's because the git respository is borked.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment
On Thu, Jul 22, 2010 at 06:29:47PM -0400, Robert Haas wrote:
> On Thu, Jul 22, 2010 at 5:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Jul 19, 2010 at 4:35 PM, Andres Freund <andres@anarazel.de> wrote:
> >>> Well. I got that far. But why is that something worthy of support?
> >>> For one I have a hard time imaging a sensible use case, for another doing
> >>> anything in that deeply nested transactions seems to gets really slow (the
> >>> chain of transactions gets walked at some places for one thing, there seem
> >>> to be others).
> >>>
> >>> If want I can write a patch for that as well, seems to be trivial enough.
> >> Updated patch attached.
> >
> > Considering that this is a crasher, I think we'll need to back-patch
> > this.  The proposed patch applies only as far back as 8.3, due to the
> > lazy XID assignment changes in that version, but it looks like the bug
> > exists all the way back to 8.0.  It looks like only minor adjustments
> > are required for the older branches, though.  7.4 is not affected, as
> > it does not have subtransactions.
> Can someone provide a reproducible test case for this bug?  I wasn't
> easily able to reproduce it.
201007191950.13856.andres@anarazel.de contains a test script. You need
to actually do a action causing an xid to get calculated
(heap_(insert|update|delete) basically) after youre deeply stacked.

> Attached please find a cleaned-up version of the patch for CVS HEAD.
> I am having a bit of trouble compiling the 8.2 patch I hacked up, and
> I believe that's because the git respository is borked.
Yes, the "old" git repo is borked. Either use Andrew's from github or
the one from magnus...

Andres
On Thu, Jul 22, 2010 at 6:49 PM, Andres Freund <andres@anarazel.de> wrote:
>> Can someone provide a reproducible test case for this bug? =A0I wasn't
>> easily able to reproduce it.
> 201007191950.13856.andres@anarazel.de contains a test script. You need
> to actually do a action causing an xid to get calculated
> (heap_(insert|update|delete) basically) after youre deeply stacked.

Thanks, perfect.  Committed and back-patched to 8.0.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company