Thread: Is there a memory leak in commit 8561e48?

Is there a memory leak in commit 8561e48?

From
"jian.long@i-soft.com.cn"
Date:
in commit 8561e48, _SPI_stack alloc from TopMemoryContext. But AtEOXact_SPI just set _SPI_stack = NULL. Is this a memory leak?



Re: Is there a memory leak in commit 8561e48?

From
Michael Paquier
Date:
On Thu, Apr 19, 2018 at 11:38:09AM +0800, jian.long@i-soft.com.cn wrote:
> in commit 8561e48, _SPI_stack alloc from TopMemoryContext. But
> AtEOXact_SPI just set _SPI_stack = NULL. Is this a memory leak?

You are right.  I can easily see the leak if I use for example a
background worker which connects to a database, and launches many
transactions in a row.  The laziest reproducer I have is to patch one of
my bgworkers to launch millions of transactions in a tight loop and the
leak is plain (this counts relations automatically, does not matter):
https://github.com/michaelpq/pg_plugins/tree/master/count_relations

TopMemoryContext is associated to a session, so the comment in
AtEOXact_SPI() is wrong.
--
Michael

Attachment

Re: Is there a memory leak in commit 8561e48?

From
Michael Paquier
Date:
On Thu, Apr 19, 2018 at 03:10:42PM +0900, Michael Paquier wrote:
> You are right.  I can easily see the leak if I use for example a
> background worker which connects to a database, and launches many
> transactions in a row.  The laziest reproducer I have is to patch one of
> my bgworkers to launch millions of transactions in a tight loop and the
> leak is plain (this counts relations automatically, does not matter):
> https://github.com/michaelpq/pg_plugins/tree/master/count_relations
>
> TopMemoryContext is associated to a session, so the comment in
> AtEOXact_SPI() is wrong.

I have been looking at this one this morning, and I can see at least two
problems:
1) When SPI_connect_ext is used in an atomic context, then the
allocation of _SPI_stack should happen in TopTransactionContext instead
of TopMemoryContext.  This way, any callers of SPI_connect would not be
impacted by any memory leaks.
2) Error stacks with non-atomic calls leak memorya anyway.  It is easy
to find leaks of _SPI_stack in the regression tests when an ERROR
happens in a PL call which lead to AtEOXact_SPI being called in
AbortTransaction.  See that as an example:
@@ -283,6 +285,12 @@ AtEOXact_SPI(bool isCommit)
                  errmsg("transaction left non-empty SPI stack"),
                  errhint("Check for missing \"SPI_finish\" calls.")));

+    if (_SPI_current != NULL && !_SPI_current->atomic && _SPI_stack != NULL)
+        ereport(WARNING,
+                (errcode(ERRCODE_WARNING),
+                 errmsg("non-atomic transaction left non-empty SPI stack"),
+                 errhint("Check after non-atomic \"SPI_connect_ext\" calls.")));

The cleanest approach I can think about is to have SPI use its own
memory context which gets cleaned up in AtEOXact_SPI, but I would like
to hear more from Peter Eisentraut and Andrew Dunstand first as
author/committer and reviewer as it is their stuff.

I am attaching a preliminary patch, which fixes partially the leak, but
that does not take care of the leaks caused by the error stacks.
--
Michael

Attachment

Re: Re: Is there a memory leak in commit 8561e48?

From
"jian.long@i-soft.com.cn"
Date:
what about just free _SPI_stack in AtEOXact_SPI? if the transaction end was initiated by SPI , AtEOXact_SPI will do nothing. 
for example:
@@ -283,6 +295,8 @@ AtEOXact_SPI(bool isCommit)
                                 errmsg("transaction left non-empty SPI stack"),
                                 errhint("Check for missing \"SPI_finish\" calls.")));

+       if (_SPI_stack)
+               pfree(_SPI_stack);
 
Date: 2018-04-20 08:49
Subject: Re: Is there a memory leak in commit 8561e48?
On Thu, Apr 19, 2018 at 03:10:42PM +0900, Michael Paquier wrote:
> You are right.  I can easily see the leak if I use for example a
> background worker which connects to a database, and launches many
> transactions in a row.  The laziest reproducer I have is to patch one of
> my bgworkers to launch millions of transactions in a tight loop and the
> leak is plain (this counts relations automatically, does not matter):
> https://github.com/michaelpq/pg_plugins/tree/master/count_relations
>
> TopMemoryContext is associated to a session, so the comment in
> AtEOXact_SPI() is wrong.
 
I have been looking at this one this morning, and I can see at least two
problems:
1) When SPI_connect_ext is used in an atomic context, then the
allocation of _SPI_stack should happen in TopTransactionContext instead
of TopMemoryContext.  This way, any callers of SPI_connect would not be
impacted by any memory leaks.
2) Error stacks with non-atomic calls leak memorya anyway.  It is easy
to find leaks of _SPI_stack in the regression tests when an ERROR
happens in a PL call which lead to AtEOXact_SPI being called in
AbortTransaction.  See that as an example:
@@ -283,6 +285,12 @@ AtEOXact_SPI(bool isCommit)
                  errmsg("transaction left non-empty SPI stack"),
                  errhint("Check for missing \"SPI_finish\" calls.")));
+    if (_SPI_current != NULL && !_SPI_current->atomic && _SPI_stack != NULL)
+        ereport(WARNING,
+                (errcode(ERRCODE_WARNING),
+                 errmsg("non-atomic transaction left non-empty SPI stack"),
+                 errhint("Check after non-atomic \"SPI_connect_ext\" calls.")));
 
The cleanest approach I can think about is to have SPI use its own
memory context which gets cleaned up in AtEOXact_SPI, but I would like
to hear more from Peter Eisentraut and Andrew Dunstand first as
author/committer and reviewer as it is their stuff.
 
I am attaching a preliminary patch, which fixes partially the leak, but
that does not take care of the leaks caused by the error stacks.
--
Michael

Re: Re: Is there a memory leak in commit 8561e48?

From
Michael Paquier
Date:
On Fri, Apr 20, 2018 at 10:00:38AM +0800, jian.long@i-soft.com.cn wrote:
> what about just free _SPI_stack in AtEOXact_SPI? if the transaction
> end was initiated by SPI , AtEOXact_SPI will do nothing.  For example:
> @@ -283,6 +295,8 @@ AtEOXact_SPI(bool isCommit)
>                                  errmsg("transaction left non-empty SPI stack"),
>                                  errhint("Check for missing \"SPI_finish\" calls.")));
>
> +       if (_SPI_stack)
> +               pfree(_SPI_stack);

Sure, but that is rather inconsistent with the handling which exists
using TopTransactionContext where the API stack is deleted at the same
time as the transaction context, which causes this approach to be
inconsistent for atomic and non-atomic calls of SPI_connect_ext.  Let's
see what is Peter's take here first.
--
Michael

Attachment

Re: Is there a memory leak in commit 8561e48?

From
Peter Eisentraut
Date:
On 4/19/18 02:10, Michael Paquier wrote:
> On Thu, Apr 19, 2018 at 11:38:09AM +0800, jian.long@i-soft.com.cn wrote:
>> in commit 8561e48, _SPI_stack alloc from TopMemoryContext. But
>> AtEOXact_SPI just set _SPI_stack = NULL. Is this a memory leak? 
> 
> You are right.  I can easily see the leak if I use for example a
> background worker which connects to a database, and launches many
> transactions in a row.  The laziest reproducer I have is to patch one of
> my bgworkers to launch millions of transactions in a tight loop and the
> leak is plain (this counts relations automatically, does not matter):
> https://github.com/michaelpq/pg_plugins/tree/master/count_relations

This is not a memory leak in the sense that the memory is not reusable.
It allocates memory for a stack, and that stack will be reused later.
The stack could be bigger than you'll need, but that's not necessarily a
problem.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Is there a memory leak in commit 8561e48?

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 4/19/18 02:10, Michael Paquier wrote:
>> You are right.  I can easily see the leak if I use for example a
>> background worker which connects to a database, and launches many
>> transactions in a row.  The laziest reproducer I have is to patch one of
>> my bgworkers to launch millions of transactions in a tight loop and the
>> leak is plain (this counts relations automatically, does not matter):
>> https://github.com/michaelpq/pg_plugins/tree/master/count_relations

> This is not a memory leak in the sense that the memory is not reusable.
> It allocates memory for a stack, and that stack will be reused later.
> The stack could be bigger than you'll need, but that's not necessarily a
> problem.

Uh, no, the memory *isn't* reused later.  The coding in AtEOXact_SPI
assumes that it can just null out _SPI_stack because whatever that
might've been pointing to is transaction-lifespan memory and will
go away without extra work here.  Which was true in every prior
release, because SPI_connect caused the stack to be allocated in
TopTransactionContext.  You've changed it to be in TopMemoryContext
and not adjusted the cleanup to match.

We could change the coding in AtEOXact_SPI so that it preserves
_SPI_stack and _SPI_stack_depth and only resets _SPI_connected,
but I'm not sure what's the point of keeping the stack storage
if _SPI_connected gets reset; that means we no longer think
there's any data there.  One way or the other this code is now
quite schizophrenic.

Possibly what you really want is for the stack storage to be permanent
and for _SPI_connected to get reset to something that's not necessarily
-1, so that AtEOXact_SPI is more like AtEOSubXact_SPI.  But then you need
a mechanism for figuring out how much of the stack ought to outlive the
current transaction.  I would bet money that that
"_SPI_current->internal_xact" thing is wrong/inadequate.

The comments in both SPI_connect and AtEOXact_SPI about memory management
seem to need work, too.

            regards, tom lane


Re: Is there a memory leak in commit 8561e48?

From
Michael Paquier
Date:
On Thu, Apr 26, 2018 at 06:36:01PM -0400, Tom Lane wrote:
> Uh, no, the memory *isn't* reused later.  The coding in AtEOXact_SPI
> assumes that it can just null out _SPI_stack because whatever that
> might've been pointing to is transaction-lifespan memory and will
> go away without extra work here.  Which was true in every prior
> release, because SPI_connect caused the stack to be allocated in
> TopTransactionContext.  You've changed it to be in TopMemoryContext
> and not adjusted the cleanup to match.

Yep, thanks for jumping in the ship.

> We could change the coding in AtEOXact_SPI so that it preserves
> _SPI_stack and _SPI_stack_depth and only resets _SPI_connected,
> but I'm not sure what's the point of keeping the stack storage
> if _SPI_connected gets reset; that means we no longer think
> there's any data there.  One way or the other this code is now
> quite schizophrenic.

Would we want to handle memory contexts differently between an atomic
and a non-atomic connection to the SPI?  For an atomic call, there is
little point in keeping the stack in the session context as this is
invalid data by the end of the transaction commit, and back-branches
rely on the transaction's context removed at when a transaction finishes
to cleanup the memory of the SPI stack.

> Possibly what you really want is for the stack storage to be permanent
> and for _SPI_connected to get reset to something that's not necessarily
> -1, so that AtEOXact_SPI is more like AtEOSubXact_SPI.  But then you need
> a mechanism for figuring out how much of the stack ought to outlive the
> current transaction.  I would bet money that that
> "_SPI_current->internal_xact" thing is wrong/inadequate.
>
> The comments in both SPI_connect and AtEOXact_SPI about memory management
> seem to need work, too.

Yeah, error handling also needs some extra lookup.  Do you still want
the memory to be around when a transaction fails with an internal ERROR?
Having consistency between the way atomic and non-atomic connections are
handled would be nice, for both the creation of the SPI stack and its
deletion.  The final result should not impact the user experience
particularly for background workers or people would get ramping silent
failures because of memory exhaustion.
--
Michael

Attachment

Re: Is there a memory leak in commit 8561e48?

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> I would bet money that that "_SPI_current->internal_xact" thing is
 Tom> wrong/inadequate.

In particular this looks wrong to me: after doing:

do $$
  begin
    execute 'declare foo cursor with hold for select 1/x as a from (values (1),(0)) v(x)';
    commit;  -- errors within the commit
  end;
$$;
ERROR:  division by zero
CONTEXT:  PL/pgSQL function inline_code_block line 1 at COMMIT

the SPI stack is not cleaned up at all, and _SPI_connected is >= 0 even
when back at the main backend command loop.

-- 
Andrew (irc:RhodiumToad)


Re: Is there a memory leak in commit 8561e48?

From
Peter Eisentraut
Date:
On 4/27/18 02:44, Andrew Gierth wrote:
>>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
>  Tom> I would bet money that that "_SPI_current->internal_xact" thing is
>  Tom> wrong/inadequate.
> 
> In particular this looks wrong to me: after doing:
> 
> do $$
>   begin
>     execute 'declare foo cursor with hold for select 1/x as a from (values (1),(0)) v(x)';
>     commit;  -- errors within the commit
>   end;
> $$;
> ERROR:  division by zero
> CONTEXT:  PL/pgSQL function inline_code_block line 1 at COMMIT
> 
> the SPI stack is not cleaned up at all, and _SPI_connected is >= 0 even
> when back at the main backend command loop.

The memory leak can be fixed by adding a pfree().

The example you show can be fixed by doing SPI cleanup in both
transaction abort and exception return to main loop, similar to other
cases that now have to handle these separately.  Patch attached.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Is there a memory leak in commit 8561e48?

From
Michael Paquier
Date:
On Mon, Apr 30, 2018 at 05:10:14PM -0400, Peter Eisentraut wrote:
> The memory leak can be fixed by adding a pfree().
>
> The example you show can be fixed by doing SPI cleanup in both
> transaction abort and exception return to main loop, similar to other
> cases that now have to handle these separately.  Patch attached.

I have looked at the patch for some time and tested it, and the approach
looks good to me.

> +/*
> + * Clean up SPI state.  Called on transaction end (of non-SPI-internal
> + * transactions) and when returning to the main loop on error.
> + */
> +void
> +SPICleanup(void)
> +{
> +    if (_SPI_stack)
> +        pfree(_SPI_stack);
> +    _SPI_stack = NULL;
> +    _SPI_stack_depth = 0;
> +    _SPI_current = NULL;
> +    _SPI_connected = -1;
> +    SPI_processed = 0;
> +    SPI_lastoid = InvalidOid;
> +    SPI_tuptable = NULL;
> +}
If _SPI_stack is NULL, a quick exit path would make sense?
--
Michael

Attachment

Re: Is there a memory leak in commit 8561e48?

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> The memory leak can be fixed by adding a pfree().

That seems like an odd way to approach this.  Why not just remove the
reset of _SPI_stack and _SPI_stack_depth, so as to subtract code rather
than adding it --- that is, make it actually work like you mistakenly
thought it did?  If we're going to keep the stack in TopMemoryContext,
there's no need to thrash it on every transaction.

            regards, tom lane


Re: Is there a memory leak in commit 8561e48?

From
Peter Eisentraut
Date:
On 5/1/18 11:42, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> The memory leak can be fixed by adding a pfree().
> 
> That seems like an odd way to approach this.  Why not just remove the
> reset of _SPI_stack and _SPI_stack_depth, so as to subtract code rather
> than adding it --- that is, make it actually work like you mistakenly
> thought it did?  If we're going to keep the stack in TopMemoryContext,
> there's no need to thrash it on every transaction.

Yes, that seems attractive.

Are we concerned about the case where someone runs a very deeply nested
SPI setup once in a session, creating a large stack allocation, which
then persists for the rest of the session?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Is there a memory leak in commit 8561e48?

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 5/1/18 11:42, Tom Lane wrote:
>> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> That seems like an odd way to approach this.  Why not just remove the
>> reset of _SPI_stack and _SPI_stack_depth, so as to subtract code rather
>> than adding it --- that is, make it actually work like you mistakenly
>> thought it did?  If we're going to keep the stack in TopMemoryContext,
>> there's no need to thrash it on every transaction.

> Yes, that seems attractive.

> Are we concerned about the case where someone runs a very deeply nested
> SPI setup once in a session, creating a large stack allocation, which
> then persists for the rest of the session?

I'm not particularly.  It seems impossible that _SPI_stack could grow
faster than the machine's actual stack, which would mean (on typical
setups) that you couldn't get more than perhaps 10MB of _SPI_stack
before hitting a stack-overflow error.  That's a lot by some measures,
but I don't think it's enough to cripple anything if we don't free it.

Also, if someone is routinely using pretty deep SPI nesting, they'd
probably be happier that we do keep the stack rather than repeatedly
creating and enlarging it.

            regards, tom lane


Re: Is there a memory leak in commit 8561e48?

From
Peter Eisentraut
Date:
On 5/2/18 12:30, Tom Lane wrote:
> I'm not particularly.  It seems impossible that _SPI_stack could grow
> faster than the machine's actual stack, which would mean (on typical
> setups) that you couldn't get more than perhaps 10MB of _SPI_stack
> before hitting a stack-overflow error.  That's a lot by some measures,
> but I don't think it's enough to cripple anything if we don't free it.
> 
> Also, if someone is routinely using pretty deep SPI nesting, they'd
> probably be happier that we do keep the stack rather than repeatedly
> creating and enlarging it.

Yes, that was the idea.  Here is an adjusted patch.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Is there a memory leak in commit 8561e48?

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> Yes, that was the idea.  Here is an adjusted patch.

Looks OK to me as far as the leak issue goes.  I have no opinion on
whether this is adequate in respect to cleanup-after-error problems.

            regards, tom lane


Re: Is there a memory leak in commit 8561e48?

From
Michael Paquier
Date:
On Wed, May 02, 2018 at 05:20:37PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> > Yes, that was the idea.  Here is an adjusted patch.
>
> Looks OK to me as far as the leak issue goes.  I have no opinion on
> whether this is adequate in respect to cleanup-after-error problems.

With connection poolers letting the connections to the server be around
for a long time, wouldn't it be an issue to let this much memory live
longer than the transaction context?  The deeper the stack, the more
memory consumed, hence the more OS cache that PostgreSQL cannot use.  So
this could impact performance for some loads.  I would vote for cleaning
up this memory instead of letting it live unused in TopMemoryContext.
--
Michael

Attachment

Re: Is there a memory leak in commit 8561e48?

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> With connection poolers letting the connections to the server be around
> for a long time, wouldn't it be an issue to let this much memory live
> longer than the transaction context?  The deeper the stack, the more
> memory consumed, hence the more OS cache that PostgreSQL cannot use.  So
> this could impact performance for some loads.  I would vote for cleaning
> up this memory instead of letting it live unused in TopMemoryContext.

It's only ~100 bytes per stack level.  I think under normal loads
nobody would notice.  If you're worried about cross-transaction
memory consumption, our various caches tend to be a lot worse.

            regards, tom lane


Re: Is there a memory leak in commit 8561e48?

From
Michael Paquier
Date:
On Wed, May 02, 2018 at 07:03:21PM -0400, Tom Lane wrote:
> It's only ~100 bytes per stack level.  I think under normal loads
> nobody would notice.  If you're worried about cross-transaction
> memory consumption, our various caches tend to be a lot worse.

Perhaps, that's one reason why people drop connections from time to time
to the server even with a custom pooler.  I am wondering if we are going
to have complains about "performance regressions" found after upgrading
to Postgres 11 for deployments which rely on complicated PL call stacks,
or complains about the OOM killer though.  Getting to review large
procedures stacks can be a pain for application developers.
--
Michael

Attachment

Re: Is there a memory leak in commit 8561e48?

From
Peter Eisentraut
Date:
On 5/2/18 20:11, Michael Paquier wrote:
> On Wed, May 02, 2018 at 07:03:21PM -0400, Tom Lane wrote:
>> It's only ~100 bytes per stack level.  I think under normal loads
>> nobody would notice.  If you're worried about cross-transaction
>> memory consumption, our various caches tend to be a lot worse.
> 
> Perhaps, that's one reason why people drop connections from time to time
> to the server even with a custom pooler.  I am wondering if we are going
> to have complains about "performance regressions" found after upgrading
> to Postgres 11 for deployments which rely on complicated PL call stacks,
> or complains about the OOM killer though.  Getting to review large
> procedures stacks can be a pain for application developers.

I went with the patch I had posted, since I needed to move ahead with
something.  If it turns out to be a problem, we can easily switch it around.

As Tom mentioned, in order to grow the SPI stack to where it has a
significant size, you might also overrun the OS stack and other things.
On the other hand, the current/new arrangement is a win for normal SPI
use, since you don't need to rebuild the stack on every call.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Is there a memory leak in commit 8561e48?

From
Michael Paquier
Date:
On Thu, May 03, 2018 at 08:45:13AM -0400, Peter Eisentraut wrote:
> I went with the patch I had posted, since I needed to move ahead with
> something.  If it turns out to be a problem, we can easily switch it
> around.

Okay.

> As Tom mentioned, in order to grow the SPI stack to where it has a
> significant size, you might also overrun the OS stack and other things.
> On the other hand, the current/new arrangement is a win for normal SPI
> use, since you don't need to rebuild the stack on every call.

It would be nice to mention that in the release notes..
--
Michael

Attachment