Thread: strange OOM errors with EXECUTE in PL/pgSQL

strange OOM errors with EXECUTE in PL/pgSQL

From
Tomas Vondra
Date:
Hi,

one of our local users reported he's getting OOM errors on 9.2, although
on 9.1 the code worked fine. Attached is a simple test-case that should
give you an OOM error almost immediately.

What it does:

1) creates a simple table called "test" with one text column.

2) creates a plpgsql function with one parameter, and all that function
   does is passing the parameter to EXECUTE

3) calls the function with a string containing many INSERTs into the
   test table

The way the EXECUTE is used is a bit awkward, but the failures seem a
bit strange to me. The whole script is ~500kB and most of that is about
11k of very simple INSERT statements:

   insert into test(value) values (''aaaaaaaaaa'');

all of them are exactly the same. Yet when it fails with OOM, the log
contains memory context stats like these:

TopMemoryContext: 5303376 total in 649 blocks; 2648 free ...
  PL/pgSQL function context: 8192 total in 1 blocks; 3160 free ...
  TopTransactionContext: 8192 total in 1 blocks; 6304 free ...
    ExecutorState: 8192 total in 1 blocks; 7616 free ...
      ExprContext: 8192 total in 1 blocks; 8160 free ...
    SPI Exec: 33554432 total in 14 blocks; 6005416 free ...
      CachedPlanSource: 3072 total in 2 blocks; 1856 free ...
      CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
        CachedPlanQuery: 3072 total in 2 blocks; 1648 free ...
      CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
        CachedPlanQuery: 3072 total in 2 blocks; 1648 free ...
      CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
        CachedPlanQuery: 3072 total in 2 blocks; 1648 free ...
      CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
        CachedPlanQuery: 3072 total in 2 blocks; 1648 free ...
      CachedPlanSource: 538688 total in 3 blocks; 1744 free ...
      ...

There is ~9500 of these CachedPlanSource + CachedPlanQuery row pairs
(see the attached log). That seems a bit strange to me, because all the
queries are exactly the same in this test case.

The number of queries needed to get OOM is inversely proportional to the
query length - by using a longer text (instead of 'aaaaaaaaaaa') you may
use much less queries.

I am no expert in this area, but it seems to me that the code does not
expect that many INSERTs in EXECUTE and does not release the memory for
some reason (e.g. because the plans are allocated in SPI Exec memory
context, etc.).

regards
Tomas

Attachment

Re: strange OOM errors with EXECUTE in PL/pgSQL

From
Tom Lane
Date:
Tomas Vondra <tv@fuzzy.cz> writes:
> What it does:

> 1) creates a simple table called "test" with one text column.

> 2) creates a plpgsql function with one parameter, and all that function
>    does is passing the parameter to EXECUTE

> 3) calls the function with a string containing many INSERTs into the
>    test table

> The way the EXECUTE is used is a bit awkward, but the failures seem a
> bit strange to me. The whole script is ~500kB and most of that is about
> 11k of very simple INSERT statements:

>    insert into test(value) values (''aaaaaaaaaa'');

The reason this fails is that you've got a half-megabyte source string,
and each of the 11000 plans that are due to be created from it saves
its own copy of the source string.  Hence, 5500 megabytes needed just
for source strings.

We could possibly fix this by inventing some sort of reference-sharing
arrangement (which'd be complicated and fragile) or by not storing the
source strings with the plans (which'd deal a serious blow to our
ability to provide helpful error messages).  Neither answer seems
appealing.

I think it would be a better idea to adopt a less brain-dead way of
processing the data.  Can't you convert this to a single INSERT with a
lot of VALUES rows?  Or split it into multiple EXECUTE chunks?
        regards, tom lane



Re: strange OOM errors with EXECUTE in PL/pgSQL

From
Tomas Vondra
Date:
On 20.12.2012 02:29, Tom Lane wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> What it does:
> 
>> 1) creates a simple table called "test" with one text column.
> 
>> 2) creates a plpgsql function with one parameter, and all that function
>>    does is passing the parameter to EXECUTE
> 
>> 3) calls the function with a string containing many INSERTs into the
>>    test table
> 
>> The way the EXECUTE is used is a bit awkward, but the failures seem a
>> bit strange to me. The whole script is ~500kB and most of that is about
>> 11k of very simple INSERT statements:
> 
>>    insert into test(value) values (''aaaaaaaaaa'');
> 
> The reason this fails is that you've got a half-megabyte source string,
> and each of the 11000 plans that are due to be created from it saves
> its own copy of the source string.  Hence, 5500 megabytes needed just
> for source strings.
> 
> We could possibly fix this by inventing some sort of reference-sharing
> arrangement (which'd be complicated and fragile) or by not storing the
> source strings with the plans (which'd deal a serious blow to our
> ability to provide helpful error messages).  Neither answer seems
> appealing.

Thanks for the explanation, I didn't occur to me that each plan keeps a
copy of the whole source string.

> I think it would be a better idea to adopt a less brain-dead way of
> processing the data.  Can't you convert this to a single INSERT with a
> lot of VALUES rows?  Or split it into multiple EXECUTE chunks?

Well, it's not my app but I'll recommend it to them. Actually I already
did but I didn't have an explanation of why it behaves like this.

The really annoying bit is that in 9.1 this works fine (it's just as
crazy approach as on but it does not end with OOM error).

Tomas



Re: strange OOM errors with EXECUTE in PL/pgSQL

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> The reason this fails is that you've got a half-megabyte source string,
> and each of the 11000 plans that are due to be created from it saves
> its own copy of the source string.  Hence, 5500 megabytes needed just
> for source strings.
>
> We could possibly fix this by inventing some sort of reference-sharing
> arrangement (which'd be complicated and fragile) or by not storing the
> source strings with the plans (which'd deal a serious blow to our
> ability to provide helpful error messages).  Neither answer seems
> appealing.

I don't readily see how complicated and fragile it would be, it looks
like a hash table of symbols pointing to source strings and a reference
counting, and each plan would need to reference that symbol. Now maybe
that's what you call complicated and fragile, and even if not, I'm not
really sure it would pull its weight. The use case of sending over and
over again *in a given session* the exact same query string without
using PREPARE/EXECUTE looks like quite tiny.

-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: strange OOM errors with EXECUTE in PL/pgSQL

From
Jan Wieck
Date:
On 12/20/2012 4:47 PM, Dimitri Fontaine wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> The reason this fails is that you've got a half-megabyte source string,
>> and each of the 11000 plans that are due to be created from it saves
>> its own copy of the source string.  Hence, 5500 megabytes needed just
>> for source strings.
>>
>> We could possibly fix this by inventing some sort of reference-sharing
>> arrangement (which'd be complicated and fragile) or by not storing the
>> source strings with the plans (which'd deal a serious blow to our
>> ability to provide helpful error messages).  Neither answer seems
>> appealing.
>
> I don't readily see how complicated and fragile it would be, it looks
> like a hash table of symbols pointing to source strings and a reference
> counting, and each plan would need to reference that symbol. Now maybe
> that's what you call complicated and fragile, and even if not, I'm not
> really sure it would pull its weight. The use case of sending over and
> over again *in a given session* the exact same query string without
> using PREPARE/EXECUTE looks like quite tiny.
>

That sounds like a bit of overkill to me.

Don't all the plans result as a plan list from a multi-statement query 
string, which was parsed into a query tree "list" and each single query 
tree then planned? I don't think there is any way that a single one of 
those trees (parse or plan) will be free'd separately. If that is true, 
then proper usage of memory contexts would make reference counting 
obsolete, even though all plans refer to the same copy.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin



Re: strange OOM errors with EXECUTE in PL/pgSQL

From
Tom Lane
Date:
Jan Wieck <JanWieck@Yahoo.com> writes:
> On 12/20/2012 4:47 PM, Dimitri Fontaine wrote:
>> Tom Lane <tgl@sss.pgh.pa.us> writes:
>>> The reason this fails is that you've got a half-megabyte source string,
>>> and each of the 11000 plans that are due to be created from it saves
>>> its own copy of the source string.  Hence, 5500 megabytes needed just
>>> for source strings.
>>> 
>>> We could possibly fix this by inventing some sort of reference-sharing
>>> arrangement (which'd be complicated and fragile) or by not storing the
>>> source strings with the plans (which'd deal a serious blow to our
>>> ability to provide helpful error messages).  Neither answer seems
>>> appealing.

> Don't all the plans result as a plan list from a multi-statement query 
> string, which was parsed into a query tree "list" and each single query 
> tree then planned? I don't think there is any way that a single one of 
> those trees (parse or plan) will be free'd separately. If that is true, 
> then proper usage of memory contexts would make reference counting 
> obsolete, even though all plans refer to the same copy.

The issue is that a multi-statement string gives rise to multiple
CachedPlanSources, which could be freed independently so far as
plancache.c knows.  (spi.c wouldn't actually attempt to do so.)
So you'd really need reference counting, or else some explicit
connection between the CachedPlanSources, neither of which seems
exactly trivial to me.

As of HEAD this particular complaint is moot anyway, because SPI_execute
now goes through the "one-shot CachedPlan" facility, and so it makes no
(zero) copies of the source string.

It'd still be possible to hit the problem when trying to SPI_prepare a
very-many-statement string, but I think the use case for that is pretty
darn small.  So I'm not excited about adding complication to fix it.
        regards, tom lane