Thread: ERROR: invalid memory alloc request size 1073741824

ERROR: invalid memory alloc request size 1073741824

From
Stefan Blanke
Date:
Hello,

We've tripped over an error when doing a "COPY.. TO STDOUT WITH BINARY" 
query.

"ERROR:  invalid memory alloc request size 1073741824"
(exactly 1GB)

So a palloc() call is failing on the AllocSizeIsValid() check.

Does anyone know if this a safety catch we are tripping with a bad query 
- or whether this check is something that should never be hit (i.e. a bug).

This has been some discussion before about making a change before the 
9.5beta1:

https://www.postgresql.org/message-id/flat/9A28C8860F777E439AA12E8AEA7694F8010F6F3F%40BPXM15GP.gisp.nec.co.jp

https://www.postgresql.org/message-id/flat/CAKJS1f_sxPyBHF6%3DnEYwPJdKKDNjVDP91b_EJApfyGTei%3DViDw%40mail.gmail.com

postgresql 9.5.4 (2PGDG.rhel6)
work_mem = 256MB

Thanks,
Stefan


Re: ERROR: invalid memory alloc request size 1073741824

From
Tomas Vondra
Date:
On 01/30/2018 06:35 PM, Stefan Blanke wrote:
> Hello,
> 
> We've tripped over an error when doing a "COPY.. TO STDOUT WITH BINARY"
> query.
> 
> "ERROR:  invalid memory alloc request size 1073741824"
> (exactly 1GB)
> 
> So a palloc() call is failing on the AllocSizeIsValid() check.
> 
> Does anyone know if this a safety catch we are tripping with a bad
> query - or whether this check is something that should never be hit
> (i.e. a bug).
> 

That very much depends on where exactly the failure happens - it might
be both (or relying on an assumption that changed over). It's not clear
if it fails during query execution or formatting the output (I seem to
remember there were/are issues with rows containing multiple large bytea
values, for example).

Can you attach gdb to the backend, set breakpoint on that elog and share
the backtrace? Then we can investigate further.

In case you don't have experience with doing that, here's a howto on the
community wiki:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

> This has been some discussion before about making a change before the
> 9.5beta1:
> 
> https://www.postgresql.org/message-id/flat/9A28C8860F777E439AA12E8AEA7694F8010F6F3F%40BPXM15GP.gisp.nec.co.jp
> 
> https://www.postgresql.org/message-id/flat/CAKJS1f_sxPyBHF6%3DnEYwPJdKKDNjVDP91b_EJApfyGTei%3DViDw%40mail.gmail.com
> 

Both of those threads are about a hashjoin issue, which should have been
 fixed in 9.5. Moreover, you haven't shared the query and it's query
plan, so it's unclear if it's doing hash joins at all. That would be
useful for investigating this issue, BTW.

> 
> postgresql 9.5.4 (2PGDG.rhel6)
> work_mem = 256MB
> 

FWIW you're missing more than a year of bugfixes


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: ERROR: invalid memory alloc request size 1073741824

From
Jan Wieck
Date:


On Tue, Jan 30, 2018 at 12:35 PM, Stefan Blanke <stefan.blanke@framestore.com> wrote:
Hello,

We've tripped over an error when doing a "COPY.. TO STDOUT WITH BINARY" query.

"ERROR:  invalid memory alloc request size 1073741824"
(exactly 1GB)

I have my money on a corrupted TOAST entry. Is this happening on trustworthy hardware or beige box with no ECC or RAID?


Regards, Jan



 

So a palloc() call is failing on the AllocSizeIsValid() check.

Does anyone know if this a safety catch we are tripping with a bad query - or whether this check is something that should never be hit (i.e. a bug).

This has been some discussion before about making a change before the 9.5beta1:

https://www.postgresql.org/message-id/flat/9A28C8860F777E439AA12E8AEA7694F8010F6F3F%40BPXM15GP.gisp.nec.co.jp

https://www.postgresql.org/message-id/flat/CAKJS1f_sxPyBHF6%3DnEYwPJdKKDNjVDP91b_EJApfyGTei%3DViDw%40mail.gmail.com

postgresql 9.5.4 (2PGDG.rhel6)
work_mem = 256MB

Thanks,
Stefan




--
Jan Wieck
Senior Postgres Architect

Re: ERROR: invalid memory alloc request size 1073741824

From
Tomas Vondra
Date:

On 01/30/2018 10:43 PM, Jan Wieck wrote:
> 
> 
> On Tue, Jan 30, 2018 at 12:35 PM, Stefan Blanke
> <stefan.blanke@framestore.com <mailto:stefan.blanke@framestore.com>> wrote:
> 
>     Hello,
> 
>     We've tripped over an error when doing a "COPY.. TO STDOUT WITH
>     BINARY" query.
> 
>     "ERROR:  invalid memory alloc request size 1073741824"
>     (exactly 1GB)
> 
> 
> I have my money on a corrupted TOAST entry. Is this happening on
> trustworthy hardware or beige box with no ECC or RAID?
> 

I'll bet you it's not that. It's quite unlikely that would fail with
exactly 1GB request size. It seems much more like a buffer that we keep
to be power of 2. The question is which one.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: ERROR: invalid memory alloc request size 1073741824

From
Stefan Blanke
Date:
 > That very much depends on where exactly the failure happens - it might
 > be both (or relying on an assumption that changed over). It's not clear
 > if it fails during query execution or formatting the output (I seem to
 > remember there were/are issues with rows containing multiple large bytea
 > values, for example).
 >
 > Can you attach gdb to the backend, set breakpoint on that elog and share
 > the backtrace? Then we can investigate further.

I can't easily get a meaningful query plan (i.e. from the live system) 
as the query that failed is based on a temporary table built by some 
pl/pgsql (the database is more of a conveyor belt than a long term store).

Unfortunately it's difficult to reproduce the ERROR. This has only 
occurred a couple of times on a live system in the last year and these 
queries run frequently.

I think we will move to 9.5.10 and see if it ever happens again; then if 
it does get a backtrace on that newer version. Incidentally we never saw 
this error on 8.4 where this code was running from 2009 until 2017.


 > I'll bet you it's not that. It's quite unlikely that would fail with
 > exactly 1GB request size. It seems much more like a buffer that we keep
 > to be power of 2. The question is which one.

I had dismissed corruption before writing in. It's exactly 1GB every 
time this has happened - and we can dump the full dataset periodically 
without issue.

 >> I have my money on a corrupted TOAST entry. Is this happening on
 >> trustworthy hardware or beige box with no ECC or RAID?

It's good quality commercial hardware in our colo - no exactly sure what.


Thanks for taking the time to look at this!
Stefan




On 01/30/18 22:00, Tomas Vondra wrote:
>
>
> On 01/30/2018 10:43 PM, Jan Wieck wrote:
>>
>>
>> On Tue, Jan 30, 2018 at 12:35 PM, Stefan Blanke
>> <stefan.blanke@framestore.com <mailto:stefan.blanke@framestore.com>> wrote:
>>
>>     Hello,
>>
>>     We've tripped over an error when doing a "COPY.. TO STDOUT WITH
>>     BINARY" query.
>>
>>     "ERROR:  invalid memory alloc request size 1073741824"
>>     (exactly 1GB)
>>
>>
>> I have my money on a corrupted TOAST entry. Is this happening on
>> trustworthy hardware or beige box with no ECC or RAID?
>>
>
> I'll bet you it's not that. It's quite unlikely that would fail with
> exactly 1GB request size. It seems much more like a buffer that we keep
> to be power of 2. The question is which one.
>
>
> regards
>


Re: ERROR: invalid memory alloc request size 1073741824

From
Jan Wieck
Date:


On Wed, Jan 31, 2018 at 12:32 PM, Stefan Blanke <stefan.blanke@framestore.com> wrote:
>
> I'll bet you it's not that. It's quite unlikely that would fail with
> exactly 1GB request size. It seems much more like a buffer that we keep
> to be power of 2. The question is which one.

I had dismissed corruption before writing in. It's exactly 1GB every time this has happened - and we can dump the full dataset periodically without issue.

>> I have my money on a corrupted TOAST entry. Is this happening on
>> trustworthy hardware or beige box with no ECC or RAID?

It's good quality commercial hardware in our colo - no exactly sure what.

If it is a sporadic issue and you can dump the full dataset, then I just lost my money (Tomas, you coming to PGConf in Jersey City?).

But then, if this is a plain COPY to stdout ... I am wondering what could possibly be in that path that wants to allocate 1GB. Or is this not so plain but rather a COPY ... SELECT ...?


Regards, Jan


 


Thanks for taking the time to look at this!
Stefan





On 01/30/18 22:00, Tomas Vondra wrote:


On 01/30/2018 10:43 PM, Jan Wieck wrote:


On Tue, Jan 30, 2018 at 12:35 PM, Stefan Blanke
<stefan.blanke@framestore.com <mailto:stefan.blanke@framestore.com>> wrote:

    Hello,

    We've tripped over an error when doing a "COPY.. TO STDOUT WITH
    BINARY" query.

    "ERROR:  invalid memory alloc request size 1073741824"
    (exactly 1GB)


I have my money on a corrupted TOAST entry. Is this happening on
trustworthy hardware or beige box with no ECC or RAID?


I'll bet you it's not that. It's quite unlikely that would fail with
exactly 1GB request size. It seems much more like a buffer that we keep
to be power of 2. The question is which one.


regards




--
Jan Wieck
Senior Postgres Architect

Re: ERROR: invalid memory alloc request size 1073741824

From
Tomas Vondra
Date:

On 01/31/2018 09:51 PM, Jan Wieck wrote:
> 
> 
> On Wed, Jan 31, 2018 at 12:32 PM, Stefan Blanke
> <stefan.blanke@framestore.com <mailto:stefan.blanke@framestore.com>> wrote:
> 
>     >
>     > I'll bet you it's not that. It's quite unlikely that would fail with
>     > exactly 1GB request size. It seems much more like a buffer that we keep
>     > to be power of 2. The question is which one.
> 
>     I had dismissed corruption before writing in. It's exactly 1GB every
>     time this has happened - and we can dump the full dataset
>     periodically without issue.
> 
>     >> I have my money on a corrupted TOAST entry. Is this happening on
>     >> trustworthy hardware or beige box with no ECC or RAID?
> 
>     It's good quality commercial hardware in our colo - no exactly sure
>     what.
> 
> 
> If it is a sporadic issue and you can dump the full dataset, then I just
> lost my money (Tomas, you coming to PGConf in Jersey City?).
> 

Unfortunately no, but I'm sure there will be other opportunities to buy
me a beer ;-) Like pgcon, for example.

>
> But then, if this is a plain COPY to stdout ... I am wondering what
> could possibly be in that path that wants to allocate 1GB. Or is this
> not so plain but rather a COPY ... SELECT ...?
> 

That's what I've been guessing, and why I was asking for a query plan.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: ERROR: invalid memory alloc request size 1073741824

From
Stefan Blanke
Date:
Hi,

We've upgraded to PostgreSQL 11.5 (postgresql.org rhel 6 rpm) and I have 
had another occurrence of this invalid alloc of 1GB. Apologies for never 
providing a query plan when discussing this two years ago; we decided to 
move to a newer PostgreSQL to see if the issue went away but took a 
while to complete the move.

The invalid alloc still only occurs occasionally every few months on a 
query that we run every minute; so unfortunately we still don't have a 
contained reproducible test case.

This is the SQL we are running with a query plan - the query plan is 
from an new empty database so the planner has no stats.

CREATE OR REPLACE FUNCTION
create_table()
RETURNS TABLE (a INTEGER, b INTEGER)
AS $$
     UPDATE y
     SET c = true,
         d = false
     WHERE e IS NOT true
         AND f IS NOT true
         AND g = 1
         AND h = 0
         AND i = 0
         AND (j IS NULL
             OR j > 0)
     RETURNING y.a, y.b;
$$ LANGUAGE SQL;

-- Prepared statement (PQprepare)
CREATE TEMPORARY TABLE temp_table AS
SELECT * FROM create_table()

-- Prepared statement (PQprepare)
SELECT y.a, y.b,
      x.k,
      x.l,
      y.m,
      y.n,
      y.o
  FROM temp_table
      INNER JOIN y ON temp_table.b = y.b
          AND temp_table.a = y.a
      INNER JOIN x ON x.a = y.a

-- The two prepared statements are executed one after another
-- in the order shown many times an hour.

The query plan for the second prepared statement is:

  Nested Loop  (cost=17.14..64.38 rows=16 width=112)
    ->  Hash Join  (cost=17.00..61.47 rows=16 width=80)
          Hash Cond: ((temp_table.b = y.b) AND (temp_table.a = y.a))
          ->  Seq Scan on temp_table  (cost=0.00..32.60 rows=2260 width=8)
          ->  Hash  (cost=12.80..12.80 rows=280 width=76)
                ->  Seq Scan on y  (cost=0.00..12.80 rows=280 width=76)
    ->  Index Scan using x_pkey on x  (cost=0.14..0.18 rows=1 width=40)
          Index Cond: (a = temp_table.a)

Thanks,
Stefan

On 31/01/2018 21:23, Tomas Vondra wrote:
> 
> 
> On 01/31/2018 09:51 PM, Jan Wieck wrote:
>>
>>
>> On Wed, Jan 31, 2018 at 12:32 PM, Stefan Blanke
>> <stefan.blanke@framestore.com <mailto:stefan.blanke@framestore.com>> wrote:
>>
>>      >
>>      > I'll bet you it's not that. It's quite unlikely that would fail with
>>      > exactly 1GB request size. It seems much more like a buffer that we keep
>>      > to be power of 2. The question is which one.
>>
>>      I had dismissed corruption before writing in. It's exactly 1GB every
>>      time this has happened - and we can dump the full dataset
>>      periodically without issue.
>>
>>      >> I have my money on a corrupted TOAST entry. Is this happening on
>>      >> trustworthy hardware or beige box with no ECC or RAID?
>>
>>      It's good quality commercial hardware in our colo - no exactly sure
>>      what.
>>
>>
>> If it is a sporadic issue and you can dump the full dataset, then I just
>> lost my money (Tomas, you coming to PGConf in Jersey City?).
>>
> 
> Unfortunately no, but I'm sure there will be other opportunities to buy
> me a beer ;-) Like pgcon, for example.
> 
>>
>> But then, if this is a plain COPY to stdout ... I am wondering what
>> could possibly be in that path that wants to allocate 1GB. Or is this
>> not so plain but rather a COPY ... SELECT ...?
>>
> 
> That's what I've been guessing, and why I was asking for a query plan.
> 
> 
> regards
> 



Re: ERROR: invalid memory alloc request size 1073741824

From
Tom Lane
Date:
Stefan Blanke <stefan.blanke@framestore.com> writes:
> We've upgraded to PostgreSQL 11.5 (postgresql.org rhel 6 rpm) and I have 
> had another occurrence of this invalid alloc of 1GB. Apologies for never 
> providing a query plan when discussing this two years ago; we decided to 
> move to a newer PostgreSQL to see if the issue went away but took a 
> while to complete the move.

> The invalid alloc still only occurs occasionally every few months on a 
> query that we run every minute; so unfortunately we still don't have a 
> contained reproducible test case.

Given the lack of stats, I wonder whether the issue could be related
to the plan sometimes being horribly bad, eg due to the temp table
being much larger than expected.  (A possible mechanism would be
hash table bloat, perhaps, but that's getting way ahead of the
evidence.)

Could you adjust your process to log the actual temp table size
each time, ie "select count(*) from temp_table" in between the
two steps, and then note whether the failures are correlated
with unusual temp table sizes?

            regards, tom lane



Re: ERROR: invalid memory alloc request size 1073741824

From
Stefan Blanke
Date:
 > Could you adjust your process to log the actual temp table size
 > each time, ie "select count(*) from temp_table" in between the
 > two steps, and then note whether the failures are correlated
 > with unusual temp table sizes?

Thanks, I'll add a count there and come back with a number when we next 
hit the "invalid memory alloc".

Stefan

On 11/03/2020 15:37, Tom Lane wrote:
> Stefan Blanke <stefan.blanke@framestore.com> writes:
>> We've upgraded to PostgreSQL 11.5 (postgresql.org rhel 6 rpm) and I have
>> had another occurrence of this invalid alloc of 1GB. Apologies for never
>> providing a query plan when discussing this two years ago; we decided to
>> move to a newer PostgreSQL to see if the issue went away but took a
>> while to complete the move.
> 
>> The invalid alloc still only occurs occasionally every few months on a
>> query that we run every minute; so unfortunately we still don't have a
>> contained reproducible test case.
> 
> Given the lack of stats, I wonder whether the issue could be related
> to the plan sometimes being horribly bad, eg due to the temp table
> being much larger than expected.  (A possible mechanism would be
> hash table bloat, perhaps, but that's getting way ahead of the
> evidence.)
> 
> Could you adjust your process to log the actual temp table size
> each time, ie "select count(*) from temp_table" in between the
> two steps, and then note whether the failures are correlated
> with unusual temp table sizes?
> 
>             regards, tom lane
>