Thread: vacuum: out of memory error

vacuum: out of memory error

From
Jakub Ouhrabka
Date:
Hi all,

I have few of these messages in server log:

ERROR:  out of memory
DETAIL:  Failed on request of size 262143996.
STATEMENT:  VACUUM ANALYZE tablename

There are few of them, always the same request size(?) but different two
databases (out of 100+) and few different tables (pg_listener,
pg_statistic and few slony tables). Vacumm analyze is issued by
pg_autovacuum.

We're also seeing strange behavior of listen/notify in possibly
corrupted database: there are notify events being received by listen
process which the process didn't subscribed to. Can this be connected?

I've done little research in mailing list archives and I found possible
cause: table corruption caused by flaky hardware. Does it sound about
right? Are there any other possible causes?

What can be corrupted? How can I check it? How can I correct it? What
are possible consequences of this corruption? How can I investigate it
further?

It's Postgresql 8.0.6 on Linux.

Any help would be greatly appreciated.

Thanks,

Kuba

Re: vacuum: out of memory error

From
Andrew Sullivan
Date:
On Fri, Nov 24, 2006 at 11:59:16AM +0100, Jakub Ouhrabka wrote:
> I've done little research in mailing list archives and I found possible
> cause: table corruption caused by flaky hardware. Does it sound about
> right? Are there any other possible causes?

It sounds about right, yes; but the other possible cause is a
software bug.  In the absence of data proving you have no hardware
problems, though, I think you'll find that people are singularly
unwilling to investigate software bugs in this case.

> What can be corrupted?

Anything.

> How can I check it?

You can try stepping through the table in question and seeing if you
run into problems anywhere.  By binary search, you should be able to
narrow it pretty quickly.

> How can I correct it?

Well, the corrupt rows are lost.  The usual method is "restore from
backup".

> What
> are possible consequences of this corruption?

You can't read the data.  But you already knew that: it's why your
vacuum is blowing up.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
The plural of anecdote is not data.
        --Roger Brinner

Re: vacuum: out of memory error

From
Jim Nasby
Date:
On Nov 24, 2006, at 4:59 AM, Jakub Ouhrabka wrote:
> DETAIL:  Failed on request of size 262143996.
> STATEMENT:  VACUUM ANALYZE tablename
>
> There are few of them, always the same request size(?) but
> different two databases (out of 100+) and few different tables
> (pg_listener, pg_statistic and few slony tables). Vacumm analyze is
> issued by pg_autovacuum.
>
> We're also seeing strange behavior of listen/notify in possibly
> corrupted database: there are notify events being received by
> listen process which the process didn't subscribed to. Can this be
> connected?
>
> I've done little research in mailing list archives and I found
> possible cause: table corruption caused by flaky hardware. Does it
> sound about right? Are there any other possible causes?

That can also be caused by setting maintenance_work_mem too high for
what your hardware is capable of, though I agree that given the other
problems it's likely that there is some kind of corruption.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: vacuum: out of memory error

From
Jakub Ouhrabka
Date:
Thanks for the responses!

One thing I've forgotten: it's not reproducible. I can issue vacuum
command manually without any problems few minutes/seconds after seeing
the error message "out of memory" in the server log.

I also can't find any corrupted rows manually.

And for the listen/notify problem - it narrowed down to be our software bug.

So I've got "vacuum: out of memory" in server log from time to time and
no other symptoms.

 > That can also be caused by setting maintenance_work_mem too high for
 > what your hardware is capable of, though I agree that given the other
 > problems it's likely that there is some kind of corruption.

maintenance_work_mem = 256000

There are 4G of RAM and 4G swap.

There's always:

ERROR:  out of memory
DETAIL:  Failed on request of size 262143996

256000 (work_mem in kb) * 1024 = 262144000

What is the cause of the error? Continuous block of this size can't be
allocated?

So maybe there's no corruption - what do you think?

Regards,

Kuba

Andrew Sullivan napsal(a):
> On Fri, Nov 24, 2006 at 11:59:16AM +0100, Jakub Ouhrabka wrote:
>> I've done little research in mailing list archives and I found possible
>> cause: table corruption caused by flaky hardware. Does it sound about
>> right? Are there any other possible causes?
>
> It sounds about right, yes; but the other possible cause is a
> software bug.  In the absence of data proving you have no hardware
> problems, though, I think you'll find that people are singularly
> unwilling to investigate software bugs in this case.
>
>> What can be corrupted?
>
> Anything.
>
>> How can I check it?
>
> You can try stepping through the table in question and seeing if you
> run into problems anywhere.  By binary search, you should be able to
> narrow it pretty quickly.
>
>> How can I correct it?
>
> Well, the corrupt rows are lost.  The usual method is "restore from
> backup".
>
>> What
>> are possible consequences of this corruption?
>
> You can't read the data.  But you already knew that: it's why your
> vacuum is blowing up.
>
> A
>

Re: vacuum: out of memory error

From
Vivek Khera
Date:
On Nov 28, 2006, at 8:40 AM, Jakub Ouhrabka wrote:

> There are 4G of RAM and 4G swap.

and what is the per-process resource limit imposed by your OS?

Just because your box has that much RAM doesn't mean your process is
allowed to use it.


Attachment