Re: Backends dying due to memory exhaustion--I'm stonkered - Mailing list pgsql-general

From Doug McNaught
Subject Re: Backends dying due to memory exhaustion--I'm stonkered
Date
Msg-id m3bsststrf.fsf@belphigor.mcnaught.org
Whole thread Raw
In response to Backends dying due to memory exhaustion--I'm stonkered  (Doug McNaught <doug@wireboard.com>)
Responses Re: Backends dying due to memory exhaustion--I'm stonkered  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Doug McNaught <doug@wireboard.com> writes:
> > One funny thing is that the nightly VACUUM doesn't always fail--the
> > system will run smoothly for one to three days on average before a
> > crash.
>
> That does seem to contradict the corrupt-data theory.  Do you run a
> VACUUM ANALYZE or just a plain VACUUM?  If there were a persisting
> corrupted tuple, I'd expect VACUUM ANALYZE to crash always, VACUUM
> never (VACUUM doesn't inquire into the actual contents of tuples).

I'm running VACUUM, then VACUUM ANALYZE (the docs seem to suggest that
you need both).  Basically my script is:

$ vacuumdb -a
$ vacuumdb -z -a

The example I sent was a crash during VACUUM.

> > That's a thought, and I will try it.  I'm currently (as of yesterday's
> > crash) running with -d 2 and output sent to a logfile.  Is this
> > debuglevel high enough to tell me which table contains the bad tuple,
> > if that's indeed the problem?
>
> That would tell you what query is running.  It's not enough to tell you
> where VACUUM is unless you do VACUUM VERBOSE.

Which will no doubt generate reams and reams of data...

> > If I can't nail it down that way, how hard would it be to write a C
> > program to scan all the tuples in a database looking for bogus size
> > fields?
>
> Fairly hard.  I'd suggest instead that you just do
>     psql -c "copy FOO to stdout" dbname >/dev/null
> and try that on each table in turn to see if you get any crashes...

OK, I'll keep that in reserve.

Another thing that springs to mind--once the crash happens, the
database doesn't respond (or gives fatal errors) to new connections
and to queries on existing connections.  Killing the postmaster does
nothing--I have to send SIGTERM to all backends and the postmaster in
order to get it to exit.  I don't know if this helps...

-Doug

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Backends dying due to memory exhaustion--I'm stonkered
Next
From: "Aggarwal , Ajay"
Date:
Subject: 2 or more columns of type 'serial' in a table