Re: Autovacuum daemon terminated by signal 11 - Mailing list pgsql-general

From Justin Pasher
Subject Re: Autovacuum daemon terminated by signal 11
Date
Msg-id 496F81EC.1050408@newmediagateway.com
Whole thread Raw
In response to Re: Autovacuum daemon terminated by signal 11  (Richard Huxton <dev@archonet.com>)
Responses Re: Autovacuum daemon terminated by signal 11  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Autovacuum daemon terminated by signal 11  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Richard Huxton wrote:
> Justin Pasher wrote:
>
>> Hello,
>>
>> I have a server running PostgreSQL 8.1.15-0etch1 (Debian etch) that was
>> recently put into production. Last week a developer started having a problem
>> with his psql connection being terminated every couple of minutes when he
>> was running a query. When I look through the logs, I noticed this message.
>>
>> 2009-01-09 08:09:46 CST LOG:  autovacuum process (PID 15012) was terminated
>> by signal 11
>>
>
> Segmentation fault - probably a bug or bad RAM.
>

It's a relatively new machine, but that's obviously a possibility with
any hardware. I haven't seen any other programs experiencing problems on
the box, but the Postgres daemon is the one that is primarily utilized,
so it's a little biased toward that.

>> I looked through the logs some more and I noticed that this was occurring
>> every minute or so. The database is a pretty heavily utilized system
>> (judging by the age(datfrozenxid) from pg_database, the system had run
>> approximately 500 million queries in less than a week). I noticed that right
>> before every autovacuum termination, it tried to autovacuum a database.
>>
>> 2009-01-09 08:09:46 CST LOG:  transaction ID wrap limit is 4563352, limited
>> by database "database_name"
>>
>> It was always showing the same database, so I decided to manually vacuum the
>> database. Once that was done (it was successful the first time without
>> errors), the problem seemed to go away. I went ahead and manually vacuumed
>> the remaining databases just to take care of the potential xid wraparound
>> issue.
>>
>
> I'd be suspicious of possible corruption in autovacuum's internal data.
> Can you trace these problems back to a power-outage or system crash? It
> doesn't look like "database_name" itself since you vacuumed that
> successfully. If autovacuum is running normally now, that might indicate
> it was something in the way autovacuum was keeping track of "database_name".
>

The server hasn't been rebooted since it was installed (about 9 months
ago, but only being utilized within the past month), so there haven't
been any crashes or power outages. The only abnormal things I can find
in the Postgres logs are the autovacuum segfaults. Looking in the logs
today, it looks like it's still happening (once again on a different
database). I manually vacuumed that one database and the problem went
away (for now).

Are there any internal Postgres tables I can look at that may shed some
light on this? Any particular maintenance commands that could be run for
repair?

> It's also probably worth running some memory tests on the server -
> (memtest86 or similar) to see if that shows anything. Was it *always*
> the autovacuum process getting sig11? If not then it might just be a
> pattern of usage that makes it more likely to use some bad RAM

I might try the memtest if we can actually get the databases off of the
server to allow some downtime. None of the logs indicate anything else
acting abnormally or being terminated abnormally, just the autovacuum
daemon. From what I can tell, the segfaults only when the databases pass
the half way point (when age(datfrozenxid) exceeds around 1500000000).
When this is not the case, the segfaults do not occur according to the logs.


Justin Pasher

pgsql-general by date:

Previous
From: Steve Crawford
Date:
Subject: Re: Why would I want to use connection pooling middleware?
Next
From: Andreas Wenk
Date:
Subject: Re: MD5 password issue