Re: How to repair Pg 9.1? - Mailing list pgsql-general

From Tomas Vondra
Subject Re: How to repair Pg 9.1?
Date
Msg-id 4F8ACEA0.6010405@fuzzy.cz
Whole thread Raw
In response to How to repair Pg 9.1?  ("Dmitry E. Oboukhov" <unera@debian.org>)
Responses Re: How to repair Pg 9.1?  ("Dmitry E. Oboukhov" <unera@debian.org>)
List pgsql-general
On 15.4.2012 14:52, Dmitry E. Oboukhov wrote:
> Recently I started a hard query (ALTER TABLE) and then Postgresql was
> killed by OOM killer.
>
> Since that clients lose their connection and Postgresql says in log:
>
>     2012-04-15 16:46:02 MSK LOG:  server process (PID 825) was terminated by signal 11: Segmentation fault
>     2012-04-15 16:46:02 MSK LOG:  terminating any other active server processes
>     2012-04-15 16:46:02 MSK WARNING:  terminating connection because of crash of another server process
>     2012-04-15 16:46:02 MSK DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 
>     2012-04-15 16:46:02 MSK HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
>     2012-04-15 16:46:02 MSK WARNING:  terminating connection because of crash of another server process
>     2012-04-15 16:46:02 MSK DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 
>     2012-04-15 16:46:02 MSK HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
>     2012-04-15 16:46:02 MSK WARNING:  terminating connection because of crash of another server process
>     2012-04-15 16:46:02 MSK DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 
>     2012-04-15 16:46:02 MSK HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
>     2012-04-15 16:46:02 MSK LOG:  all server processes terminated; reinitializing
>     2012-04-15 16:46:02 MSK LOG:  database system was interrupted; last known up at 2012-04-15 16:45:17 MSK
>     2012-04-15 16:46:02 MSK LOG:  database system was not properly shut down; automatic recovery in progress
>     2012-04-15 16:46:02 MSK LOG:  consistent recovery state reached at 58/21C44D18
>     2012-04-15 16:46:02 MSK LOG:  redo starts at 58/21C44D18
>     2012-04-15 16:46:02 MSK LOG:  record with zero length at 58/21CA7E20
>     2012-04-15 16:46:02 MSK LOG:  redo done at 58/21CA7DE0
>     2012-04-15 16:46:02 MSK LOG:  last completed transaction was at log time 2012-04-15 16:46:02.620188+04
>     2012-04-15 16:46:02 MSK LOG:  autovacuum launcher started
>     2012-04-15 16:46:02 MSK LOG:  database system is ready to accept connections
>
>
> What can I do in the situation?

Well, what happened is quite clear. One of the processes was terminated
by the OOM killer. The database noticed this and responded properly by
restarting the database. That caused a proper recovery - you don't need
to "repair" the database.

What you need to do is change the machine configuration so that the OOM
errors do not happen. I don't know what's running on the machine, but
this usually means tweaking the vm.* parameters at the kernel level and
shared_buffers/work_mem/maintenance_work_mem at the PostgreSQL level.

In this particular situation (ALTER TABLE) I'd use a low value for the
maintenance_work_mem parameter - it'll run longer but won't need that
much memory.

Tomas

pgsql-general by date:

Previous
From: "Dmitry E. Oboukhov"
Date:
Subject: How to repair Pg 9.1?
Next
From: Tomas Vondra
Date:
Subject: Re: Writing data to a text file based on a trigger event...