On Tue, Dec 3, 2013 at 4:32 AM, Shuwn Yuan Tee <shuwnyuan@binary.com> wrote:
> We recently experienced crash on out postgres production server. Here's our
> server environment:
>
> - Postgres 9.3
> - in OpenVZ container
> - total memory: 64GB
>
>
> Here's the error snippet from postgres log:
>
> ERROR: could not read block 356121 in file "base/33134/33598.2": Bad
> address
> LOG: server process (PID 21119) was terminated by signal 7: Bus error
>
> WARNING: terminating connection because of crash of another server process
> DETAIL: The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> LOG: all server processes terminated; reinitializing
> LOG: database system was interrupted; last known up at 2013-12-03 08:47:06
> UTC
> LOG: database system was not properly shut down; automatic recovery in
> progress
> UTC FATAL: the database system is in recovery mode
>
> LOG: checkpoint complete: wrote 10499 buffers (0.7%); 0 transaction log
> file(s) added, 0 removed, 4 recycled; write=0.215 s, sync=11.405 s,
> total=11.631
> FATAL: the database system is in recovery mode
> LOG: database system is ready to accept connections
>
>
> Can anyone suggests whether this is critical error? Does it indicate any
> data corruption in postgres?
>
> Although we think this is unlikely related, but this is what we did few
> hours before the crash:
>
> (1) Try to improve query performance by tweaking this:
> a) shared_buffer: 8GB -> 16GB
> b) effective_cache_size: 16GB -> 32GB
> c) random_page_cost: 4 -> 2
> d) restart postgres
>
> (2) Due to no obvious improvement in performace, change the setting in (1)
> back to before & restart
>
>
> Thanks if anyone has any insight
This seems to be bug in OpenVZ. It appears not to like high shared
buffer settings.
merlin