Re: Next steps in debugging database storage problems? - Mailing list pgsql-general

From Jacob Bunk Nielsen
Subject Re: Next steps in debugging database storage problems?
Date
Msg-id spamdrop+87sikyb4ic.fsf@atom.bunk.cc
Whole thread Raw
In response to Next steps in debugging database storage problems?  (Jacob Bunk Nielsen <jacob@bunk.cc>)
Responses Re: Next steps in debugging database storage problems?  (Terry Schmitt <terry.schmitt@gmail.com>)
Re: Next steps in debugging database storage problems?  (Jacob Bunk Nielsen <jacob@bunk.cc>)
List pgsql-general
Hi

On the 1st of July 2014 Jacob Bunk Nielsen <jacob@bunk.cc> wrote:

> We have a PostgreSQL 9.3.4 running in an LXC container on Debian
> Wheezy on a Linux 3.10.43 kernel on a Dell R620 server. Data are
> stored on a XFS file system. We are seeing problems such as:
>
> unexpected data beyond EOF in block 2 of relation base/805208133/1238511128
>
> and
>
> could not read block 5 in file "base/805208348/1259338118": read only
> 0 of 8192 bytes
>
> This seems to occur every few days after the server has been up for
> 30-40 days. If we reboot the server it'll be another 30-40 days before
> we see any problems again.
>
> The server has been running fine on a Dell R710 for a long time, and was
> upgraded to a Dell R620 last year, when the problems started. We have
> tried switching to a different Dell R620, but that did not make a
> difference. We've seen this with kernels 3.2, 3.4 and 3.10.

This time it took 45 days before this happened:

LOG:  unexpected EOF on standby connection
ERROR:  unexpected data beyond EOF in block 140 of relation base/805208885/805209852
HINT:  This has been seen to occur with buggy kernels; consider updating your system.

It always happens with small tables with lots of inserts and deletes.
From previous experience we know that it's now going to happen again in
a few days, so we'll probably try to schedule a reboot to give us
another 30-40 days.

Is anyone else seeing problems with PostgreSQL on XFS filesystems?

Any hints on how to debug what goes wrong here would be still be greatly
appreciated.

> We have multiple other PostgreSQL servers running in a similar setup
> without causing any problems, but this server is probably the busiest of
> our PostgreSQL servers.

This is still the case.

Best regards

Jacob



pgsql-general by date:

Previous
From: Joseph Kregloh
Date:
Subject: Re: Best practices for cloning DB servers
Next
From: "FarjadFarid\(ChkNet\)"
Date:
Subject: list of index