Hi
Jacob Bunk Nielsen <jacob@bunk.cc> writes:
> We have a PostgreSQL 9.3.4 running in an LXC container on Debian
> Wheezy on a Linux 3.10.43 kernel on a Dell R620 server. Data are
> stored on a XFS file system. We are seeing problems such as:
>
> unexpected data beyond EOF in block 2 of relation base/805208133/1238511128
>
> and
>
> could not read block 5 in file "base/805208348/1259338118": read only
> 0 of 8192 bytes
We use streaming replication to a different server on different
hardware. That server had been up for 300+ days and just had an incident
of:
LOG: consistent recovery state reached at 226/E7DE1680
WARNING: page 0 of relation base/805208133/1274861078 does not exist
CONTEXT: xlog redo insert: rel 1663/805208133/1274861078; tid 0/1
PANIC: WAL contains references to invalid pages
LOG: database system is ready to accept read only connections
CONTEXT: xlog redo insert: rel 1663/805208133/1274861078; tid 0/1
LOG: startup process (PID 2308) was terminated by signal 6: Aborted
LOG: terminating any other active server processes
We've rebooted that server now and restarted the replication. We'll see
how it goes in a few hours.
I'm still very interested in hearing any hints you guys may have to how
I should debug these problems.
> I've tried writing a program to simulate a workload that resembles the
> workload on the problematic tables, but I can't get that to fail. So
> what should be my next step in debugging this?
That program has been running for 24+ hours now, and everything just
works as expected, so still no luck in reproducing this problem.
Best regards
Jacob
P.S. Sorry about the double post with different subject - my initial
post was held up for several hours due to putting "Help" in the subject,
so I thought I had been discarded by a list admin.