Here is the full process list at the time it stopped working (I have changed the actual username, db and IP for security). Would the idle in transaction process be the culprit?
postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54 postgres: startup process recovering 000000010000053D0000003F waiting
postgres 5642 0.0 21.4 3428356 2613252 ? Ss Aug14 0:30 postgres: writer process
postgres 5659 0.0 0.0 177524 788 ? Ss Aug14 0:03 postgres: stats collector process
postgres 7159 1.2 0.1 3451360 18352 ? Ss Aug14 17:31 postgres: wal receiver process streaming 549/216B3730
postgres 10403 0.0 0.2 3430372 25920 ? Ss Aug14 0:31 postgres: user db x.x.x.x(61656) idle in transaction
On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman <rexxe98@gmail.com> wrote:
> Hello, > > I'm having an issue where streaming replication just randomly stops working. > I haven't been able to find anything in the logs which point to an issue, > but the Postgres process shows a "waiting" status on the slave: > > postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54 postgres: > startup process recovering 000000010000053D0000003F waiting
There is a recovery conflict which it is waiting to go away. In other words, you have a long-running (or long-idle) transaction on the slave which is blocking recovery.