James Sebastian <james.sebastian@gmail.com> writes: > On 1 August 2015 at 19:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, if it took that long to recover, you might have raised the >> checkpoint interval settings too high.
> I am using the following parameters
> checkpoint_segments = 10 (from OS default 3) > checkpoint_completion_target = 0.8 (from OS default 0.5) > archive_mode=on > archive_timeout=600
[ scratches head... ] It should certainly not have taken very long to replay 10 WAL segments worth of data. I surmise that the problems you were having before the shutdown were worse than you thought, ie checkpoints were failing to complete, probably due to a persistent I/O error, so that there was a whole lot more than normal to replay after the last successful checkpoint. Is there any evidence of such distress in the postmaster log?
[....sigh....Thanks.....]
We had very slow application performance and many hanging threads as per pgadmin -> server status Also logs had the following which also indicating probably high I/O (as per google search results)
2015-07-30 10:10:21 IST WARNING: pgstat wait timeout 2015-07-30 10:12:21 IST WARNING: pgstat wait timeout
I got hardware analysed and I am sure there was no disc problems as per them.
Load on application was usual...and that brings me to this email list
Thanks for all the help so far. I am learning much and becoming little more comfortable with dealing with postgres administration from pure os admin background.