On Thursday, June 07, 2012 03:58:24 PM Andres Freund wrote:
> Hi,
>
> On Thursday, June 07, 2012 12:44:08 PM Valentine Gogichashvili wrote:
> > I have the situation again, one of 3 slaves was slow to play all the WAL
> > files and being about 10GB late it crashed with the same error again.
> >
> > I collected DEBUG4 output in this time:
> > https://docs.google.com/open?id=0B2NMMrfiBQcLZjNDbU0xQ3lvWms
>
> Ok, I stared at this some time and I think I see what the problem is. Some
> log excerpts that lead my reasoning:
> ...
> after that we start adding all currently running xids from the snapshot to
> the KnownAssigned machinery. They are already recorded though, so we fail
> in KnownAssignedXidsAdd with the OPs error.
>
> The simplest fix for that seems to be to simply reset the KnownAssignedXids
> state in the above branch. Any arguments against that?
A patch implementing that is attached. Unfortunately not really tested yet
because its kinda hard to hit that code-path.
Valentine, can you test that patch?
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services