On Mon, Sep 22, 2014 at 6:50 PM, Stephen Frost <sfrost@snowman.net> wrote:
Pavan, * Pavan Deolasee (pavan.deolasee@gmail.com) wrote: > On Thu, Sep 18, 2014 at 12:02 PM, Pavan Deolasee <pavan.deolasee@gmail.com> > wrote: > > While running some tests on REL9_2_STABLE branch, I saw an assertion > > failure in syncrep.c. The stack trace looks like this: > > > Any comments on this? I see it very regularly during my pgbench tests.
I agree that it looks like there may be a race condition there- but could you provide the test cases you're working with? What kind of behavior is it that's making it show up easily for you?
Nothing special really. Set up a 2-node sync replication on my Mac laptop and running pgbench with 10 clients triggers it. As I said, after looking at the code and realising that there is a race condition, I tried with with gdb to reproduce the race I suspect.
Anyways, the attached patch should trigger the race condition for a simple query. I'm deliberately making backend to wait to give walsender a chance to send outstanding WALs and then making walsender to wait so that assertion is triggered in the backend.