Re: [HACKERS] snapbuild woes - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] snapbuild woes
Date
Msg-id 20161212223338.bmhswrdaipttvcve@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] snapbuild woes  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
List pgsql-hackers
On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
> On 12/12/16 22:42, Andres Freund wrote:
> > Hi,
> > 
> > On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
> >> Hi,
> >> First one is outright bug, which has to do with how we track running
> >> transactions. What snapbuild basically does while doing initial snapshot
> >> is read the xl_running_xacts record, store the list of running txes and
> >> then wait until they all finish. The problem with this is that
> >> xl_running_xacts does not ensure that it only logs transactions that are
> >> actually still running (to avoid locking PGPROC) so there might be xids
> >> in xl_running_xacts that already committed before it was logged.
> > 
> > I don't think that's actually true?  Notice how LogStandbySnapshot()
> > only releases the lock *after* the LogCurrentRunningXacts() iff
> > wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
> > observed must actually be a bit more complex :(
> > 
> 
> Hmm, interesting, I did see the transaction commit in the WAL before the
> xl_running_xacts that contained the xid as running. I only seen it on
> production system though, didn't really manage to easily reproduce it
> locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does.  So they're still running in the PGPROC
sense, just not the crash-recovery sense...

Andres



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: [HACKERS] snapbuild woes
Next
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] PATCH: two slab-like memory allocators