Re: Assertion failure when streaming logical changes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Assertion failure when streaming logical changes
Date
Msg-id 54DAA7A8.5000705@vmware.com
Whole thread Raw
In response to Re: Assertion failure when streaming logical changes  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Assertion failure when streaming logical changes  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On 02/10/2015 02:46 PM, Andres Freund wrote:
> On 2015-02-10 21:20:37 +0900, Michael Paquier wrote:
>> Using test_decoding on HEAD (cc761b1) I am seeing the following assertion
>> failure:
>> TRAP: FailedAssertion("!(!((&RegisteredSnapshots)->ph_root ==
>> ((void*)0)))", File: "snapmgr.c", Line: 677)
>
> Ick. I guess a revert of 94028691609f8e148bd4ce72c46163f018832a5b fixes
> it?
> ...
>
> Yea, it really looks like the above commit is to "blame". The new xmin
> tracking infrastructure doesn't know about the historical snapshot...

The new xmin tracking code assumes that if a snapshots's regd_count > 0,
it has already been pushed to the RegisteredSnapshots heap. That
assumption doesn't hold because the logical decoding stuff creates its
own snapshots and sets regd_count=1 to prevent snapmgr.c from freeing
them, even though they haven't been registered with RegisterSnapshot.

The second paragraph in this comment in snapmgr.c needs fixing:

>  * Likewise, any snapshots that have been exported by pg_export_snapshot
>  * have regd_count = 1 and are counted in RegisteredSnapshots, but are not
>  * tracked by any resource owner.
>  *
>  * The same is true for historic snapshots used during logical decoding,
>  * their lifetime is managed separately (as they life longer as one xact.c
>  * transaction).

Besides the spelling, that's incorrect: historic snapshots are *not*
counted in RegisteredSnapshots. That was harmless before commit
94028691, but it was always wrong.


I think setting regd_count=1 outside snapmgr.c is a pretty ugly hack.
snapbuild.c also abuses active_count as a reference counter, which is
similarly ugly. I think regd_count and active_count should both be
treated as private to snapmgr.c, and initialized to zero elsewhere.

As a minimal fix, we could change the logical decoding code to not use
regd_count to prevent snapmgr.c from freeing its snapshots, but use
active_count for that too. Setting regd_count to 1 in
SnapBuildBuildSnapshot() seems unnecessary in the first place, as the
callers also call SnapBuildSnapIncRefcount(). Patch attached.

But could we get rid of those active_count manipulations too? Could you
replace the SnapBuildSnap[Inc|Dec]Refcount calls with
[Un]RegisterSnapshot()?

- Heikki


Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: reducing our reliance on MD5
Next
From: Arthur Silva
Date:
Subject: Re: reducing our reliance on MD5