Re: Logical decoding of sequence advances, part II - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Logical decoding of sequence advances, part II
Date
Msg-id CAMsr+YF76MHV5t_2YvW3z95YtR8nOVvnzKWmG8mTwWYWFJSAUA@mail.gmail.com
Whole thread Raw
In response to Re: Logical decoding of sequence advances, part II  (Kevin Grittner <kgrittn@gmail.com>)
Responses Re: Logical decoding of sequence advances, part II
List pgsql-hackers
<p dir="ltr"><p dir="ltr">On 23 Aug 2016 05:43, "Kevin Grittner" <<a
href="mailto:kgrittn@gmail.com">kgrittn@gmail.com</a>>wrote:<br /> ><br /> > On Mon, Aug 22, 2016 at 3:29 PM,
RobertHaas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>> wrote:<br /> ><br /> > >
itseems to me that<br /> > > this is just one facet of a much more general problem: given two<br /> > >
transactionsT1 and T2, the order of replay must match the order of<br /> > > commit unless you can prove that
thereare no dependencies between<br /> > > them.  I don't see why it matters whether the operations are
sequence<br/> > > operations or data operations; it's just a question of whether they're<br /> > >
modifyingthe same "stuff".<p dir="ltr">It matters because sequence operations aren't transactional in pg. Except when
theyare - operations on a newly CREATEd sequence or one where we did a TRUNCATE ...RESTART IDENTITY. <p dir="ltr">But
wedon't store the xid of the xact associated with a transactional sequence update along with the sequence update
anywhere.We just rely on nk other xact knowing to look at the sequence relfilenode we're changing. Doesn't work so well
inlogical rep.<p dir="ltr">We also don't store knowledge of whether or not the sequence advance is transactional. Again
importantbecause for two xacts t1 and t2:<p dir="ltr">* Sequence last value is 50<p dir="ltr">* T1 calls nextval. Needs
anew chunk because all cached values have been used. Writes sequence wal advancing seq last_value to 100, returns 51.<p
dir="ltr">*T2 calls nextval, gets cached value 52.<p dir="ltr">* T2 commits<p dir="ltr">* Master crashes and we fail
overto replica.<p dir="ltr">This is fine for physical rep. We replay the sequence advance and all is well.<p
dir="ltr">Butfor logical rep the sequence can't be treated as part of t1. If t1 rolls back or we fail over before
replyingit we might return value 52 from nextval even though we replayed and committed t2 that used value 52. Oops.<p
dir="ltr">Howeverif some xact t3 creates a sequence we can't replay updates to it until the sequence relation is
committed.And it's even more fun with TRUNCATE ... RESTART IDENTITY where we need rollback behaviour too.<p
dir="ltr">Makesense? It's hard because sequences are sometimes but not always exrmpt from transactional behaviour and
pgdoesn't record when, since it can rely on physical wal redo order and can apply sequence advances before the sequence
relationis committed yet.<p dir="ltr">><br /> > The commit order is the simplest and safest *unless* there is
a<br/> > read-write anti-dependency a/k/a read-write dependency a/k/a<br /> > rw-conflict: where a read from one
transactionsees the "before"<br /> > version of data modified by the other transaction.  In such a case<br /> >
itis necessary for correct serializable transaction behavior for<br /> > the transaction that read the "before"
imageto be replayed before<br /> > the write it didn't see, regardless of commit order.  If you're not<br /> >
tryingto avoid serialization anomalies, it is less clear to me<br /> > what is best.<p dir="ltr">Could you provide
anexample of a case where xacts replayed in commit order will produce incorrect results? <p dir="ltr">Remember that we
aren'tdoing statement based replication in pg logical decoding/replication. We don't care how a row got changed, only
thatwe make consistent transitions from before state to after state to for each transaction, such that the data
committedand visible on the master is visible on the standby and no uncommitted or not yet visible data on the master
iscommitted/visible on the replica. The replica should have visible committed data matching the master as it was when
itoriginally executed the xact we most recently replayed. <p dir="ltr">No locking is decoded or replayed. It is not
expectedthat a normal non replication client executing some other concurrent xact will have the same effect if run on
standbyas on master.<p dir="ltr">It's replication not tightly coupled clustering. If/when we have things like parallel
decodingand replay of  concurrent xacts then issues like the dependencies you mention will start to become a concern.
Weare a long way from there.<p dir="ltr">For sequences the requirement IMO is that the sequence advances on the replica
toor past the position it was at on the master when the first xact that saw those sequence values committed. We should
neversee the sequence 'behind' such that calling nextval on the replica can produce a value already seen and stored by
somecommitted xact on the replica. Being a bit ahead is ok, much like pg discards sequence values on crash.<p
dir="ltr">That'snot that hard. The problems arise when the sequence it's self isn't committed yet, per above.<br /> 

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: UTF-8 docs?
Next
From: Thomas Munro
Date:
Subject: Re: Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().