Logical decoding of sequence advances, part II - Mailing list pgsql-hackers

From Craig Ringer
Subject Logical decoding of sequence advances, part II
Date
Msg-id CAMsr+YFtoGcLfs4z3sjMX1yLxWE37MJcJ+NkRqPwaJjr_LE-Xg@mail.gmail.com
Whole thread Raw
Responses Re: Logical decoding of sequence advances, part II  (Craig Ringer <craig@2ndquadrant.com>)
Re: Logical decoding of sequence advances, part II  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi all

My earlier efforts at logical decoding of sequence advances were too simplistic[1], falling afoul of issues with sequences being both transactional and not transactional depending on whether the sequence is created in the current xact or not.

TL;DR of solution:

* extend SeqTableData and xl_seq_rec with xid
* set xid in SeqTableData during DefineSequence()
* if SeqTableData set, write it in xl_seq_rec when writing xlog
* assign sequence update to specified xid's reorder buffer in decoding if xid set, otherwise decode immediately


The problem:

If the sequence is created in the current xact (i.e. uncommitted) we have to add the sequence updates to that xact to be replayed only if it commits. The sequence is visible only to the toplevel xact that created the sequence so advances of it can only come from that xact and its children. The actual CREATE SEQUENCE is presumed to be handled separately by an event trigger or similar.

If the new sequence is committed we must replay sequence advances immediately and non-transactionally to ensure they're not lost due to xact rollback or replayed in the wrong order due to xact commit order.

If the sequence is ALTERed in a way that changes pg_class that's event triggers' job and sequence decoding doesn't care. If it's ALTERed in a way that changes Form_pg_sequence we replay the change immediately, using the last committed snapshot to get the sequence details, so the change will take immediate effect and is retained whether or not any pg_class changes are committed. This reflects how it happens on the upstream.


Planned solution:

Extend xl_seq_rec with a created_in_xid TransactionId field. If created_in_xid != InvalidTransactionId, logical decoding associates the sequence advance with the given toplevel xact and adds it to the reorder buffer instead of immediately invoking the sequence decoding callback. The decoding callback then gets invoked during ReorderBufferCommit processing at the appropriate time, like any other transactional change.

To determine whether to log an xid for the sequence advance we need some backend local state to determine whether the sequence is new in this xact. Handily we already have one, the seqhashtab of SeqTableData in sequence.c, just where it's needed. So all that's needed is to add a TransactionId field that we set if we created that sequence in this session. If it's set we test it for TransactionIsInProgress() when xlog'ing a sequence advance; if it is, log that xid. If not in progress, clear the xid in SeqTableData entry so we don't check again.



Another approach would be, during decoding, to look up the relfilenode of the sequence to get the sequence oid and do a pg_class lookup. Check to see whether xmin is part of an in-progress xact. If so, add the sequence advance to that xact's reorder buffer, otherwise decode it immediately. The problem is that

(a) I think we lack relfilenode-to-oid mapping information at decoding-time. RelidByRelfilenode() needs a snapshot and is invoked during ReorderBufferCommit(). We have make the transactional vs nontransactional decision in LogicalDecodingProcessRecord() when I'm pretty sure we don't have a snapshot.

(b) It also has issues with ALTER TRANSACTION. We must replay decoded xact updates immediately even if some in-flight xact has modified the pg_class entry for the sequence. So we can't just check whether the xmin is one of our xact's (sub)xids, we must also check whether some older tuple for the same sequence oid has a corresponding xmax and keep walking backwards until we determine whether we originally CREATEd the sequence in this xact or only ALTERed it.


So yeah. I think extending SeqTableData and xl_seq_rec with xid is the way to go. Objections?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: replication slots replicated to standbys?
Next
From: Craig Ringer
Date:
Subject: Re: replication slots replicated to standbys?