Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding and replication of sequences, take 2
Date
Msg-id 652f6e81-f71b-b760-b1bb-ab078b8dbe53@enterprisedb.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences, take 2  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical decoding and replication of sequences, take 2
List pgsql-hackers

On 11/27/23 11:13, Amit Kapila wrote:
> On Mon, Nov 27, 2023 at 11:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Mon, Nov 27, 2023 at 6:41 AM Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>>>
>>> While going over 0001, I realized there might be an optimization for
>>> ReorderBufferSequenceIsTransactional. As coded in 0001, it always
>>> searches through all top-level transactions, and if there's many of them
>>> that might be expensive, even if very few of them have any relfilenodes
>>> in the hash table. It's still linear search, and it needs to happen for
>>> each sequence change.
>>>
>>> But can the relfilenode even be in some other top-level transaction? How
>>> could it be - our transaction would not see it, and wouldn't be able to
>>> generate the sequence change. So we should be able to simply check *our*
>>> transaction (or if it's a subxact, the top-level transaction). Either
>>> it's there (and it's transactional change), or not (and then it's
>>> non-transactional change).
>>>
>>
>> I also think the relfilenode should be part of either the current
>> top-level xact or one of its subxact, so looking at all the top-level
>> transactions for each change doesn't seem advisable.
>>
>>> The 0004 does this.
>>>
>>> This of course hinges on when exactly the transactions get created, and
>>> assignments processed. For example if this would fire before the txn
>>> gets assigned to the top-level one, this would break. I don't think this
>>> can happen thanks to the immediate logging of assignments, but I'm too
>>> tired to think about it now.
>>>
>>
>> This needs some thought because I think we can't guarantee the
>> association till we reach the point where we can actually decode the
>> xact. See comments in AssertTXNLsnOrder() [1].
>>

I suppose you mean the comment before the SnapBuildXactNeedsSkip call,
which says:

  /*
   * Skip the verification if we don't reach the LSN at which we start
   * decoding the contents of transactions yet because until we reach
   * the LSN, we could have transactions that don't have the association
   * between the top-level transaction and subtransaction yet and
   * consequently have the same LSN.  We don't guarantee this
   * association until we try to decode the actual contents of
   * transaction. The ordering of the records prior to the
   * start_decoding_at LSN should have been checked before the restart.
   */

But doesn't this say that after we actually start decoding / stop
skipping, we should have seen the assignment? We're already decoding
transaction contents (because sequence change *is* part of xact, even if
we decide to replay it in the non-transactional way).

> 
> I am wondering that instead of building the infrastructure to know
> whether a particular change is transactional on the decoding side,
> can't we have some flag in the WAL record to note whether the change
> is transactional or not? I have discussed this point with my colleague
> Kuroda-San and we thought that it may be worth exploring whether we
> can use rd_createSubid/rd_newRelfilelocatorSubid in RelationData to
> determine if the sequence is created/changed in the current
> subtransaction and then record that in WAL record. By this, we need to
> have additional information in the WAL record like XLOG_SEQ_LOG but we
> can probably do it only with wal_level as logical.
> 

I may not understand the proposal exactly, but it's not enough to know
if it was created in the same subxact. It might have been created in
some earlier subxact in the same top-level xact.

FWIW I think one of the earlier patch versions did something like this,
by adding a "created" flag in the xlog record. And we concluded doing
this on the decoding side is a better solution.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: remaining sql/json patches
Next
From: Ajin Cherian
Date:
Subject: Re: PATCH: Add REINDEX tag to event triggers