Home > mailing lists

Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: logical decoding and replication of sequences, take 2
Date	November 27, 2023 13:41:40
Msg-id	8d0d4b34-3e12-2637-e8b8-f4c3b2b0a1f3@enterprisedb.com Whole thread Raw
In response to	Re: logical decoding and replication of sequences, take 2 (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view


On 11/27/23 12:11, Amit Kapila wrote:
> On Mon, Nov 27, 2023 at 4:17 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 11/27/23 11:13, Amit Kapila wrote:
>>> On Mon, Nov 27, 2023 at 11:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>
>>>> On Mon, Nov 27, 2023 at 6:41 AM Tomas Vondra
>>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>>
>>>>> While going over 0001, I realized there might be an optimization for
>>>>> ReorderBufferSequenceIsTransactional. As coded in 0001, it always
>>>>> searches through all top-level transactions, and if there's many of them
>>>>> that might be expensive, even if very few of them have any relfilenodes
>>>>> in the hash table. It's still linear search, and it needs to happen for
>>>>> each sequence change.
>>>>>
>>>>> But can the relfilenode even be in some other top-level transaction? How
>>>>> could it be - our transaction would not see it, and wouldn't be able to
>>>>> generate the sequence change. So we should be able to simply check *our*
>>>>> transaction (or if it's a subxact, the top-level transaction). Either
>>>>> it's there (and it's transactional change), or not (and then it's
>>>>> non-transactional change).
>>>>>
>>>>
>>>> I also think the relfilenode should be part of either the current
>>>> top-level xact or one of its subxact, so looking at all the top-level
>>>> transactions for each change doesn't seem advisable.
>>>>
>>>>> The 0004 does this.
>>>>>
>>>>> This of course hinges on when exactly the transactions get created, and
>>>>> assignments processed. For example if this would fire before the txn
>>>>> gets assigned to the top-level one, this would break. I don't think this
>>>>> can happen thanks to the immediate logging of assignments, but I'm too
>>>>> tired to think about it now.
>>>>>
>>>>
>>>> This needs some thought because I think we can't guarantee the
>>>> association till we reach the point where we can actually decode the
>>>> xact. See comments in AssertTXNLsnOrder() [1].
>>>>
>>
>> I suppose you mean the comment before the SnapBuildXactNeedsSkip call,
>> which says:
>>
>>   /*
>>    * Skip the verification if we don't reach the LSN at which we start
>>    * decoding the contents of transactions yet because until we reach
>>    * the LSN, we could have transactions that don't have the association
>>    * between the top-level transaction and subtransaction yet and
>>    * consequently have the same LSN.  We don't guarantee this
>>    * association until we try to decode the actual contents of
>>    * transaction. The ordering of the records prior to the
>>    * start_decoding_at LSN should have been checked before the restart.
>>    */
>>
>> But doesn't this say that after we actually start decoding / stop
>> skipping, we should have seen the assignment? We're already decoding
>> transaction contents (because sequence change *is* part of xact, even if
>> we decide to replay it in the non-transactional way).
>>
> 
> It means to say that the assignment is decided after start_decoding_at
> point. We haven't decided that we are past start_decoding_at by the
> time the patch is computing the transactional flag.
> 

Ah, I see. We're deciding if the change is transactional before calling
SnapBuildXactNeedsSkip. That's a bit unfortunate.

>>>
>>> I am wondering that instead of building the infrastructure to know
>>> whether a particular change is transactional on the decoding side,
>>> can't we have some flag in the WAL record to note whether the change
>>> is transactional or not? I have discussed this point with my colleague
>>> Kuroda-San and we thought that it may be worth exploring whether we
>>> can use rd_createSubid/rd_newRelfilelocatorSubid in RelationData to
>>> determine if the sequence is created/changed in the current
>>> subtransaction and then record that in WAL record. By this, we need to
>>> have additional information in the WAL record like XLOG_SEQ_LOG but we
>>> can probably do it only with wal_level as logical.
>>>
>>
>> I may not understand the proposal exactly, but it's not enough to know
>> if it was created in the same subxact. It might have been created in
>> some earlier subxact in the same top-level xact.
>>
> 
> We should be able to detect even some earlier subxact or top-level
> xact based on rd_createSubid/rd_newRelfilelocatorSubid.
> 

Interesting. I admit I haven't considered using these fields before, so
I need to familiarize with it a bit, and try if it'd work.

>> FWIW I think one of the earlier patch versions did something like this,
>> by adding a "created" flag in the xlog record. And we concluded doing
>> this on the decoding side is a better solution.
>>
> 
> oh, I thought it would be much simpler than what we are doing on the
> decoding-side. Can you please point me to the email discussion where
> this is concluded or share the reason?
> 

I think the discussion started around [1], and then in a bunch of
following messages (search for "relfilenode").

regards


[1]
https://www.postgresql.org/message-id/CAExHW5v_vVqkhF4ehST9EzpX1L3bemD1S%2BkTk_-ZVu_ir-nKDw%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 27 November 2023, 13:40:09
Subject: Re: Testing autovacuum wraparound (including failsafe)

From: Tomas Vondra
Date: 27 November 2023, 13:54:56
Subject: Re: logical decoding and replication of sequences, take 2

Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers

Previous

Next