Re: logical decoding and replication of sequences - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding and replication of sequences
Date
Msg-id b775bee1-b6c8-098e-9c78-5a5f7ec9abdd@enterprisedb.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: logical decoding and replication of sequences  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 3/7/22 22:11, Tomas Vondra wrote:
> 
> 
> On 3/7/22 17:39, Tomas Vondra wrote:
>>
>>
>> On 3/1/22 12:53, Amit Kapila wrote:
>>> On Mon, Feb 28, 2022 at 5:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>
>>>> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>>
>>>>> On 2/10/22 19:17, Tomas Vondra wrote:
>>>>>> I've polished & pushed the first part adding sequence decoding
>>>>>> infrastructure etc. Attached are the two remaining parts.
>>>>>>
>>>>>> I plan to wait a day or two and then push the test_decoding part. The
>>>>>> last part (for built-in replication) will need more work and maybe
>>>>>> rethinking the grammar etc.
>>>>>>
>>>>>
>>>>> I've pushed the second part, adding sequences to test_decoding.
>>>>>
>>>>
>>>> The test_decoding is failing randomly in the last few days. I am not
>>>> completely sure but they might be related to this work. The two of
>>>> these appears to be due to the same reason:
>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07
>>>>
>>>> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
>>>> "reorderbuffer.c", Line: 1173, PID: 35013)
>>>> 0   postgres                            0x00593de0 ExceptionalCondition + 160\\0
>>>>
>>>
>>> While reviewing the code for this, I noticed that in
>>> sequence_decode(), we don't call ReorderBufferProcessXid to register
>>> the first known lsn in WAL for the current xid. The similar functions
>>> logicalmsg_decode() or heap_decode() do call ReorderBufferProcessXid
>>> even if they decide not to queue or send the change. Is there a reason
>>> for not doing the same here? However, I am not able to deduce any
>>> scenario where lack of this will lead to such an Assertion failure.
>>> Any thoughts?
>>>
>>
>> Thanks, that seems like an omission. Will fix.
>>
> 
> I've pushed this simple fix. Not sure it'll fix the assert failures on
> skink/locust, though. Given the lack of information it'll be difficult
> to verify. So let's wait a bit.
> 

I've done about 5000 runs of 'make check' in test_decoding, on two rpi
machines (one armv7, one aarch64). Not a single assert failure :-(

How come skink/locust hit that in just a couple runs?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Next
From: "David G. Johnston"
Date:
Subject: Re: Naming of the different stats systems / "stats collector"