Re: logical decoding and replication of sequences - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding and replication of sequences
Date
Msg-id 93af17c3-a78d-4918-5f6b-76dfeb2d48bd@enterprisedb.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: logical decoding and replication of sequences
List pgsql-hackers

On 3/7/22 17:53, Tomas Vondra wrote:
> On 2/28/22 12:46, Amit Kapila wrote:
>> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>>>
>>> On 2/10/22 19:17, Tomas Vondra wrote:
>>>> I've polished & pushed the first part adding sequence decoding
>>>> infrastructure etc. Attached are the two remaining parts.
>>>>
>>>> I plan to wait a day or two and then push the test_decoding part. The
>>>> last part (for built-in replication) will need more work and maybe
>>>> rethinking the grammar etc.
>>>>
>>>
>>> I've pushed the second part, adding sequences to test_decoding.
>>>
>>
>> The test_decoding is failing randomly in the last few days. I am not
>> completely sure but they might be related to this work. The two of
>> these appears to be due to the same reason:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07
>>
>> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
>> "reorderbuffer.c", Line: 1173, PID: 35013)
>> 0   postgres                            0x00593de0 ExceptionalCondition + 160\\0
>>
> 
> This might be related to the issue reported by Amit, i.e. that
> sequence_decode does not call ReorderBufferProcessXid(). If this keeps
> failing, we'll have to add some extra debug info (logging LSN etc.), at
> least temporarily. It'd be valuable to inspect the WAL too.
> 
>> Another:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2022-02-16%2006%3A21%3A48
>>
>> --- /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/expected/rewrite.out
>> 2022-02-14 20:19:14.000000000 +0000
>> +++ /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/results/rewrite.out
>> 2022-02-16 07:42:18.000000000 +0000
>> @@ -126,6 +126,7 @@
>>    table public.replication_example: INSERT: id[integer]:4
>> somedata[integer]:3 text[character varying]:null
>> testcolumn1[integer]:null
>>    table public.replication_example: INSERT: id[integer]:5
>> somedata[integer]:4 text[character varying]:null
>> testcolumn1[integer]:2 testcolumn2[integer]:1
>>    COMMIT
>> +  sequence public.replication_example_id_seq: transactional:0
>> last_value: 38 log_cnt: 0 is_called:1
>>    BEGIN
>>    table public.replication_example: INSERT: id[integer]:6
>> somedata[integer]:5 text[character varying]:null
>> testcolumn1[integer]:3 testcolumn2[integer]:null
>>    COMMIT
>> @@ -133,7 +134,7 @@
>>    table public.replication_example: INSERT: id[integer]:7
>> somedata[integer]:6 text[character varying]:null
>> testcolumn1[integer]:4 testcolumn2[integer]:null
>>    table public.replication_example: INSERT: id[integer]:8
>> somedata[integer]:7 text[character varying]:null
>> testcolumn1[integer]:5 testcolumn2[integer]:null
>> testcolumn3[integer]:1
>>    COMMIT
>> - (15 rows)
>> + (16 rows)
>>
> 
> Interesting. I can think of one reason that might cause this - we log
> the first sequence increment after a checkpoint. So if a checkpoint
> happens in an unfortunate place, there'll be an extra WAL record. On
> slow / busy machines that's quite possible, I guess.
> 

I've tweaked the checkpoint_interval to make checkpoints more aggressive
(set it to 1s), and it seems my hunch was correct - it produces failures
exactly like this one. The best fix probably is to just disable decoding
of sequences in those tests that are not aimed at testing sequence decoding.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: refactoring basebackup.c
Next
From: "David G. Johnston"
Date:
Subject: Re: role self-revocation