Re: logical decoding and replication of sequences - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding and replication of sequences
Date
Msg-id 5bd18c76-c38b-86e1-78d3-73c44ac59a7f@enterprisedb.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical decoding and replication of sequences  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On 2/28/22 12:46, Amit Kapila wrote:
> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 2/10/22 19:17, Tomas Vondra wrote:
>>> I've polished & pushed the first part adding sequence decoding
>>> infrastructure etc. Attached are the two remaining parts.
>>>
>>> I plan to wait a day or two and then push the test_decoding part. The
>>> last part (for built-in replication) will need more work and maybe
>>> rethinking the grammar etc.
>>>
>>
>> I've pushed the second part, adding sequences to test_decoding.
>>
> 
> The test_decoding is failing randomly in the last few days. I am not
> completely sure but they might be related to this work. The two of
> these appears to be due to the same reason:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07
> 
> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
> "reorderbuffer.c", Line: 1173, PID: 35013)
> 0   postgres                            0x00593de0 ExceptionalCondition + 160\\0
> 

This might be related to the issue reported by Amit, i.e. that
sequence_decode does not call ReorderBufferProcessXid(). If this keeps
failing, we'll have to add some extra debug info (logging LSN etc.), at
least temporarily. It'd be valuable to inspect the WAL too.

> Another:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2022-02-16%2006%3A21%3A48
> 
> --- /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/expected/rewrite.out
> 2022-02-14 20:19:14.000000000 +0000
> +++ /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/results/rewrite.out
> 2022-02-16 07:42:18.000000000 +0000
> @@ -126,6 +126,7 @@
>    table public.replication_example: INSERT: id[integer]:4
> somedata[integer]:3 text[character varying]:null
> testcolumn1[integer]:null
>    table public.replication_example: INSERT: id[integer]:5
> somedata[integer]:4 text[character varying]:null
> testcolumn1[integer]:2 testcolumn2[integer]:1
>    COMMIT
> +  sequence public.replication_example_id_seq: transactional:0
> last_value: 38 log_cnt: 0 is_called:1
>    BEGIN
>    table public.replication_example: INSERT: id[integer]:6
> somedata[integer]:5 text[character varying]:null
> testcolumn1[integer]:3 testcolumn2[integer]:null
>    COMMIT
> @@ -133,7 +134,7 @@
>    table public.replication_example: INSERT: id[integer]:7
> somedata[integer]:6 text[character varying]:null
> testcolumn1[integer]:4 testcolumn2[integer]:null
>    table public.replication_example: INSERT: id[integer]:8
> somedata[integer]:7 text[character varying]:null
> testcolumn1[integer]:5 testcolumn2[integer]:null
> testcolumn3[integer]:1
>    COMMIT
> - (15 rows)
> + (16 rows)
> 

Interesting. I can think of one reason that might cause this - we log
the first sequence increment after a checkpoint. So if a checkpoint
happens in an unfortunate place, there'll be an extra WAL record. On
slow / busy machines that's quite possible, I guess.

I wonder if these two issues might be related ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences
Next
From: "David G. Johnston"
Date:
Subject: Re: role self-revocation