Re: logical decoding and replication of sequences - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding and replication of sequences
Date
Msg-id de8cc7c3-4b61-94d5-87dd-ed317b245655@enterprisedb.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical decoding and replication of sequences  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: logical decoding and replication of sequences  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers

On 3/28/22 07:29, Amit Kapila wrote:
> ...
>
> While thinking about this, I think I see a problem with the
> non-transactional handling of sequences. It seems that we will skip
> sending non-transactional sequence change if it occurs before the
> decoding has reached a consistent point but the surrounding commit
> occurs after a consistent point is reached. In such cases, the
> corresponding DMLs like inserts will be sent but sequence changes
> won't be sent. For example (this scenario is based on
> twophase_snapshot.spec),
> 
> Initial setup:
> ==============
> Create table t1_seq(c1 int);
> Create Sequence seq1;
> 
> Test Execution via multiple sessions (this test allows insert in
> session-2 to happen before we reach a consistent point and commit
> happens after a consistent point):
> =======================================================================================================
> 
> Session-2:
> Begin;
> SELECT pg_current_xact_id();
> 
> Session-1:
> SELECT 'init' FROM pg_create_logical_replication_slot('test_slot',
> 'test_decoding', false, true);
> 
> Session-3:
> Begin;
> SELECT pg_current_xact_id();
> 
> Session-2:
> Commit;
> Begin;
> INSERT INTO t1_seq SELECT nextval('seq1') FROM generate_series(1,100);
> 
> Session-3:
> Commit;
> 
> Session-2:
> Commit 'foo'
> 
> Session-1:
> SELECT data  FROM pg_logical_slot_get_changes('test_slot', NULL, NULL,
> 'include-xids', 'false', 'skip-empty-xacts', '1');
> 
>                      data
> ----------------------------------------------
>  BEGIN
>  table public.t1_seq: INSERT: c1[integer]:1
>  table public.t1_seq: INSERT: c1[integer]:2
>  table public.t1_seq: INSERT: c1[integer]:3
>  table public.t1_seq: INSERT: c1[integer]:4
>  table public.t1_seq: INSERT: c1[integer]:5
>  table public.t1_seq: INSERT: c1[integer]:6
> 
> 
> Now, if we normally try to decode such an insert, the result would be
> something like:
>                                      data
> ------------------------------------------------------------------------------
>  sequence public.seq1: transactional:0 last_value: 33 log_cnt: 0 is_called:1
>  sequence public.seq1: transactional:0 last_value: 66 log_cnt: 0 is_called:1
>  sequence public.seq1: transactional:0 last_value: 99 log_cnt: 0 is_called:1
>  sequence public.seq1: transactional:0 last_value: 132 log_cnt: 0 is_called:1
>  BEGIN
>  table public.t1_seq: INSERT: c1[integer]:1
>  table public.t1_seq: INSERT: c1[integer]:2
>  table public.t1_seq: INSERT: c1[integer]:3
>  table public.t1_seq: INSERT: c1[integer]:4
>  table public.t1_seq: INSERT: c1[integer]:5
>  table public.t1_seq: INSERT: c1[integer]:6
> 
> This will create an inconsistent replica as sequence changes won't be
> replicated.

Hmm, that's interesting. I wonder if it can actually happen, though.
Have you been able to reproduce that, somehow?

> I thought about changing snapshot dealing of
> non-transactional sequence changes similar to transactional ones but
> that also won't work because it is only at commit we decide whether we
> can send the changes.
> 
I wonder if there's some earlier LSN (similar to the consistent point)
which might be useful for this.

Or maybe we should queue even the non-transactional changes, not
per-transaction but in a global list, and then at each commit either
discard inspect them (at that point we know the lowest LSN for all
transactions and the consistent point). Seems complex, though.

> For the transactional case, as we are considering the create sequence
> operation as transactional, we would unnecessarily queue them even
> though that is not required. Basically, they don't need to be
> considered transactional and we can simply ignore such messages like
> other DDLs. But for that probably we need to distinguish Alter/Create
> case which may or may not be straightforward. Now, queuing them is
> probably harmless unless it causes the transaction to spill/stream.
> 

I'm not sure I follow. Why would we queue them unnecessarily?

Also, there's the bug with decoding changes in transactions that create
the sequence and add it to a publication. I think the agreement was that
this behavior was incorrect, we should not decode changes until the
subscription is refreshed. Doesn't that mean can't be any CREATE case,
just ALTER?

> I still couldn't think completely about cases where a mix of
> transactional and non-transactional changes occur in the same
> transaction as I think it somewhat depends on what we want to do about
> the above cases.
> 

Understood. I need to think about this too.

>>> At all places, it is mentioned
>>> as creating a sequence for transactional cases which at the very least
>>> need some tweak.
>>>
>>
>> Which places?
>>
> 
> In comments like:
> a. When decoding sequences, we differentiate between sequences created
> in a (running) transaction and sequences created in other (already
> committed) transactions.
> b. ... But for new sequences, we need to handle them in a transactional way, ..
> c. ... Change needs to be handled as transactional, because the
> sequence was created in a transaction that is still running ...
> 
> It seems all these places indicate a scenario of creating a sequence
> whereas we want to do transactional stuff mainly for Alter.
> 

Right, I'll think about how to clarify the comments.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: [Proposal] vacuumdb --schema only
Next
From: gkokolatos@pm.me
Date:
Subject: Re: Add LZ4 compression in pg_dump