Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding and replication of sequences, take 2
Date
Msg-id cce24592-8f20-dc91-3aec-f6257d4091ae@enterprisedb.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences, take 2  ("Jonathan S. Katz" <jkatz@postgresql.org>)
Responses Re: logical decoding and replication of sequences, take 2
List pgsql-hackers
On 2/22/23 03:28, Jonathan S. Katz wrote:
> Hi,
> 
> On 2/16/23 10:50 AM, Tomas Vondra wrote:
>> Hi,
>>
>> Here's a rebased patch, without the last bit which is now unnecessary
>> thanks to c981d9145dea.
> 
> Thanks for continuing to work on this patch! I tested the latest version
> and have some feedback/clarifications.
> 

Thanks!

> I did some testing using a demo-app-based-on-a-real-world app I had
> conjured up[1]. This uses integer sequences as surrogate keys.
> 
> In general things seemed to work, but I had a couple of
> observations/questions.
> 
> 1. Sequence IDs after a "failover". I believe this is a design decision,
> but I noticed that after simulating a failover, the IDs were replicating
> from a higher value, e.g.
> 
> INSERT INTO room (name) VALUES ('room 1');
> INSERT INTO room (name) VALUES ('room 2');
> INSERT INTO room (name) VALUES ('room 3');
> INSERT INTO room (name) VALUES ('room 4');
> 
> The values of room_id_seq on each instance:
> 
> instance 1:
> 
>  last_value | log_cnt | is_called
> ------------+---------+-----------
>           4 |      29 | t
> 
>  instance 2:
> 
>   last_value | log_cnt | is_called
> ------------+---------+-----------
>          33 |       0 | t
> 
> After the switchover on instance 2:
> 
> INSERT INTO room (name) VALUES ('room 5') RETURNING id;
> 
>  id
> ----
>  34
> 
> I don't see this as an issue for most applications, but we should at
> least document the behavior somewhere.
> 

Yes, this is due to how we WAL-log sequences. We don't log individual
increments, but every 32nd increment and we log the "future" sequence
state so that after a crash/recovery we don't generate duplicates.

So you do nextval() and it returns 1. But into WAL we record 32. And
there will be no WAL records until nextval reaches 32 and needs to
generate another batch.

And because logical replication relies on these WAL records, it inherits
this batching behavior with a "jump" on recovery/failover. IMHO it's OK,
it works for the "logical failover" use case and if you need gapless
sequences then regular sequences are not an issue anyway.

It's possible to reduce the jump a bit by reducing the batch size (from
32 to 0) so that every increment is logged. But it doesn't eliminate it
because of rollbacks.

> 2. Using with origin=none with nonconflicting sequences.
> 
> I modified the example in [1] to set up two schemas with non-conflicting
> sequences[2], e.g. on instance 1:
> 
> CREATE TABLE public.room (
>     id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 1)
> PRIMARY KEY,
>     name text NOT NULL
> );
> 
> and instance 2:
> 
> CREATE TABLE public.room (
>     id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 2)
> PRIMARY KEY,
>     name text NOT NULL
> );
> 

Well, yeah. We don't support active-active logical replication (at least
not with the built-in). You can easily get into similar issues without
sequences.

Replicating a sequence overwrites the state of the sequence on the other
side, which may result in it generating duplicate values with the other
node, etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: Refactor calculations to use instr_time