Re: sequences vs. synchronous replication - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: sequences vs. synchronous replication
Date
Msg-id 7869761b-fdc7-6f7d-dc95-9ba8315a8ecf@enterprisedb.com
Whole thread Raw
In response to Re: sequences vs. synchronous replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: sequences vs. synchronous replication
List pgsql-hackers

On 12/18/21 07:00, Tomas Vondra wrote:
> 
> 
> On 12/18/21 05:52, Tom Lane wrote:
>> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>>> The problem is exactly the same as in [1] - the aborted transaction
>>> generated WAL, but RecordTransactionAbort() ignores that and does not
>>> update LogwrtResult.Write, with the reasoning that aborted transactions
>>> do not matter. But sequences violate that, because we only write WAL
>>> once every 32 increments, so the following nextval() gets "committed"
>>> without waiting for the replica (because it did not produce WAL).
>>
>> Ugh.
>>
>>> I'm not sure this is a clear data corruption bug, but it surely walks
>>> and quacks like one. My proposal is to fix this by tracking the lsn of
>>> the last LSN for a sequence increment, and then check that LSN in
>>> RecordTransactionCommit() before calling XLogFlush().
>>
>> (1) Does that work if the aborted increment was in a different
>> session?  I think it is okay but I'm tired enough to not be sure.
>>
> 
> Good point - it doesn't :-( At least not by simply storing LSN in a 
> global variable or something like that.
> 
> The second backend needs to know the LSN of the last WAL-logged sequence 
> increment, but only the first backend knows that. So we'd need to share 
> that between backends somehow. I doubt we want to track LSN for every 
> individual sequence (because for clusters with many dbs / sequences that 
> may be a lot).
> 
> Perhaps we could track just a fixed number o LSN values in shared memory 
> (say, 1024), and update/read just the element determined by hash(oid). 
> That is, the backend WAL-logging sequence with given oid would set the 
> current LSN to array[hash(oid) % 1024], and backend doing nextval() 
> would simply remember the LSN in that slot. Yes, if there are conflicts 
> that'll flush more than needed.
> 

Here's a PoC demonstrating this idea. I'm not convinced it's the right 
way to deal with this - it surely seems more like a duct tape fix than a 
clean solution. But it does the trick.

I wonder if storing this in shmem is good enough - we lose the LSN info 
on restart, but the checkpoint should trigger FPI which makes it OK.

A bigger question is whether sequences are the only thing affected by 
this. If you look at RecordTransactionCommit() then we skip flush/wait 
in two cases:

1) !wrote_xlog - if the xact did not produce WAL

2) !markXidCommitted - if the xact does not have a valid XID

Both apply to sequences, and the PoC patch tweaks them. But maybe there 
are other places where we don't generate WAL and/or assign XID in some 
cases, to save time?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Column Filtering in Logical Replication
Next
From: Chapman Flack
Date:
Subject: Is my home $HOME or is it getpwent()->pw_dir ?