Re: Taking into account syncrep position in flush_lsn reported by apply worker - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Taking into account syncrep position in flush_lsn reported by apply worker
Date
Msg-id ff0c1433-6294-4643-9f4d-b445913dd9e4@iki.fi
Whole thread Raw
In response to Re: Taking into account syncrep position in flush_lsn reported by apply worker  (Arseny Sher <ars@neon.tech>)
Responses Re: Taking into account syncrep position in flush_lsn reported by apply worker
List pgsql-hackers
On 14/08/2024 16:54, Arseny Sher wrote:
> On 8/13/24 06:35, Amit Kapila wrote:
>> On Mon, Aug 12, 2024 at 3:43 PM Arseny Sher <ars@neon.tech> wrote:
>>>
>>> Sorry for the poor formatting of the message above, this should be 
>>> better:
>>>
>>> Hey. Currently synchronous_commit is disabled for logical apply worker
>>> on the ground that reported flush_lsn includes only locally flushed data
>>> so slot (publisher) preserves everything higher than this, and so in
>>> case of subscriber restart no data is lost. However, imagine that
>>> subscriber is made highly available by standby to which synchronous
>>> replication is enabled. Then reported flush_lsn is ignorant of this
>>> synchronous replication progress, and in case of failover data loss may
>>> occur if subscriber managed to ack flush_lsn ahead of syncrep.
>>
>> Won't the same can be achieved by enabling the synchronous_commit
>> parameter for a subscription?
> 
> Nope, because it would force WAL flush and wait for replication to the
> standby in the apply worker, slowing down it. The logic missing
> currently is not to wait for the synchronous commit, but still mind its
> progress in the flush_lsn reporting.

I think this patch makes sense. I'm not sure we've actually made any 
promises on it, but it feels wrong that the slot's LSN might be advanced 
past the LSN that's been has been acknowledged by the replica, if 
synchronous replication is configured. I see little downside in making 
that promise.

> +    /*
> +     * If synchronous replication is configured, take into account its position.
> +     */
> +    if (SyncRepStandbyNames != NULL && SyncRepStandbyNames[0] != '\0')
> +    {
> +        LWLockAcquire(SyncRepLock, LW_SHARED);
> +        local_flush = Min(local_flush, WalSndCtl->lsn[SYNC_REP_WAIT_FLUSH]);
> +        LWLockRelease(SyncRepLock);
> +    }
> +

Should probably use the SyncStandbysDefined() macro here. Or check 
WalSndCtl->sync_standbys_defined like SyncRepWaitForLSN() does; not sure 
which would be more appropriate here.

Should the synchronous_commit setting also affect this?

Please also check if the docs need to be updated, or if a paragraph 
should be added somewhere on this behavior.

A TAP test case would be nice. Not sure how complicated it will be, but 
if not too complicated, it'd be nice to include it in check-world.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Next
From: Jacob Champion
Date:
Subject: Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs