RE: Newly created replication slot may be invalidated by checkpoint - Mailing list pgsql-hackers

From Vitaly Davydov
Subject RE: Newly created replication slot may be invalidated by checkpoint
Date
Msg-id 324b22-691ed080-3-51e9c380@77308141
Whole thread Raw
In response to RE: Newly created replication slot may be invalidated by checkpoint  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses RE: Newly created replication slot may be invalidated by checkpoint
List pgsql-hackers
Hi Zhijie Hou

I'm not sure, my previous email was successfully sent. I do not see it in the
pgsql-hackers mailing list. Re-sending it again, sorry for that.

Thank you for preparing the patch!

The new lock schema (Allocation, Control) seems to be working, without any
deadlocks. It is guaranteed by the lock orders - (1) Allocation lock is followed
by the Control lock, or (2) the Control lock without the Allocation lock. I've
attached the doc [1] where I tried to describe the lock schema to analyze
possible deadlocks. I haven't found any evident problems.

Concerning reserve_wal_for_local_slot. It seems it is used in synchronization
of failover logical slots. For me, it is tricky to change restart_lsn of a
synced logical slot to RedoRecPtr, because it may lead to problems with
logical replication using such slot after the replica promotion. But it seems
it is the architectural problem and it is not related to the problems, solved by
the patch.

The change of lock mode to EXCLUSIVE in ReplicationSlotsComputeRequiredLSN may
affect the performance when a lot of slots are advanced during some small period
of time. It may affect the walsender performance. It advances the logical or
physical slots when receive a confirmation from the replica. I guess, the slot
advancement may be pretty frequent operation.

There is an idea to keep the SHARED lock but put XLogSetReplicationSlotMinimumLSN
under this lock and implement some guarantees that we do not advance slots to
the past, taking into account that the slot can be advanced in past if it
doesn't cross wal segment boundaries (it happens). In this case, if a concurrent
process advances an existing slot, its old restart_lsn will protect the wal. In
case of wal reservation we may use EXCLUSIVE lock in
XLogSetReplicationSlotMinimumLSN.

Furthremore, I believe, ReplicationSlotsComputeRequiredLSN is required for
checkpointer to calculate the oldest wal segment but it is not required to be
called every time when a slot is advanced. It may affect the reports -
GetWALAvailability function, but I think it is not a big problem to deal with it.

Some typos in the patch commit message:

1) A typo: yet updated estart_lsn, while the

2) If a backend advances a slot's restart_lsn reaches
XLogSetReplicationSlotMinimumLSN... - may be to put 'and' before reaches?


[1] lockschema.md

With best regards,
Vitaly

Attachment

pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: quoteOneName() inconsistency with quote_all_identifiers — replacement API proposed
Next
From: Chao Li
Date:
Subject: Re: [PATCH] pg_get_domain_ddl: DDL reconstruction function for CREATE DOMAIN statement