Logical replication & oldest XID. - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Logical replication & oldest XID.
Date
Msg-id 574DA53E.1010806@postgrespro.ru
Whole thread Raw
List pgsql-hackers
Hi,

We are using logical replication in multimaster and are faced with some 
interesting problem with "frozen" procArray->replication_slot_xmin.
This variable is adjusted by ProcArraySetReplicationSlotXmin which is 
invoked by ReplicationSlotsComputeRequiredXmin, which
is in turn is called by LogicalConfirmReceivedLocation. If transactions 
are executed at all nodes of multimaster, then everything works fine: 
replication_slot_xmin is advanced. But if we send transactions only to 
one multimaster node and broadcast this changes to other nodes, then no 
data is send through replications slot at this nodes. No data sends - no 
confirmations, LogicalConfirmReceivedLocation is not called and 
procArray->replication_slot_xmin preserves original value 599.

As a result GetOldestXmin function always returns 599, so autovacuum is 
actually blocked and our multimaster is not able to perform cleanup of 
XID->CSN map, which cause shared memory overflow. This situation happens 
only when write transactions are sent only to one node or if there are 
no write transactions at all.

Before implementing some workaround (for example forces all of 
ReplicationSlotsComputeRequiredXmin), I want to understand if it is real 
problem of logical replication or we are doing something wrong? BDR 
should be faced with the same problem if all updates are performed from 
one node...

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Next
From: Robert Haas
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <