Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)? - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date
Msg-id CAB7nPqQ05G15JooRMEONgPkW0osot77yaFAUF9_6Q8G+v+2+xg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Thu, Feb 2, 2017 at 1:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> I'm afraid that many WAL segments would start with a continuation record
> when there are the workload of short transactions (e.g., by pgbench), and
> which would make restart_lsn go behind very much. No?

I don't quite understand this argument. Even if there are many small
transactions, that would cause restart_lsn to just be late by one
segment, all the time.

> The discussion on this thread just makes me think that restart_lsn should
> indicate the replay location instead of flush location. This seems safer.

That would penalize WAL retention on the primary with standbys using
recovery_min_apply_delay and a slot for example...

We can attempt to address this problem two ways. The patch proposed
(ugly btw and there are two typos!) is doing it in the WAL sender by
not making restart_lsn jump to the next segment if a continuation
record is found. Or we could have the standby request for the next
segment instead if the record it wants to replay from is at a boundary
and that it locally has the beginning of the record, and it has it
because it already confirmed to the primary that it flushed to the
next segment. Not sure which fix is better though.
-- 
Michael



pgsql-hackers by date:

Previous
From: Nikhil Sontakke
Date:
Subject: Re: [HACKERS] Speedup twophase transactions
Next
From: Konstantin Knizhnik
Date:
Subject: Re: [HACKERS] Deadlock in XLogInsert at AIX