Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)? - Mailing list pgsql-bugs

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?
Date
Msg-id 20170202.112829.188781915.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-bugs
Thank you for the comment.

At Thu, 2 Feb 2017 01:26:03 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwEET=QBA_jND=xhrXn+9ZreP4_qMBAqsBZg56beqxbveg@mail.gmail.com>
> > The attached patch does that. Usually it reads page headers only
> > on segment boundaries, but once continuation record found (or
> > failed to read the next page header, that is, the first record on
> > the first page in the next segment has not been replicated), it
> > becomes to happen on every page boundary until non-continuation
> > page comes.
> 
> I'm afraid that many WAL segments would start with a continuation record
> when there are the workload of short transactions (e.g., by pgbench), and
> which would make restart_lsn go behind very much. No?

I agreed. So trying to release the lock for every page boundary
but restart_lsn goes behind much if so many contiguous pages were
CONTRECORD. But I think the chance for the situation sticks for
one or more segments is ignorablly low. Being said that, there
*is* possibility of false continuation, anyway.

> The discussion on this thread just makes me think that restart_lsn should
> indicate the replay location instead of flush location. This seems safer.

Standby restarts from minRecoveryPoint, which is a copy of
XLogCtl->replayEndRecPtr and updated by
UpdateMinRecoveryPoint(). Whlie, applyPtr in reply messages is a
copy of XLogCtl->lastReplayedEndRecptr which is updated after the
upate of on-disk minRecoveryPoint. It seems safe from the
viewpoint.

On the other hand, apply is pausable. Records are copied and
flushd on standby then the segments on master that is already
sent are safely be removed even for the case. In spite of that,
older segments on the master are kept from being removed during
the pause. If applyPtr were used as restart_lsn, this could be
another problem and this is sure to happen.

I'm not sure how much possibility is there for several contiguous
segments are full of contpages. But I think it's worse that apply
pause causes needless pg_wal flooding.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #14522: plpythonu, missed filenode
Next
From: crvv.mail@gmail.com
Date:
Subject: [BUGS] BUG #14523: Commands which compare with nested subquery expressionfails with "should not reference subplan var"