Re: avoid multiple hard links to same WAL file after a crash - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: avoid multiple hard links to same WAL file after a crash
Date
Msg-id 20220412.154631.417529439388886590.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: avoid multiple hard links to same WAL file after a crash  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: avoid multiple hard links to same WAL file after a crash  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
At Mon, 11 Apr 2022 09:52:57 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in 
> On Mon, Apr 11, 2022 at 12:28:47PM -0400, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Mon, Apr 11, 2022 at 5:12 AM Kyotaro Horiguchi
> >> <horikyota.ntt@gmail.com> wrote:
> >>> If this diagnosis is correct, the comment is proved to be paranoid.
> > 
> >> It's sometimes difficult to understand what problems really old code
> >> comments are worrying about. For example, could they have been
> >> worrying about bugs in the code? Could they have been worrying about
> >> manual interference with the pg_wal directory? It's hard to know.
> > 
> > "git blame" can be helpful here, if you trace back to when the comment
> > was written and then try to find the associated mailing-list discussion.
> > (That leap can be difficult for commits pre-dating our current
> > convention of including links in the commit message, but it's usually
> > not *that* hard to locate contemporaneous discussion.)
> 
> I traced this back a while ago.  I believe the link() was first added in
> November 2000 as part of f0e37a8.  This even predates WAL recycling, which
> was added in July 2001 as part of 7d4d5c0.

f0e37a8 lacks discussion.. It introduced the CHECKPOINT command from
somwhere out of the ML.. This patch changed XLogFileInit to
supportusing existent files so that XLogWrite can use the new segment
provided by checkpoint and still allow XLogWrite to create a new
segment by itself.

Just before the commit, calls to XLogFileInit were protected (or
serialized) by logwr_lck.  At the commit calls to the same function
were still serialized by ControlFileLockId.

I *guess* that Vadim faced/noticed a race condition when he added
checkpoint. Thus introduced the link+remove protocol but finally it
became useless by moving the call to XLogFileInit within
ControlFileLockId section.  But, of course, all of story is mere a
guess.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Skipping schema changes in publication
Next
From: Amit Kapila
Date:
Subject: Re: Skipping schema changes in publication