Home > mailing lists

Re: standby recovery fails (tablespace related) (tentative patch and discussion) - Mailing list pgsql-hackers

From	Paul Guo
Subject	Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date	April 23, 2019 05:31:58
Msg-id	CAEET0ZEcwz57z2yfWRds43b3TfQPPDSWmbjGmD43xRxLT41NDg@mail.gmail.com Whole thread
In response to	Re: standby recovery fails (tablespace related) (tentative patchand discussion) (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses	Re: standby recovery fails (tablespace related) (tentative patchand discussion)
List	pgsql-hackers

Tree view

Hi Kyotaro, ignoring the MakePGDirectory() failure will fix this database create redo error, but I suspect some other kind of redo, which depends on the files under the directory (they are not copied since the directory is not created) and also cannot be covered by the invalid page mechanism, could fail. Thanks.

On Mon, Apr 22, 2019 at 3:40 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Oops! The comment in the previous patch is wrong.

At Mon, 22 Apr 2019 16:15:13 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20190422.161513.258021727.horiguchi.kyotaro@lab.ntt.co.jp>
> At Mon, 22 Apr 2019 12:36:43 +0800, Paul Guo <pguo@pivotal.io> wrote in <CAEET0ZGpUrMGUzfyzVF9FuSq+zb=QovYa2cvyRnDOTvZ5vXxTw@mail.gmail.com>
> > Please see my replies inline. Thanks.
> >
> > On Fri, Apr 19, 2019 at 12:38 PM Asim R P <apraveen@pivotal.io> wrote:
> >
> > > On Wed, Apr 17, 2019 at 1:27 PM Paul Guo <pguo@pivotal.io> wrote:
> > > >
> > > > create db with tablespace
> > > > drop database
> > > > drop tablespace.
> > >
> > > Essentially, that sequence of operations causes crash recovery to fail
> > > if the "drop tablespace" transaction was committed before crashing.
> > > This is a bug in crash recovery in general and should be reproducible
> > > without configuring a standby. Is that right?
> > >
> >
> > No. In general, checkpoint is done for drop_db/create_db/drop_tablespace on
> > master.
> > That makes the file/directory update-to-date if I understand the related
> > code correctly.
> > For standby, checkpoint redo does not ensure that.
>
> That's right partly. As you must have seen, fast shutdown forces
> restartpoint for the last checkpoint and it prevents the problem
> from happening. Anyway it seems to be a problem.
>
> > > Your patch creates missing directories in the destination. Don't we
> > > need to create the tablespace symlink under pg_tblspc/? I would
> > >
> >
> > 'create db with tablespace' redo log does not include the tablespace real
> > directory information.
> > Yes, we could add in it into the xlog, but that seems to be an overdesign.
>
> But I don't think creating directory that is to be removed just
> after is a wanted solution. The directory most likely to be be
> removed just after.
>
> > > prefer extending the invalid page mechanism to deal with this, as
> > > suggested by Ashwin off-list. It will allow us to avoid creating
> >
> > directories and files only to remove them shortly afterwards when the
> > > drop database and drop tablespace records are replayed.
> > >
> > >
> > 'invalid page' mechanism seems to be more proper for missing pages of a
> > file. For
> > missing directories, we could, of course, hack to use that (e.g. reading
> > any page of
> > a relfile in that database) to make sure the tablespace create code
> > (without symlink)
> > safer (It assumes those directories will be deleted soon).
> >
> > More feedback about all of the previous discussed solutions is welcome.
>
> It doesn't seem to me that the invalid page mechanism is
> applicable in straightforward way, because it doesn't consider
> simple file copy.
>
> Drop failure is ignored any time. I suppose we can ignore the
> error to continue recovering as far as recovery have not reached
> consistency. The attached would work *at least* your case, but I
> haven't checked this covers all places where need the same
> treatment.

The comment for the new function XLogMakePGDirectory is wrong:

+ * There is a possibility that WAL replay causes a creation of the same
+ * directory left by the previous crash. Issuing ERROR prevents the caller
+ * from continuing recovery.

The correct one is:

+ * There is a possibility that WAL replay causes an error by creation of a
+ * directory under a directory removed before the previous crash. Issuing
+ * ERROR prevents the caller from continuing recovery.

It is fixed in the attached.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

From: Andrew Gierth
Date: 23 April 2019, 05:23:13
Subject: Re: Symbol referencing errors

From: David Rowley
Date: 23 April 2019, 05:45:57
Subject: Re: pg_dump is broken for partition tablespaces

Re: standby recovery fails (tablespace related) (tentative patch and discussion) - Mailing list pgsql-hackers

Previous

Next