Re: standby recovery fails (tablespace related) (tentative patch and discussion) - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date
Msg-id 20220404.172948.678193664696814690.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: standby recovery fails (tablespace related) (tentative patch and discussion)  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
At Fri, 1 Apr 2022 14:51:58 -0400, Robert Haas <robertmhaas@gmail.com> wrote in 
> On Fri, Apr 1, 2022 at 12:22 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > By the way, may I ask how do we fix this?  The existing recovery code
> > already generates just-to-be-delete files in a real directory in
> > pg_tblspc sometimes, and elsewise skip applying WAL records on
> > nonexistent heap pages.  It is the "mixed" way.
> 
> Can you be more specific about where we have each behavior now?

They're done in  XLogReadBufferExtended.

The second behavior happens here,
xlogutils.c:
>        /* hm, page doesn't exist in file */
>        if (mode == RBM_NORMAL)
>        {
>            log_invalid_page(rnode, forknum, blkno, false);
+            Assert(0);
>            return InvalidBuffer;

With the assertion, 015_promotion_pages.pl crashes. This prevents page
creation and the following redo action on the page.

The first behavior is described as the following comment:

>     * Create the target file if it doesn't already exist.  This lets us cope
>     * if the replay sequence contains writes to a relation that is later
>     * deleted.  (The original coding of this routine would instead suppress
>     * the writes, but that seems like it risks losing valuable data if the
>     * filesystem loses an inode during a crash.  Better to write the data
>     * until we are actually told to delete the file.)
>     */
>    smgrcreate(smgr, forknum, true);

Without the smgrcreate call, make check-world fails due to missing
files for FSM and visibility map, and init forks, which it's a bit
doubtful that the cases fall into the category so-called "creates
inexistent objects by redo access". In a few places, XLOG_FPI records
are used to create the first page of a file including main and init
forks.  But I don't see a case of main fork during make check-world.

# Most of the failure cases happen as standby freeze. I was a bit
# annoyed that make check-world doesn't tell what is the module
# currently being tested.  In that case I had to deduce it from the
# sequence of preceding script names, but if the first TAP script of a
# module freezes, I had to use ps to find the module..


> > 1. stop XLogReadBufferForRedo creating a file in nonexistent
> >   directories then remember the failure (I'm not sure how big the
> >   impact is.)
> >
> > 2. unconditionally create all objects required for recovery to proceed..
> >   2.1 and igore the failures.
> >   2.2 and remember the failures.
> >
> > 3. Any other?
> >
> > 2 needs to create a real directory in pg_tblspc. So 1?
> 
> I think we could either do 1 or 2. My intuition is that getting 2
> working would be less scary and more likely to be something we would
> feel comfortable back-patching, but 1 is probably a better design in
> the long term. However, I might be wrong -- that's just a guess.

Thanks.  I forgot to mention in the previous mail (but mentioned
somewhere upthread) but if we take 2, there's no way other than
creating a real directory in pg_tblspc while recovery.  I don't think
it is neat.

I haven't found how the patch caused creation of a relation file that
is to be removed soon.  However, I find that v19 patch fails by maybe
due to some change in Cluster.pm.  It takes a bit more time to check
that..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: "Andrey V. Lepikhov"
Date:
Subject: Re: Removing unneeded self joins
Next
From: Daniel Shelepanov
Date:
Subject: collect_corrupt_items_vacuum.patch