Re: standby recovery fails (tablespace related) (tentative patch and discussion) - Mailing list pgsql-hackers
From | Paul Guo |
---|---|
Subject | Re: standby recovery fails (tablespace related) (tentative patch and discussion) |
Date | |
Msg-id | CAEET0ZF-0hGtkZGufHaqDzg-Ez89n=1T6raWv-03SP1FtB7wRw@mail.gmail.com Whole thread Raw |
In response to | Re: standby recovery fails (tablespace related) (tentative patch and discussion) (Paul Guo <pguo@pivotal.io>) |
Responses |
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
|
List | pgsql-hackers |
On Mon, May 27, 2019 at 9:39 PM Paul Guo <pguo@pivotal.io> wrote:
On Tue, May 14, 2019 at 11:06 AM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:Hello.
At Mon, 13 May 2019 17:37:50 +0800, Paul Guo <pguo@pivotal.io> wrote in <CAEET0ZF9yN4DaXyuFLzOcAYyxuFF1Ms_OQWeA+Rwv3GhA5Q-SA@mail.gmail.com>
> Thanks for the reply.
>
> On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>
> >
> > + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
> > + {
> >
> > This patch is allowing missing source and destination directory
> > even in consistent state. I don't think it is safe.
> >
>
> I do not understand this. Can you elaborate?
Suppose we were recoverying based on a backup at LSN1 targeting
to LSN3 then it crashed at LSN2, where LSN1 < LSN2 <= LSN3. LSN2
is called as "consistency point", before where the database is
not consistent. It's because we are applying WAL recored older
than those that were already applied in the second trial. The
same can be said for crash recovery, where LSN1 is the latest
checkpoint ('s redo LSN) and LSN2=LSN3 is the crashed LSN.
Creation of an existing directory or dropping of a non-existent
directory are apparently inconsistent or "broken" so we should
stop recovery when seeing such WAL records while database is in
consistent state.This seems to be hard to detect. I thought using invalid_page mechanism long ago,but it seems to be hard to fully detect a dropped tablespace.> > > 2) Fixed dbase_desc(). Now the xlog output looks correct.
> > >
> > > rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn:
> > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
> > > pg_tblspc/16384/PG_12_201904281/16386
> > >
> > > rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn:
> > > 0/01638EB8, prev 0/01638E40, desc: DROP dir
> > > pg_tblspc/16384/PG_12_201904281/16386
> >
> > WAL records don't convey such information. The previous
> > description seems right to me.
> >
>
> 2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for
> Database/CREATE: copy dir 1663/1 to 65546/65547
> The directories are definitely wrong and misleading.
The original description is right in the light of how the server
recognizes. The record exactly says that "copy dir 1663/1 to
65546/65547" and the latter path is converted in filesystem layer
via a symlink.In either $PG_DATA/pg_tblspc or symlinked real tablespace directory,there is an additional directory like PG_12_201905221 betweentablespace oid and database oid. See the directory layout as below,so the directory info in xlog dump output was not correct.$ ls -lh data/pg_tblspc/
total 0
lrwxrwxrwx. 1 gpadmin gpadmin 6 May 27 17:23 16384 -> /tmp/2
$ ls -lh /tmp/2
total 0
drwx------. 3 gpadmin gpadmin 18 May 27 17:24 PG_12_201905221
> > > > Also I'd suggest we should use pg_mkdir_p() in
> > TablespaceCreateDbspace()
> > > > to replace
> > > > the code block includes a lot of
> > > > get_parent_directory(), MakePGDirectory(), etc even it
> > > > is not fixing a bug since pg_mkdir_p() code change seems to be more
> > > > graceful and simpler.
> >
> > But I don't agree to this. pg_mkdir_p goes above two-parents up,
> > which would be unwanted here.
> >
> > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
> This change just makes the code concise. Though in theory the change is not
> needed.
We don't want to create tablespace direcotory after concurrent
DROPing, as the comment just above is saying:
| * Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
| * or TablespaceCreateDbspace is running concurrently.
If the concurrent DROP TABLESPACE destroyed the grand parent
directory, we mustn't create it again.Yes, this is a good reason to keep the original code. Thanks.By the way, based on your previous test patch I added another test which could easily detectthe missing src directory case.
I updated the patch to v3. In this version, we skip the error if copydir fails due to missing src/dst directory,
but to make sure the ignoring is legal, I add a simple log/forget mechanism (Using List) similar to the xlog invalid page
checking mechanism. Two tap tests are included. One is actually from a previous patch by Kyotaro in this
email thread and another is added by me. In addition, dbase_desc() is fixed to make the message accurate.
Thanks.
Attachment
pgsql-hackers by date: