Re: standby recovery fails (tablespace related) (tentative patchand discussion) - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: standby recovery fails (tablespace related) (tentative patchand discussion) |
Date | |
Msg-id | 20190507.154711.254360772.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: standby recovery fails (tablespace related) (tentative patch and discussion) (Paul Guo <pguo@pivotal.io>) |
Responses |
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
|
List | pgsql-hackers |
Hi. At Tue, 30 Apr 2019 14:33:47 +0800, Paul Guo <pguo@pivotal.io> wrote in <CAEET0ZGhmDKrq7JJu2rLLqcJBR8pA4OYrKsirZ5Ft8-deG1e8A@mail.gmail.com> > I updated the original patch to It's reasonable not to touch copydir. > 1) skip copydir() if either src path or dst parent path is missing in > dbase_redo(). Both missing cases seem to be possible. For the src path > missing case, mkdir_p() is meaningless. It seems that moving the directory > existence check step to dbase_redo() has less impact on other code. Nice catch. + if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode))) + { This patch is allowing missing source and destination directory even in consistent state. I don't think it is safe. + ereport(WARNING, + (errmsg("directory \"%s\" for copydir() does not exists." + "It is possibly expected. Skip copydir().", + parent_path))); This message seems unfriendly to users, or it seems like an elog message. How about something like this. The same can be said for the source directory. | WARNING: skipped creating database directory: "%s" | DETAIL: The tabelspace %u may have been removed just before crash. # I'm not confident in this at all:( > 2) Fixed dbase_desc(). Now the xlog output looks correct. > > rmgr: Database len (rec/tot): 42/ 42, tx: 486, lsn: > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to > pg_tblspc/16384/PG_12_201904281/16386 > > rmgr: Database len (rec/tot): 34/ 34, tx: 487, lsn: > 0/01638EB8, prev 0/01638E40, desc: DROP dir > pg_tblspc/16384/PG_12_201904281/16386 WAL records don't convey such information. The previous description seems right to me. > I'm not familiar with the TAP test details previously. I learned a lot > about how to test such case from Kyotaro's patch series.👍 Yeah, good to hear. > On Sun, Apr 28, 2019 at 3:33 PM Paul Guo <pguo@pivotal.io> wrote: > > If we want to fix the issue by ignoring the dst path create failure, I do > > not think we should do > > that in copydir() since copydir() seems to be a common function. I'm not > > sure whether it is > > used by some extensions or not. If no maybe we should move the dst patch > > create logic > > out of copydir(). Agreed to this. > > Also I'd suggest we should use pg_mkdir_p() in TablespaceCreateDbspace() > > to replace > > the code block includes a lot of > > get_parent_directory(), MakePGDirectory(), etc even it > > is not fixing a bug since pg_mkdir_p() code change seems to be more > > graceful and simpler. But I don't agree to this. pg_mkdir_p goes above two-parents up, which would be unwanted here. > > Whatever ignore mkdir failure or mkdir_p, I found that these steps seem to > > be error-prone > > along with postgre evolving since they are hard to test and also we are > > not easy to think out > > various potential bad cases. Is it possible that we should do real > > checkpoint (flush & update > > redo lsn) when seeing checkpoint xlogs for these operations? This will > > slow down standby > > but master also does this and also these operations are not usual, > > espeically it seems that it > > does not slow down wal receiving usually? That dramatically slows recovery (not replication) if databases are created and deleted frequently. That wouldn't be acceptable. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: