Re: Race condition in recovery? - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Race condition in recovery? |
Date | |
Msg-id | 20210528.150537.1949222691062992784.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Race condition in recovery? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Race condition in recovery?
|
List | pgsql-hackers |
Thanks! At Thu, 27 May 2021 15:05:44 -0400, Robert Haas <robertmhaas@gmail.com> wrote in > On Wed, May 26, 2021 at 8:49 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > So in the mail [1] and [2] I tried to describe what's going on around > > the two issues. Although I haven't have a response to [2], can I > > think that we clarified the intention of ee994272ca? And may I think > > that we decided that we don't add a test for the commit? > > Regarding the first question, I feel that the intention of ee994272ca > is fairly clear at this point. Someone else might feel differently so > I won't presume to speak for anyone but me. I completely misunderstood your intention here. > Regarding the second question, I am not opposed to adding a test for > that commit, but I think it is a lot more important to fix the bug we > have now than to add a test for a bug that was fixed a long time ago. Yes. I agree to that. Glad to see that. > > Then it seems to me that Robert refound how to reproduce Dilip's case > > using basebackup instead of using two standbys. It is using a broken > > archive_command with pg_basebackup -Xnone and I showed that the same > > resulting state is available by pg_basebackup -Xstream/fetch clearing > > pg_wal directory of the resulting backup including an explanation of > > why. > > Yes, it makes sense that we could get to the same state either by not > fetching the WAL in the first place, or alternatively by fetching it > and then removing it. Sure. That is an opinion and I can agree to that. > > *I* think that it is better to avoid to have the archive_command since > > it seems to me that just unlinking some files seems simpler tha having > > the broken archive_command. However, since Robert ignored it, I guess > > that Robert thinks that the broken archive_command is better than > > that. > > Well ... I don't see those things as quite related. As far as I can > see, unlinking files from pg_wal is an alternative to using -Xnone. On > the other hand, the broken archive_command is there to make sure the > new primary doesn't archive its WAL segment too soon. I agree to use the archive_command just to create the desired state. > Regarding the first point, I think using -Xnone is better than using > -Xfetch/stream and then removing the WAL, because (1) it doesn't seem > efficient to fetch WAL only to turn around and remove it and (2) > someone might question whether removing the WAL afterward is a > supported procedure, whereas using an option built into the tool must > surely be supported. Mmmm. That looks like meaning that we don't intend to support the Dilip's case, and means that we support the use of archive-command-copies-only-other-than-wal-segments? > Regarding the second point, I think using the broken archive command > is superior because we can be sure of the behavior. If we just rely on > not having crossed a segment boundary, then anything that causes more > WAL to be generated than we are expecting could break the test. I > don't think it's particularly likely in a case like this that > autovacuum or any other thing would kick in and generate extra WAL, > but the broken archive command ensures that even if it does happen, > the test will still work as intended. That, to me, seems like a good > enough reason to do it that way. Yeah. It is what the most convincing reason. > > FWIW, regarding to the name of the test script, putting aside what it > > actually does, I proposed to place it as a part or > > 004_timeline_switch.pl because this issue is related to timeline > > switching. > > I think it is better to keep it separate. Long test scripts that test > multiple things with completely separate tests are hard to read. Agreed. I often annoyed by a long-lasting TAP script when I wanted to do one of the test items in it. However, I was not sure which is our policy here, consolidating all related tests into one script or having separate scripts containing tests up to a "certain" number or a set of tests that would take a certain time, or limiting by number the of lines. I thought that we are on the first way as I have told several times to put new tests into an existing script. > Kyotaro-san, I hope I have not given any offense. I am doing my best, > and certainly did not mean to be rude. No. Thanks for the words, Robert. I might be a bit too naive, but I had an anxious feeling that I might have been totally pointless or my words might have been too cryptic/broken (my fingers are quite fat), or I might have done something wrong or anything other. Anyway I thought I might have done something wrong here. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: