Re: Race condition in recovery? - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Race condition in recovery? |
Date | |
Msg-id | 20210604.182456.2154274967017506645.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Race condition in recovery? (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Race condition in recovery?
|
List | pgsql-hackers |
At Fri, 4 Jun 2021 13:21:08 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > On Fri, Jun 4, 2021 at 2:03 AM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, May 27, 2021 at 2:26 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > Changed as suggested. > > > > I don't think the code as written here is going to work on Windows, > > because your code doesn't duplicate enable_restoring's call to > > perl2host or its backslash-escaping logic. It would really be better > > if we could use enable_restoring directly. Also, I discovered that the > > 'return' in cp_history_files should really say 'exit', because > > otherwise it generates a complaint every time it's run. It should also > > have 'use strict' and 'use warnings' at the top. > > Ok > > > Here's a version of your test case patch with the 1-line code fix > > added, the above issues addressed, and a bunch of cosmetic tweaks. > > Unfortunately, it doesn't pass for me consistently. I'm not sure if > > that's because I broke something with my changes, or because the test > > contains an underlying race condition which we need to address. > > Attached also are the log files from a failed run if you want to look > > at them. The key lines seem to be: > > I could not reproduce this but I think I got the issue, I think I used > the wrong target LSN in wait_for_catchup, instead of checking the last > "insert LSN" of the standby I was waiting for last "replay LSN" of > standby which was wrong. Changed as below in the attached patch. I think that's right. And the test script detects the issue for me both on Linux but doesn't work for Windows. '"C:/../Documents/work/postgresql/src/test/recovery/t/cp_history_files"' is not recognized as an internal command or externalcommand .. Because Windows' cmd.exe doesn't have the shbang feature. On Windows, maybe archive_command should be like '".../perl" "$FindBin../cp_history_files" "%p"... If I did this I got another error. "couldn't copy pg_wal\00000002.history to C:/../Documents/work/postgresql/src est^Mecovery/tmp_check/t_000_a_primary_data/archives/00000002.history: at C:/../Documents/work/postgresql/src/test/recovery/t/cp_history_filesline 10.^M" ("^M" are the replacement for carrage return) So.. I'm not sure what is happening but the error messages, or.. Anyway I don't have a time to investigate it. + # clean up + $node_primary->teardown_node; + $node_standby->teardown_node; + $node_cascade->teardown_node; I don't think explicit teardown is useless as the final cleanup. By the way the attached patch is named as "Fix-corner-case..." but doesn't contain the fix. Is it intentional? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: