Re: Random pg_upgrade test failure on drongo - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: Random pg_upgrade test failure on drongo |
Date | |
Msg-id | 093b8ab5-e634-9ec6-4f59-b5c659ebb8f7@gmail.com Whole thread Raw |
In response to | Re: Random pg_upgrade test failure on drongo (Andrew Dunstan <andrew@dunslane.net>) |
List | pgsql-hackers |
Hello Andrew and Kuroda-san, 27.11.2023 16:58, Andrew Dunstan wrote: >>> It's also interesting, what is full version/build of OS on drongo and >>> fairywren. >>> >> >> It's WS 2019 1809/17763.4252. The latest available AFAICT is 17763.5122 >> > > I've updated it to 17763.5122 now. > Thank you for the information! It had pushed me to upgrade my Server 2019 1809 17763.592 to 17763.4252. And then I discovered that I have difficulties with reproducing the issue on all my VMs after reboot (even on old versions/builds). It took me a while to understand what's going on and what affects reproduction of the issue. I was puzzled by the fact that I can't reproduce the issue with my unlink-open test program under seemingly the same conditions as before, until I realized that the issue reproduced only when the target directory opened in Windows Explorer. Now I'm sorry for bringing more mystery into the topic and for misleading information. So, the issue reproduced only when something scans the working directory for files/opens them. I added the same logic into my test program (see unlink-open-scandir attached) and now I see the failure on Windows Server 2019 (Version 10.0.17763.4252). A script like this: start cmd /c "unlink-open-scandir test1 10 5000 >log1 2>&1" ... start cmd /c "unlink-open-scandir test10 10 5000 >log10 2>&1" results in: C:\temp>find "failed" log* ---------- LOG1 ---------- LOG10 fopen() after unlink() failed (13) ---------- LOG2 fopen() after unlink() failed (13) ---------- LOG3 fopen() after unlink() failed (13) ---------- LOG4 fopen() after unlink() failed (13) ---------- LOG5 fopen() after unlink() failed (13) ---------- LOG6 fopen() after unlink() failed (13) ---------- LOG7 fopen() after unlink() failed (13) ---------- LOG8 fopen() after unlink() failed (13) ---------- LOG9 fopen() after unlink() failed (13) C:\temp>type log10 ... iteration 108 fopen() after unlink() failed (13) The same observed on: Windows 10 Version 1809 (OS Build 17763.1) But no failures on: Windows 10 Version 22H2 (OS Build 19045.3693) Windows 11 Version 21H2 (OS Build 22000.613) So the behavior change really took place, but my previous estimations were incorrect (my apologies). BTW, "rename" mode of the test program can produce more rare errors on rename: ---------- LOG3 MoveFileEx() failed (0) but not on open. 30.11.2023 13:00, Hayato Kuroda (Fujitsu) wrote: > Thanks for your interest for the issue. I have been tracking the failure but been not occurred. > Your analysis seems to solve BF failures, by updating OSes. Yes, but I don't think that leaving Server 2019 behind (I suppose Windows Server 2019 build 20348 would have the same behaviour as Windows 10 19045) is affordable. (Though looking at Cirrus CI logs, I see that what is entitled "Windows Server 2019" in fact is Windows Server 2022 there.) >> I think that's because unlink() is performed asynchronously on those old >> Windows versions, but rename() is always synchronous. > > OK. Actually I could not find descriptions about them, but your experiment showed facts. I don't know how this peculiarity is called, but it looks like when some other process captures the file handle, unlink() exits as if the file was deleted completely, but the subsequent open() fails. Best regards, Alexander
Attachment
pgsql-hackers by date: