Re: BUG #16161: pg_ctl stop fails sometimes (on Windows) - Mailing list pgsql-bugs
From | Alexander Lakhin |
---|---|
Subject | Re: BUG #16161: pg_ctl stop fails sometimes (on Windows) |
Date | |
Msg-id | c4c0e925-7b00-a555-9c56-80573af5ff4a@gmail.com Whole thread Raw |
In response to | Re: BUG #16161: pg_ctl stop fails sometimes (on Windows) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #16161: pg_ctl stop fails sometimes (on Windows)
|
List | pgsql-bugs |
Hello Tom, 16.12.2019 0:26, Tom Lane wrote: > Alexander Lakhin <exclusion@gmail.com> writes: >>> The regression tests on Windows sometimes fail with 'Permission denied' >>> errors. For example: >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2019-12-11%2007%3A45%3A33 >> I see two ways to fix the issue: >> 1) Unlink postmaster.pid using rename operation (adopt the solution from >> https://stackoverflow.com/questions/3764072/). >> 2) Ignore such 'Permission denied' error and just try to open the file >> once again (attached fix_open_for_unlink.patch implements this). >> I'm inclined to the second approach as pgwin32_open() already handles >> two transient states (Windows-only), and it could be useful not only for >> postmaster.pid, but for some other files. > Agreed that (2) seems like the way to go. However, I'm not too pleased > with the patch as given, because it is gratuitously different in almost > every possible way from the adjacent code that's doing just about the same > thing for those other transient failures. Why is the timeout duration > different? Why is the looping logic not identical? Why did you make a > different decision about whether logging might be a good idea? Actually, > why didn't you just extend the existing if-block to also cover this case? > Maybe there's good reasons to be different, but you didn't explain them. Thanks for your questions. I'll try to add more details. I didn't want to use the existing code for three reasons. First, there are too many differences in the parameters: other error code, other error reason, no need for logging. Second, I wanted to decrease a delay, as whole unlink operation (comprising of three kernel calls: CreateFile, SetDispositionInformationFile, CloseFile) performed for me in 0.0002 seconds (as ProcessMonitor showed for several runs). Assuming even hundredfold duration seems sufficient. On the other side, ERROR_ACCESS_DENIED can be caused by a real access restriction and waiting for 30 seconds before erroring out is too long. Third, the logic of the existing loop confused me — I haven't seen a reason to sleep for a last time and then return an error without rechecking a condition. And I think that the logging is irrelevant as we want to handle here just the DELETE_PENDING state. If the "access denied" error state is disappeared within a second, then we can log only "the file probably was in a deletion state and now it's not", and if this state is persisting, we can log "access is really denied", but I believe that the existing calling code does the same anyway. > I'm also not that excited about memorializing a stackoverflow discussion > as the reason to do something. Can we point to something in Microsoft's > official docs? I'm not excited about this too, but I couldn't find something more appropriate. For example. there is a similar question on their forums: https://social.technet.microsoft.com/Forums/office/en-US/58e5c670-f024-44ff-9919-36c44ab11d9c/file-delete-pending-problem?forum=w7itprogeneral but it left without a reference to their docs. WebArchive contains their knowledge base article where STATUS_DELETE_PENDING is mapped to ERROR_ACCESS_DENIED, but today this information cannot be found on microsoft.com: https://web.archive.org/web/20150317121919/http://support.microsoft.com/en-us/kb/113996 Maybe they consider this as too low-level implementation details... I think, they have some internal documentation where this is described in details, but we can't reference it anyway. On the other hand, finding a couple of references to stackoverflow in src/ made me feel that it is not widely accepted, but still accepted (as an exception). Best regards. Alexander
pgsql-bugs by date: