Thread: BUG #16161: pg_ctl stop fails sometimes (on Windows)
The following bug has been logged on the website: Bug reference: 16161 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 12.1 Operating system: Windows Description: The regression tests on Windows sometimes fail with 'Permission denied' errors. For example: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2019-12-11%2007%3A45%3A33 ============== shutting down postmaster ============== pg_ctl: could not open PID file "c:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied This error occurs when pg_ctl is trying to open postmaster.pid while this file is in "delete pending" state (https://stackoverflow.com/questions/3764072/).
11.12.2019 23:00, PG Bug reporting form wrote:
To reproduce the issue reliably I propose a simple modification to synchronize unlink() with open() and a simple test (sync_pid_ops+test.patch).The following bug has been logged on the website: Bug reference: 16161 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 12.1 Operating system: Windows Description: The regression tests on Windows sometimes fail with 'Permission denied' errors. For example: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2019-12-11%2007%3A45%3A33 ============== shutting down postmaster ============== pg_ctl: could not open PID file "c:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied This error occurs when pg_ctl is trying to open postmaster.pid while this file is in "delete pending" state (https://stackoverflow.com/questions/3764072/).
With the patch applied, `vcregress taptest src/test/restart` fails for me on iteration 47, 6, 7, 42, 51, 26, ...
I see two ways to fix the issue:
1) Unlink postmaster.pid using rename operation (adopt the solution from https://stackoverflow.com/questions/3764072/).
2) Ignore such 'Permission denied' error and just try to open the file once again (attached fix_open_for_unlink.patch implements this).
I'm inclined to the second approach as pgwin32_open() already handles two transient states (Windows-only), and it could be useful not only for postmaster.pid, but for some other files.
Best regards,
Alexander
Attachment
Alexander Lakhin <exclusion@gmail.com> writes: >> The regression tests on Windows sometimes fail with 'Permission denied' >> errors. For example: >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2019-12-11%2007%3A45%3A33 > I see two ways to fix the issue: > 1) Unlink postmaster.pid using rename operation (adopt the solution from > https://stackoverflow.com/questions/3764072/). > 2) Ignore such 'Permission denied' error and just try to open the file > once again (attached fix_open_for_unlink.patch implements this). > I'm inclined to the second approach as pgwin32_open() already handles > two transient states (Windows-only), and it could be useful not only for > postmaster.pid, but for some other files. Agreed that (2) seems like the way to go. However, I'm not too pleased with the patch as given, because it is gratuitously different in almost every possible way from the adjacent code that's doing just about the same thing for those other transient failures. Why is the timeout duration different? Why is the looping logic not identical? Why did you make a different decision about whether logging might be a good idea? Actually, why didn't you just extend the existing if-block to also cover this case? Maybe there's good reasons to be different, but you didn't explain them. I'm also not that excited about memorializing a stackoverflow discussion as the reason to do something. Can we point to something in Microsoft's official docs? regards, tom lane
Hello Tom, 16.12.2019 0:26, Tom Lane wrote: > Alexander Lakhin <exclusion@gmail.com> writes: >>> The regression tests on Windows sometimes fail with 'Permission denied' >>> errors. For example: >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory&dt=2019-12-11%2007%3A45%3A33 >> I see two ways to fix the issue: >> 1) Unlink postmaster.pid using rename operation (adopt the solution from >> https://stackoverflow.com/questions/3764072/). >> 2) Ignore such 'Permission denied' error and just try to open the file >> once again (attached fix_open_for_unlink.patch implements this). >> I'm inclined to the second approach as pgwin32_open() already handles >> two transient states (Windows-only), and it could be useful not only for >> postmaster.pid, but for some other files. > Agreed that (2) seems like the way to go. However, I'm not too pleased > with the patch as given, because it is gratuitously different in almost > every possible way from the adjacent code that's doing just about the same > thing for those other transient failures. Why is the timeout duration > different? Why is the looping logic not identical? Why did you make a > different decision about whether logging might be a good idea? Actually, > why didn't you just extend the existing if-block to also cover this case? > Maybe there's good reasons to be different, but you didn't explain them. Thanks for your questions. I'll try to add more details. I didn't want to use the existing code for three reasons. First, there are too many differences in the parameters: other error code, other error reason, no need for logging. Second, I wanted to decrease a delay, as whole unlink operation (comprising of three kernel calls: CreateFile, SetDispositionInformationFile, CloseFile) performed for me in 0.0002 seconds (as ProcessMonitor showed for several runs). Assuming even hundredfold duration seems sufficient. On the other side, ERROR_ACCESS_DENIED can be caused by a real access restriction and waiting for 30 seconds before erroring out is too long. Third, the logic of the existing loop confused me — I haven't seen a reason to sleep for a last time and then return an error without rechecking a condition. And I think that the logging is irrelevant as we want to handle here just the DELETE_PENDING state. If the "access denied" error state is disappeared within a second, then we can log only "the file probably was in a deletion state and now it's not", and if this state is persisting, we can log "access is really denied", but I believe that the existing calling code does the same anyway. > I'm also not that excited about memorializing a stackoverflow discussion > as the reason to do something. Can we point to something in Microsoft's > official docs? I'm not excited about this too, but I couldn't find something more appropriate. For example. there is a similar question on their forums: https://social.technet.microsoft.com/Forums/office/en-US/58e5c670-f024-44ff-9919-36c44ab11d9c/file-delete-pending-problem?forum=w7itprogeneral but it left without a reference to their docs. WebArchive contains their knowledge base article where STATUS_DELETE_PENDING is mapped to ERROR_ACCESS_DENIED, but today this information cannot be found on microsoft.com: https://web.archive.org/web/20150317121919/http://support.microsoft.com/en-us/kb/113996 Maybe they consider this as too low-level implementation details... I think, they have some internal documentation where this is described in details, but we can't reference it anyway. On the other hand, finding a couple of references to stackoverflow in src/ made me feel that it is not widely accepted, but still accepted (as an exception). Best regards. Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > 16.12.2019 0:26, Tom Lane wrote: >> Agreed that (2) seems like the way to go. However, I'm not too pleased >> with the patch as given, because it is gratuitously different in almost >> every possible way from the adjacent code that's doing just about the same >> thing for those other transient failures. Why is the timeout duration >> different? Why is the looping logic not identical? Why did you make a >> different decision about whether logging might be a good idea? Actually, >> why didn't you just extend the existing if-block to also cover this case? >> Maybe there's good reasons to be different, but you didn't explain them. > Second, I wanted to decrease a delay, as whole unlink operation > (comprising of three kernel calls: CreateFile, > SetDispositionInformationFile, CloseFile) performed for me in 0.0002 > seconds (as ProcessMonitor showed for several runs). Assuming even > hundredfold duration seems sufficient. Fair enough. I wonder btw whether the 100ms wait quantum isn't ridiculously long for modern machines. We'd have to change it for both conditions though to avoid odd behavior (or use two separate wait counters?), so I didn't mess with that for now. > Third, the logic of the existing loop confused me — I haven't seen a > reason to sleep for a last time and then return an error without > rechecking a condition. That is pretty bogus now that you mention it, but the answer is to fix it not just avert your eyes. > And I think that the logging is irrelevant as we want to handle here > just the DELETE_PENDING state. True, if we're not going to wait very long then there's little need to log. I've pushed this with adjustment of the other loop and some fooling with the comment --- I still don't see a need to cite stackoverflow as an authority. regards, tom lane
16.12.2019 23:17, Tom Lane wrote: > > I've pushed this with adjustment of the other loop and some fooling > with the comment --- I still don't see a need to cite stackoverflow > as an authority. Thank you! I hope now Windows machines will concede the first place of the race for failures. Best regards, Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > 16.12.2019 23:17, Tom Lane wrote: >> I've pushed this with adjustment of the other loop and some fooling >> with the comment --- I still don't see a need to cite stackoverflow >> as an authority. > Thank you! I hope now Windows machines will concede the first place of > the race for failures. Seems like we're not there yet :-(. Buildfarm members jacana and fairywren have been failing the recovery tests, in v10 and up, since this went in. It looks like we're getting a timeout on this step in 010_logical_decoding_timelines.pl: # Should be able to read from slot created before base backup ($ret, $stdout, $stderr) = $node_replica->psql( 'postgres', "SELECT data FROM pg_logical_slot_peek_changes('before_basebackup', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts','1');", timeout => 30); Now, it's mighty suspicious that this has a timeout of 30 sec which is exactly how long we made pgwin32_open retry for --- but how is it that this test is causing an ERROR_ACCESS_DENIED failure? And if it was, how did we get past that before? It does look suspiciously like this wait is triggering on these machines and somehow getting masked in most other test cases. If you compare the per-step runtimes in jacana's last successful run, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-12-16%2013%3A01%3A26 with those after this patch went in, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-12-17%2004%3A16%3A40 there's clearly something very wrong. fairywren likewise. regards, tom lane
At Tue, 17 Dec 2019 18:50:26 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Alexander Lakhin <exclusion@gmail.com> writes: > > 16.12.2019 23:17, Tom Lane wrote: > >> I've pushed this with adjustment of the other loop and some fooling > >> with the comment --- I still don't see a need to cite stackoverflow > >> as an authority. > > > Thank you! I hope now Windows machines will concede the first place of > > the race for failures. > > Seems like we're not there yet :-(. Buildfarm members jacana and > fairywren have been failing the recovery tests, in v10 and up, > since this went in. It looks like we're getting a timeout on > this step in 010_logical_decoding_timelines.pl: > > # Should be able to read from slot created before base backup > ($ret, $stdout, $stderr) = $node_replica->psql( > 'postgres', > "SELECT data FROM pg_logical_slot_peek_changes('before_basebackup', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts','1');", > timeout => 30); > > Now, it's mighty suspicious that this has a timeout of 30 sec > which is exactly how long we made pgwin32_open retry for --- > but how is it that this test is causing an ERROR_ACCESS_DENIED > failure? And if it was, how did we get past that before? > > It does look suspiciously like this wait is triggering on these > machines and somehow getting masked in most other test cases. > If you compare the per-step runtimes in jacana's last successful > run, > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-12-16%2013%3A01%3A26 > > with those after this patch went in, > > https://buildfarm.postgresql.org/cgi-bin/shocw_log.pl?nm=jacana&dt=2019-12-17%2004%3A16%3A40 > > there's clearly something very wrong. fairywren likewise. The immediate cause of the failure seems to be a slowdown of startup or recovery. It took only 8 minutes in the successful case but 25 minutes even though stopped on the way in the failure case. In the failure case, it took 30 seconds to move only 26.1MB from 15E8C48 to 30013A0. just around 0.3 seconds for a move of the same size (15E9C48->3000DE8 (26.1MB)) in the successful case (*1). In the sucessful case, the time took from archiving 00000002.history to removing old segments is 0.15 s, but about 3.5 s in the failure case (*2). Seemingly the immediate cause of the timeout looks like just an extreme slowdown of file read, but it took 2 seconds to move within a page from 3000C18 to 3000C50. So I suspect that 6d7547c219 might slow down readTimeLineHistory() significantly in the case. regards. *1: jacana - WAL reading Success 2019-12-16 09:01:01.805 EST [5df78e1d.b74:4] 010_logical_decoding_timelines.pl LOG: starting logical decoding for slot "before_basebackup" 2019-12-16 09:01:01.805 EST [5df78e1d.b74:5] 010_logical_decoding_timelines.pl DETAIL: Streaming transactions committingafter 0/15CC960, reading WAL from 0/15CC928. ... 2019-12-16 09:01:01.836 EST [5df78e1d.b74:34] 010_logical_decoding_timelines.pl DEBUG: failed to increase restart lsn: proposed0/3000DE8, after 0/3000DE8, current candidate 0/15E9C48, current after 0/15E9C48, flushed up to 0/15CC960 Fail 2019-12-17 01:35:02.235 EST [5df87716.2ddc:4] 010_logical_decoding_timelines.pl LOG: starting logical decoding for slot"before_basebackup" 2019-12-17 01:35:02.235 EST [5df87716.2ddc:5] 010_logical_decoding_timelines.pl DETAIL: Streaming transactions committingafter 0/15CC960, reading WAL from 0/15CC928. ... 2019-12-17 01:35:24.195 EST [5df87716.2ddc:33] 010_logical_decoding_timelines.pl DEBUG: failed to increase restart lsn:proposed 0/3000C18, after 0/3000C18, current candidate 0/15E9C48, current after 0/15E9C48, flushed up to 0/15CC960 2019-12-17 01:35:24.195 EST [5df87716.2ddc:34] 010_logical_decoding_timelines.pl DEBUG: serializing snapshot to pg_logical/snapshots/0-3000C50.snap 2019-12-17 01:35:26.379 EST [5df87716.2ddc:35] 010_logical_decoding_timelines.pl DEBUG: failed to increase restart lsn:proposed 0/3000C50, after 0/3000C50, current candidate 0/15E9C48, current after 0/15E9C48, flushed up to 0/15CC960 2019-12-17 01:35:26.379 EST [5df87716.2ddc:36] 010_logical_decoding_timelines.pl DEBUG: serializing snapshot to pg_logical/snapshots/0-3000D68.snap 2019-12-17 01:35:28.575 EST [5df87716.2ddc:37] 010_logical_decoding_timelines.pl DEBUG: failed to increase restart lsn:proposed 0/3000D68, after 0/3000D68, current candidate 0/15E9C48, current after 0/15E9C48, flushed up to 0/15CC960 2019-12-17 01:35:28.575 EST [5df87716.2ddc:38] 010_logical_decoding_timelines.pl DEBUG: serializing snapshot to pg_logical/snapshots/0-30012F0.snap 2019-12-17 01:35:30.775 EST [5df87716.2ddc:39] 010_logical_decoding_timelines.pl DEBUG: failed to increase restart lsn:proposed 0/30012F0, after 0/30012F0, current candidate 0/15E9C48, current after 0/15E9C48, flushed up to 0/15CC960 2019-12-17 01:35:30.775 EST [5df87716.2ddc:40] 010_logical_decoding_timelines.pl DEBUG: serializing snapshot to pg_logical/snapshots/0-30013A0.snap ==== *2: jacana - checkpoint Success 2019-12-16 09:01:01.399 EST [5df78e1d.170c:1] DEBUG: archived write-ahead log file "00000002.history" 2019-12-16 09:01:01.430 EST [5df78e1d.170c:2] DEBUG: archived write-ahead log file "000000010000000000000003.partial" 2019-12-16 09:01:01.445 EST [5df78e1b.2f60:4] DEBUG: attempting to remove WAL segments older than log file 000000000000000000000000 2019 Fail 2019-12-17 01:35:01.950 EST [5df87715.2e0c:1] DEBUG: archived write-ahead log file "00000002.history" 2019-12-17 01:35:03.080 EST [5df87715.2e0c:2] DEBUG: archived write-ahead log file "000000010000000000000003.partial" 2019-12-17 01:35:05.389 EST [5df876ed.3370:4] DEBUG: attempting to remove WAL segments older than log file 000000000000000000000000 -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, 18.12.2019 6:25, Kyotaro Horiguchi wrote: > At Tue, 17 Dec 2019 18:50:26 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in >> >> Now, it's mighty suspicious that this has a timeout of 30 sec >> which is exactly how long we made pgwin32_open retry for --- >> but how is it that this test is causing an ERROR_ACCESS_DENIED >> failure? And if it was, how did we get past that before? Not exactly, we set timeout for 1 sec. It seems we have this timeout occurred many times. >> It does look suspiciously like this wait is triggering on these >> machines and somehow getting masked in most other test cases. >> If you compare the per-step runtimes in jacana's last successful >> run, >> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-12-16%2013%3A01%3A26 >> >> with those after this patch went in, >> >> https://buildfarm.postgresql.org/cgi-bin/shocw_log.pl?nm=jacana&dt=2019-12-17%2004%3A16%3A40 >> >> there's clearly something very wrong. fairywren likewise. > The immediate cause of the failure seems to be a slowdown of startup > or recovery. It took only 8 minutes in the successful case but 25 > minutes even though stopped on the way in the failure case. > > In the failure case, it took 30 seconds to move only 26.1MB from > 15E8C48 to 30013A0. just around 0.3 seconds for a move of the same > size (15E9C48->3000DE8 (26.1MB)) in the successful case (*1). In the > sucessful case, the time took from archiving 00000002.history to > removing old segments is 0.15 s, but about 3.5 s in the failure case > (*2). > > Seemingly the immediate cause of the timeout looks like just an > extreme slowdown of file read, but it took 2 seconds to move within a > page from 3000C18 to 3000C50. So I suspect that 6d7547c219 might slow > down readTimeLineHistory() significantly in the case. > > regards. > > Yes, some tests are really slowed down (namely, test-decoding-check, pg_basebackup-check, ...). ProcessMonitor shows: "8:11:57.1428800 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","SUCCESS",... "8:11:57.1429851 AM","postgres.exe","2640","SetRenameInformationFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state.tmp","SUCCESS",... "8:11:57.1432041 AM","postgres.exe","2640","CloseFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","SUCCESS","" "8:11:57.1432964 AM","postgres.exe","2640","CloseFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state","SUCCESS","" "8:11:57.1435137 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state","SUCCESS",... "8:11:57.1435709 AM","postgres.exe","2640","CloseFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot\state","SUCCESS","" "8:11:57.1437251 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.2451100 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.3571077 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.4643527 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.5778052 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.6825679 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.7921704 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:57.9018507 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:58.0109597 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:58.1204329 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:58.2300829 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY",... "8:11:58.2302124 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.3393345 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.4484310 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.5584079 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.6683586 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.7773396 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.8858091 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:58.9955264 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:59.1050665 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:59.2137752 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:59.3234641 AM","postgres.exe","2640","CreateFile","...\t_006_logical_decoding_master_data\pgdata\pg_replslot","IS DIRECTORY",... "8:11:59.3261361 AM","postgres.exe","2640","QueryOpen","C:\Windows\System32\kernel.appcore.dll","FAST IO DISALLOWED","" https://web.archive.org/web/20150317121919/http://support.microsoft.com/en-us/kb/113996 tells that STATUS_FILE_IS_A_DIRECTORY is mapped to ERROR_ACCESS_DENIED too. It seems that the cause of the issue is in fsync_fname_ext: fd = OpenTransientFile(fname, flags); /* * Some OSs don't allow us to open directories at all (Windows returns * EACCES), just ignore the error in that case. If desired also silently * ignoring errors about unreadable files. Log others. */ if (fd < 0 && isdir && (errno == EISDIR || errno == EACCES)) return 0; Maybe we need to take isdir into account before calling OpenTransientFile on Windows... Please look at the simple patch that makes the recoverycheck test pass. Best regards, Alexander
Attachment
Alexander Lakhin <exclusion@gmail.com> writes: > 18.12.2019 6:25, Kyotaro Horiguchi wrote: >> At Tue, 17 Dec 2019 18:50:26 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in >>> It does look suspiciously like this wait is triggering on these >>> machines and somehow getting masked in most other test cases. > https://web.archive.org/web/20150317121919/http://support.microsoft.com/en-us/kb/113996 > tells that STATUS_FILE_IS_A_DIRECTORY is mapped to ERROR_ACCESS_DENIED too. Ugh. I wondered if we could really get away with slowing down all ERROR_ACCESS_DENIED cases. > It seems that the cause of the issue is in fsync_fname_ext: I think you're blaming the messenger. We cannot suddenly impose a rule that people mustn't try to open() files that might be directories. Even if fsync_fname_ext() happens to be the only place that does that today (which I doubt) it's going to bite us on the rear regularly in the future, because there's no way to enforce it, and most developers aren't going to notice the problem in testing. It's possible that we could deal with this specific case by having pgwin32_open() try to stat the file and see if it's a directory. But I think that's messy. Also, the list of kernel-level conditions that that webpage says are mapped to ERROR_ACCESS_DENIED is scarily long: STATUS_THREAD_IS_TERMINATING ERROR_ACCESS_DENIED STATUS_PROCESS_IS_TERMINATING ERROR_ACCESS_DENIED STATUS_INVALID_LOCK_SEQUENCE ERROR_ACCESS_DENIED STATUS_INVALID_VIEW_SIZE ERROR_ACCESS_DENIED STATUS_ALREADY_COMMITTED ERROR_ACCESS_DENIED STATUS_ACCESS_DENIED ERROR_ACCESS_DENIED STATUS_FILE_IS_A_DIRECTORY ERROR_ACCESS_DENIED STATUS_CANNOT_DELETE ERROR_ACCESS_DENIED STATUS_FILE_DELETED ERROR_ACCESS_DENIED STATUS_FILE_RENAMED ERROR_ACCESS_DENIED STATUS_DELETE_PENDING ERROR_ACCESS_DENIED STATUS_PORT_CONNECTION_REFUSED ERROR_ACCESS_DENIED ... STATUS_ENCRYPTION_FAILED ERROR_ACCESS_DENIED STATUS_DECRYPTION_FAILED ERROR_ACCESS_DENIED STATUS_NO_RECOVERY_POLICY ERROR_ACCESS_DENIED STATUS_NO_EFS ERROR_ACCESS_DENIED STATUS_WRONG_EFS ERROR_ACCESS_DENIED STATUS_NO_USER_KEYS ERROR_ACCESS_DENIED ... SEC_E_MESSAGE_ALTERED ERROR_ACCESS_DENIED SEC_E_OUT_OF_SEQUENCE ERROR_ACCESS_DENIED Maybe most of these can't occur during fopen, but I'd rather not assume that. Is there a way to get the original kernel error code? It'd be a lot better if we could be sure that the condition is STATUS_DELETE_PENDING before looping. regards, tom lane
18.12.2019 19:03, Tom Lane wrote:
Regarding fsync_fname_ext(), I thought that it's OK to avoid performing a call that is known to fail. And even if someone will try to open() a directory, he will need to add the same check for EACCES, otherwise the code will fail on Windows. In fact, I see the same pattern in pre_sync_fname, fsync_fname in file_utils.c.Alexander Lakhin <exclusion@gmail.com> writes:18.12.2019 6:25, Kyotaro Horiguchi wrote:At Tue, 17 Dec 2019 18:50:26 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote inIt does look suspiciously like this wait is triggering on these machines and somehow getting masked in most other test cases.https://web.archive.org/web/20150317121919/http://support.microsoft.com/en-us/kb/113996 tells that STATUS_FILE_IS_A_DIRECTORY is mapped to ERROR_ACCESS_DENIED too.Ugh. I wondered if we could really get away with slowing down all ERROR_ACCESS_DENIED cases.It seems that the cause of the issue is in fsync_fname_ext:I think you're blaming the messenger. We cannot suddenly impose a rule that people mustn't try to open() files that might be directories. Even if fsync_fname_ext() happens to be the only place that does that today (which I doubt) it's going to bite us on the rear regularly in the future, because there's no way to enforce it, and most developers aren't going to notice the problem in testing.
I see your point about future uncertainty, but I don't think that receiving e.g. STATUS_FILE_DELETED or STATUS_NO_EFS error every few seconds and ignoring it is a normal practice.It's possible that we could deal with this specific case by having pgwin32_open() try to stat the file and see if it's a directory. But I think that's messy. Also, the list of kernel-level conditions that that webpage says are mapped to ERROR_ACCESS_DENIED is scarily long: STATUS_THREAD_IS_TERMINATING ERROR_ACCESS_DENIED STATUS_PROCESS_IS_TERMINATING ERROR_ACCESS_DENIED STATUS_INVALID_LOCK_SEQUENCE ERROR_ACCESS_DENIED STATUS_INVALID_VIEW_SIZE ERROR_ACCESS_DENIED STATUS_ALREADY_COMMITTED ERROR_ACCESS_DENIED STATUS_ACCESS_DENIED ERROR_ACCESS_DENIED STATUS_FILE_IS_A_DIRECTORY ERROR_ACCESS_DENIED STATUS_CANNOT_DELETE ERROR_ACCESS_DENIED STATUS_FILE_DELETED ERROR_ACCESS_DENIED STATUS_FILE_RENAMED ERROR_ACCESS_DENIED STATUS_DELETE_PENDING ERROR_ACCESS_DENIED STATUS_PORT_CONNECTION_REFUSED ERROR_ACCESS_DENIED ... STATUS_ENCRYPTION_FAILED ERROR_ACCESS_DENIED STATUS_DECRYPTION_FAILED ERROR_ACCESS_DENIED STATUS_NO_RECOVERY_POLICY ERROR_ACCESS_DENIED STATUS_NO_EFS ERROR_ACCESS_DENIED STATUS_WRONG_EFS ERROR_ACCESS_DENIED STATUS_NO_USER_KEYS ERROR_ACCESS_DENIED ... SEC_E_MESSAGE_ALTERED ERROR_ACCESS_DENIED SEC_E_OUT_OF_SEQUENCE ERROR_ACCESS_DENIED Maybe most of these can't occur during fopen, but I'd rather not assume that
Unfortunately no, there is no known way to get the kernel error code. As stated in https://stackoverflow.com/questions/3764072, there is the GetFileInformationByHandleEx function, that can return DeletePending state, but to use it we should open the file first (to get a file handle).Is there a way to get the original kernel error code? It'd be a lot better if we could be sure that the condition is STATUS_DELETE_PENDING before looping.
So I see three ways now:
1) Revert the pgwin32_open change and choose the previously rejected approach: Unlink postmaster.pid using rename operation (adopt the solution from https://stackoverflow.com/questions/3764072/).
2) Add a check for directory in pgwin32_open, but I think then we're getting close to checking there just for postmaster.pid.
3) Find out how often we get ERROR_ACCESS_DENIED during all regression tests and if such error count is near zero, then add the check for isdir in fsync_fname_ext and friends, and get done.
I'm still thinking that we shouldn't perform doomed calls, but I'm ready to change the course.
Best regards,
Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > 18.12.2019 19:03, Tom Lane wrote: >> Even if fsync_fname_ext() happens to be the only place that does that >> today (which I doubt) it's going to bite us on the rear regularly in >> the future, because there's no way to enforce it, and most developers >> aren't going to notice the problem in testing. > Regarding fsync_fname_ext(), I thought that it's OK to avoid performing > a call that is known to fail. And even if someone will try to open() a > directory, he will need to add the same check for EACCES, otherwise the > code will fail on Windows. In fact, I see the same pattern in > pre_sync_fname, fsync_fname in file_utils.c. My point is that it's totally unreasonable to expect callers to always know, in advance of accessing the filesystem, whether the pathname they are going to access is a directory or not. The point of calling a filesystem access function is to find out that kind of information. Yes, there are many cases where we can expect that PG can predict this, but an open() replacement has no business making such a sweeping assumption about its use-cases. >> Is there a way to get the original kernel error code? >> It'd be a lot better if we could be sure that the condition >> is STATUS_DELETE_PENDING before looping. > Unfortunately no, there is no known way to get the kernel error code. Bleah. I did a little bit of googling and couldn't find anything to contradict that, but it's sure annoying. > So I see three ways now: > 1) Revert the pgwin32_open change and choose the previously rejected > approach: Unlink postmaster.pid using rename operation (adopt the > solution from https://stackoverflow.com/questions/3764072/). > 2) Add a check for directory in pgwin32_open, but I think then we're > getting close to checking there just for postmaster.pid. > 3) Find out how often we get ERROR_ACCESS_DENIED during all regression > tests and if such error count is near zero, then add the check for isdir > in fsync_fname_ext and friends, and get done. I'm not terribly thrilled with (1) because I do not think that postmaster.pid accesses are the only place where we have this issue. And (3) is not only not a forward-looking solution, but it imagines that the speed of the regression tests is the only thing we need to worry about here. We generally don't assume that the regression tests are an accurate model of the performance users will see. So that leaves us with (2), it seems. regards, tom lane
At Wed, 18 Dec 2019 17:09:15 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Alexander Lakhin <exclusion@gmail.com> writes: > > 18.12.2019 19:03, Tom Lane wrote: > >> Even if fsync_fname_ext() happens to be the only place that does that > >> today (which I doubt) it's going to bite us on the rear regularly in > >> the future, because there's no way to enforce it, and most developers > >> aren't going to notice the problem in testing. > > > Regarding fsync_fname_ext(), I thought that it's OK to avoid performing > > a call that is known to fail. And even if someone will try to open() a > > directory, he will need to add the same check for EACCES, otherwise the > > code will fail on Windows. In fact, I see the same pattern in > > pre_sync_fname, fsync_fname in file_utils.c. > > My point is that it's totally unreasonable to expect callers to always > know, in advance of accessing the filesystem, whether the pathname they > are going to access is a directory or not. The point of calling a > filesystem access function is to find out that kind of information. > Yes, there are many cases where we can expect that PG can predict > this, but an open() replacement has no business making such a sweeping > assumption about its use-cases. > > >> Is there a way to get the original kernel error code? > >> It'd be a lot better if we could be sure that the condition > >> is STATUS_DELETE_PENDING before looping. > > > Unfortunately no, there is no known way to get the kernel error code. > > Bleah. I did a little bit of googling and couldn't find anything > to contradict that, but it's sure annoying. The codes seems to be indentifiable only on device driver layer. > > So I see three ways now: > > 1) Revert the pgwin32_open change and choose the previously rejected > > approach: Unlink postmaster.pid using rename operation (adopt the > > solution from https://stackoverflow.com/questions/3764072/). > > 2) Add a check for directory in pgwin32_open, but I think then we're > > getting close to checking there just for postmaster.pid. > > 3) Find out how often we get ERROR_ACCESS_DENIED during all regression > > tests and if such error count is near zero, then add the check for isdir > > in fsync_fname_ext and friends, and get done. > > I'm not terribly thrilled with (1) because I do not think that > postmaster.pid accesses are the only place where we have this issue. > And (3) is not only not a forward-looking solution, but it imagines > that the speed of the regression tests is the only thing we need to > worry about here. We generally don't assume that the regression > tests are an accurate model of the performance users will see. > > So that leaves us with (2), it seems. Agreed for (2). I'm not sure about the point of the discussion that stat()'ing is messy. If it's about performance, does stat()ing only when CreateFile returned ERROR_ACCESS_DENIED helps? Anyway we wait for a moment in that case. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
19.12.2019 4:39, Kyotaro Horiguchi wrote: > At Wed, 18 Dec 2019 17:09:15 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in >> Alexander Lakhin <exclusion@gmail.com> writes: >>> So I see three ways now: >>> 1) Revert the pgwin32_open change and choose the previously rejected >>> approach: Unlink postmaster.pid using rename operation (adopt the >>> solution from https://stackoverflow.com/questions/3764072/). >>> 2) Add a check for directory in pgwin32_open, but I think then we're >>> getting close to checking there just for postmaster.pid. >>> 3) Find out how often we get ERROR_ACCESS_DENIED during all regression >>> tests and if such error count is near zero, then add the check for isdir >>> in fsync_fname_ext and friends, and get done. >> I'm not terribly thrilled with (1) because I do not think that >> postmaster.pid accesses are the only place where we have this issue. >> And (3) is not only not a forward-looking solution, but it imagines >> that the speed of the regression tests is the only thing we need to >> worry about here. We generally don't assume that the regression >> tests are an accurate model of the performance users will see. >> >> So that leaves us with (2), it seems. > Agreed for (2). > > I'm not sure about the point of the discussion that stat()'ing is > messy. If it's about performance, does stat()ing only when CreateFile > returned ERROR_ACCESS_DENIED helps? Anyway we wait for a moment in > that case. Please look at the patch that implements (2). It makes `vcregress recoverycheck` pass (and my demo restart test still passes too). As open(dir) is getting a little more expensive than before, maybe it's still worthwhile to patch fsync*(..., isdir,...). I can prepare such a patch if so. Best regards, Alexander
Attachment
At Thu, 19 Dec 2019 07:00:00 +0300, Alexander Lakhin <exclusion@gmail.com> wrote in > 19.12.2019 4:39, Kyotaro Horiguchi wrote: > > At Wed, 18 Dec 2019 17:09:15 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > >> Alexander Lakhin <exclusion@gmail.com> writes: > >>> So I see three ways now: > >>> 1) Revert the pgwin32_open change and choose the previously rejected > >>> approach: Unlink postmaster.pid using rename operation (adopt the > >>> solution from https://stackoverflow.com/questions/3764072/). > >>> 2) Add a check for directory in pgwin32_open, but I think then we're > >>> getting close to checking there just for postmaster.pid. > >>> 3) Find out how often we get ERROR_ACCESS_DENIED during all regression > >>> tests and if such error count is near zero, then add the check for isdir > >>> in fsync_fname_ext and friends, and get done. > >> I'm not terribly thrilled with (1) because I do not think that > >> postmaster.pid accesses are the only place where we have this issue. > >> And (3) is not only not a forward-looking solution, but it imagines > >> that the speed of the regression tests is the only thing we need to > >> worry about here. We generally don't assume that the regression > >> tests are an accurate model of the performance users will see. > >> > >> So that leaves us with (2), it seems. > > Agreed for (2). > > > > I'm not sure about the point of the discussion that stat()'ing is > > messy. If it's about performance, does stat()ing only when CreateFile > > returned ERROR_ACCESS_DENIED helps? Anyway we wait for a moment in > > that case. > Please look at the patch that implements (2). It makes `vcregress > recoverycheck` pass (and my demo restart test still passes too). > As open(dir) is getting a little more expensive than before, maybe it's > still worthwhile to patch fsync*(..., isdir,...). I can prepare such a > patch if so. Sorry, I hadn't looked it. +#ifdef WIN32 + /* Windows doesn't allow us to open directories at all. + */ + if (isdir) + return 0; +#endif Doesn't it need to handle ENOENT or anything that can happen if OpenTransientFile() were called? And.. how do we regard about the crash-safetiness on Windows? After some googling it seems that not only CreateFile but also fsync on directories cannot be performed on Windows at all. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Alexander Lakhin <exclusion@gmail.com> writes: > Please look at the patch that implements (2). It makes `vcregress > recoverycheck` pass (and my demo restart test still passes too). I think we could be a little smarter here. If the problem is STATUS_DELETE_PENDING, doesn't that affect stat() as well? That is, if stat() succeeds, we needn't wait, independently of whether it says S_ISDIR or not? This seems like a noticeable improvement if true, because it would mean that ordinary file-permission failures also need not wait. We'd still get confused if we get a permission failure on a containing directory rather than the file itself, but that I'm prepared to dismiss as an uncommon case. Entirely-untested patch along this line attached. BTW ... it's likely that stat() here is actually going to invoke pgwin32_safestat(), which has its own opinions about this, and indeed seems to think we'll get ERROR_DELETE_PENDING. I think that's harmless here, but it makes me wonder if we should use a native Windows API instead of going through stat(). > As open(dir) is getting a little more expensive than before, maybe it's > still worthwhile to patch fsync*(..., isdir,...). I can prepare such a > patch if so. I think we should leave that for later, so that the buildfarm can actually test whatever logic we put in here. regards, tom lane diff --git a/src/port/open.c b/src/port/open.c index 5165276..24af6f4 100644 --- a/src/port/open.c +++ b/src/port/open.c @@ -21,6 +21,7 @@ #include <fcntl.h> #include <assert.h> +#include <sys/stat.h> static int @@ -137,18 +138,33 @@ pgwin32_open(const char *fileName, int fileFlags,...) } /* - * ERROR_ACCESS_DENIED can be returned if the file is deleted but not - * yet gone (Windows NT status code is STATUS_DELETE_PENDING). Wait a - * bit and try again, giving up after 1 second (since this condition - * should never persist very long). + * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet + * gone (Windows NT status code is STATUS_DELETE_PENDING). In that + * case we want to wait a bit and try again, giving up after 1 second + * (since this condition should never persist very long). However, + * there are other commonly-hit cases that return ERROR_ACCESS_DENIED, + * so care is needed. In particular that happens if we try to open a + * directory, or of course if there's an actual file-permissions + * problem. To distinguish these cases, try a stat(). In the + * delete-pending case, it will either also get STATUS_DELETE_PENDING, + * or it will see the file as gone and fail with ENOENT. In other + * cases it will usually succeed. The only somewhat-likely case where + * this coding will uselessly wait is if there's a permissions problem + * with a containing directory, which we hope will never happen in any + * performance-critical code paths. */ if (err == ERROR_ACCESS_DENIED) { if (loops < 10) { - pg_usleep(100000); - loops++; - continue; + struct stat st; + + if (stat(fileName, &st) != 0) + { + pg_usleep(100000); + loops++; + continue; + } } }
19.12.2019 18:44, Tom Lane wrote: > Alexander Lakhin <exclusion@gmail.com> writes: >> Please look at the patch that implements (2). It makes `vcregress >> recoverycheck` pass (and my demo restart test still passes too). > I think we could be a little smarter here. If the problem is > STATUS_DELETE_PENDING, doesn't that affect stat() as well? That is, > if stat() succeeds, we needn't wait, independently of whether it > says S_ISDIR or not? This seems like a noticeable improvement if > true, because it would mean that ordinary file-permission failures > also need not wait. We'd still get confused if we get a permission > failure on a containing directory rather than the file itself, but > that I'm prepared to dismiss as an uncommon case. > > Entirely-untested patch along this line attached. I have tested it with my demo restart test and it works. Yes, stat() is resulted in the CreateFile() call too, so it should fail the same way. I haven't managed to catch both CreateFile() and stat() failing with STATUS_DELETE_PENDING (CreateFile() failed for me 37 times per 1000 loops), but I believe that sleep() will work correctly in this case. `vcregress recoverytest` is passed too. The DELETE_PENDING conflict now looks like this: 1 "8:50:44.9928200 PM","postgres.exe","2792","CreateFile","...\pgdata\postmaster.pid","SUCCESS","Desired Access: Read Attributes, Delete, ..." 2 "8:50:44.9928951 PM","postgres.exe","2792","QueryAttributeTagFile","...\pgdata\postmaster.pid","SUCCESS",... 3 "8:50:44.9929114 PM","pg_ctl.exe","2996","CreateFile","...\pgdata\postmaster.pid","DELETE PENDING","Desired Access: Generic Read, Disposition: Open, ..." 4 "8:50:44.9929162 PM","postgres.exe","2792","SetDispositionInformationFile","...\pgdata\postmaster.pid","SUCCESS","Delete: True" 5 "8:50:44.9929377 PM","postgres.exe","2792","CloseFile","...\pgdata\postmaster.pid","SUCCESS","" 6 "8:50:44.9932617 PM","pg_ctl.exe","2996","CreateFile","...\pgdata\postmaster.pid","NAME NOT FOUND","Desired Access: Read Attributes, ..." 7 "8:50:45.1042027 PM","pg_ctl.exe","2996","CreateFile","...\pgdata\postmaster.pid","NAME NOT FOUND","Desired Access: Generic Read, ..." 8 "8:50:45.9926445 PM","pg_ctl.exe","2396","CreateFile","...\pgdata\postmaster.pid","NAME NOT FOUND","Desired Access: Generic Read ..." Here 1, 2, 4, 5 are calls behind unlink() in postmaster, 3 is CreateFile() in pgwin32_open(), 6 is stat() in pgwin32_safestat(), then we have sleep for 0.1 sec, 7 is CreateFile() in pgwin32_open() again, and 8 is CreateFile() performed by a new pg_ctl instance. Maybe we should change the condition to 'if (stat(fileName, &st) != 0 && (err = GetLastError()) == ERROR_ACCESS_DENIED)' to avoid unnecessary sleep with a loop iteration... With this change the same conflict looks like this: 1 "10:23:12.9940793 PM","postgres.exe","1908","CreateFile","...\pgdata\postmaster.pid","SUCCESS","Desired Access: Read Attributes, ..." 2 "10:23:12.9941500 PM","postgres.exe","1908","QueryAttributeTagFile","...\pgdata\postmaster.pid","SUCCESS", ... 3 "10:23:12.9941713 PM","postgres.exe","1908","SetDispositionInformationFile","...\pgdata\postmaster.pid","SUCCESS","Delete: True" 4 "10:23:12.9941766 PM","pg_ctl.exe","2420","CreateFile","...\pgdata\postmaster.pid","DELETE PENDING","Desired Access: Generic Read ..." 5 "10:23:12.9941936 PM","postgres.exe","1908","CloseFile","...\pgdata\postmaster.pid","SUCCESS","" 6 "10:23:12.9945254 PM","pg_ctl.exe","2420","CreateFile","...\pgdata\postmaster.pid","NAME NOT FOUND","Desired Access: Read Attributes, ..." 7 "10:23:13.9940261 PM","pg_ctl.exe","1676","CreateFile","...\pgdata\postmaster.pid","NAME NOT FOUND","Desired Access: Generic Read, ..." Here 1, 2, 3, 5 are calls behind unlink() in postmaster, 4 is CreateFile() in pgwin32_open(), 6 is stat() in pgwin32_safestat(), 7 is CreateFile() performed by a new pg_ctl instance. `vcregress recoverytest` is passed. The log of access to the logical decoding slot now looks like this: 1 "10:37:22.9496662 PM","postgres.exe","2956","CreateFile","...\pgdata\pg_replslot\otherdb_slot","IS DIRECTORY","Desired Access: Generic Read, ..." 2 "10:37:22.9498061 PM","postgres.exe","2956","CreateFile","...\pgdata\pg_replslot\otherdb_slot","SUCCESS","Desired Access: Read Attributes, ..." 3 "10:37:22.9498478 PM","postgres.exe","2956","QueryInformationVolume","...\pgdata\pg_replslot\otherdb_slot","SUCCESS","VolumeCreationTime: ... " 4 "10:37:22.9498634 PM","postgres.exe","2956","QueryAllInformationFile","...\pgdata\pg_replslot\otherdb_slot","BUFFER OVERFLOW","CreationTime: ..." 5 "10:37:22.9498860 PM","postgres.exe","2956","CloseFile","...\pgdata\pg_replslot\otherdb_slot","SUCCESS","" 6 "10:37:22.9499936 PM","postgres.exe","2956","QueryOpen","...\pgdata\pg_replslot\otherdb_slot","FAST IO DISALLOWED","" 7 "10:37:22.9500733 PM","postgres.exe","2956","CreateFile","...\pgdata\pg_replslot\otherdb_slot","SUCCESS","Desired Access: ..." 8 "10:37:22.9500973 PM","postgres.exe","2956","QueryNetworkOpenInformationFile","...\pgdata\pg_replslot\otherdb_slot","SUCCESS","CreationTime:..." 9 "10:37:22.9501108 PM","postgres.exe","2956","CloseFile","...\pgdata\pg_replslot\otherdb_slot","SUCCESS","" > BTW ... it's likely that stat() here is actually going to invoke > pgwin32_safestat(), which has its own opinions about this, and > indeed seems to think we'll get ERROR_DELETE_PENDING. I think > that's harmless here, but it makes me wonder if we should use > a native Windows API instead of going through stat(). Yes, I would call native stat(), as we don't need to perform an excessive call to get file size (for directories). It seems that the check for ERROR_DELETE_PENDING was added to pgwin32_safestat() blindly, the issue wasn't reproduced at that time: https://www.postgresql.org/message-id/CAB7nPqRJV6trFta-Qzgi6j2feuYR2ZC%2BKHvWdHnbpDG2scTrxw%40mail.gmail.com >> As open(dir) is getting a little more expensive than before, maybe it's >> still worthwhile to patch fsync*(..., isdir,...). I can prepare such a >> patch if so. > I think we should leave that for later, so that the buildfarm can > actually test whatever logic we put in here. Agreed. It doesn't directly affect this bug and should be done separately. Best regards, Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > Maybe we should change the condition to 'if (stat(fileName, &st) != 0 && > (err = GetLastError()) == ERROR_ACCESS_DENIED)' to avoid unnecessary > sleep with a loop iteration... Well, we have to loop back on file-not-found too ... > It seems that the check for ERROR_DELETE_PENDING was added to > pgwin32_safestat() blindly, the issue wasn't reproduced at that time: > https://www.postgresql.org/message-id/CAB7nPqRJV6trFta-Qzgi6j2feuYR2ZC%2BKHvWdHnbpDG2scTrxw%40mail.gmail.com Hmm, makes one wonder whether that's actually live code. regards, tom lane
On Thu, Dec 19, 2019 at 9:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It seems that the check for ERROR_DELETE_PENDING was added to
> pgwin32_safestat() blindly, the issue wasn't reproduced at that time:
> https://www.postgresql.org/message-id/CAB7nPqRJV6trFta-Qzgi6j2feuYR2ZC%2BKHvWdHnbpDG2scTrxw%40mail.gmail.com
Hmm, makes one wonder whether that's actually live code.
Part of what the patch "Windows could not stat file - over 4GB" [1] does is proposing an alternative using native functions.
Regards,
Juan José Santamaría Flecha
19.12.2019 23:09, Tom Lane wrote: > Alexander Lakhin <exclusion@gmail.com> writes: >> Maybe we should change the condition to 'if (stat(fileName, &st) != 0 && >> (err = GetLastError()) == ERROR_ACCESS_DENIED)' to avoid unnecessary >> sleep with a loop iteration... > Well, we have to loop back on file-not-found too ... I think, if the file is not found on stat() we can safely return the updated err (set errno with _dosmapper(err), to be exact). If we'll loop back, we'll get the same err = GetLastError() after next CreateFile(). >> It seems that the check for ERROR_DELETE_PENDING was added to >> pgwin32_safestat() blindly, the issue wasn't reproduced at that time: >> https://www.postgresql.org/message-id/CAB7nPqRJV6trFta-Qzgi6j2feuYR2ZC%2BKHvWdHnbpDG2scTrxw%40mail.gmail.com > Hmm, makes one wonder whether that's actually live code. I'll try to research this matter meantime. Best regards, Alexander
At Thu, 19 Dec 2019 15:09:45 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Alexander Lakhin <exclusion@gmail.com> writes: > > Maybe we should change the condition to 'if (stat(fileName, &st) != 0 && > > (err = GetLastError()) == ERROR_ACCESS_DENIED)' to avoid unnecessary > > sleep with a loop iteration... > > Well, we have to loop back on file-not-found too ... Agreed. But no need for a sleep in the case and even no need to loop back when we are opening an existing file, if CreateFile would return ERROR_ACCESS_DENIED on opening an existing file. > > It seems that the check for ERROR_DELETE_PENDING was added to > > pgwin32_safestat() blindly, the issue wasn't reproduced at that time: > > https://www.postgresql.org/message-id/CAB7nPqRJV6trFta-Qzgi6j2feuYR2ZC%2BKHvWdHnbpDG2scTrxw%40mail.gmail.com > > Hmm, makes one wonder whether that's actually live code. Even if it is actually dead code, it seems reasonable as it stands since it is intending to read status of an existing file and the caller is assumed not to be knowing of ERROR_ACCESS_DENIED. In our case, pgwin32_open should be conscious of the state so I think calling pgwin32_safestat does not fit. (Or pgwin32_open and pgwin32_safestat are on the same level of API.) Maybe we could just loop back without extra stat'ing. With the following change, fopen(pidfile, 'r') immediately returns if the file is being deleted. Postmaster waits for the file gone (it would be already gone at the first try in most cases). The sleep on ERROR_ACCESS_DENEID moves from pg_ctl to postmaster in that case, but I think it doesn't matter. while (CreateFile()) { ... if (err == ERROR_ACCESS_DENIED) { if ((fileFlags & O_CREAT) == 0) <retun with ENOENT> pg_usleep(...); loops++; continue; regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Fri, 20 Dec 2019 12:16:32 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Thu, 19 Dec 2019 15:09:45 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > > Alexander Lakhin <exclusion@gmail.com> writes: > > > It seems that the check for ERROR_DELETE_PENDING was added to > > > pgwin32_safestat() blindly, the issue wasn't reproduced at that time: > > > https://www.postgresql.org/message-id/CAB7nPqRJV6trFta-Qzgi6j2feuYR2ZC%2BKHvWdHnbpDG2scTrxw%40mail.gmail.com > > > > Hmm, makes one wonder whether that's actually live code. > > Even if it is actually dead code, it seems reasonable as it stands > since it is intending to read status of an existing file and the > caller is assumed not to be knowing of ERROR_ACCESS_DENIED. In our > case, pgwin32_open should be conscious of the state so I think calling > pgwin32_safestat does not fit. (Or pgwin32_open and pgwin32_safestat > are on the same level of API.) > > Maybe we could just loop back without extra stat'ing. With the > following change, fopen(pidfile, 'r') immediately returns if the file > is being deleted. Postmaster waits for the file gone (it would be > already gone at the first try in most cases). The sleep on > ERROR_ACCESS_DENEID moves from pg_ctl to postmaster in that case, > but I think it doesn't matter. It is wrong. It hides real "access denied".. Sorry for the noise. > while (CreateFile()) > { > ... > if (err == ERROR_ACCESS_DENIED) > { > if ((fileFlags & O_CREAT) == 0) > <retun with ENOENT> > > pg_usleep(...); > loops++; > continue; regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Fri, 20 Dec 2019 05:40:00 +0300, Alexander Lakhin <exclusion@gmail.com> wrote in > 19.12.2019 23:09, Tom Lane wrote: > > Alexander Lakhin <exclusion@gmail.com> writes: > >> Maybe we should change the condition to 'if (stat(fileName, &st) != 0 && > >> (err = GetLastError()) == ERROR_ACCESS_DENIED)' to avoid unnecessary > >> sleep with a loop iteration... > > Well, we have to loop back on file-not-found too ... > I think, if the file is not found on stat() we can safely return the > updated err (set errno with _dosmapper(err), to be exact). If we'll loop > back, we'll get the same err = GetLastError() after next CreateFile(). I think Tom is mentinoing for O_CREAT case. If CreateFile returned ERROR_ACCESS_DENIED, then stat() returns ERROR_FILE_NOT_FOUND, it means the next CreateFile can success. > >> It seems that the check for ERROR_DELETE_PENDING was added to > >> pgwin32_safestat() blindly, the issue wasn't reproduced at that time: > >> https://www.postgresql.org/message-id/CAB7nPqRJV6trFta-Qzgi6j2feuYR2ZC%2BKHvWdHnbpDG2scTrxw%40mail.gmail.com > > Hmm, makes one wonder whether that's actually live code. > I'll try to research this matter meantime. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > At Thu, 19 Dec 2019 15:09:45 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in >> Alexander Lakhin <exclusion@gmail.com> writes: >>> It seems that the check for ERROR_DELETE_PENDING was added to >>> pgwin32_safestat() blindly, the issue wasn't reproduced at that time: >>> https://www.postgresql.org/message-id/CAB7nPqRJV6trFta-Qzgi6j2feuYR2ZC%2BKHvWdHnbpDG2scTrxw%40mail.gmail.com >> Hmm, makes one wonder whether that's actually live code. > Even if it is actually dead code, it seems reasonable as it stands > since it is intending to read status of an existing file and the > caller is assumed not to be knowing of ERROR_ACCESS_DENIED. What I was wondering about was how come, if stat() can see the specific error code ERROR_DELETE_PENDING, we don't get to see that from CreateFile. The whole reason we have a problem here is that CreateFile won't return that code :-( ... so it seems possible that the code in pgwin32_safestat is just wrong because the case never happens. regards, tom lane
20.12.2019 6:32, Kyotaro Horiguchi wrote: > At Fri, 20 Dec 2019 05:40:00 +0300, Alexander Lakhin <exclusion@gmail.com> wrote in >> 19.12.2019 23:09, Tom Lane wrote: >>> Alexander Lakhin <exclusion@gmail.com> writes: >>>> Maybe we should change the condition to 'if (stat(fileName, &st) != 0 && >>>> (err = GetLastError()) == ERROR_ACCESS_DENIED)' to avoid unnecessary >>>> sleep with a loop iteration... >>> Well, we have to loop back on file-not-found too ... >> I think, if the file is not found on stat() we can safely return the >> updated err (set errno with _dosmapper(err), to be exact). If we'll loop >> back, we'll get the same err = GetLastError() after next CreateFile(). > I think Tom is mentinoing for O_CREAT case. If CreateFile returned > ERROR_ACCESS_DENIED, then stat() returns ERROR_FILE_NOT_FOUND, it > means the next CreateFile can success. It seems this is a new corner case and we just failed in this case previously (yes, with ERROR_ACCESS_DENIED instead of ERROR_FILE_NOT_FOUND, but nonetheless). If we restrict ourselves to solving the initial problem with 'pt_ctl stop' (and pg_ctl opens pidfile without O_CREAT), I wouldn't develop more powerful open() now. But if we want it, the following code passes my tests (restart & recoverycheck) too: if (stat(fileName, &st) != 0 && ((err = GetLastError()) == ERROR_ACCESS_DENIED) || ((err == ERROR_FILE_NOT_FOUND) && (fileFlags & O_CREAT))) { pg_usleep(100000); loops++; continue; } Maybe we still need a practical case where it's useful, to test it for sure. Best regards, Alexander
Whaaaaat!! At Thu, 19 Dec 2019 23:22:52 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > At Thu, 19 Dec 2019 15:09:45 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in > >> Alexander Lakhin <exclusion@gmail.com> writes: > >>> It seems that the check for ERROR_DELETE_PENDING was added to > >>> pgwin32_safestat() blindly, the issue wasn't reproduced at that time: > >>> https://www.postgresql.org/message-id/CAB7nPqRJV6trFta-Qzgi6j2feuYR2ZC%2BKHvWdHnbpDG2scTrxw%40mail.gmail.com > > >> Hmm, makes one wonder whether that's actually live code. > > > Even if it is actually dead code, it seems reasonable as it stands > > since it is intending to read status of an existing file and the > > caller is assumed not to be knowing of ERROR_ACCESS_DENIED. > > What I was wondering about was how come, if stat() can see the specific > error code ERROR_DELETE_PENDING, we don't get to see that from CreateFile. > The whole reason we have a problem here is that CreateFile won't return > that code :-( ... so it seems possible that the code in pgwin32_safestat > is just wrong because the case never happens. Ugggh! My eyes automatically converted it to ERROR_ACCESS_DENIED.. Yes, the condition never be true. Even if we use ERROR_ACCESS_DENIED instaed, pgwin32_safestat cannot tell STATUS_ACCESS_DENIED from STATUS_DELETE_PENDING. A possible compromise would be the same looping with pgwin32_open, but I'm not sure if it doesn't harm callers. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Dec 19, 2019 at 09:49:54PM +0100, Juan José Santamaría Flecha wrote: > Part of what the patch "Windows could not stat file - over 4GB" [1] does is > proposing an alternative using native functions. > > [1] https://commitfest.postgresql.org/26/2189/ Yeah, we are likely going to need something like that at the end.. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Thu, Dec 19, 2019 at 09:49:54PM +0100, Juan José Santamaría Flecha wrote: >> Part of what the patch "Windows could not stat file - over 4GB" [1] does is >> proposing an alternative using native functions. >> [1] https://commitfest.postgresql.org/26/2189/ > Yeah, we are likely going to need something like that at the end.. Yeah, I was wondering if we shouldn't investigate that issue alongside this one. I've not had the time to look closely though (and besides, I'm the wrong guy to be taking point on a Windows issue). regards, tom lane
Alexander Lakhin <exclusion@gmail.com> writes: > 19.12.2019 18:44, Tom Lane wrote: >> Entirely-untested patch along this line attached. > I have tested it with my demo restart test and it works. Yes, stat() is > resulted in the CreateFile() call too, so it should fail the same way. Since this patch seems to fix the problem, I went ahead and pushed it. We can still consider whether it can be improved, but I was getting antsy about leaving the buildfarm broken. regards, tom lane