Thread: Re: pgsql: Add regression test for recovery pause.
Fujii Masao <fujii@postgresql.org> writes: > Add regression test for recovery pause. Buildfarm member jacana doesn't like this patch: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2021-06-02%2012%3A00%3A44 the symptom being Jun 02 09:05:17 t/005_replay_delay..................# poll_query_until timed out executing this query: Jun 02 09:05:17 # SELECT '0/3002A20'::pg_lsn < pg_last_wal_receive_lsn() Jun 02 09:05:17 # expecting this output: Jun 02 09:05:17 # t Jun 02 09:05:17 # last actual query output: Jun 02 09:05:17 # Jun 02 09:05:17 # with stderr: Jun 02 09:05:17 # ERROR: syntax error at or near "pg_lsn" Jun 02 09:05:17 # LINE 1: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn() Jun 02 09:05:17 # ^ Checking the postmaster log confirms that what the backend is getting is 2021-06-02 08:58:01.073 EDT [60b78059.f84:4] 005_replay_delay.pl ERROR: syntax error at or near "pg_lsn" at character 20 2021-06-02 08:58:01.073 EDT [60b78059.f84:5] 005_replay_delay.pl STATEMENT: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn() It sort of looks like something has decided that the pg_lsn constant is a search path and made a lame attempt to convert it to Windows style. I doubt our own code is doing that, so I'm inclined to blame IPC::Run thinking it can mangle the command string it's given. I wonder whether jacana has got a freshly-installed version of IPC::Run. Another interesting question is how come we managed to get this far in the tests. There is a nearly, but not quite, identical delay query in 002_archiving.pl, which already ran successfully: # Wait until necessary replay has been done on standby my $caughtup_query = "SELECT '$current_lsn'::pg_lsn <= pg_last_wal_replay_lsn()"; $node_standby->poll_query_until('postgres', $caughtup_query) or die "Timed out while waiting for standby to catch up"; I wonder whether the fact that 002 uses '<=' not '<' could be at all related. (I also wonder which one is correct as a means of waiting for replay; they are not both correct.) In any case, letting IPC::Run munge SQL commands seems completely unacceptable. We can't plan on working around that every time. regards, tom lane
On 6/2/21 5:26 PM, Tom Lane wrote: > Fujii Masao <fujii@postgresql.org> writes: >> Add regression test for recovery pause. > Buildfarm member jacana doesn't like this patch: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2021-06-02%2012%3A00%3A44 > > the symptom being > > Jun 02 09:05:17 t/005_replay_delay..................# poll_query_until timed out executing this query: > Jun 02 09:05:17 # SELECT '0/3002A20'::pg_lsn < pg_last_wal_receive_lsn() > Jun 02 09:05:17 # expecting this output: > Jun 02 09:05:17 # t > Jun 02 09:05:17 # last actual query output: > Jun 02 09:05:17 # > Jun 02 09:05:17 # with stderr: > Jun 02 09:05:17 # ERROR: syntax error at or near "pg_lsn" > Jun 02 09:05:17 # LINE 1: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn() > Jun 02 09:05:17 # ^ > > Checking the postmaster log confirms that what the backend is getting is > > 2021-06-02 08:58:01.073 EDT [60b78059.f84:4] 005_replay_delay.pl ERROR: syntax error at or near "pg_lsn" at character20 > 2021-06-02 08:58:01.073 EDT [60b78059.f84:5] 005_replay_delay.pl STATEMENT: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn() > > It sort of looks like something has decided that the pg_lsn constant > is a search path and made a lame attempt to convert it to Windows > style. I doubt our own code is doing that, so I'm inclined to blame > IPC::Run thinking it can mangle the command string it's given. > I wonder whether jacana has got a freshly-installed version of IPC::Run. > > Another interesting question is how come we managed to get this far > in the tests. There is a nearly, but not quite, identical delay > query in 002_archiving.pl, which already ran successfully: > > # Wait until necessary replay has been done on standby > my $caughtup_query = > "SELECT '$current_lsn'::pg_lsn <= pg_last_wal_replay_lsn()"; > $node_standby->poll_query_until('postgres', $caughtup_query) > or die "Timed out while waiting for standby to catch up"; > > I wonder whether the fact that 002 uses '<=' not '<' could be > at all related. (I also wonder which one is correct as a means > of waiting for replay; they are not both correct.) > > In any case, letting IPC::Run munge SQL commands seems completely > unacceptable. We can't plan on working around that every time. > > Looks to me like we're getting munged by the msys shell, and unlike on msys2 there isn't a way to disable it: https://stackoverflow.com/questions/7250130/how-to-stop-mingw-and-msys-from-mangling-path-names-given-at-the-command-line c.f. commit 73ff3a0abbb Maybe a robust solution would be to have the query piped to psql on its stdin rather than on the command line. poll_query_until looks on a quick check like the only place in PostgresNode where we use "psql -c" I'll experiment a bit tomorrow. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 6/2/21 6:25 PM, Andrew Dunstan wrote: > > > Looks to me like we're getting munged by the msys shell, and unlike on > msys2 there isn't a way to disable it: > https://stackoverflow.com/questions/7250130/how-to-stop-mingw-and-msys-from-mangling-path-names-given-at-the-command-line > > > c.f. commit 73ff3a0abbb > > > Maybe a robust solution would be to have the query piped to psql on its > stdin rather than on the command line. poll_query_until looks on a quick > check like the only place in PostgresNode where we use "psql -c" > > > I'll experiment a bit tomorrow. > > > My suspicion was correct. Fix pushed. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > My suspicion was correct. Fix pushed. Great, thanks. Do we need to worry about back-patching that? It seems only accidental if no existing back-branch test cases hit this. regards, tom lane
On 6/3/21 4:45 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> My suspicion was correct. Fix pushed. > Great, thanks. > > Do we need to worry about back-patching that? It seems only > accidental if no existing back-branch test cases hit this. Well, we haven't had breakage, but its also useful to keep things in sync as much as possible. Ill do it shortly. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com