Thread: Re: pgsql: Add regression test for recovery pause.

Re: pgsql: Add regression test for recovery pause.

From
Tom Lane
Date:
Fujii Masao <fujii@postgresql.org> writes:
> Add regression test for recovery pause.

Buildfarm member jacana doesn't like this patch:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2021-06-02%2012%3A00%3A44

the symptom being

Jun 02 09:05:17 t/005_replay_delay..................# poll_query_until timed out executing this query:
Jun 02 09:05:17 # SELECT '0/3002A20'::pg_lsn < pg_last_wal_receive_lsn()
Jun 02 09:05:17 # expecting this output:
Jun 02 09:05:17 # t
Jun 02 09:05:17 # last actual query output:
Jun 02 09:05:17 #
Jun 02 09:05:17 # with stderr:
Jun 02 09:05:17 # ERROR:  syntax error at or near "pg_lsn"
Jun 02 09:05:17 # LINE 1: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn()
Jun 02 09:05:17 #                            ^

Checking the postmaster log confirms that what the backend is getting is

2021-06-02 08:58:01.073 EDT [60b78059.f84:4] 005_replay_delay.pl ERROR:  syntax error at or near "pg_lsn" at character
20
2021-06-02 08:58:01.073 EDT [60b78059.f84:5] 005_replay_delay.pl STATEMENT:  SELECT '0\\3002A20';pg_lsn <
pg_last_wal_receive_lsn()

It sort of looks like something has decided that the pg_lsn constant
is a search path and made a lame attempt to convert it to Windows
style.  I doubt our own code is doing that, so I'm inclined to blame
IPC::Run thinking it can mangle the command string it's given.
I wonder whether jacana has got a freshly-installed version of IPC::Run.

Another interesting question is how come we managed to get this far
in the tests.  There is a nearly, but not quite, identical delay
query in 002_archiving.pl, which already ran successfully:

# Wait until necessary replay has been done on standby
my $caughtup_query =
  "SELECT '$current_lsn'::pg_lsn <= pg_last_wal_replay_lsn()";
$node_standby->poll_query_until('postgres', $caughtup_query)
  or die "Timed out while waiting for standby to catch up";

I wonder whether the fact that 002 uses '<=' not '<' could be
at all related.  (I also wonder which one is correct as a means
of waiting for replay; they are not both correct.)

In any case, letting IPC::Run munge SQL commands seems completely
unacceptable.  We can't plan on working around that every time.

            regards, tom lane



Re: pgsql: Add regression test for recovery pause.

From
Andrew Dunstan
Date:
On 6/2/21 5:26 PM, Tom Lane wrote:
> Fujii Masao <fujii@postgresql.org> writes:
>> Add regression test for recovery pause.
> Buildfarm member jacana doesn't like this patch:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2021-06-02%2012%3A00%3A44
>
> the symptom being
>
> Jun 02 09:05:17 t/005_replay_delay..................# poll_query_until timed out executing this query:
> Jun 02 09:05:17 # SELECT '0/3002A20'::pg_lsn < pg_last_wal_receive_lsn()
> Jun 02 09:05:17 # expecting this output:
> Jun 02 09:05:17 # t
> Jun 02 09:05:17 # last actual query output:
> Jun 02 09:05:17 # 
> Jun 02 09:05:17 # with stderr:
> Jun 02 09:05:17 # ERROR:  syntax error at or near "pg_lsn"
> Jun 02 09:05:17 # LINE 1: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn()
> Jun 02 09:05:17 #                            ^
>
> Checking the postmaster log confirms that what the backend is getting is
>
> 2021-06-02 08:58:01.073 EDT [60b78059.f84:4] 005_replay_delay.pl ERROR:  syntax error at or near "pg_lsn" at
character20
 
> 2021-06-02 08:58:01.073 EDT [60b78059.f84:5] 005_replay_delay.pl STATEMENT:  SELECT '0\\3002A20';pg_lsn <
pg_last_wal_receive_lsn()
>
> It sort of looks like something has decided that the pg_lsn constant
> is a search path and made a lame attempt to convert it to Windows
> style.  I doubt our own code is doing that, so I'm inclined to blame
> IPC::Run thinking it can mangle the command string it's given.
> I wonder whether jacana has got a freshly-installed version of IPC::Run.
>
> Another interesting question is how come we managed to get this far
> in the tests.  There is a nearly, but not quite, identical delay
> query in 002_archiving.pl, which already ran successfully:
>
> # Wait until necessary replay has been done on standby
> my $caughtup_query =
>   "SELECT '$current_lsn'::pg_lsn <= pg_last_wal_replay_lsn()";
> $node_standby->poll_query_until('postgres', $caughtup_query)
>   or die "Timed out while waiting for standby to catch up";
>
> I wonder whether the fact that 002 uses '<=' not '<' could be
> at all related.  (I also wonder which one is correct as a means
> of waiting for replay; they are not both correct.)
>
> In any case, letting IPC::Run munge SQL commands seems completely
> unacceptable.  We can't plan on working around that every time.
>
>             



Looks to me like we're getting munged by the msys shell, and unlike on
msys2 there isn't a way to disable it:

https://stackoverflow.com/questions/7250130/how-to-stop-mingw-and-msys-from-mangling-path-names-given-at-the-command-line


c.f. commit 73ff3a0abbb


Maybe a robust solution would be to have the query piped to psql on its
stdin rather than on the command line. poll_query_until looks on a quick
check like the only place in PostgresNode where we use "psql -c"


I'll experiment a bit tomorrow.



cheers


andrew


-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Add regression test for recovery pause.

From
Andrew Dunstan
Date:
On 6/2/21 6:25 PM, Andrew Dunstan wrote:
>
>
> Looks to me like we're getting munged by the msys shell, and unlike on
> msys2 there isn't a way to disable it:
>
https://stackoverflow.com/questions/7250130/how-to-stop-mingw-and-msys-from-mangling-path-names-given-at-the-command-line
>
>
> c.f. commit 73ff3a0abbb
>
>
> Maybe a robust solution would be to have the query piped to psql on its
> stdin rather than on the command line. poll_query_until looks on a quick
> check like the only place in PostgresNode where we use "psql -c"
>
>
> I'll experiment a bit tomorrow.
>
>
>


My suspicion was correct. Fix pushed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Add regression test for recovery pause.

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> My suspicion was correct. Fix pushed.

Great, thanks.

Do we need to worry about back-patching that?  It seems only
accidental if no existing back-branch test cases hit this.

            regards, tom lane



Re: pgsql: Add regression test for recovery pause.

From
Andrew Dunstan
Date:
On 6/3/21 4:45 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> My suspicion was correct. Fix pushed.
> Great, thanks.
>
> Do we need to worry about back-patching that?  It seems only
> accidental if no existing back-branch test cases hit this.


Well, we haven't had breakage, but its also useful to keep things in
sync as much as possible. Ill do it shortly.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com