Re: BackgroundPsql swallowing errors on windows - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: BackgroundPsql swallowing errors on windows
Date
Msg-id e5d8d598-bf23-4966-91fb-b4043e451d2c@dunslane.net
Whole thread Raw
Responses Re: BackgroundPsql swallowing errors on windows
List pgsql-hackers
On 2025-02-13 Th 12:39 PM, Andres Freund wrote:
> Hi,
>
> One of the remaining tasks for AIO was to convert the tests to be proper tap
> tests.  I did that and was thanked by the tests fairly randomly failing on
> windows. Which test fails changes from run to run.
>
> The symptom is that BackgroundPsql->query() sometimes would simply swallow
> errors that were clearly generated by the backend. Which then would cause
> tests to fail, because testing for errors was the whole point of $test.
>
>
> At first I thought the issue was that psql didn't actually flush stderr after
> displaying errors. And while that may be an issue, it doesn't seem to be the
> one causing problems for me.
>
> Lots of hair-pulling later, I have a somewhat confirmed theory for what's
> happening:
>
> BackgroundPsql::query() tries to detect if the passed in query has executed by
> adding a "banner" after the query and using pump_until() to wait until that
> banner has been "reached".  That seems to work reasonably well on !windows.
>
> On windows however, it looks like there's no guarantee that if stdout has been
> received by IPC::Run, stderr also has been received, even if the stderr
> content has been generated first. I tried to add an extra ->pump_nb() call to
> query(), thinking that maybe IPC::Run just didn't get input that had actually
> arrived, due to waiting on just one pipe. But no success.
>
> My understanding is that IPC::Run uses a proxy process on windows to execute
> subprocesses and then communicates with that over TCP (or something along
> those lines).  I suspect what's happening is that the communication with the
> external process allows for reordering between stdout/stderr.
>
> And indeed, changing BackgroundPsql::query() to emit the banner on both stdout
> and stderr and waiting on both seems to fix the issue.
>
>
> One complication is that I found that just waiting for the banner, without
> also its newline, sometimes lead to unexpected newlines causing later queries
> to fail. I think that happens if the trailing newline is read separately from
> the rest of the string.
>
> However, matching the newlines caused tests to fail on some machines. After a
> lot of cursing I figured out that for interactive psql we output \r\n, causing
> my regex match to fail. I.e. tests failed whenever IO::PTY was availble...
>
> It's also not correct, as we did before, to just look for the banner, it has
> to be anchored to either the start of the output or a newline, otherwise the
> \echo (or \warn) command itself will be matched by pump_until() (but then the
> replacing the command would fail). Not sure that could cause active problems
> without the addition of \warn (which is also echoed on stdout), but it
> certainly could after.
>
>
> The banner being the same between queries made it hard to understand if a
> banner that appeared in the output was from the current query or a past
> query. Therefore I added a counter to it.
>
>
> For debugging I added a "note" that shows stdout/stderr after executing the
> query, I think it may be worth keeping that, but I'm not sure.
>
>
> This was a rather painful exercise.
>
>


It's been discussed before, but I'd really really like to get rid of 
BackgroundPsql. It's ugly, non-intuitive and fragile.

Last year I did some work on this. I was going to present it at Athens 
but illness prevented me, and then other life events managed to get in 
my way. But the basic work is around. See 
<https://github.com/adunstan/test-pq/commit/98518e4929e80fb96f210bbc5aab9fdcea058512> 
This introduces a libpq session object (PostgreSQL::Test::Session) which 
can be backed either by FFI or a small XS wrapper - the commit has 
recipes for both. Missing is a meson.build file for the XS wrapper. 
There are significant performance gains to be had too (poll_query_until 
is much nicer, for example, as are most uses of safe_psql). If there is  
interest I will bring the work up to date, and maybe we can introduce 
this early in the v19 cycle. It would significantly reduce our 
dependency on IPC::Run, especially the pump() stuff.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: BackgroundPsql swallowing errors on windows
Next
From: Julien Rouhaud
Date:
Subject: Re: pg_stat_statements and "IN" conditions