Thread: Re: BackgroundPsql swallowing errors on windows

Re: BackgroundPsql swallowing errors on windows

From
Andrew Dunstan
Date:
On 2025-02-13 Th 12:39 PM, Andres Freund wrote:
> Hi,
>
> One of the remaining tasks for AIO was to convert the tests to be proper tap
> tests.  I did that and was thanked by the tests fairly randomly failing on
> windows. Which test fails changes from run to run.
>
> The symptom is that BackgroundPsql->query() sometimes would simply swallow
> errors that were clearly generated by the backend. Which then would cause
> tests to fail, because testing for errors was the whole point of $test.
>
>
> At first I thought the issue was that psql didn't actually flush stderr after
> displaying errors. And while that may be an issue, it doesn't seem to be the
> one causing problems for me.
>
> Lots of hair-pulling later, I have a somewhat confirmed theory for what's
> happening:
>
> BackgroundPsql::query() tries to detect if the passed in query has executed by
> adding a "banner" after the query and using pump_until() to wait until that
> banner has been "reached".  That seems to work reasonably well on !windows.
>
> On windows however, it looks like there's no guarantee that if stdout has been
> received by IPC::Run, stderr also has been received, even if the stderr
> content has been generated first. I tried to add an extra ->pump_nb() call to
> query(), thinking that maybe IPC::Run just didn't get input that had actually
> arrived, due to waiting on just one pipe. But no success.
>
> My understanding is that IPC::Run uses a proxy process on windows to execute
> subprocesses and then communicates with that over TCP (or something along
> those lines).  I suspect what's happening is that the communication with the
> external process allows for reordering between stdout/stderr.
>
> And indeed, changing BackgroundPsql::query() to emit the banner on both stdout
> and stderr and waiting on both seems to fix the issue.
>
>
> One complication is that I found that just waiting for the banner, without
> also its newline, sometimes lead to unexpected newlines causing later queries
> to fail. I think that happens if the trailing newline is read separately from
> the rest of the string.
>
> However, matching the newlines caused tests to fail on some machines. After a
> lot of cursing I figured out that for interactive psql we output \r\n, causing
> my regex match to fail. I.e. tests failed whenever IO::PTY was availble...
>
> It's also not correct, as we did before, to just look for the banner, it has
> to be anchored to either the start of the output or a newline, otherwise the
> \echo (or \warn) command itself will be matched by pump_until() (but then the
> replacing the command would fail). Not sure that could cause active problems
> without the addition of \warn (which is also echoed on stdout), but it
> certainly could after.
>
>
> The banner being the same between queries made it hard to understand if a
> banner that appeared in the output was from the current query or a past
> query. Therefore I added a counter to it.
>
>
> For debugging I added a "note" that shows stdout/stderr after executing the
> query, I think it may be worth keeping that, but I'm not sure.
>
>
> This was a rather painful exercise.
>
>


It's been discussed before, but I'd really really like to get rid of 
BackgroundPsql. It's ugly, non-intuitive and fragile.

Last year I did some work on this. I was going to present it at Athens 
but illness prevented me, and then other life events managed to get in 
my way. But the basic work is around. See 
<https://github.com/adunstan/test-pq/commit/98518e4929e80fb96f210bbc5aab9fdcea058512> 
This introduces a libpq session object (PostgreSQL::Test::Session) which 
can be backed either by FFI or a small XS wrapper - the commit has 
recipes for both. Missing is a meson.build file for the XS wrapper. 
There are significant performance gains to be had too (poll_query_until 
is much nicer, for example, as are most uses of safe_psql). If there is  
interest I will bring the work up to date, and maybe we can introduce 
this early in the v19 cycle. It would significantly reduce our 
dependency on IPC::Run, especially the pump() stuff.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: BackgroundPsql swallowing errors on windows

From
Andres Freund
Date:
Hi,

On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote:
> It's been discussed before, but I'd really really like to get rid of
> BackgroundPsql. It's ugly, non-intuitive and fragile.

I agree, unfortunately we're stuck with this until we have a better
alternative in tree :(


> Last year I did some work on this. I was going to present it at Athens but
> illness prevented me, and then other life events managed to get in my way.
> But the basic work is around. See
<https://github.com/adunstan/test-pq/commit/98518e4929e80fb96f210bbc5aab9fdcea058512>
> This introduces a libpq session object (PostgreSQL::Test::Session) which can
> be backed either by FFI or a small XS wrapper - the commit has recipes for
> both. Missing is a meson.build file for the XS wrapper. There are
> significant performance gains to be had too (poll_query_until is much nicer,
> for example, as are most uses of safe_psql). If there is  interest I will
> bring the work up to date, and maybe we can introduce this early in the v19
> cycle. It would significantly reduce our dependency on IPC::Run, especially
> the pump() stuff.

I definitely am interested.

Greetings,

Andres Freund



Re: BackgroundPsql swallowing errors on windows

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote:
>> ... If there is interest I will
>> bring the work up to date, and maybe we can introduce this early in the v19
>> cycle. It would significantly reduce our dependency on IPC::Run, especially
>> the pump() stuff.

> I definitely am interested.

+1.  Would there be a chance of removing our use of IPC::Run entirely?
There'd be a lot to like about that.

            regards, tom lane



Re: BackgroundPsql swallowing errors on windows

From
Andres Freund
Date:
Hi,

On 2025-02-14 12:14:55 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2025-02-14 08:14:45 -0500, Andrew Dunstan wrote:
> >> ... If there is interest I will
> >> bring the work up to date, and maybe we can introduce this early in the v19
> >> cycle. It would significantly reduce our dependency on IPC::Run, especially
> >> the pump() stuff.
> 
> > I definitely am interested.
> 
> +1.  Would there be a chance of removing our use of IPC::Run entirely?
> There'd be a lot to like about that.

Unfortunately I doubt it'd get us that close to that.

We do have several tests that intentionally involve psql,
e.g. 010_tab_completion.pl, 001_password.pl. Presumably those would continue
to use something like BackgroundPsql.pm, even if we had a sane way to have
longrunning connections in tap tests.

There also are a bunch of things we use IPC::Run to run synchronously, we
probably could replace most of those without too much pain. The hardest
probably would be timeout handling.

Greetings,

Andres Freund