Re: BackgroundPsql swallowing errors on windows - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: BackgroundPsql swallowing errors on windows |
Date | |
Msg-id | e5d8d598-bf23-4966-91fb-b4043e451d2c@dunslane.net Whole thread Raw |
Responses |
Re: BackgroundPsql swallowing errors on windows
|
List | pgsql-hackers |
On 2025-02-13 Th 12:39 PM, Andres Freund wrote: > Hi, > > One of the remaining tasks for AIO was to convert the tests to be proper tap > tests. I did that and was thanked by the tests fairly randomly failing on > windows. Which test fails changes from run to run. > > The symptom is that BackgroundPsql->query() sometimes would simply swallow > errors that were clearly generated by the backend. Which then would cause > tests to fail, because testing for errors was the whole point of $test. > > > At first I thought the issue was that psql didn't actually flush stderr after > displaying errors. And while that may be an issue, it doesn't seem to be the > one causing problems for me. > > Lots of hair-pulling later, I have a somewhat confirmed theory for what's > happening: > > BackgroundPsql::query() tries to detect if the passed in query has executed by > adding a "banner" after the query and using pump_until() to wait until that > banner has been "reached". That seems to work reasonably well on !windows. > > On windows however, it looks like there's no guarantee that if stdout has been > received by IPC::Run, stderr also has been received, even if the stderr > content has been generated first. I tried to add an extra ->pump_nb() call to > query(), thinking that maybe IPC::Run just didn't get input that had actually > arrived, due to waiting on just one pipe. But no success. > > My understanding is that IPC::Run uses a proxy process on windows to execute > subprocesses and then communicates with that over TCP (or something along > those lines). I suspect what's happening is that the communication with the > external process allows for reordering between stdout/stderr. > > And indeed, changing BackgroundPsql::query() to emit the banner on both stdout > and stderr and waiting on both seems to fix the issue. > > > One complication is that I found that just waiting for the banner, without > also its newline, sometimes lead to unexpected newlines causing later queries > to fail. I think that happens if the trailing newline is read separately from > the rest of the string. > > However, matching the newlines caused tests to fail on some machines. After a > lot of cursing I figured out that for interactive psql we output \r\n, causing > my regex match to fail. I.e. tests failed whenever IO::PTY was availble... > > It's also not correct, as we did before, to just look for the banner, it has > to be anchored to either the start of the output or a newline, otherwise the > \echo (or \warn) command itself will be matched by pump_until() (but then the > replacing the command would fail). Not sure that could cause active problems > without the addition of \warn (which is also echoed on stdout), but it > certainly could after. > > > The banner being the same between queries made it hard to understand if a > banner that appeared in the output was from the current query or a past > query. Therefore I added a counter to it. > > > For debugging I added a "note" that shows stdout/stderr after executing the > query, I think it may be worth keeping that, but I'm not sure. > > > This was a rather painful exercise. > > It's been discussed before, but I'd really really like to get rid of BackgroundPsql. It's ugly, non-intuitive and fragile. Last year I did some work on this. I was going to present it at Athens but illness prevented me, and then other life events managed to get in my way. But the basic work is around. See <https://github.com/adunstan/test-pq/commit/98518e4929e80fb96f210bbc5aab9fdcea058512> This introduces a libpq session object (PostgreSQL::Test::Session) which can be backed either by FFI or a small XS wrapper - the commit has recipes for both. Missing is a meson.build file for the XS wrapper. There are significant performance gains to be had too (poll_query_until is much nicer, for example, as are most uses of safe_psql). If there is interest I will bring the work up to date, and maybe we can introduce this early in the v19 cycle. It would significantly reduce our dependency on IPC::Run, especially the pump() stuff. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
pgsql-hackers by date: