Thread: Re: BackgroundPsql swallowing errors on windows
On Thu, Feb 13, 2025 at 12:39:04PM -0500, Andres Freund wrote: > My understanding is that IPC::Run uses a proxy process on windows to execute > subprocesses and then communicates with that over TCP (or something along > those lines). Right. > I suspect what's happening is that the communication with the > external process allows for reordering between stdout/stderr. > > And indeed, changing BackgroundPsql::query() to emit the banner on both stdout > and stderr and waiting on both seems to fix the issue. That makes sense. I wondered how one might fix IPC::Run to preserve the relative timing of stdout and stderr, not perturbing the timing the way that disrupted your test run. I can think of two strategies: - Remove the proxy. - Make pipe data visible to Perl variables only when at least one of the proxy<-program_under_test pipes had no data ready to read. In other words, if both pipes have data ready, make all that data visible to Perl code simultaneously. (When both the stdout pipe and the stderr pipe have data ready, one can't determine data arrival order.) Is there a possibly-less-invasive change that might work? > The banner being the same between queries made it hard to understand if a > banner that appeared in the output was from the current query or a past > query. Therefore I added a counter to it. Sounds good. > For debugging I added a "note" that shows stdout/stderr after executing the > query, I think it may be worth keeping that, but I'm not sure. It should be okay to keep. We're not likely to funnel huge amounts of data through BackgroundPsql. If we ever do that, we could just skip the "note" for payloads larger than some threshold. v2 of the patch looks fine.
Hi, On 2025-02-16 09:39:43 -0800, Noah Misch wrote: > On Thu, Feb 13, 2025 at 12:39:04PM -0500, Andres Freund wrote: > > I suspect what's happening is that the communication with the > > external process allows for reordering between stdout/stderr. > > > > And indeed, changing BackgroundPsql::query() to emit the banner on both stdout > > and stderr and waiting on both seems to fix the issue. > > That makes sense. I wondered how one might fix IPC::Run to preserve the > relative timing of stdout and stderr, not perturbing the timing the way that > disrupted your test run. I can think of two strategies: > > - Remove the proxy. > > - Make pipe data visible to Perl variables only when at least one of the > proxy<-program_under_test pipes had no data ready to read. In other words, > if both pipes have data ready, make all that data visible to Perl code > simultaneously. (When both the stdout pipe and the stderr pipe have data > ready, one can't determine data arrival order.) > > Is there a possibly-less-invasive change that might work? I don't really know enough about IPC::Run's internals to answer. My interpretation of how it might work, purely from observation, is that it opens one tcp connection for each "pipe" and that that's what's introducing the potential of reordering, as the different sockets can have different delivery timeframes. If that's it, it seems proxying all the pipes through one connection might be an option. I did briefly look at IPC::Run's code, but couldn't figure out how it all fits together quickly enough... > v2 of the patch looks fine. Thanks for reviewing! Greetings, Andres Freund
On Sun, Feb 16, 2025 at 01:02:01PM -0500, Andres Freund wrote: > On 2025-02-16 09:39:43 -0800, Noah Misch wrote: > > On Thu, Feb 13, 2025 at 12:39:04PM -0500, Andres Freund wrote: > > > I suspect what's happening is that the communication with the > > > external process allows for reordering between stdout/stderr. > > > > > > And indeed, changing BackgroundPsql::query() to emit the banner on both stdout > > > and stderr and waiting on both seems to fix the issue. > > > > That makes sense. I wondered how one might fix IPC::Run to preserve the > > relative timing of stdout and stderr, not perturbing the timing the way that > > disrupted your test run. I can think of two strategies: > > > > - Remove the proxy. > > > > - Make pipe data visible to Perl variables only when at least one of the > > proxy<-program_under_test pipes had no data ready to read. In other words, > > if both pipes have data ready, make all that data visible to Perl code > > simultaneously. (When both the stdout pipe and the stderr pipe have data > > ready, one can't determine data arrival order.) > > > > Is there a possibly-less-invasive change that might work? > > I don't really know enough about IPC::Run's internals to answer. My > interpretation of how it might work, purely from observation, is that it opens > one tcp connection for each "pipe" and that that's what's introducing the > potential of reordering, as the different sockets can have different delivery > timeframes. Right. > If that's it, it seems proxying all the pipes through one > connection might be an option. It would. Thanks. However, I think that would entail modifying the program under test to cooperate with the arrangement. When running an ordinary program that does write(1, ...) and write(2, ...), the read end needs some way to deal with the uncertainty about which write happened first. dup2(1, 2) solves the order ambiguity, but it loses other signal. > I did briefly look at IPC::Run's code, but couldn't figure out how it all fits > together quickly enough... We can ignore what IPC::Run does today. The reordering is a general problem of proxying >1 pipe. (Proxying on non-Windows would have the same problem.)
Hi, On 2025-02-16 10:47:40 -0800, Noah Misch wrote: > On Sun, Feb 16, 2025 at 01:02:01PM -0500, Andres Freund wrote: > > On 2025-02-16 09:39:43 -0800, Noah Misch wrote: > > > On Thu, Feb 13, 2025 at 12:39:04PM -0500, Andres Freund wrote: > > > > I suspect what's happening is that the communication with the > > > > external process allows for reordering between stdout/stderr. > > > > > > > > And indeed, changing BackgroundPsql::query() to emit the banner on both stdout > > > > and stderr and waiting on both seems to fix the issue. > > > > > > That makes sense. I wondered how one might fix IPC::Run to preserve the > > > relative timing of stdout and stderr, not perturbing the timing the way that > > > disrupted your test run. I can think of two strategies: > > > > > > - Remove the proxy. > > > > > > - Make pipe data visible to Perl variables only when at least one of the > > > proxy<-program_under_test pipes had no data ready to read. In other words, > > > if both pipes have data ready, make all that data visible to Perl code > > > simultaneously. (When both the stdout pipe and the stderr pipe have data > > > ready, one can't determine data arrival order.) > > > > > > Is there a possibly-less-invasive change that might work? > > > > I don't really know enough about IPC::Run's internals to answer. My > > interpretation of how it might work, purely from observation, is that it opens > > one tcp connection for each "pipe" and that that's what's introducing the > > potential of reordering, as the different sockets can have different delivery > > timeframes. > > Right. > > > If that's it, it seems proxying all the pipes through one > > connection might be an option. > > It would. Thanks. However, I think that would entail modifying the program > under test to cooperate with the arrangement. When running an ordinary > program that does write(1, ...) and write(2, ...), the read end needs some way > to deal with the uncertainty about which write happened first. dup2(1, 2) > solves the order ambiguity, but it loses other signal. I think what's happening in this case must go beyond just that. Afaict just doing ->pump_nb() would otherwise solve it. My uninformed theory is that two tcp connections are used. With two pipes P1: write(1) P1: write(2) P2: read(1) P2: read(2) wouldn't ever result in P2 not seeing data on either of reads. But with two TCP sockets there can be time between the send() completing and recv() on the other side reading the data, even on a local system (e.g. due to the tcp stack waiting a while for more data before sending data). To avoid that the proxy program could read from N pipes and then proxy them through *one* socket by prefixing the data with information about which pipe the data is from. Then IPC::Run could split the data again, using the added prefix. I don't think that would require modifying the program under test? Greetings, Andres Freund
On Sun, Feb 16, 2025 at 03:55:10PM -0500, Andres Freund wrote: > On 2025-02-16 10:47:40 -0800, Noah Misch wrote: > > On Sun, Feb 16, 2025 at 01:02:01PM -0500, Andres Freund wrote: > > > On 2025-02-16 09:39:43 -0800, Noah Misch wrote: > > > > On Thu, Feb 13, 2025 at 12:39:04PM -0500, Andres Freund wrote: > > > > > I suspect what's happening is that the communication with the > > > > > external process allows for reordering between stdout/stderr. > > > > > > > > > > And indeed, changing BackgroundPsql::query() to emit the banner on both stdout > > > > > and stderr and waiting on both seems to fix the issue. > > > > > > > > That makes sense. I wondered how one might fix IPC::Run to preserve the > > > > relative timing of stdout and stderr, not perturbing the timing the way that > > > > disrupted your test run. I can think of two strategies: > > > > > > > > - Remove the proxy. > > > > > > > > - Make pipe data visible to Perl variables only when at least one of the > > > > proxy<-program_under_test pipes had no data ready to read. In other words, > > > > if both pipes have data ready, make all that data visible to Perl code > > > > simultaneously. (When both the stdout pipe and the stderr pipe have data > > > > ready, one can't determine data arrival order.) > > > > > > > > Is there a possibly-less-invasive change that might work? > > > seems proxying all the pipes through one > > > connection might be an option. > > > > It would. Thanks. However, I think that would entail modifying the program > > under test to cooperate with the arrangement. When running an ordinary > > program that does write(1, ...) and write(2, ...), the read end needs some way > > to deal with the uncertainty about which write happened first. dup2(1, 2) > > solves the order ambiguity, but it loses other signal. > > I think what's happening in this case must go beyond just that. Afaict just > doing ->pump_nb() would otherwise solve it. My uninformed theory is that two > tcp connections are used. With two pipes > > P1: write(1) > P1: write(2) > P2: read(1) > P2: read(2) > > wouldn't ever result in P2 not seeing data on either of reads. True. > But with two > TCP sockets there can be time between the send() completing and recv() on the > other side reading the data, even on a local system (e.g. due to the tcp stack > waiting a while for more data before sending data). > > To avoid that the proxy program could read from N pipes and then proxy them > through *one* socket by prefixing the data with information about which pipe > the data is from. Then IPC::Run could split the data again, using the added > prefix. > > I don't think that would require modifying the program under test? I think that gets an order anomaly when the proxy is slow and the program under test does this: write(2, "ERROR: 1") write(1, "BANNER 1") write(2, "ERROR: 2") write(1, "BANNER 2") If the proxy is sufficiently fast, it will wake up four times and perform two read() calls on each of the two pipes. On the flip side, if the proxy is sufficiently slow, it will wake up once and perform one read() on each of the two pipes. In the slow case, the reads get "ERROR: 1ERROR: 2" and "BANNER 1BANNER 2". The proxy sends the data onward as though the program under test had done: write(1, "BANNER 1BANNER 2") write(2, "ERROR: 1ERROR: 2") From the slow proxy's perspective, it can't rule out the program under test having done those two write() calls. The proxy doesn't have enough information to reconstruct the original four write() calls. What prevents that anomaly? If the proxy were to convey this uncertainty so the consumer side knew to handle these writes as an atomic unit, I think that would work: write(n, "proxy-begin") write(n, "fd-1[BANNER 1BANNER 2]") write(n, "fd-2[ERROR: 1ERROR: 2]") write(n, "proxy-commit")
Noah Misch <noah@leadboat.com> writes: > From the slow proxy's perspective, it can't rule out the program under test > having done those two write() calls. The proxy doesn't have enough > information to reconstruct the original four write() calls. What prevents > that anomaly? Yeah, I think it's hopeless to expect that we can disambiguate the order of writes to two different pipes. For the problem at hand, though, it seems like we don't really need to do that. Rather, the question is "when we detect that the program-under-test has exited, can we be sure we have collected all of its output?". I think that IPC::Run may be screwing up here, because I have seen non-Windows CI failures that look like it didn't read all the stderr output. For example, this pgbench test failure on macOS from [1]: # Running: pgbench -n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -Dmaxint=9223372036854775807-Dminint=-9223372036854775808 -M prepared -f /Users/admin/pgsql/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_error_shell_bad_command [17:27:47.408](0.061s) ok 273 - pgbench script error: shell bad command status (got 2 vs expected 2) [17:27:47.409](0.000s) ok 274 - pgbench script error: shell bad command stdout /(?^:processed: 0/1)/ [17:27:47.409](0.000s) not ok 275 - pgbench script error: shell bad command stderr /(?^:\(shell\) .* meta-command failed)/ [17:27:47.409](0.000s) # Failed test 'pgbench script error: shell bad command stderr /(?^:\(shell\) .* meta-command failed)/' # at /Users/admin/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line 1466. # '' # doesn't match '(?^:\(shell\) .* meta-command failed)' The program's exited with a failure code as expected, and we saw (some of?) the expected stdout output, but stderr output is reported to be empty. regards, tom lane [1] https://cirrus-ci.com/task/6221238034497536
Hi, On 2025-02-16 18:18:44 -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > From the slow proxy's perspective, it can't rule out the program under test > > having done those two write() calls. The proxy doesn't have enough > > information to reconstruct the original four write() calls. What prevents > > that anomaly? > > Yeah, I think it's hopeless to expect that we can disambiguate the > order of writes to two different pipes. For the problem at hand, > though, it seems like we don't really need to do that. Rather, the > question is "when we detect that the program-under-test has exited, > can we be sure we have collected all of its output?". That's what my patch upthread tries to achieve by having a query separator both on stdout and stderr and waiting for both. > I think that IPC::Run may be screwing up here, because I have seen > non-Windows CI failures that look like it didn't read all the stderr output. > For example, this pgbench test failure on macOS from [1]: > > # Running: pgbench -n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -Dmaxint=9223372036854775807-Dminint=-9223372036854775808 -M prepared -f /Users/admin/pgsql/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_error_shell_bad_command > [17:27:47.408](0.061s) ok 273 - pgbench script error: shell bad command status (got 2 vs expected 2) > [17:27:47.409](0.000s) ok 274 - pgbench script error: shell bad command stdout /(?^:processed: 0/1)/ > [17:27:47.409](0.000s) not ok 275 - pgbench script error: shell bad command stderr /(?^:\(shell\) .* meta-command failed)/ > [17:27:47.409](0.000s) # Failed test 'pgbench script error: shell bad command stderr /(?^:\(shell\) .* meta-command failed)/' > # at /Users/admin/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line 1466. > # '' > # doesn't match '(?^:\(shell\) .* meta-command failed)' > > The program's exited with a failure code as expected, and we saw (some > of?) the expected stdout output, but stderr output is reported to be > empty. It's possible this is caused by the same issue as on windows. Or by one of the other things fixed in the patch, a) there's afaict no guarantee that we'd read from pipe A if we were waiting for A|B and B got ready b) that we weren't actually waiting for quite all the output to be generated (missing the newline). Or it could be because psql doesn't actually flush stderr in all patch, from what I can tell... I hope it'll be easier to debug with the patch in place if it doesn't turn out to already be fixed. Greetings, Andres Freund
On Sun, Feb 16, 2025 at 06:18:44PM -0500, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > From the slow proxy's perspective, it can't rule out the program under test > > having done those two write() calls. The proxy doesn't have enough > > information to reconstruct the original four write() calls. What prevents > > that anomaly? > > Yeah, I think it's hopeless to expect that we can disambiguate the > order of writes to two different pipes. For the problem at hand, > though, it seems like we don't really need to do that. Rather, the > question is "when we detect that the program-under-test has exited, > can we be sure we have collected all of its output?". In the BackgroundPsql case, we have output collection moments for every query while the psql-under-test is running, not just after exit. If I understand the original post right, the specifics are as follows. If $stdout witnesses the result of '\echo BANNER', $stderr should contain anything from psql commands before the \echo. That holds on non-Windows, but the IPC::Run proxy makes it not hold on Windows. > I think that > IPC::Run may be screwing up here, because I have seen non-Windows > CI failures that look like it didn't read all the stderr output. > For example, this pgbench test failure on macOS from [1]: > > # Running: pgbench -n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -Dmaxint=9223372036854775807-Dminint=-9223372036854775808 -M prepared -f /Users/admin/pgsql/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_error_shell_bad_command > [17:27:47.408](0.061s) ok 273 - pgbench script error: shell bad command status (got 2 vs expected 2) > [17:27:47.409](0.000s) ok 274 - pgbench script error: shell bad command stdout /(?^:processed: 0/1)/ > [17:27:47.409](0.000s) not ok 275 - pgbench script error: shell bad command stderr /(?^:\(shell\) .* meta-command failed)/ > [17:27:47.409](0.000s) # Failed test 'pgbench script error: shell bad command stderr /(?^:\(shell\) .* meta-command failed)/' > # at /Users/admin/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line 1466. > # '' > # doesn't match '(?^:\(shell\) .* meta-command failed)' > > The program's exited with a failure code as expected, and we saw (some > of?) the expected stdout output, but stderr output is reported to be > empty. https://github.com/cpan-authors/IPC-Run/commit/2128df3bbcac7e733ac46302c4b1371ffb88fe14 fixed that one. > [1] https://cirrus-ci.com/task/6221238034497536
Noah Misch <noah@leadboat.com> writes: > On Sun, Feb 16, 2025 at 06:18:44PM -0500, Tom Lane wrote: >> I think that >> IPC::Run may be screwing up here, because I have seen non-Windows >> CI failures that look like it didn't read all the stderr output. >> For example, this pgbench test failure on macOS from [1]: > https://github.com/cpan-authors/IPC-Run/commit/2128df3bbcac7e733ac46302c4b1371ffb88fe14 > fixed that one. Ah. Do we know whether that fix has made it into our CI images? (Or anywhere else, for that matter?) regards, tom lane
Hi, On February 16, 2025 7:50:18 PM EST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Noah Misch <noah@leadboat.com> writes: >> On Sun, Feb 16, 2025 at 06:18:44PM -0500, Tom Lane wrote: >>> I think that >>> IPC::Run may be screwing up here, because I have seen non-Windows >>> CI failures that look like it didn't read all the stderr output. >>> For example, this pgbench test failure on macOS from [1]: > >> https://github.com/cpan-authors/IPC-Run/commit/2128df3bbcac7e733ac46302c4b1371ffb88fe14 >> fixed that one. > >Ah. Do we know whether that fix has made it into our CI images? >(Or anywhere else, for that matter?) The CI images are regenerated three times a week, but for most OSs, they will only install perl modules via the applicablepackaging method, so it'll depend on when they pick up that version. On Windows cpan is used, so it should pick that new version fairly quickly if a release has been made. On macos we can't currently use images, so we just cache all the installed macports packages. The cache is keyed by OS versionand list of packages to be installed, with no other forced invalidation right now. So it's hard to predict when anew version of a package will be picked up and it will differ between git repositories. I've been wondering whether thecached macports install should just be regularly generated instead, along the other ci images. Greetings, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Sun, Feb 16, 2025 at 08:42:50PM -0500, Andres Freund wrote: > On February 16, 2025 7:50:18 PM EST, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >Noah Misch <noah@leadboat.com> writes: > >> On Sun, Feb 16, 2025 at 06:18:44PM -0500, Tom Lane wrote: > >>> I think that > >>> IPC::Run may be screwing up here, because I have seen non-Windows > >>> CI failures that look like it didn't read all the stderr output. > >>> For example, this pgbench test failure on macOS from [1]: > > > >> https://github.com/cpan-authors/IPC-Run/commit/2128df3bbcac7e733ac46302c4b1371ffb88fe14 > >> fixed that one. > > > >Ah. Do we know whether that fix has made it into our CI images? > >(Or anywhere else, for that matter?) > > The CI images are regenerated three times a week, but for most OSs, they will only install perl modules via the applicablepackaging method, so it'll depend on when they pick up that version. > > On Windows cpan is used, so it should pick that new version fairly quickly if a release has been made. > > On macos we can't currently use images, so we just cache all the installed macports packages. The cache is keyed by OSversion and list of packages to be installed, with no other forced invalidation right now. So it's hard to predict whena new version of a package will be picked up and it will differ between git repositories. I've been wondering whetherthe cached macports install should just be regularly generated instead, along the other ci images. The change is not in a release yet. We could have macos install IPC::Run from github, or I could get a release cut so it can make its way to macports. https://ports.macports.org/port/p5.34-ipc-run/builds/ suggests it ingested the last release within a couple days of release, so macports itself may add negligible latency.
Hi, On February 16, 2025 8:42:50 PM EST, Andres Freund <andres@anarazel.de> wrote: >On Windows cpan is used, so it should pick that new version fairly quickly if a release has been made. Looks like no release has happened since late 2023: https://github.com/cpan-authors/IPC-Run/blob/master/Changelog So it won't be picked up for now. I guess we could make CI pick up the git version, but that doesn't really seem like a scalable approach. Greetings, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Noah Misch <noah@leadboat.com> writes: > On Sun, Feb 16, 2025 at 08:42:50PM -0500, Andres Freund wrote: >> On macos we can't currently use images, so we just cache all the installed macports packages. The cache is keyed by OSversion and list of packages to be installed, with no other forced invalidation right now. So it's hard to predict whena new version of a package will be picked up and it will differ between git repositories. I've been wondering whetherthe cached macports install should just be regularly generated instead, along the other ci images. > The change is not in a release yet. We could have macos install IPC::Run from > github, or I could get a release cut so it can make its way to macports. > https://ports.macports.org/port/p5.34-ipc-run/builds/ suggests it ingested the > last release within a couple days of release, so macports itself may add > negligible latency. Yeah, my experience is that macports is pretty quick about picking up new releases. If you can persuade upstream to make a release happen, that'd be great. regards, tom lane
On Mon, Feb 17, 2025 at 7:02 AM Andres Freund <andres@anarazel.de> wrote: > I don't really know enough about IPC::Run's internals to answer. My > interpretation of how it might work, purely from observation, is that it opens > one tcp connection for each "pipe" and that that's what's introducing the > potential of reordering, as the different sockets can have different delivery > timeframes. If that's it, it seems proxying all the pipes through one > connection might be an option. I had a couple of ideas about how to get rid of the intermediate subprocess. Obviously it can't convert "two pipes are ready" into two separate socket send() calls that preserve the original order, as it doesn't know them (unless perhaps it switches to completion-based I/O). But really, the whole design is ugly and slow. If we have some capacity to improve Run::IPC, I think we should try to get rid of the pipe/socket bridge and plug either a pipe or a socket directly into the target subprocess. But which one? 1. Pipes only: Run::IPC could use IOCP or WaitForMultipleEvents() instead of select()/poll(). 2. Sockets only: Apparently you can give sockets directly to subprocesses as stdin/stdout/stderr: https://stackoverflow.com/questions/4993119/redirect-io-of-process-to-windows-socket The Run::IPC comments explain that the extra process was needed to be able to forward all data even if the target subprocess exits without closing the socket (the linger stuff we have met before in PostgreSQL itself). I suspect that if we went that way, maybe asynchronous I/O would fix that too (see my other thread with guesses and demos on that topic), but it might not be race-free. I don't know. I'd like to know for PostgreSQL's own sake, but for Run::IPC I think I'd prefer option 1 anyway: if you have to write new native Windows API interactions either way, you might as well go with the normal native way for Windows processes to connect standard I/O streams.