On Mon, Feb 17, 2025 at 7:02 AM Andres Freund <andres@anarazel.de> wrote:
> I don't really know enough about IPC::Run's internals to answer. My
> interpretation of how it might work, purely from observation, is that it opens
> one tcp connection for each "pipe" and that that's what's introducing the
> potential of reordering, as the different sockets can have different delivery
> timeframes. If that's it, it seems proxying all the pipes through one
> connection might be an option.
I had a couple of ideas about how to get rid of the intermediate
subprocess. Obviously it can't convert "two pipes are ready" into two
separate socket send() calls that preserve the original order, as it
doesn't know them (unless perhaps it switches to completion-based
I/O). But really, the whole design is ugly and slow. If we have some
capacity to improve Run::IPC, I think we should try to get rid of the
pipe/socket bridge and plug either a pipe or a socket directly into
the target subprocess. But which one?
1. Pipes only: Run::IPC could use IOCP or WaitForMultipleEvents()
instead of select()/poll().
2. Sockets only: Apparently you can give sockets directly to
subprocesses as stdin/stdout/stderr:
https://stackoverflow.com/questions/4993119/redirect-io-of-process-to-windows-socket
The Run::IPC comments explain that the extra process was needed to be
able to forward all data even if the target subprocess exits without
closing the socket (the linger stuff we have met before in PostgreSQL
itself). I suspect that if we went that way, maybe asynchronous I/O
would fix that too (see my other thread with guesses and demos on that
topic), but it might not be race-free. I don't know. I'd like to
know for PostgreSQL's own sake, but for Run::IPC I think I'd prefer
option 1 anyway: if you have to write new native Windows API
interactions either way, you might as well go with the normal native
way for Windows processes to connect standard I/O streams.