Thread: IPC::Run accepts bug reports

IPC::Run accepts bug reports

From
Noah Misch
Date:
Separating this from the pytest thread:

On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:
> The one
> thing I know about that *I* think is a pretty big problem about Perl
> is that IPC::Run is not really maintained.

I don't see in https://github.com/cpan-authors/IPC-Run/issues anything
affecting PostgreSQL.  If you know of IPC::Run defects, please report them.
If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on
it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175
NetBSD-10-specific behavior coping.



Re: IPC::Run accepts bug reports

From
Robert Haas
Date:
On Sat, Jun 15, 2024 at 7:48 PM Noah Misch <noah@leadboat.com> wrote:
> Separating this from the pytest thread:
>
> On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:
> > The one
> > thing I know about that *I* think is a pretty big problem about Perl
> > is that IPC::Run is not really maintained.
>
> I don't see in https://github.com/cpan-authors/IPC-Run/issues anything
> affecting PostgreSQL.  If you know of IPC::Run defects, please report them.
> If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on
> it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175
> NetBSD-10-specific behavior coping.

I'm not concerned about any specific open issue; my concern is about
the health of that project. https://metacpan.org/pod/IPC::Run says
that this module is seeking new maintainers, and it looks like the
people listed as current maintainers are mostly inactive. Instead,
you're fixing stuff. That's great, but we ideally want PostgreSQL's
dependencies to be things that are used widely enough that we don't
end up maintaining them ourselves.

I apologize if my comment came across as disparaging your efforts;
that was not my intent.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: IPC::Run accepts bug reports

From
Andres Freund
Date:
Hi,

On 2024-06-15 16:48:24 -0700, Noah Misch wrote:
> Separating this from the pytest thread:
>
> On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:
> > The one
> > thing I know about that *I* think is a pretty big problem about Perl
> > is that IPC::Run is not really maintained.
>
> I don't see in https://github.com/cpan-authors/IPC-Run/issues anything
> affecting PostgreSQL.  If you know of IPC::Run defects, please report them.
> If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on
> it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175
> NetBSD-10-specific behavior coping.

1) Sometimes hangs hard on windows if started processes have not been shut
   down before script exits.  I've mostly encountered this via the buildfarm /
   CI, so I never had a good way of narrowing this down. It's very painful
   because things seem to often just get stuck once that happens.

2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack
   Broken pipe:" (in _do_filters()). There's plenty reports of this on the
   list, and I've hit this several times personally. It seems to be timing
   dependent, I've encountered it after seemingly irrelevant ordering changes.

   I suspect I could create a reproducer with a bit of time.

3) It's very slow on windows (in addition to the windows process
   slowness). That got to be fixable to some degree.

Greetings,

Andres Freund



Re: IPC::Run accepts bug reports

From
Noah Misch
Date:
On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote:
> On 2024-06-15 16:48:24 -0700, Noah Misch wrote:
> > On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:
> > > The one
> > > thing I know about that *I* think is a pretty big problem about Perl
> > > is that IPC::Run is not really maintained.
> >
> > I don't see in https://github.com/cpan-authors/IPC-Run/issues anything
> > affecting PostgreSQL.  If you know of IPC::Run defects, please report them.
> > If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on
> > it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175
> > NetBSD-10-specific behavior coping.
> 
> 1) Sometimes hangs hard on windows if started processes have not been shut
>    down before script exits.  I've mostly encountered this via the buildfarm /
>    CI, so I never had a good way of narrowing this down. It's very painful
>    because things seem to often just get stuck once that happens.

That's bad.  Do you have a link to a log, a thread discussing it, or even just
one of the test names experiencing it?

> 2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack
>    Broken pipe:" (in _do_filters()). There's plenty reports of this on the
>    list, and I've hit this several times personally. It seems to be timing
>    dependent, I've encountered it after seemingly irrelevant ordering changes.
> 
>    I suspect I could create a reproducer with a bit of time.

I've seen that one.  If the harness has data to write to a child, the child
exiting before the write is one way to reach that.  Perhaps before exec(),
IPC::Run should do a non-blocking write from each pending IO.  That way, small
writes never experience the timing-dependent behavior.

> 3) It's very slow on windows (in addition to the windows process
>    slowness). That got to be fixable to some degree.

Agreed.  For the next release, today's git has some optimizations.  There are
other known-possible Windows optimizations not implemented.



Re: IPC::Run accepts bug reports

From
Andres Freund
Date:
Hi,

On 2024-06-18 10:10:17 -0700, Noah Misch wrote:
> On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote:
> > On 2024-06-15 16:48:24 -0700, Noah Misch wrote:
> > > On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:
> > > > The one
> > > > thing I know about that *I* think is a pretty big problem about Perl
> > > > is that IPC::Run is not really maintained.
> > >
> > > I don't see in https://github.com/cpan-authors/IPC-Run/issues anything
> > > affecting PostgreSQL.  If you know of IPC::Run defects, please report them.
> > > If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on
> > > it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175
> > > NetBSD-10-specific behavior coping.
> > 
> > 1) Sometimes hangs hard on windows if started processes have not been shut
> >    down before script exits.  I've mostly encountered this via the buildfarm /
> >    CI, so I never had a good way of narrowing this down. It's very painful
> >    because things seem to often just get stuck once that happens.
> 
> That's bad.  Do you have a link to a log, a thread discussing it, or even just
> one of the test names experiencing it?

I'm unfortunately blanking on the right keyword right now.

I think it basically required not shutting down a process started in the
background with IPC::Run.

I'll try to repro it by removing some ->finish or ->quit calls.

There's also a bunch of tests that have blocks like

    # some Windows Perls at least don't like IPC::Run's start/kill_kill regime.
    skip "Test fails on Windows perl", 2 if $Config{osname} eq 'MSWin32';

Some of them may have been related to this.


> > 2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack
> >    Broken pipe:" (in _do_filters()). There's plenty reports of this on the
> >    list, and I've hit this several times personally. It seems to be timing
> >    dependent, I've encountered it after seemingly irrelevant ordering changes.
> > 
> >    I suspect I could create a reproducer with a bit of time.
> 
> I've seen that one.  If the harness has data to write to a child, the child
> exiting before the write is one way to reach that.  Perhaps before exec(),
> IPC::Run should do a non-blocking write from each pending IO.  That way, small
> writes never experience the timing-dependent behavior.

I think the question is rather, why is ipc run choosing to die in this
situation and can that be fixed?


> > 3) It's very slow on windows (in addition to the windows process
> >    slowness). That got to be fixable to some degree.
> 
> Agreed.  For the next release, today's git has some optimizations.  There are
> other known-possible Windows optimizations not implemented.

Yay!

Greetings,

Andres Freund



Re: IPC::Run accepts bug reports

From
Andrew Dunstan
Date:


On 2024-06-18 Tu 3:00 PM, Andres Freund wrote:
Hi,

On 2024-06-18 10:10:17 -0700, Noah Misch wrote:
On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote:
On 2024-06-15 16:48:24 -0700, Noah Misch wrote:
On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:
The one
thing I know about that *I* think is a pretty big problem about Perl
is that IPC::Run is not really maintained.
I don't see in https://github.com/cpan-authors/IPC-Run/issues anything
affecting PostgreSQL.  If you know of IPC::Run defects, please report them.
If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on
it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175
NetBSD-10-specific behavior coping.
1) Sometimes hangs hard on windows if started processes have not been shut   down before script exits.  I've mostly encountered this via the buildfarm /   CI, so I never had a good way of narrowing this down. It's very painful   because things seem to often just get stuck once that happens.
That's bad.  Do you have a link to a log, a thread discussing it, or even just
one of the test names experiencing it?
I'm unfortunately blanking on the right keyword right now.

I think it basically required not shutting down a process started in the
background with IPC::Run.

I'll try to repro it by removing some ->finish or ->quit calls.

There's also a bunch of tests that have blocks like
	# some Windows Perls at least don't like IPC::Run's start/kill_kill regime.	skip "Test fails on Windows perl", 2 if $Config{osname} eq 'MSWin32';

Some of them may have been related to this.


I only found one of those, in src/test/recovery/t/006_logical_decoding.pl, which seems to be the only place we use kill_kill at all. That comment dates back to 2017, so maybe a more modern perl and/or IPC::Run will improve matters.

It's not clear to me why that code isn't calling finish() before trying kill_kill(). That's what the IPC::Run docs seem to suggest you should do.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: IPC::Run accepts bug reports

From
Andres Freund
Date:
Hi,

On 2024-06-18 12:00:13 -0700, Andres Freund wrote:
> On 2024-06-18 10:10:17 -0700, Noah Misch wrote:
> > > 1) Sometimes hangs hard on windows if started processes have not been shut
> > >    down before script exits.  I've mostly encountered this via the buildfarm /
> > >    CI, so I never had a good way of narrowing this down. It's very painful
> > >    because things seem to often just get stuck once that happens.
> >
> > That's bad.  Do you have a link to a log, a thread discussing it, or even just
> > one of the test names experiencing it?
>
> I'm unfortunately blanking on the right keyword right now.
>
> I think it basically required not shutting down a process started in the
> background with IPC::Run.
>
> I'll try to repro it by removing some ->finish or ->quit calls.

Yep, that did it.  It reliably reproduces if I comment out
the lines below
  # explicitly shut down psql instances gracefully - to avoid hangs
  # or worse on windows
in 021_row_visibility.pl

The logfile ends in
Warning: unable to close filehandle GEN25 properly: Bad file descriptor during global destruction.
Warning: unable to close filehandle GEN20 properly: Bad file descriptor during global destruction.


Even if I cancel the test, I can't rerun it because due to a leftover psql
a) a new temp install can't be made (could be solved by rm -rf)
b) the test's logfile can't be removed (couldn't even rename the directory)

The psql instance needs to be found and terminated first.


Greetings,

Andres Freund



Re: IPC::Run accepts bug reports

From
Noah Misch
Date:
On Tue, Jun 18, 2024 at 08:07:27PM -0700, Andres Freund wrote:
> > > > 1) Sometimes hangs hard on windows if started processes have not been shut
> > > >    down before script exits.

> It reliably reproduces if I comment out
> the lines below
>   # explicitly shut down psql instances gracefully - to avoid hangs
>   # or worse on windows
> in 021_row_visibility.pl
> 
> The logfile ends in
> Warning: unable to close filehandle GEN25 properly: Bad file descriptor during global destruction.
> Warning: unable to close filehandle GEN20 properly: Bad file descriptor during global destruction.
> 
> 
> Even if I cancel the test, I can't rerun it because due to a leftover psql
> a) a new temp install can't be made (could be solved by rm -rf)
> b) the test's logfile can't be removed (couldn't even rename the directory)
> 
> The psql instance needs to be found and terminated first.

Thanks for that recipe.  I've put that in my queue to fix.

On Tue, Jun 18, 2024 at 12:00:13PM -0700, Andres Freund wrote:
> On 2024-06-18 10:10:17 -0700, Noah Misch wrote:
> > On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote:
> > > 2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack
> > >    Broken pipe:" (in _do_filters()). There's plenty reports of this on the
> > >    list, and I've hit this several times personally. It seems to be timing
> > >    dependent, I've encountered it after seemingly irrelevant ordering changes.
> > > 
> > >    I suspect I could create a reproducer with a bit of time.
> > 
> > I've seen that one.  If the harness has data to write to a child, the child
> > exiting before the write is one way to reach that.  Perhaps before exec(),
> > IPC::Run should do a non-blocking write from each pending IO.  That way, small
> > writes never experience the timing-dependent behavior.
> 
> I think the question is rather, why is ipc run choosing to die in this
> situation and can that be fixed?

With default signal handling, the process would die to SIGPIPE.  Since
PostgreSQL::Test ignores SIGPIPE, this happens instead.  The IPC::Run source
tree has no discussion of ignoring SIGPIPE, so I bet it didn't get a conscious
decision.  Perhaps it can do better.



Re: IPC::Run accepts bug reports

From
Noah Misch
Date:
On Mon, Jun 17, 2024 at 01:56:46PM -0400, Robert Haas wrote:
> On Sat, Jun 15, 2024 at 7:48 PM Noah Misch <noah@leadboat.com> wrote:
> > Separating this from the pytest thread:
> >
> > On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:
> > > The one
> > > thing I know about that *I* think is a pretty big problem about Perl
> > > is that IPC::Run is not really maintained.
> >
> > I don't see in https://github.com/cpan-authors/IPC-Run/issues anything
> > affecting PostgreSQL.  If you know of IPC::Run defects, please report them.
> > If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on
> > it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175
> > NetBSD-10-specific behavior coping.
> 
> I'm not concerned about any specific open issue; my concern is about
> the health of that project. https://metacpan.org/pod/IPC::Run says
> that this module is seeking new maintainers, and it looks like the
> people listed as current maintainers are mostly inactive. Instead,
> you're fixing stuff. That's great, but we ideally want PostgreSQL's
> dependencies to be things that are used widely enough that we don't
> end up maintaining them ourselves.

That's reasonable to want.

> I apologize if my comment came across as disparaging your efforts;
> that was not my intent.

It did not come across as disparaging.