Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT - Mailing list pgsql-bugs

From Kyotaro HORIGUCHI
Subject Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT
Date
Msg-id 20181109.143931.243136889.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Responses Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-bugs
At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5BE4318F.4040002@lab.ntt.co.jp>
> (2018/11/08 10:50), Thomas Munro wrote:
> > I take back what I said earlier about false positives from other
> > pipes.  I think it's only traditional Unix programs designed for use
> > in pipelines and naive programs that let SIGPIPE terminate the
> > process.   The accepted answer here gives a good way to think about
> > it:
> >
> > https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
> 
> Thanks for the information!
>
> > A program sophisticated enough to be writing to other pipes is no
> > longer in that category and should be setting up signal dispositions
> > itself, so I agree that we should enable the default disposition and
> > ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
> > close to the intended purpose of that signal AFAICS.
> 
> Great!
>
> >>> In the sense of "We don't care the reason", negligible reasons
> >>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> >>> theoretically ignored safely. "theoretically" here means it is
> >>> another issue whether we rely on the output from a program which
> >>> causes SEGV (or any reason other than SIGPIPE, which we caused).
> >>
> >> For the SIGSEGV case, I think it would be better that we don't rely on
> >> the output data, IMO, because I think there might be a possibility
> >> that
> >> the program have generated that data incorrectly/unexpectedly.
> >
> > +1
> >
> > I don't think we should ignore termination by signals other than
> > SIGPIPE: that could hide serious problems from users.  I want to know
> > if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
> > happens after we read enough data; there is a major problem that a
> > human needs to investigate!
> 
> I think so too.

Ok, I can live with that with no problem.

> >>> As the result it doesn't report an error for SELECT * FROM ft2
> >>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
> >>>
> >>> =#  select * from ft limit 1;
> >>>      a
> >>> -------
> >>>    test1
> >>> (1 row)
> >>>
> >>> limit 2 reports the error.
> >>>
> >>> =# select * from ft limit 2;
> >>> ERROR:  program "/home/horiguti/work/exprog" failed
> >>> DETAIL:  child process exited with exit code 1
> >>
> >> I think this would be contrary to users expectations: if the SELECT
> >> command works for limit 1, they would expect that the command would
> >> work
> >> for limit 2 as well.  So, I think it would be better to error out that
> >> command for limit 1 as well, as-is.
> >
> > I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
> > error.  For LIMIT 1, we got all the rows we wanted, and then we closed
> > the pipe.  If we got a non-zero non-signal exit code, or a signal exit
> > code and it was SIGPIPE (not any other signal!), then we should
> > consider that to be expected.
> 
> Maybe I'm missing something, but the non-zero non-signal exit code
> means that there was something wrong with the called program, so I
> think a human had better investigate that as well IMO, which would
> probably be a minor problem, though.  Too restrictive?

I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.

> > I tried to think of a scenario where the already-received output is
> > truly invalidated by a later error that happens after we close the
> > pipe.  It could be something involving a program that uses something
> > optimistic like serializable snapshot isolation that can later decide
> > that whatever it told you earlier is not valid after all.  Suppose the
> > program is clever enough to expect EPIPE and not consider that to be
> > an error, but wants to tell us about serialization failure with a
> > non-zero exit code.  To handle that, you'd need a way to provide an
> > option to file_fdw to tell it not to ignore non-zero exit codes after
> > close.  This seems so exotic and contrived that it's not worth
> > bothering with for now, but could always be added.
> 
> Interesting!  I agree that such an option could add more flexibility
> in handling the non-zero-exit-code case.

I think the program shoudn't output a line until all possible
output is validated. Once the data source emited a line, the
receiver can't do other than believe that it won't be withdrawn.

> > BTW just for curiosity:
> >
> > perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
> > Exit code: terminated by SIGPIPE, like seq
> 
> Good to know!

Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.

| $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
| 4
| $ echo $?
| 0

> > ruby -e 'for i in 1..1000000 do puts i; end' | head -5
> > Exit code: 1, like Python
> 
> Sad.  Anyway, thanks a lot for these experiments in addition to the
> previous ones!

ruby reported broken pipe but exit status was 0..

create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
select * from ft5 limit 5;
 a 
---
 1
...
 5
(5 rows)
(no error)

> > On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
> > <fujita.etsuro@lab.ntt.co.jp>  wrote:
> >> (2018/11/06 19:50), Thomas Munro wrote:
> >>> On my FreeBSD system, I compared the output of procstat -i (= show
> >>> signal disposition) for two "sleep 60" processes, one invoked from the
> >>> shell and the other from COPY ... FROM PROGRAM.  The differences were:
> >>> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
> >>> default action would be to terminate the process, but the COPY PROGRAM
> >>> child ignored them; for TTIN and TTOU, the default action would be to
> >>> stop the process, but again they are ignored.  Why do bgwriter.c,
> >>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> >>> regular backends?
> >>
> >> So, we should revert SIGUSR2 as well to default processing?
> >
> > I don't think it matters in practice, but it might be nice to restore
> > that just for consistency.
> 
> Agreed.
> 
> > I'm not sure what to think about the TTIN,
> > TTOU stuff; I don't understand job control well right now but I don't
> > think it really applies to programs run by a PostgreSQL backend, so if
> > we restore those it'd probably again be only for consistency.  Then
> > again, there may be a reason someone decided to ignore those in the
> > postmaster + regular backends but not the various auxiliary processes.
> > Anyone?
> 
> I don't have any idea about that.

In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically).  Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #15492: pg_cancel_backend(pg_backend_pid()) returns true sporadically
Next
From: Thomas Munro
Date:
Subject: Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT