Thread: BUG #15449: file_fdw using program cause exit code error when usingLIMIT

BUG #15449: file_fdw using program cause exit code error when usingLIMIT

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15449
Logged by:          Eric Cyr
Email address:      eric.cyr@gmail.com
PostgreSQL version: 10.5
Operating system:   Ubuntu 16.04.2 LTS (Xenial Xerus)
Description:

CREATE FOREIGN TABLE test_file_fdw_program_limit
    (
      message text
    )
    SERVER pg_log_fdw OPTIONS (program 'echo "test"', format 'csv')
;

--

SELECT * FROM test_file_fdw_program_limit;
/*
OK
*/

SELECT * FROM test_file_fdw_program_limit LIMIT 2;
/*
OK
*/

SELECT * FROM test_file_fdw_program_limit LIMIT 0;
/*
[38000] ERROR: program "echo "test"" failed Detail: child process exited
with exit code 1
*/

--

LIMIT 0 is the easiest way we found to reproduce the error, but we
encountered same issue with LIMIT (< file lines) on a program doing an if
exists + file output stream manipulation.


=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> /*
> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> with exit code 1
> */

> LIMIT 0 is the easiest way we found to reproduce the error, but we
> encountered same issue with LIMIT (< file lines) on a program doing an if
> exists + file output stream manipulation.

Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
shows something pretty unsurprising:

sh: line 1: echo: write error: Broken pipe

I get similar behavior from "cat"; didn't try anything else.

So the called program is behaving in a somewhat reasonable way: it's
detecting EPIPE on its stdout (after we close the pipe), reporting that,
and doing exit(1).

Unfortunately, it's not clear what we could do about that, short of
always reading the whole program output, which is likely to be a
cure worse than the disease.  If the program were failing thanks to
SIGPIPE, we could recognize that as a case we can ignore ... but with
behavior like this, I don't see a reliable way to tell it apart from
a generic program failure, which surely we'd better report.

The only idea that comes to mind is to introduce a file_fdw option
specifying "read program input to end, even if query doesn't want
it all".  Not quite sure which choice should be the default; you
could make an argument either way I think.

Curiously, if you do stuff like "cat longfile | head -10", you don't
get any visible error, though surely cat must be seeing the same
thing underneath ... I wonder how the shell is arranging that?

            regards, tom lane


I wrote:
> =?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
>> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
>> /*
>> [38000] ERROR: program "echo "test"" failed Detail: child process exited
>> with exit code 1
>> */

> Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> shows something pretty unsurprising:
> sh: line 1: echo: write error: Broken pipe
> So the called program is behaving in a somewhat reasonable way: it's
> detecting EPIPE on its stdout (after we close the pipe), reporting that,
> and doing exit(1).
> Unfortunately, it's not clear what we could do about that, short of
> always reading the whole program output, which is likely to be a
> cure worse than the disease.  If the program were failing thanks to
> SIGPIPE, we could recognize that as a case we can ignore ... but with
> behavior like this, I don't see a reliable way to tell it apart from
> a generic program failure, which surely we'd better report.

After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.

So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure.  The attached POC patch does that and successfully makes
the file_fdw problem go away for me.

It's just a POC because there are some things that need more thought
than I've given them:

1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream?  If not, what infrastructure do we
need to add to control that?  In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)

2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)

3. Maybe this should be implemented at some higher level?

4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?

            regards, tom lane

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b58a74f4e3db9d8e8f30d35e08ca84814742faa3..c4c6c875ec8102df2756d8ea1ba79791d63fed25 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ClosePipeToProgram(CopyState cstate)
*** 1727,1737 ****
--- 1727,1746 ----
                  (errcode_for_file_access(),
                   errmsg("could not close pipe to external command: %m")));
      else if (pclose_rc != 0)
+     {
+         /*
+          * If the called program terminated on SIGPIPE, assume it's OK; we
+          * must have chosen to stop reading its output early.
+          */
+         if (WIFSIGNALED(pclose_rc) && WTERMSIG(pclose_rc) == SIGPIPE)
+             return;
+
          ereport(ERROR,
                  (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
                   errmsg("program \"%s\" failed",
                          cstate->filename),
                   errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+     }
  }

  /*
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f176749c5d1b2adc9993b5a9a3571ee566c..6cec85e2c1dc7dea23c8b941e5080714ea65e57a 100644
*** a/src/backend/storage/file/fd.c
--- b/src/backend/storage/file/fd.c
*************** OpenTransientFilePerm(const char *fileNa
*** 2430,2440 ****
--- 2430,2445 ----
   * Routines that want to initiate a pipe stream should use OpenPipeStream
   * rather than plain popen().  This lets fd.c deal with freeing FDs if
   * necessary.  When done, call ClosePipeStream rather than pclose.
+  *
+  * This function also ensures that the popen'd program is run with default
+  * SIGPIPE processing, rather than the SIG_IGN setting the backend normally
+  * uses.  This ensures desirable response to, eg, closing a read pipe early.
   */
  FILE *
  OpenPipeStream(const char *command, const char *mode)
  {
      FILE       *file;
+     int            save_errno;

      DO_DB(elog(LOG, "OpenPipeStream: Allocated %d (%s)",
                 numAllocatedDescs, command));
*************** OpenPipeStream(const char *command, cons
*** 2452,2459 ****
  TryAgain:
      fflush(stdout);
      fflush(stderr);
      errno = 0;
!     if ((file = popen(command, mode)) != NULL)
      {
          AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];

--- 2457,2469 ----
  TryAgain:
      fflush(stdout);
      fflush(stderr);
+     pqsignal(SIGPIPE, SIG_DFL);
      errno = 0;
!     file = popen(command, mode);
!     save_errno = errno;
!     pqsignal(SIGPIPE, SIG_IGN);
!     errno = save_errno;
!     if (file != NULL)
      {
          AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];

*************** TryAgain:
*** 2466,2473 ****

      if (errno == EMFILE || errno == ENFILE)
      {
-         int            save_errno = errno;
-
          ereport(LOG,
                  (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
                   errmsg("out of file descriptors: %m; release and retry")));
--- 2476,2481 ----

I wrote:
> =?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
>> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
>> /*
>> [38000] ERROR: program "echo "test"" failed Detail: child process exited
>> with exit code 1
>> */

> Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> shows something pretty unsurprising:
> sh: line 1: echo: write error: Broken pipe
> So the called program is behaving in a somewhat reasonable way: it's
> detecting EPIPE on its stdout (after we close the pipe), reporting that,
> and doing exit(1).
> Unfortunately, it's not clear what we could do about that, short of
> always reading the whole program output, which is likely to be a
> cure worse than the disease.  If the program were failing thanks to
> SIGPIPE, we could recognize that as a case we can ignore ... but with
> behavior like this, I don't see a reliable way to tell it apart from
> a generic program failure, which surely we'd better report.

After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.

So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure.  The attached POC patch does that and successfully makes
the file_fdw problem go away for me.

It's just a POC because there are some things that need more thought
than I've given them:

1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream?  If not, what infrastructure do we
need to add to control that?  In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)

2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)

3. Maybe this should be implemented at some higher level?

4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?

            regards, tom lane

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b58a74f4e3db9d8e8f30d35e08ca84814742faa3..c4c6c875ec8102df2756d8ea1ba79791d63fed25 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** ClosePipeToProgram(CopyState cstate)
*** 1727,1737 ****
--- 1727,1746 ----
                  (errcode_for_file_access(),
                   errmsg("could not close pipe to external command: %m")));
      else if (pclose_rc != 0)
+     {
+         /*
+          * If the called program terminated on SIGPIPE, assume it's OK; we
+          * must have chosen to stop reading its output early.
+          */
+         if (WIFSIGNALED(pclose_rc) && WTERMSIG(pclose_rc) == SIGPIPE)
+             return;
+
          ereport(ERROR,
                  (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
                   errmsg("program \"%s\" failed",
                          cstate->filename),
                   errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+     }
  }

  /*
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f176749c5d1b2adc9993b5a9a3571ee566c..6cec85e2c1dc7dea23c8b941e5080714ea65e57a 100644
*** a/src/backend/storage/file/fd.c
--- b/src/backend/storage/file/fd.c
*************** OpenTransientFilePerm(const char *fileNa
*** 2430,2440 ****
--- 2430,2445 ----
   * Routines that want to initiate a pipe stream should use OpenPipeStream
   * rather than plain popen().  This lets fd.c deal with freeing FDs if
   * necessary.  When done, call ClosePipeStream rather than pclose.
+  *
+  * This function also ensures that the popen'd program is run with default
+  * SIGPIPE processing, rather than the SIG_IGN setting the backend normally
+  * uses.  This ensures desirable response to, eg, closing a read pipe early.
   */
  FILE *
  OpenPipeStream(const char *command, const char *mode)
  {
      FILE       *file;
+     int            save_errno;

      DO_DB(elog(LOG, "OpenPipeStream: Allocated %d (%s)",
                 numAllocatedDescs, command));
*************** OpenPipeStream(const char *command, cons
*** 2452,2459 ****
  TryAgain:
      fflush(stdout);
      fflush(stderr);
      errno = 0;
!     if ((file = popen(command, mode)) != NULL)
      {
          AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];

--- 2457,2469 ----
  TryAgain:
      fflush(stdout);
      fflush(stderr);
+     pqsignal(SIGPIPE, SIG_DFL);
      errno = 0;
!     file = popen(command, mode);
!     save_errno = errno;
!     pqsignal(SIGPIPE, SIG_IGN);
!     errno = save_errno;
!     if (file != NULL)
      {
          AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];

*************** TryAgain:
*** 2466,2473 ****

      if (errno == EMFILE || errno == ENFILE)
      {
-         int            save_errno = errno;
-
          ereport(LOG,
                  (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
                   errmsg("out of file descriptors: %m; release and retry")));
--- 2476,2481 ----

Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
Hello.

At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane <tgl@sss.pgh.pa.us> wrote in <18397.1540297291@sss.pgh.pa.us>
> I wrote:
> > =?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> >> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> >> /*
> >> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> >> with exit code 1
> >> */
> 
> > Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> > shows something pretty unsurprising:
> > sh: line 1: echo: write error: Broken pipe
> > So the called program is behaving in a somewhat reasonable way: it's
> > detecting EPIPE on its stdout (after we close the pipe), reporting that,
> > and doing exit(1).
> > Unfortunately, it's not clear what we could do about that, short of
> > always reading the whole program output, which is likely to be a
> > cure worse than the disease.  If the program were failing thanks to
> > SIGPIPE, we could recognize that as a case we can ignore ... but with
> > behavior like this, I don't see a reliable way to tell it apart from
> > a generic program failure, which surely we'd better report.
> 
> After a bit of thought, the problem here is blindingly obvious:
> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> and evidently popen() allows the called program to inherit that,
> at least on these platforms.
> 
> So what we need to do is make sure the called program inherits SIG_DFL
> handling for SIGPIPE, and then special-case that result as not being
> a failure.  The attached POC patch does that and successfully makes
> the file_fdw problem go away for me.
> 
> It's just a POC because there are some things that need more thought
> than I've given them:
> 
> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> launched through OpenPipeStream?  If not, what infrastructure do we
> need to add to control that?  In particular, is it sane to revert
> SIGPIPE for a pipe program that we will write to not read from?
> (I thought probably yes, because that is the normal Unix behavior,
> but it could be debated.)

Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
might not be handled poperly since it would be unexpected by the
program.

If we don't read from the pipe (that is, we are writing to it),
anyway we shouldn't change the behavior since SIGPIPE can come
from another pipe.

> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> we don't intend to terminate that early.)

Since the SIGPIPE may come from another pipe, I think we
shouldn't generally. But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure. Thus ClosePipeToProgram
might need an extra parameter or CopyState may need an additional
flag named, say, "explicit_close" (or
"ignore_sigpipe_on_pclose"..) in CopyState which will be set in
EndCopy.

> 3. Maybe this should be implemented at some higher level?

I'm not sure, but it seems to me not needed.

> 4. Are there any other signals we ought to be reverting to default
> behavior before launching a COPY TO/FROM PROGRAM?

All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
Hello.

At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane <tgl@sss.pgh.pa.us> wrote in <18397.1540297291@sss.pgh.pa.us>
> I wrote:
> > =?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> >> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> >> /*
> >> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> >> with exit code 1
> >> */
> 
> > Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> > shows something pretty unsurprising:
> > sh: line 1: echo: write error: Broken pipe
> > So the called program is behaving in a somewhat reasonable way: it's
> > detecting EPIPE on its stdout (after we close the pipe), reporting that,
> > and doing exit(1).
> > Unfortunately, it's not clear what we could do about that, short of
> > always reading the whole program output, which is likely to be a
> > cure worse than the disease.  If the program were failing thanks to
> > SIGPIPE, we could recognize that as a case we can ignore ... but with
> > behavior like this, I don't see a reliable way to tell it apart from
> > a generic program failure, which surely we'd better report.
> 
> After a bit of thought, the problem here is blindingly obvious:
> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> and evidently popen() allows the called program to inherit that,
> at least on these platforms.
> 
> So what we need to do is make sure the called program inherits SIG_DFL
> handling for SIGPIPE, and then special-case that result as not being
> a failure.  The attached POC patch does that and successfully makes
> the file_fdw problem go away for me.
> 
> It's just a POC because there are some things that need more thought
> than I've given them:
> 
> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> launched through OpenPipeStream?  If not, what infrastructure do we
> need to add to control that?  In particular, is it sane to revert
> SIGPIPE for a pipe program that we will write to not read from?
> (I thought probably yes, because that is the normal Unix behavior,
> but it could be debated.)

Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
might not be handled poperly since it would be unexpected by the
program.

If we don't read from the pipe (that is, we are writing to it),
anyway we shouldn't change the behavior since SIGPIPE can come
from another pipe.

> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> we don't intend to terminate that early.)

Since the SIGPIPE may come from another pipe, I think we
shouldn't generally. But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure. Thus ClosePipeToProgram
might need an extra parameter or CopyState may need an additional
flag named, say, "explicit_close" (or
"ignore_sigpipe_on_pclose"..) in CopyState which will be set in
EndCopy.

> 3. Maybe this should be implemented at some higher level?

I'm not sure, but it seems to me not needed.

> 4. Are there any other signals we ought to be reverting to default
> behavior before launching a COPY TO/FROM PROGRAM?

All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
> At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us>  wrote in<18397.1540297291@sss.pgh.pa.us>
>> After a bit of thought, the problem here is blindingly obvious:
>> we generally run the backend with SIGPIPE handing set to SIG_IGN,
>> and evidently popen() allows the called program to inherit that,
>> at least on these platforms.
>>
>> So what we need to do is make sure the called program inherits SIG_DFL
>> handling for SIGPIPE, and then special-case that result as not being
>> a failure.  The attached POC patch does that and successfully makes
>> the file_fdw problem go away for me.

Thanks for working on this!

>> It's just a POC because there are some things that need more thought
>> than I've given them:
>>
>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
>> launched through OpenPipeStream?  If not, what infrastructure do we
>> need to add to control that?  In particular, is it sane to revert
>> SIGPIPE for a pipe program that we will write to not read from?
>> (I thought probably yes, because that is the normal Unix behavior,
>> but it could be debated.)
>
> Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
> might not be handled poperly since it would be unexpected by the
> program.
>
> If we don't read from the pipe (that is, we are writing to it),
> anyway we shouldn't change the behavior since SIGPIPE can come
> from another pipe.

I'm a bit confused here.  Horiguchi-san, are you saying that the called 
program that we will read from should inherit SIG_DFL for SIGPIPE, as 
proposed in the POC patch, but the called program that we will write to 
should inherit SIG_IGN as it currently does?

ISTM that it would be OK to inherit SIG_DFL in both cases, because I 
think it would be the responsibility of the called program to handle 
SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing 
something, though.

>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
>> we don't intend to terminate that early.)
>
> Since the SIGPIPE may come from another pipe, I think we
> shouldn't generally.

Agreed; if ClosePipeToProgram ignores that failure, we would fail to get 
a better error message in CopySendEndOfRow if the called program 
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:

                 if (cstate->is_program)
                 {
                     if (errno == EPIPE)
                     {
                         /*
                          * The pipe will be closed automatically on 
error at
                          * the end of transaction, but we might get a 
better
                          * error message from the subprocess' exit code 
than
                          * just "Broken Pipe"
                          */
                         ClosePipeToProgram(cstate);

                         /*
                          * If ClosePipeToProgram() didn't throw an 
error, the
                          * program terminated normally, but closed the pipe
                          * first. Restore errno, and throw an error.
                          */
                         errno = EPIPE;
                     }
                     ereport(ERROR,
                             (errcode_for_file_access(),
                              errmsg("could not write to COPY program: 
%m")));
                 }

> But iff we are explicitly stop reading from
> the pipe before detecting an error, it can be ignored since we
> aren't interested in further failure.

You mean that we should ignore other failures of the called program when 
we stop reading from the pipe early?

> Thus ClosePipeToProgram
> might need an extra parameter or CopyState may need an additional
> flag named, say, "explicit_close" (or
> "ignore_sigpipe_on_pclose"..) in CopyState which will be set in
> EndCopy.
>
>> 3. Maybe this should be implemented at some higher level?
>
> I'm not sure, but it seems to me not needed.

ISTM that it's a good idea to control the ClosePipeToProgram behavior by 
a new member for CopyState.

>> 4. Are there any other signals we ought to be reverting to default
>> behavior before launching a COPY TO/FROM PROGRAM?
>
> All handlers other than SIG_DEF and IGN are reset on exec().
> Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
> harm anything. Perhaps it would be safer against future changes
> if we explicitly reset all changed actions to SIG_DEF, but it
> might be too-much..

Not sure, but reverting SIGPIPE to default would be enough as a fix for 
the original issue, if we go with the POC patch.

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
> At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us>  wrote in<18397.1540297291@sss.pgh.pa.us>
>> After a bit of thought, the problem here is blindingly obvious:
>> we generally run the backend with SIGPIPE handing set to SIG_IGN,
>> and evidently popen() allows the called program to inherit that,
>> at least on these platforms.
>>
>> So what we need to do is make sure the called program inherits SIG_DFL
>> handling for SIGPIPE, and then special-case that result as not being
>> a failure.  The attached POC patch does that and successfully makes
>> the file_fdw problem go away for me.

Thanks for working on this!

>> It's just a POC because there are some things that need more thought
>> than I've given them:
>>
>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
>> launched through OpenPipeStream?  If not, what infrastructure do we
>> need to add to control that?  In particular, is it sane to revert
>> SIGPIPE for a pipe program that we will write to not read from?
>> (I thought probably yes, because that is the normal Unix behavior,
>> but it could be debated.)
>
> Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
> might not be handled poperly since it would be unexpected by the
> program.
>
> If we don't read from the pipe (that is, we are writing to it),
> anyway we shouldn't change the behavior since SIGPIPE can come
> from another pipe.

I'm a bit confused here.  Horiguchi-san, are you saying that the called 
program that we will read from should inherit SIG_DFL for SIGPIPE, as 
proposed in the POC patch, but the called program that we will write to 
should inherit SIG_IGN as it currently does?

ISTM that it would be OK to inherit SIG_DFL in both cases, because I 
think it would be the responsibility of the called program to handle 
SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing 
something, though.

>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
>> we don't intend to terminate that early.)
>
> Since the SIGPIPE may come from another pipe, I think we
> shouldn't generally.

Agreed; if ClosePipeToProgram ignores that failure, we would fail to get 
a better error message in CopySendEndOfRow if the called program 
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:

                 if (cstate->is_program)
                 {
                     if (errno == EPIPE)
                     {
                         /*
                          * The pipe will be closed automatically on 
error at
                          * the end of transaction, but we might get a 
better
                          * error message from the subprocess' exit code 
than
                          * just "Broken Pipe"
                          */
                         ClosePipeToProgram(cstate);

                         /*
                          * If ClosePipeToProgram() didn't throw an 
error, the
                          * program terminated normally, but closed the pipe
                          * first. Restore errno, and throw an error.
                          */
                         errno = EPIPE;
                     }
                     ereport(ERROR,
                             (errcode_for_file_access(),
                              errmsg("could not write to COPY program: 
%m")));
                 }

> But iff we are explicitly stop reading from
> the pipe before detecting an error, it can be ignored since we
> aren't interested in further failure.

You mean that we should ignore other failures of the called program when 
we stop reading from the pipe early?

> Thus ClosePipeToProgram
> might need an extra parameter or CopyState may need an additional
> flag named, say, "explicit_close" (or
> "ignore_sigpipe_on_pclose"..) in CopyState which will be set in
> EndCopy.
>
>> 3. Maybe this should be implemented at some higher level?
>
> I'm not sure, but it seems to me not needed.

ISTM that it's a good idea to control the ClosePipeToProgram behavior by 
a new member for CopyState.

>> 4. Are there any other signals we ought to be reverting to default
>> behavior before launching a COPY TO/FROM PROGRAM?
>
> All handlers other than SIG_DEF and IGN are reset on exec().
> Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
> harm anything. Perhaps it would be safer against future changes
> if we explicitly reset all changed actions to SIG_DEF, but it
> might be too-much..

Not sure, but reverting SIGPIPE to default would be enough as a fix for 
the original issue, if we go with the POC patch.

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5BDC4BA0.7050106@lab.ntt.co.jp>
> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
> > At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us> wrote
> > in<18397.1540297291@sss.pgh.pa.us>
> >> After a bit of thought, the problem here is blindingly obvious:
> >> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> >> and evidently popen() allows the called program to inherit that,
> >> at least on these platforms.
> >>
> >> So what we need to do is make sure the called program inherits SIG_DFL
> >> handling for SIGPIPE, and then special-case that result as not being
> >> a failure.  The attached POC patch does that and successfully makes
> >> the file_fdw problem go away for me.
> 
> Thanks for working on this!
> 
> >> It's just a POC because there are some things that need more thought
> >> than I've given them:
> >>
> >> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> >> launched through OpenPipeStream?  If not, what infrastructure do we
> >> need to add to control that?  In particular, is it sane to revert
> >> SIGPIPE for a pipe program that we will write to not read from?
> >> (I thought probably yes, because that is the normal Unix behavior,
> >> but it could be debated.)
> >
> > Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
> > might not be handled poperly since it would be unexpected by the
> > program.
> >
> > If we don't read from the pipe (that is, we are writing to it),
> > anyway we shouldn't change the behavior since SIGPIPE can come
> > from another pipe.
> 
> I'm a bit confused here.  Horiguchi-san, are you saying that the
> called program that we will read from should inherit SIG_DFL for
> SIGPIPE, as proposed in the POC patch, but the called program that we
> will write to should inherit SIG_IGN as it currently does?

Maybe yes. The understanding of the first paragram looks
right. But in my second paragraph, I said that we shouldn't set
SIGPIPE to other than SIG_DFL at exec() time even if we are to
read the pipe because it can be writing to another pipe and the
change can affect the program's behavior.

> ISTM that it would be OK to inherit SIG_DFL in both cases, because I
> think it would be the responsibility of the called program to handle
> SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
> something, though.

So, I think we *should* (not just OK to) restore SIGPIPE to
SIG_DFL in any case here to prevent undetermined situation for
the program.

> >> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> >> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> >> we don't intend to terminate that early.)
> >
> > Since the SIGPIPE may come from another pipe, I think we
> > shouldn't generally.
> 
> Agreed; if ClosePipeToProgram ignores that failure, we would fail to
> get a better error message in CopySendEndOfRow if the called program
> (inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
> 
>                 if (cstate->is_program)
>                 {
>                     if (errno == EPIPE)
>                     {
>                         /*
>                          * The pipe will be closed automatically on error at
>                          * the end of transaction, but we might get a better
>                          * error message from the subprocess' exit code than
>                          * just "Broken Pipe"
>                          */
>                         ClosePipeToProgram(cstate);
> 
>                         /*
>                          * If ClosePipeToProgram() didn't throw an error, the
>                          * program terminated normally, but closed the pipe
>                          * first. Restore errno, and throw an error.
>                          */
>                         errno = EPIPE;
>                     }
>                     ereport(ERROR,
>                             (errcode_for_file_access(),
>                              errmsg("could not write to COPY program: %m")));
>                 }

Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
program's exit status. So it is irrelevant to called program's
SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
backend side.

> > But iff we are explicitly stop reading from
> > the pipe before detecting an error, it can be ignored since we
> > aren't interested in further failure.
> 
> You mean that we should ignore other failures of the called program
> when we stop reading from the pipe early?

Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the program
since closing our writing end of a pipe is likely to cause it and
even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.

> > Thus ClosePipeToProgram
> > might need an extra parameter or CopyState may need an additional
> > flag named, say, "explicit_close" (or
> > "ignore_sigpipe_on_pclose"..) in CopyState which will be set in
> > EndCopy.
> >
> >> 3. Maybe this should be implemented at some higher level?
> >
> > I'm not sure, but it seems to me not needed.
> 
> ISTM that it's a good idea to control the ClosePipeToProgram behavior
> by a new member for CopyState.
> 
> >> 4. Are there any other signals we ought to be reverting to default
> >> behavior before launching a COPY TO/FROM PROGRAM?
> >
> > All handlers other than SIG_DEF and IGN are reset on exec().
> > Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
> > harm anything. Perhaps it would be safer against future changes
> > if we explicitly reset all changed actions to SIG_DEF, but it
> > might be too-much..
> 
> Not sure, but reverting SIGPIPE to default would be enough as a fix
> for the original issue, if we go with the POC patch.

Agreed. I wonder why there's no API that resets all handlers to
SIG_DFL at once or some flag telling to exec() that it should
start with default signal handler set.

I tried that but found that fread() doesn't detect termination of
called program by a signal as an error. So just set a flag in
EndCopyFrom() ends with concealing of the error.

Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).

=# select * from ft2 limit 1;
   a   
-------
 test1

=# select * from ft2 limit 2;
ERROR:  program "/home/horiguti/work/exprog" failed
DETAIL:  child process was terminated by signal 13: Broken pipe

For the original case:

=# select * from ft1 limit 1;
  a   
------
 test
=# select * from ft1 limit 2;
  a   
------
 test
(1 row)


I didn't confirmed whether it is complete.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b58a74f4e3..1cff0a4221 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -219,6 +219,12 @@ typedef struct CopyStateData
     char       *raw_buf;
     int            raw_buf_index;    /* next byte to process */
     int            raw_buf_len;    /* total # of bytes stored */
+
+    /*
+     * We ignore SIGPIPE of the called proram in ClosePipeToProgram() when we
+     * don't see an EOF yet from the reading end of a pipe to a program.
+     */
+    bool        eof_reached;
 } CopyStateData;
 
 /* DestReceiver for COPY (query) TO */
@@ -1727,11 +1733,21 @@ ClosePipeToProgram(CopyState cstate)
                 (errcode_for_file_access(),
                  errmsg("could not close pipe to external command: %m")));
     else if (pclose_rc != 0)
+    {
+        /*
+         * If the called program terminated on SIGPIPE, assume it's OK; we
+         * must have chosen to stop reading its output early.
+         */
+        if (!cstate->eof_reached &&
+            WIFSIGNALED(pclose_rc) && WTERMSIG(pclose_rc) == SIGPIPE)
+            return;
+
         ereport(ERROR,
                 (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
                  errmsg("program \"%s\" failed",
                         cstate->filename),
                  errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+    }
 }
 
 /*
@@ -3525,7 +3541,10 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
         /* read raw fields in the next line */
         if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
+        {
+            cstate->eof_reached = true;
             return false;
+        }
 
         /* check for overflowing fields */
         if (nfields > 0 && fldct > nfields)
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f1767..6cec85e2c1 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2430,11 +2430,16 @@ OpenTransientFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
  * Routines that want to initiate a pipe stream should use OpenPipeStream
  * rather than plain popen().  This lets fd.c deal with freeing FDs if
  * necessary.  When done, call ClosePipeStream rather than pclose.
+ *
+ * This function also ensures that the popen'd program is run with default
+ * SIGPIPE processing, rather than the SIG_IGN setting the backend normally
+ * uses.  This ensures desirable response to, eg, closing a read pipe early.
  */
 FILE *
 OpenPipeStream(const char *command, const char *mode)
 {
     FILE       *file;
+    int            save_errno;
 
     DO_DB(elog(LOG, "OpenPipeStream: Allocated %d (%s)",
                numAllocatedDescs, command));
@@ -2452,8 +2457,13 @@ OpenPipeStream(const char *command, const char *mode)
 TryAgain:
     fflush(stdout);
     fflush(stderr);
+    pqsignal(SIGPIPE, SIG_DFL);
     errno = 0;
-    if ((file = popen(command, mode)) != NULL)
+    file = popen(command, mode);
+    save_errno = errno;
+    pqsignal(SIGPIPE, SIG_IGN);
+    errno = save_errno;
+    if (file != NULL)
     {
         AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
 
@@ -2466,8 +2476,6 @@ TryAgain:
 
     if (errno == EMFILE || errno == ENFILE)
     {
-        int            save_errno = errno;
-
         ereport(LOG,
                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
                  errmsg("out of file descriptors: %m; release and retry")));

Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5BDC4BA0.7050106@lab.ntt.co.jp>
> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
> > At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us> wrote
> > in<18397.1540297291@sss.pgh.pa.us>
> >> After a bit of thought, the problem here is blindingly obvious:
> >> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> >> and evidently popen() allows the called program to inherit that,
> >> at least on these platforms.
> >>
> >> So what we need to do is make sure the called program inherits SIG_DFL
> >> handling for SIGPIPE, and then special-case that result as not being
> >> a failure.  The attached POC patch does that and successfully makes
> >> the file_fdw problem go away for me.
> 
> Thanks for working on this!
> 
> >> It's just a POC because there are some things that need more thought
> >> than I've given them:
> >>
> >> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> >> launched through OpenPipeStream?  If not, what infrastructure do we
> >> need to add to control that?  In particular, is it sane to revert
> >> SIGPIPE for a pipe program that we will write to not read from?
> >> (I thought probably yes, because that is the normal Unix behavior,
> >> but it could be debated.)
> >
> > Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
> > might not be handled poperly since it would be unexpected by the
> > program.
> >
> > If we don't read from the pipe (that is, we are writing to it),
> > anyway we shouldn't change the behavior since SIGPIPE can come
> > from another pipe.
> 
> I'm a bit confused here.  Horiguchi-san, are you saying that the
> called program that we will read from should inherit SIG_DFL for
> SIGPIPE, as proposed in the POC patch, but the called program that we
> will write to should inherit SIG_IGN as it currently does?

Maybe yes. The understanding of the first paragram looks
right. But in my second paragraph, I said that we shouldn't set
SIGPIPE to other than SIG_DFL at exec() time even if we are to
read the pipe because it can be writing to another pipe and the
change can affect the program's behavior.

> ISTM that it would be OK to inherit SIG_DFL in both cases, because I
> think it would be the responsibility of the called program to handle
> SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
> something, though.

So, I think we *should* (not just OK to) restore SIGPIPE to
SIG_DFL in any case here to prevent undetermined situation for
the program.

> >> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> >> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> >> we don't intend to terminate that early.)
> >
> > Since the SIGPIPE may come from another pipe, I think we
> > shouldn't generally.
> 
> Agreed; if ClosePipeToProgram ignores that failure, we would fail to
> get a better error message in CopySendEndOfRow if the called program
> (inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
> 
>                 if (cstate->is_program)
>                 {
>                     if (errno == EPIPE)
>                     {
>                         /*
>                          * The pipe will be closed automatically on error at
>                          * the end of transaction, but we might get a better
>                          * error message from the subprocess' exit code than
>                          * just "Broken Pipe"
>                          */
>                         ClosePipeToProgram(cstate);
> 
>                         /*
>                          * If ClosePipeToProgram() didn't throw an error, the
>                          * program terminated normally, but closed the pipe
>                          * first. Restore errno, and throw an error.
>                          */
>                         errno = EPIPE;
>                     }
>                     ereport(ERROR,
>                             (errcode_for_file_access(),
>                              errmsg("could not write to COPY program: %m")));
>                 }

Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
program's exit status. So it is irrelevant to called program's
SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
backend side.

> > But iff we are explicitly stop reading from
> > the pipe before detecting an error, it can be ignored since we
> > aren't interested in further failure.
> 
> You mean that we should ignore other failures of the called program
> when we stop reading from the pipe early?

Yes, we have received sufficient data from the pipe then closed
it successfully. The program may want write more but we don't
want it. We ragher should ignore SIGPIPE exit code of the program
since closing our writing end of a pipe is likely to cause it and
even if it comes from another pipe, we can assume that the
SIGPIPE immediately stopped the program before returning any
garbage to us.

> > Thus ClosePipeToProgram
> > might need an extra parameter or CopyState may need an additional
> > flag named, say, "explicit_close" (or
> > "ignore_sigpipe_on_pclose"..) in CopyState which will be set in
> > EndCopy.
> >
> >> 3. Maybe this should be implemented at some higher level?
> >
> > I'm not sure, but it seems to me not needed.
> 
> ISTM that it's a good idea to control the ClosePipeToProgram behavior
> by a new member for CopyState.
> 
> >> 4. Are there any other signals we ought to be reverting to default
> >> behavior before launching a COPY TO/FROM PROGRAM?
> >
> > All handlers other than SIG_DEF and IGN are reset on exec().
> > Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
> > harm anything. Perhaps it would be safer against future changes
> > if we explicitly reset all changed actions to SIG_DEF, but it
> > might be too-much..
> 
> Not sure, but reverting SIGPIPE to default would be enough as a fix
> for the original issue, if we go with the POC patch.

Agreed. I wonder why there's no API that resets all handlers to
SIG_DFL at once or some flag telling to exec() that it should
start with default signal handler set.

I tried that but found that fread() doesn't detect termination of
called program by a signal as an error. So just set a flag in
EndCopyFrom() ends with concealing of the error.

Finally in the attached PoC, I set cstate->eof_reached on failure
of NextCopyFromRawFields then if eof_reached we don't ingore
SIGPIPE. For a program like
"puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
the following behavior. (I got an error without the fflush() but
it is inevitable).

=# select * from ft2 limit 1;
   a   
-------
 test1

=# select * from ft2 limit 2;
ERROR:  program "/home/horiguti/work/exprog" failed
DETAIL:  child process was terminated by signal 13: Broken pipe

For the original case:

=# select * from ft1 limit 1;
  a   
------
 test
=# select * from ft1 limit 2;
  a   
------
 test
(1 row)


I didn't confirmed whether it is complete.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b58a74f4e3..1cff0a4221 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -219,6 +219,12 @@ typedef struct CopyStateData
     char       *raw_buf;
     int            raw_buf_index;    /* next byte to process */
     int            raw_buf_len;    /* total # of bytes stored */
+
+    /*
+     * We ignore SIGPIPE of the called proram in ClosePipeToProgram() when we
+     * don't see an EOF yet from the reading end of a pipe to a program.
+     */
+    bool        eof_reached;
 } CopyStateData;
 
 /* DestReceiver for COPY (query) TO */
@@ -1727,11 +1733,21 @@ ClosePipeToProgram(CopyState cstate)
                 (errcode_for_file_access(),
                  errmsg("could not close pipe to external command: %m")));
     else if (pclose_rc != 0)
+    {
+        /*
+         * If the called program terminated on SIGPIPE, assume it's OK; we
+         * must have chosen to stop reading its output early.
+         */
+        if (!cstate->eof_reached &&
+            WIFSIGNALED(pclose_rc) && WTERMSIG(pclose_rc) == SIGPIPE)
+            return;
+
         ereport(ERROR,
                 (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
                  errmsg("program \"%s\" failed",
                         cstate->filename),
                  errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+    }
 }
 
 /*
@@ -3525,7 +3541,10 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
         /* read raw fields in the next line */
         if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
+        {
+            cstate->eof_reached = true;
             return false;
+        }
 
         /* check for overflowing fields */
         if (nfields > 0 && fldct > nfields)
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f1767..6cec85e2c1 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2430,11 +2430,16 @@ OpenTransientFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
  * Routines that want to initiate a pipe stream should use OpenPipeStream
  * rather than plain popen().  This lets fd.c deal with freeing FDs if
  * necessary.  When done, call ClosePipeStream rather than pclose.
+ *
+ * This function also ensures that the popen'd program is run with default
+ * SIGPIPE processing, rather than the SIG_IGN setting the backend normally
+ * uses.  This ensures desirable response to, eg, closing a read pipe early.
  */
 FILE *
 OpenPipeStream(const char *command, const char *mode)
 {
     FILE       *file;
+    int            save_errno;
 
     DO_DB(elog(LOG, "OpenPipeStream: Allocated %d (%s)",
                numAllocatedDescs, command));
@@ -2452,8 +2457,13 @@ OpenPipeStream(const char *command, const char *mode)
 TryAgain:
     fflush(stdout);
     fflush(stderr);
+    pqsignal(SIGPIPE, SIG_DFL);
     errno = 0;
-    if ((file = popen(command, mode)) != NULL)
+    file = popen(command, mode);
+    save_errno = errno;
+    pqsignal(SIGPIPE, SIG_IGN);
+    errno = save_errno;
+    if (file != NULL)
     {
         AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
 
@@ -2466,8 +2476,6 @@ TryAgain:
 
     if (errno == EMFILE || errno == ENFILE)
     {
-        int            save_errno = errno;
-
         ereport(LOG,
                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
                  errmsg("out of file descriptors: %m; release and retry")));

Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Thomas Munro
Date:
On Wed, Oct 24, 2018 at 1:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > =?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> >> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> >> /*
> >> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> >> with exit code 1
> >> */
>
> > Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> > shows something pretty unsurprising:
> > sh: line 1: echo: write error: Broken pipe
> > So the called program is behaving in a somewhat reasonable way: it's
> > detecting EPIPE on its stdout (after we close the pipe), reporting that,
> > and doing exit(1).
> > Unfortunately, it's not clear what we could do about that, short of
> > always reading the whole program output, which is likely to be a
> > cure worse than the disease.  If the program were failing thanks to
> > SIGPIPE, we could recognize that as a case we can ignore ... but with
> > behavior like this, I don't see a reliable way to tell it apart from
> > a generic program failure, which surely we'd better report.
>
> After a bit of thought, the problem here is blindingly obvious:
> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> and evidently popen() allows the called program to inherit that,
> at least on these platforms.
>
> So what we need to do is make sure the called program inherits SIG_DFL
> handling for SIGPIPE, and then special-case that result as not being
> a failure.  The attached POC patch does that and successfully makes
> the file_fdw problem go away for me.
>
> It's just a POC because there are some things that need more thought
> than I've given them:
>
> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> launched through OpenPipeStream?  If not, what infrastructure do we
> need to add to control that?  In particular, is it sane to revert
> SIGPIPE for a pipe program that we will write to not read from?
> (I thought probably yes, because that is the normal Unix behavior,
> but it could be debated.)
>
> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> we don't intend to terminate that early.)

I'm not sure about that.  It might in theory be telling you about some
other pipe.  If you're OK with that false positive, why not ignore all
errors after you've read enough successful input and decided to close
the pipe?

> 3. Maybe this should be implemented at some higher level?

It won't work for some programs that ignore or catch the signal, so in
theory you might want to give users the power/responsibility to say
"ignore errors that occur after I decide to hang up".  Here are three
different behaviours I found in popular software, showing termination
by signal, custom error handling that we can't distinguish, and a
bonehead strategy:

$ seq 1 1000000 | head -5
1
2
3
4
5
... exit code indicates killed by signal

$ python -c "for i in range(1000000): print i" | head -5
0
1
2
3
4
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 32] Broken pipe
... exit code 1

$ cat Test.java
public class Test {
        public static void main(String[] args) {
                for (int i = 0; i < 1000000; ++i) {
                        System.out.println(Integer.toString(i));
                }
        }
}
$ javac Test.java
$ java Test | head -5
0
1
2
3
4
... wait a really long time with no output, exit code 0

(Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
exception, except for PrintStreams like System.out which just eat data
after an error...)

> 4. Are there any other signals we ought to be reverting to default
> behavior before launching a COPY TO/FROM PROGRAM?

On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM.  The differences were:
PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored.  Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Thomas Munro
Date:
On Wed, Oct 24, 2018 at 1:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > =?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> >> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
> >> /*
> >> [38000] ERROR: program "echo "test"" failed Detail: child process exited
> >> with exit code 1
> >> */
>
> > Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
> > shows something pretty unsurprising:
> > sh: line 1: echo: write error: Broken pipe
> > So the called program is behaving in a somewhat reasonable way: it's
> > detecting EPIPE on its stdout (after we close the pipe), reporting that,
> > and doing exit(1).
> > Unfortunately, it's not clear what we could do about that, short of
> > always reading the whole program output, which is likely to be a
> > cure worse than the disease.  If the program were failing thanks to
> > SIGPIPE, we could recognize that as a case we can ignore ... but with
> > behavior like this, I don't see a reliable way to tell it apart from
> > a generic program failure, which surely we'd better report.
>
> After a bit of thought, the problem here is blindingly obvious:
> we generally run the backend with SIGPIPE handing set to SIG_IGN,
> and evidently popen() allows the called program to inherit that,
> at least on these platforms.
>
> So what we need to do is make sure the called program inherits SIG_DFL
> handling for SIGPIPE, and then special-case that result as not being
> a failure.  The attached POC patch does that and successfully makes
> the file_fdw problem go away for me.
>
> It's just a POC because there are some things that need more thought
> than I've given them:
>
> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> launched through OpenPipeStream?  If not, what infrastructure do we
> need to add to control that?  In particular, is it sane to revert
> SIGPIPE for a pipe program that we will write to not read from?
> (I thought probably yes, because that is the normal Unix behavior,
> but it could be debated.)
>
> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> we don't intend to terminate that early.)

I'm not sure about that.  It might in theory be telling you about some
other pipe.  If you're OK with that false positive, why not ignore all
errors after you've read enough successful input and decided to close
the pipe?

> 3. Maybe this should be implemented at some higher level?

It won't work for some programs that ignore or catch the signal, so in
theory you might want to give users the power/responsibility to say
"ignore errors that occur after I decide to hang up".  Here are three
different behaviours I found in popular software, showing termination
by signal, custom error handling that we can't distinguish, and a
bonehead strategy:

$ seq 1 1000000 | head -5
1
2
3
4
5
... exit code indicates killed by signal

$ python -c "for i in range(1000000): print i" | head -5
0
1
2
3
4
Traceback (most recent call last):
  File "<string>", line 1, in <module>
IOError: [Errno 32] Broken pipe
... exit code 1

$ cat Test.java
public class Test {
        public static void main(String[] args) {
                for (int i = 0; i < 1000000; ++i) {
                        System.out.println(Integer.toString(i));
                }
        }
}
$ javac Test.java
$ java Test | head -5
0
1
2
3
4
... wait a really long time with no output, exit code 0

(Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
exception, except for PrintStreams like System.out which just eat data
after an error...)

> 4. Are there any other signals we ought to be reverting to default
> behavior before launching a COPY TO/FROM PROGRAM?

On my FreeBSD system, I compared the output of procstat -i (= show
signal disposition) for two "sleep 60" processes, one invoked from the
shell and the other from COPY ... FROM PROGRAM.  The differences were:
PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
default action would be to terminate the process, but the COPY PROGRAM
child ignored them; for TTIN and TTOU, the default action would be to
stop the process, but again they are ignored.  Why do bgwriter.c,
startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
regular backends?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
> At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BDC4BA0.7050106@lab.ntt.co.jp>
>> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
>>> At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us>  wrote
>>> in<18397.1540297291@sss.pgh.pa.us>
>>>> It's just a POC because there are some things that need more thought
>>>> than I've given them:
>>>>
>>>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
>>>> launched through OpenPipeStream?  If not, what infrastructure do we
>>>> need to add to control that?  In particular, is it sane to revert
>>>> SIGPIPE for a pipe program that we will write to not read from?
>>>> (I thought probably yes, because that is the normal Unix behavior,
>>>> but it could be debated.)

>> ISTM that it would be OK to inherit SIG_DFL in both cases, because I
>> think it would be the responsibility of the called program to handle
>> SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
>> something, though.
>
> So, I think we *should* (not just OK to) restore SIGPIPE to
> SIG_DFL in any case here to prevent undetermined situation for
> the program.

OK

>>>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>>>> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
>>>> we don't intend to terminate that early.)
>>>
>>> Since the SIGPIPE may come from another pipe, I think we
>>> shouldn't generally.
>>
>> Agreed; if ClosePipeToProgram ignores that failure, we would fail to
>> get a better error message in CopySendEndOfRow if the called program
>> (inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
>>
>>                  if (cstate->is_program)
>>                  {
>>                      if (errno == EPIPE)
>>                      {
>>                          /*
>>                           * The pipe will be closed automatically on error at
>>                           * the end of transaction, but we might get a better
>>                           * error message from the subprocess' exit code than
>>                           * just "Broken Pipe"
>>                           */
>>                          ClosePipeToProgram(cstate);
>>
>>                          /*
>>                           * If ClosePipeToProgram() didn't throw an error, the
>>                           * program terminated normally, but closed the pipe
>>                           * first. Restore errno, and throw an error.
>>                           */
>>                          errno = EPIPE;
>>                      }
>>                      ereport(ERROR,
>>                              (errcode_for_file_access(),
>>                               errmsg("could not write to COPY program: %m")));
>>                  }
>
> Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
> program's exit status. So it is irrelevant to called program's
> SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
> backend side.

My explanation might not be enough.  Let me explain.  If the called 
program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for 
some reason, ClosePipeToProgram *as-is* would create an error message 
from that program's exit code.  But if we modify ClosePipeToProgram like 
the original POC patch, that function would not create that message for 
that termination.  To avoid that, I think it would be better for 
ClosePipeToProgram to ignore the SIGPIPE failure only in the case where 
the caller is a COPY FROM PROGRAM that is allowed to terminate early. 
(Maybe we could specify that as a new argument for BeginCopyFrom.)

>>> But iff we are explicitly stop reading from
>>> the pipe before detecting an error, it can be ignored since we
>>> aren't interested in further failure.
>>
>> You mean that we should ignore other failures of the called program
>> when we stop reading from the pipe early?
>
> Yes, we have received sufficient data from the pipe then closed
> it successfully. The program may want write more but we don't
> want it. We ragher should ignore SIGPIPE exit code of the program
> since closing our writing end of a pipe is likely to cause it and

I think so too.

> even if it comes from another pipe, we can assume that the
> SIGPIPE immediately stopped the program before returning any
> garbage to us.

Sorry, I don't understand this.

>>>> 4. Are there any other signals we ought to be reverting to default
>>>> behavior before launching a COPY TO/FROM PROGRAM?
>>>
>>> All handlers other than SIG_DEF and IGN are reset on exec().
>>> Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
>>> harm anything. Perhaps it would be safer against future changes
>>> if we explicitly reset all changed actions to SIG_DEF, but it
>>> might be too-much..
>>
>> Not sure, but reverting SIGPIPE to default would be enough as a fix
>> for the original issue, if we go with the POC patch.
>
> Agreed. I wonder why there's no API that resets all handlers to
> SIG_DFL at once or some flag telling to exec() that it should
> start with default signal handler set.

Such API would be an improvement, but IMO I think that would go beyond 
the scope of this fix.

> Finally in the attached PoC, I set cstate->eof_reached on failure
> of NextCopyFromRawFields then if eof_reached we don't ingore
> SIGPIPE. For a program like
> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
> the following behavior. (I got an error without the fflush() but
> it is inevitable).
>
> =# select * from ft2 limit 1;
>     a
> -------
>   test1
>
> =# select * from ft2 limit 2;
> ERROR:  program "/home/horiguti/work/exprog" failed
> DETAIL:  child process was terminated by signal 13: Broken pipe
>
> For the original case:
>
> =# select * from ft1 limit 1;
>    a
> ------
>   test
> =# select * from ft1 limit 2;
>    a
> ------
>   test
> (1 row)
>
>
> I didn't confirmed whether it is complete.

Sorry, I don't understand this fully, but the reason to add the 
eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal 
COPY FROM PROGRAM cases?

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
> At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BDC4BA0.7050106@lab.ntt.co.jp>
>> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
>>> At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us>  wrote
>>> in<18397.1540297291@sss.pgh.pa.us>
>>>> It's just a POC because there are some things that need more thought
>>>> than I've given them:
>>>>
>>>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
>>>> launched through OpenPipeStream?  If not, what infrastructure do we
>>>> need to add to control that?  In particular, is it sane to revert
>>>> SIGPIPE for a pipe program that we will write to not read from?
>>>> (I thought probably yes, because that is the normal Unix behavior,
>>>> but it could be debated.)

>> ISTM that it would be OK to inherit SIG_DFL in both cases, because I
>> think it would be the responsibility of the called program to handle
>> SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
>> something, though.
>
> So, I think we *should* (not just OK to) restore SIGPIPE to
> SIG_DFL in any case here to prevent undetermined situation for
> the program.

OK

>>>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>>>> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
>>>> we don't intend to terminate that early.)
>>>
>>> Since the SIGPIPE may come from another pipe, I think we
>>> shouldn't generally.
>>
>> Agreed; if ClosePipeToProgram ignores that failure, we would fail to
>> get a better error message in CopySendEndOfRow if the called program
>> (inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
>>
>>                  if (cstate->is_program)
>>                  {
>>                      if (errno == EPIPE)
>>                      {
>>                          /*
>>                           * The pipe will be closed automatically on error at
>>                           * the end of transaction, but we might get a better
>>                           * error message from the subprocess' exit code than
>>                           * just "Broken Pipe"
>>                           */
>>                          ClosePipeToProgram(cstate);
>>
>>                          /*
>>                           * If ClosePipeToProgram() didn't throw an error, the
>>                           * program terminated normally, but closed the pipe
>>                           * first. Restore errno, and throw an error.
>>                           */
>>                          errno = EPIPE;
>>                      }
>>                      ereport(ERROR,
>>                              (errcode_for_file_access(),
>>                               errmsg("could not write to COPY program: %m")));
>>                  }
>
> Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
> program's exit status. So it is irrelevant to called program's
> SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
> backend side.

My explanation might not be enough.  Let me explain.  If the called 
program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for 
some reason, ClosePipeToProgram *as-is* would create an error message 
from that program's exit code.  But if we modify ClosePipeToProgram like 
the original POC patch, that function would not create that message for 
that termination.  To avoid that, I think it would be better for 
ClosePipeToProgram to ignore the SIGPIPE failure only in the case where 
the caller is a COPY FROM PROGRAM that is allowed to terminate early. 
(Maybe we could specify that as a new argument for BeginCopyFrom.)

>>> But iff we are explicitly stop reading from
>>> the pipe before detecting an error, it can be ignored since we
>>> aren't interested in further failure.
>>
>> You mean that we should ignore other failures of the called program
>> when we stop reading from the pipe early?
>
> Yes, we have received sufficient data from the pipe then closed
> it successfully. The program may want write more but we don't
> want it. We ragher should ignore SIGPIPE exit code of the program
> since closing our writing end of a pipe is likely to cause it and

I think so too.

> even if it comes from another pipe, we can assume that the
> SIGPIPE immediately stopped the program before returning any
> garbage to us.

Sorry, I don't understand this.

>>>> 4. Are there any other signals we ought to be reverting to default
>>>> behavior before launching a COPY TO/FROM PROGRAM?
>>>
>>> All handlers other than SIG_DEF and IGN are reset on exec().
>>> Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
>>> harm anything. Perhaps it would be safer against future changes
>>> if we explicitly reset all changed actions to SIG_DEF, but it
>>> might be too-much..
>>
>> Not sure, but reverting SIGPIPE to default would be enough as a fix
>> for the original issue, if we go with the POC patch.
>
> Agreed. I wonder why there's no API that resets all handlers to
> SIG_DFL at once or some flag telling to exec() that it should
> start with default signal handler set.

Such API would be an improvement, but IMO I think that would go beyond 
the scope of this fix.

> Finally in the attached PoC, I set cstate->eof_reached on failure
> of NextCopyFromRawFields then if eof_reached we don't ingore
> SIGPIPE. For a program like
> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
> the following behavior. (I got an error without the fflush() but
> it is inevitable).
>
> =# select * from ft2 limit 1;
>     a
> -------
>   test1
>
> =# select * from ft2 limit 2;
> ERROR:  program "/home/horiguti/work/exprog" failed
> DETAIL:  child process was terminated by signal 13: Broken pipe
>
> For the original case:
>
> =# select * from ft1 limit 1;
>    a
> ------
>   test
> =# select * from ft1 limit 2;
>    a
> ------
>   test
> (1 row)
>
>
> I didn't confirmed whether it is complete.

Sorry, I don't understand this fully, but the reason to add the 
eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal 
COPY FROM PROGRAM cases?

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5BE18409.2070004@lab.ntt.co.jp>
> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
> > At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro
> > Fujita<fujita.etsuro@lab.ntt.co.jp> wrote
> > in<5BDC4BA0.7050106@lab.ntt.co.jp>
> >> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
> >>> At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us>  wrote
> >>> in<18397.1540297291@sss.pgh.pa.us>
> >>>> It's just a POC because there are some things that need more thought
> >>>> than I've given them:
> >>>>
> >>>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> >>>> launched through OpenPipeStream?  If not, what infrastructure do we
> >>>> need to add to control that?  In particular, is it sane to revert
> >>>> SIGPIPE for a pipe program that we will write to not read from?
> >>>> (I thought probably yes, because that is the normal Unix behavior,
> >>>> but it could be debated.)
> 
> >> ISTM that it would be OK to inherit SIG_DFL in both cases, because I
> >> think it would be the responsibility of the called program to handle
> >> SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
> >> something, though.
> >
> > So, I think we *should* (not just OK to) restore SIGPIPE to
> > SIG_DFL in any case here to prevent undetermined situation for
> > the program.
> 
> OK
> 
> >>>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> >>>> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> >>>> we don't intend to terminate that early.)
> >>>
> >>> Since the SIGPIPE may come from another pipe, I think we
> >>> shouldn't generally.
> >>
> >> Agreed; if ClosePipeToProgram ignores that failure, we would fail to
> >> get a better error message in CopySendEndOfRow if the called program
> >> (inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
> >>
> >>                  if (cstate->is_program)
> >>                  {
> >>                      if (errno == EPIPE)
> >>                      {
> >>                          /*
> >>                           * The pipe will be closed automatically on error
> >>                           * at
> >>                           * the end of transaction, but we might get a
> >>                           * better
> >>                           * error message from the subprocess' exit code
> >>                           * than
> >>                           * just "Broken Pipe"
> >>                           */
> >>                          ClosePipeToProgram(cstate);
> >>
> >>                          /*
> >>                           * If ClosePipeToProgram() didn't throw an error,
> >>                           * the
> >>                           * program terminated normally, but closed the
> >>                           * pipe
> >>                           * first. Restore errno, and throw an error.
> >>                           */
> >>                          errno = EPIPE;
> >>                      }
> >>                      ereport(ERROR,
> >>                              (errcode_for_file_access(),
> >>                               errmsg("could not write to COPY program:
> >>                               %m")));
> >>                  }
> >
> > Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
> > program's exit status. So it is irrelevant to called program's
> > SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
> > backend side.
> 
> My explanation might not be enough.  Let me explain.  If the called
> program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for
> some reason, ClosePipeToProgram *as-is* would create an error message
> from that program's exit code.  But if we modify ClosePipeToProgram
> like the original POC patch, that function would not create that
> message for that termination.  To avoid that, I think it would be
> better for ClosePipeToProgram to ignore the SIGPIPE failure only in
> the case where the caller is a COPY FROM PROGRAM that is allowed to
> terminate early. (Maybe we could specify that as a new argument for
> BeginCopyFrom.)

I understood. Thanks for the explanation. I agree to that.

> >>> But iff we are explicitly stop reading from
> >>> the pipe before detecting an error, it can be ignored since we
> >>> aren't interested in further failure.
> >>
> >> You mean that we should ignore other failures of the called program
> >> when we stop reading from the pipe early?
> >
> > Yes, we have received sufficient data from the pipe then closed
> > it successfully. The program may want write more but we don't
> > want it. We ragher should ignore SIGPIPE exit code of the program
                rather
> > since closing our writing end of a pipe is likely to cause it and
> 
> I think so too.
> 
> > even if it comes from another pipe, we can assume that the
> > SIGPIPE immediately stopped the program before returning any
> > garbage to us.
> 
> Sorry, I don't understand this.

Mmm. It looks confusing, sorry. In other words:

We don't care the reason for program's ending if we received
enough data from the program and we actively (= before receiving
EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
or something fatal happened on the program before we end reading,
we receive EOF just after that and we are interested in the cause
of the program's death.

# Does the above make sense?

In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).


> >>>> 4. Are there any other signals we ought to be reverting to default
> >>>> behavior before launching a COPY TO/FROM PROGRAM?
> >>>
> >>> All handlers other than SIG_DEF and IGN are reset on exec().
> >>> Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
> >>> harm anything. Perhaps it would be safer against future changes
> >>> if we explicitly reset all changed actions to SIG_DEF, but it
> >>> might be too-much..
> >>
> >> Not sure, but reverting SIGPIPE to default would be enough as a fix
> >> for the original issue, if we go with the POC patch.
> >
> > Agreed. I wonder why there's no API that resets all handlers to
> > SIG_DFL at once or some flag telling to exec() that it should
> > start with default signal handler set.
> 
> Such API would be an improvement, but IMO I think that would go beyond
> the scope of this fix.

Ah, I wanted OS or standard library to provide it. Not us. Anyway
it is not relevant to the topic here.

> > Finally in the attached PoC, I set cstate->eof_reached on failure
> > of NextCopyFromRawFields then if eof_reached we don't ingore
> > SIGPIPE. For a program like
> > "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
> > the following behavior. (I got an error without the fflush() but
> > it is inevitable).
> >
> > =# select * from ft2 limit 1;
> >     a
> > -------
> >   test1
> >
> > =# select * from ft2 limit 2;
> > ERROR:  program "/home/horiguti/work/exprog" failed
> > DETAIL:  child process was terminated by signal 13: Broken pipe
> >
> > For the original case:
> >
> > =# select * from ft1 limit 1;
> >    a
> > ------
> >   test
> > =# select * from ft1 limit 2;
> >    a
> > ------
> >   test
> > (1 row)
> >
> >
> > I didn't confirmed whether it is complete.
> 
> Sorry, I don't understand this fully, but the reason to add the
> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
> COPY FROM PROGRAM cases?

Yes, if we have received EOF, the program ended before we
finished reading from the pipe thus any error should be
reported. Elsewise we don't care at least SIGPIPE. In the
attached patch all kind of errors are ignored when pipe is closed
from reading-end on backends.

As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".

=#  select * from ft limit 1;
   a   
-------
 test1
(1 row)

limit 2 reports the error.

=# select * from ft limit 2;
ERROR:  program "/home/horiguti/work/exprog" failed
DETAIL:  child process exited with exit code 1

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b58a74f4e3..c7728f6d6e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -219,6 +219,12 @@ typedef struct CopyStateData
     char       *raw_buf;
     int            raw_buf_index;    /* next byte to process */
     int            raw_buf_len;    /* total # of bytes stored */
+
+    /*
+     * We ignore errors signaled from called proram in ClosePipeToProgram()
+     * when we close our side of pipe before we receive EOF.
+     */
+    bool        eof_reached;
 } CopyStateData;
 
 /* DestReceiver for COPY (query) TO */
@@ -1726,12 +1732,19 @@ ClosePipeToProgram(CopyState cstate)
         ereport(ERROR,
                 (errcode_for_file_access(),
                  errmsg("could not close pipe to external command: %m")));
-    else if (pclose_rc != 0)
+    else if (cstate->eof_reached && pclose_rc != 0)
+    {
+        /*
+         * If the called program terminated on error after we closed the pipe
+         * on our reading end, assume it's OK; we must have chosen to stop
+         * reading its output early.
+         */
         ereport(ERROR,
                 (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
                  errmsg("program \"%s\" failed",
                         cstate->filename),
                  errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+    }
 }
 
 /*
@@ -3525,7 +3538,10 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
         /* read raw fields in the next line */
         if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
+        {
+            cstate->eof_reached = true;
             return false;
+        }
 
         /* check for overflowing fields */
         if (nfields > 0 && fldct > nfields)
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f1767..6cec85e2c1 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2430,11 +2430,16 @@ OpenTransientFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
  * Routines that want to initiate a pipe stream should use OpenPipeStream
  * rather than plain popen().  This lets fd.c deal with freeing FDs if
  * necessary.  When done, call ClosePipeStream rather than pclose.
+ *
+ * This function also ensures that the popen'd program is run with default
+ * SIGPIPE processing, rather than the SIG_IGN setting the backend normally
+ * uses.  This ensures desirable response to, eg, closing a read pipe early.
  */
 FILE *
 OpenPipeStream(const char *command, const char *mode)
 {
     FILE       *file;
+    int            save_errno;
 
     DO_DB(elog(LOG, "OpenPipeStream: Allocated %d (%s)",
                numAllocatedDescs, command));
@@ -2452,8 +2457,13 @@ OpenPipeStream(const char *command, const char *mode)
 TryAgain:
     fflush(stdout);
     fflush(stderr);
+    pqsignal(SIGPIPE, SIG_DFL);
     errno = 0;
-    if ((file = popen(command, mode)) != NULL)
+    file = popen(command, mode);
+    save_errno = errno;
+    pqsignal(SIGPIPE, SIG_IGN);
+    errno = save_errno;
+    if (file != NULL)
     {
         AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
 
@@ -2466,8 +2476,6 @@ TryAgain:
 
     if (errno == EMFILE || errno == ENFILE)
     {
-        int            save_errno = errno;
-
         ereport(LOG,
                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
                  errmsg("out of file descriptors: %m; release and retry")));

Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5BE18409.2070004@lab.ntt.co.jp>
> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
> > At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro
> > Fujita<fujita.etsuro@lab.ntt.co.jp> wrote
> > in<5BDC4BA0.7050106@lab.ntt.co.jp>
> >> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
> >>> At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane<tgl@sss.pgh.pa.us>  wrote
> >>> in<18397.1540297291@sss.pgh.pa.us>
> >>>> It's just a POC because there are some things that need more thought
> >>>> than I've given them:
> >>>>
> >>>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
> >>>> launched through OpenPipeStream?  If not, what infrastructure do we
> >>>> need to add to control that?  In particular, is it sane to revert
> >>>> SIGPIPE for a pipe program that we will write to not read from?
> >>>> (I thought probably yes, because that is the normal Unix behavior,
> >>>> but it could be debated.)
> 
> >> ISTM that it would be OK to inherit SIG_DFL in both cases, because I
> >> think it would be the responsibility of the called program to handle
> >> SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing
> >> something, though.
> >
> > So, I think we *should* (not just OK to) restore SIGPIPE to
> > SIG_DFL in any case here to prevent undetermined situation for
> > the program.
> 
> OK
> 
> >>>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
> >>>> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
> >>>> we don't intend to terminate that early.)
> >>>
> >>> Since the SIGPIPE may come from another pipe, I think we
> >>> shouldn't generally.
> >>
> >> Agreed; if ClosePipeToProgram ignores that failure, we would fail to
> >> get a better error message in CopySendEndOfRow if the called program
> >> (inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:
> >>
> >>                  if (cstate->is_program)
> >>                  {
> >>                      if (errno == EPIPE)
> >>                      {
> >>                          /*
> >>                           * The pipe will be closed automatically on error
> >>                           * at
> >>                           * the end of transaction, but we might get a
> >>                           * better
> >>                           * error message from the subprocess' exit code
> >>                           * than
> >>                           * just "Broken Pipe"
> >>                           */
> >>                          ClosePipeToProgram(cstate);
> >>
> >>                          /*
> >>                           * If ClosePipeToProgram() didn't throw an error,
> >>                           * the
> >>                           * program terminated normally, but closed the
> >>                           * pipe
> >>                           * first. Restore errno, and throw an error.
> >>                           */
> >>                          errno = EPIPE;
> >>                      }
> >>                      ereport(ERROR,
> >>                              (errcode_for_file_access(),
> >>                               errmsg("could not write to COPY program:
> >>                               %m")));
> >>                  }
> >
> > Mmm, that's EPIPE of fwrite, not SIGNALLED&SIGPIPE of called
> > program's exit status. So it is irrelevant to called program's
> > SIGPIPE setup. It requires SIGPIPE to be kept to SIG_IGN on the
> > backend side.
> 
> My explanation might not be enough.  Let me explain.  If the called
> program that inherited SIG_DFL for SIGPIPE terminated on SIGPIPE for
> some reason, ClosePipeToProgram *as-is* would create an error message
> from that program's exit code.  But if we modify ClosePipeToProgram
> like the original POC patch, that function would not create that
> message for that termination.  To avoid that, I think it would be
> better for ClosePipeToProgram to ignore the SIGPIPE failure only in
> the case where the caller is a COPY FROM PROGRAM that is allowed to
> terminate early. (Maybe we could specify that as a new argument for
> BeginCopyFrom.)

I understood. Thanks for the explanation. I agree to that.

> >>> But iff we are explicitly stop reading from
> >>> the pipe before detecting an error, it can be ignored since we
> >>> aren't interested in further failure.
> >>
> >> You mean that we should ignore other failures of the called program
> >> when we stop reading from the pipe early?
> >
> > Yes, we have received sufficient data from the pipe then closed
> > it successfully. The program may want write more but we don't
> > want it. We ragher should ignore SIGPIPE exit code of the program
                rather
> > since closing our writing end of a pipe is likely to cause it and
> 
> I think so too.
> 
> > even if it comes from another pipe, we can assume that the
> > SIGPIPE immediately stopped the program before returning any
> > garbage to us.
> 
> Sorry, I don't understand this.

Mmm. It looks confusing, sorry. In other words:

We don't care the reason for program's ending if we received
enough data from the program and we actively (= before receiving
EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
or something fatal happened on the program before we end reading,
we receive EOF just after that and we are interested in the cause
of the program's death.

# Does the above make sense?

In the sense of "We don't care the reason", negligible reasons
are necessariry restricted to SIGPIPE, evan SIGSEGV could be
theoretically ignored safely. "theoretically" here means it is
another issue whether we rely on the output from a program which
causes SEGV (or any reason other than SIGPIPE, which we caused).


> >>>> 4. Are there any other signals we ought to be reverting to default
> >>>> behavior before launching a COPY TO/FROM PROGRAM?
> >>>
> >>> All handlers other than SIG_DEF and IGN are reset on exec().
> >>> Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
> >>> harm anything. Perhaps it would be safer against future changes
> >>> if we explicitly reset all changed actions to SIG_DEF, but it
> >>> might be too-much..
> >>
> >> Not sure, but reverting SIGPIPE to default would be enough as a fix
> >> for the original issue, if we go with the POC patch.
> >
> > Agreed. I wonder why there's no API that resets all handlers to
> > SIG_DFL at once or some flag telling to exec() that it should
> > start with default signal handler set.
> 
> Such API would be an improvement, but IMO I think that would go beyond
> the scope of this fix.

Ah, I wanted OS or standard library to provide it. Not us. Anyway
it is not relevant to the topic here.

> > Finally in the attached PoC, I set cstate->eof_reached on failure
> > of NextCopyFromRawFields then if eof_reached we don't ingore
> > SIGPIPE. For a program like
> > "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
> > the following behavior. (I got an error without the fflush() but
> > it is inevitable).
> >
> > =# select * from ft2 limit 1;
> >     a
> > -------
> >   test1
> >
> > =# select * from ft2 limit 2;
> > ERROR:  program "/home/horiguti/work/exprog" failed
> > DETAIL:  child process was terminated by signal 13: Broken pipe
> >
> > For the original case:
> >
> > =# select * from ft1 limit 1;
> >    a
> > ------
> >   test
> > =# select * from ft1 limit 2;
> >    a
> > ------
> >   test
> > (1 row)
> >
> >
> > I didn't confirmed whether it is complete.
> 
> Sorry, I don't understand this fully, but the reason to add the
> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
> COPY FROM PROGRAM cases?

Yes, if we have received EOF, the program ended before we
finished reading from the pipe thus any error should be
reported. Elsewise we don't care at least SIGPIPE. In the
attached patch all kind of errors are ignored when pipe is closed
from reading-end on backends.

As the result it doesn't report an error for SELECT * FROM ft2
LIMIT 1 on "main(void){puts("test1"); return 1;}".

=#  select * from ft limit 1;
   a   
-------
 test1
(1 row)

limit 2 reports the error.

=# select * from ft limit 2;
ERROR:  program "/home/horiguti/work/exprog" failed
DETAIL:  child process exited with exit code 1

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b58a74f4e3..c7728f6d6e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -219,6 +219,12 @@ typedef struct CopyStateData
     char       *raw_buf;
     int            raw_buf_index;    /* next byte to process */
     int            raw_buf_len;    /* total # of bytes stored */
+
+    /*
+     * We ignore errors signaled from called proram in ClosePipeToProgram()
+     * when we close our side of pipe before we receive EOF.
+     */
+    bool        eof_reached;
 } CopyStateData;
 
 /* DestReceiver for COPY (query) TO */
@@ -1726,12 +1732,19 @@ ClosePipeToProgram(CopyState cstate)
         ereport(ERROR,
                 (errcode_for_file_access(),
                  errmsg("could not close pipe to external command: %m")));
-    else if (pclose_rc != 0)
+    else if (cstate->eof_reached && pclose_rc != 0)
+    {
+        /*
+         * If the called program terminated on error after we closed the pipe
+         * on our reading end, assume it's OK; we must have chosen to stop
+         * reading its output early.
+         */
         ereport(ERROR,
                 (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
                  errmsg("program \"%s\" failed",
                         cstate->filename),
                  errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+    }
 }
 
 /*
@@ -3525,7 +3538,10 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
 
         /* read raw fields in the next line */
         if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
+        {
+            cstate->eof_reached = true;
             return false;
+        }
 
         /* check for overflowing fields */
         if (nfields > 0 && fldct > nfields)
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f1767..6cec85e2c1 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -2430,11 +2430,16 @@ OpenTransientFilePerm(const char *fileName, int fileFlags, mode_t fileMode)
  * Routines that want to initiate a pipe stream should use OpenPipeStream
  * rather than plain popen().  This lets fd.c deal with freeing FDs if
  * necessary.  When done, call ClosePipeStream rather than pclose.
+ *
+ * This function also ensures that the popen'd program is run with default
+ * SIGPIPE processing, rather than the SIG_IGN setting the backend normally
+ * uses.  This ensures desirable response to, eg, closing a read pipe early.
  */
 FILE *
 OpenPipeStream(const char *command, const char *mode)
 {
     FILE       *file;
+    int            save_errno;
 
     DO_DB(elog(LOG, "OpenPipeStream: Allocated %d (%s)",
                numAllocatedDescs, command));
@@ -2452,8 +2457,13 @@ OpenPipeStream(const char *command, const char *mode)
 TryAgain:
     fflush(stdout);
     fflush(stderr);
+    pqsignal(SIGPIPE, SIG_DFL);
     errno = 0;
-    if ((file = popen(command, mode)) != NULL)
+    file = popen(command, mode);
+    save_errno = errno;
+    pqsignal(SIGPIPE, SIG_IGN);
+    errno = save_errno;
+    if (file != NULL)
     {
         AllocateDesc *desc = &allocatedDescs[numAllocatedDescs];
 
@@ -2466,8 +2476,6 @@ TryAgain:
 
     if (errno == EMFILE || errno == ENFILE)
     {
-        int            save_errno = errno;
-
         ereport(LOG,
                 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
                  errmsg("out of file descriptors: %m; release and retry")));

Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/06 19:50), Thomas Munro wrote:
> On Wed, Oct 24, 2018 at 1:21 AM Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> I wrote:
>>> =?utf-8?q?PG_Bug_reporting_form?=<noreply@postgresql.org>  writes:
>>>> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
>>>> /*
>>>> [38000] ERROR: program "echo "test"" failed Detail: child process exited
>>>> with exit code 1
>>>> */
>>
>>> Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
>>> shows something pretty unsurprising:
>>> sh: line 1: echo: write error: Broken pipe
>>> So the called program is behaving in a somewhat reasonable way: it's
>>> detecting EPIPE on its stdout (after we close the pipe), reporting that,
>>> and doing exit(1).
>>> Unfortunately, it's not clear what we could do about that, short of
>>> always reading the whole program output, which is likely to be a
>>> cure worse than the disease.  If the program were failing thanks to
>>> SIGPIPE, we could recognize that as a case we can ignore ... but with
>>> behavior like this, I don't see a reliable way to tell it apart from
>>> a generic program failure, which surely we'd better report.
>>
>> After a bit of thought, the problem here is blindingly obvious:
>> we generally run the backend with SIGPIPE handing set to SIG_IGN,
>> and evidently popen() allows the called program to inherit that,
>> at least on these platforms.
>>
>> So what we need to do is make sure the called program inherits SIG_DFL
>> handling for SIGPIPE, and then special-case that result as not being
>> a failure.  The attached POC patch does that and successfully makes
>> the file_fdw problem go away for me.
>>
>> It's just a POC because there are some things that need more thought
>> than I've given them:
>>
>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
>> launched through OpenPipeStream?  If not, what infrastructure do we
>> need to add to control that?  In particular, is it sane to revert
>> SIGPIPE for a pipe program that we will write to not read from?
>> (I thought probably yes, because that is the normal Unix behavior,
>> but it could be debated.)
>>
>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
>> we don't intend to terminate that early.)
>
> I'm not sure about that.  It might in theory be telling you about some
> other pipe.  If you're OK with that false positive, why not ignore all
> errors after you've read enough successful input and decided to close
> the pipe?

It's unfortunate to have that false positive, but in my opinion I think 
we had better to error out if there is something wrong with the called 
program, because in that case I think the data that we read from the 
pipe might not be reliable.  IMO I think it would be the responsibility 
of the called program to handle/ignore SIGPIPE properly if necessary.

>> 3. Maybe this should be implemented at some higher level?
>
> It won't work for some programs that ignore or catch the signal, so in
> theory you might want to give users the power/responsibility to say
> "ignore errors that occur after I decide to hang up".  Here are three
> different behaviours I found in popular software, showing termination
> by signal, custom error handling that we can't distinguish, and a
> bonehead strategy:

Interesting!

> $ seq 1 1000000 | head -5
> 1
> 2
> 3
> 4
> 5
> ... exit code indicates killed by signal
>
> $ python -c "for i in range(1000000): print i" | head -5
> 0
> 1
> 2
> 3
> 4
> Traceback (most recent call last):
>    File "<string>", line 1, in<module>
> IOError: [Errno 32] Broken pipe
> ... exit code 1

That's sad.

> $ cat Test.java
> public class Test {
>          public static void main(String[] args) {
>                  for (int i = 0; i<  1000000; ++i) {
>                          System.out.println(Integer.toString(i));
>                  }
>          }
> }
> $ javac Test.java
> $ java Test | head -5
> 0
> 1
> 2
> 3
> 4
> ... wait a really long time with no output, exit code 0
>
> (Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
> exception, except for PrintStreams like System.out which just eat data
> after an error...)

I agree that that is a bonehead strategy, but that seems not that bad to me.

>> 4. Are there any other signals we ought to be reverting to default
>> behavior before launching a COPY TO/FROM PROGRAM?
>
> On my FreeBSD system, I compared the output of procstat -i (= show
> signal disposition) for two "sleep 60" processes, one invoked from the
> shell and the other from COPY ... FROM PROGRAM.  The differences were:
> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
> default action would be to terminate the process, but the COPY PROGRAM
> child ignored them; for TTIN and TTOU, the default action would be to
> stop the process, but again they are ignored.  Why do bgwriter.c,
> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> regular backends?

So, we should revert SIGUSR2 as well to default processing?

Thanks!

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/06 19:50), Thomas Munro wrote:
> On Wed, Oct 24, 2018 at 1:21 AM Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> I wrote:
>>> =?utf-8?q?PG_Bug_reporting_form?=<noreply@postgresql.org>  writes:
>>>> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
>>>> /*
>>>> [38000] ERROR: program "echo "test"" failed Detail: child process exited
>>>> with exit code 1
>>>> */
>>
>>> Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
>>> shows something pretty unsurprising:
>>> sh: line 1: echo: write error: Broken pipe
>>> So the called program is behaving in a somewhat reasonable way: it's
>>> detecting EPIPE on its stdout (after we close the pipe), reporting that,
>>> and doing exit(1).
>>> Unfortunately, it's not clear what we could do about that, short of
>>> always reading the whole program output, which is likely to be a
>>> cure worse than the disease.  If the program were failing thanks to
>>> SIGPIPE, we could recognize that as a case we can ignore ... but with
>>> behavior like this, I don't see a reliable way to tell it apart from
>>> a generic program failure, which surely we'd better report.
>>
>> After a bit of thought, the problem here is blindingly obvious:
>> we generally run the backend with SIGPIPE handing set to SIG_IGN,
>> and evidently popen() allows the called program to inherit that,
>> at least on these platforms.
>>
>> So what we need to do is make sure the called program inherits SIG_DFL
>> handling for SIGPIPE, and then special-case that result as not being
>> a failure.  The attached POC patch does that and successfully makes
>> the file_fdw problem go away for me.
>>
>> It's just a POC because there are some things that need more thought
>> than I've given them:
>>
>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
>> launched through OpenPipeStream?  If not, what infrastructure do we
>> need to add to control that?  In particular, is it sane to revert
>> SIGPIPE for a pipe program that we will write to not read from?
>> (I thought probably yes, because that is the normal Unix behavior,
>> but it could be debated.)
>>
>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
>> we don't intend to terminate that early.)
>
> I'm not sure about that.  It might in theory be telling you about some
> other pipe.  If you're OK with that false positive, why not ignore all
> errors after you've read enough successful input and decided to close
> the pipe?

It's unfortunate to have that false positive, but in my opinion I think 
we had better to error out if there is something wrong with the called 
program, because in that case I think the data that we read from the 
pipe might not be reliable.  IMO I think it would be the responsibility 
of the called program to handle/ignore SIGPIPE properly if necessary.

>> 3. Maybe this should be implemented at some higher level?
>
> It won't work for some programs that ignore or catch the signal, so in
> theory you might want to give users the power/responsibility to say
> "ignore errors that occur after I decide to hang up".  Here are three
> different behaviours I found in popular software, showing termination
> by signal, custom error handling that we can't distinguish, and a
> bonehead strategy:

Interesting!

> $ seq 1 1000000 | head -5
> 1
> 2
> 3
> 4
> 5
> ... exit code indicates killed by signal
>
> $ python -c "for i in range(1000000): print i" | head -5
> 0
> 1
> 2
> 3
> 4
> Traceback (most recent call last):
>    File "<string>", line 1, in<module>
> IOError: [Errno 32] Broken pipe
> ... exit code 1

That's sad.

> $ cat Test.java
> public class Test {
>          public static void main(String[] args) {
>                  for (int i = 0; i<  1000000; ++i) {
>                          System.out.println(Integer.toString(i));
>                  }
>          }
> }
> $ javac Test.java
> $ java Test | head -5
> 0
> 1
> 2
> 3
> 4
> ... wait a really long time with no output, exit code 0
>
> (Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
> exception, except for PrintStreams like System.out which just eat data
> after an error...)

I agree that that is a bonehead strategy, but that seems not that bad to me.

>> 4. Are there any other signals we ought to be reverting to default
>> behavior before launching a COPY TO/FROM PROGRAM?
>
> On my FreeBSD system, I compared the output of procstat -i (= show
> signal disposition) for two "sleep 60" processes, one invoked from the
> shell and the other from COPY ... FROM PROGRAM.  The differences were:
> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
> default action would be to terminate the process, but the COPY PROGRAM
> child ignored them; for TTIN and TTOU, the default action would be to
> stop the process, but again they are ignored.  Why do bgwriter.c,
> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> regular backends?

So, we should revert SIGUSR2 as well to default processing?

Thanks!

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
> At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BE18409.2070004@lab.ntt.co.jp>
>> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
>>> At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro
>>> Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
>>> in<5BDC4BA0.7050106@lab.ntt.co.jp>
>>>> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
>>>>> But iff we are explicitly stop reading from
>>>>> the pipe before detecting an error, it can be ignored since we
>>>>> aren't interested in further failure.
>>>>
>>>> You mean that we should ignore other failures of the called program
>>>> when we stop reading from the pipe early?
>>>
>>> Yes, we have received sufficient data from the pipe then closed
>>> it successfully. The program may want write more but we don't
>>> want it. We ragher should ignore SIGPIPE exit code of the program
>                  rather
>>> since closing our writing end of a pipe is likely to cause it and
>>
>> I think so too.
>>
>>> even if it comes from another pipe, we can assume that the
>>> SIGPIPE immediately stopped the program before returning any
>>> garbage to us.
>>
>> Sorry, I don't understand this.
>
> Mmm. It looks confusing, sorry. In other words:
>
> We don't care the reason for program's ending if we received
> enough data from the program and we actively (= before receiving
> EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
> or something fatal happened on the program before we end reading,
> we receive EOF just after that and we are interested in the cause
> of the program's death.
>
> # Does the above make sense?

Yeah, thanks for the explanation!

> In the sense of "We don't care the reason", negligible reasons
> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> theoretically ignored safely. "theoretically" here means it is
> another issue whether we rely on the output from a program which
> causes SEGV (or any reason other than SIGPIPE, which we caused).

For the SIGSEGV case, I think it would be better that we don't rely on 
the output data, IMO, because I think there might be a possibility that 
the program have generated that data incorrectly/unexpectedly.

>>> Finally in the attached PoC, I set cstate->eof_reached on failure
>>> of NextCopyFromRawFields then if eof_reached we don't ingore
>>> SIGPIPE. For a program like
>>> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
>>> the following behavior. (I got an error without the fflush() but
>>> it is inevitable).
>>>
>>> =# select * from ft2 limit 1;
>>>      a
>>> -------
>>>    test1
>>>
>>> =# select * from ft2 limit 2;
>>> ERROR:  program "/home/horiguti/work/exprog" failed
>>> DETAIL:  child process was terminated by signal 13: Broken pipe
>>>
>>> For the original case:
>>>
>>> =# select * from ft1 limit 1;
>>>     a
>>> ------
>>>    test
>>> =# select * from ft1 limit 2;
>>>     a
>>> ------
>>>    test
>>> (1 row)
>>>
>>>
>>> I didn't confirmed whether it is complete.
>>
>> Sorry, I don't understand this fully, but the reason to add the
>> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
>> COPY FROM PROGRAM cases?
>
> Yes, if we have received EOF, the program ended before we
> finished reading from the pipe thus any error should be
> reported.  Elsewise we don't care at least SIGPIPE. In the
> attached patch all kind of errors are ignored when pipe is closed
> from reading-end on backends.

OK, I understand your idea.  Thanks for the patch!

> As the result it doesn't report an error for SELECT * FROM ft2
> LIMIT 1 on "main(void){puts("test1"); return 1;}".
>
> =#  select * from ft limit 1;
>     a
> -------
>   test1
> (1 row)
>
> limit 2 reports the error.
>
> =# select * from ft limit 2;
> ERROR:  program "/home/horiguti/work/exprog" failed
> DETAIL:  child process exited with exit code 1

I think this would be contrary to users expectations: if the SELECT 
command works for limit 1, they would expect that the command would work 
for limit 2 as well.  So, I think it would be better to error out that 
command for limit 1 as well, as-is.

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
> At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BE18409.2070004@lab.ntt.co.jp>
>> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
>>> At Fri, 02 Nov 2018 22:05:36 +0900, Etsuro
>>> Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
>>> in<5BDC4BA0.7050106@lab.ntt.co.jp>
>>>> (2018/10/29 15:58), Kyotaro HORIGUCHI wrote:
>>>>> But iff we are explicitly stop reading from
>>>>> the pipe before detecting an error, it can be ignored since we
>>>>> aren't interested in further failure.
>>>>
>>>> You mean that we should ignore other failures of the called program
>>>> when we stop reading from the pipe early?
>>>
>>> Yes, we have received sufficient data from the pipe then closed
>>> it successfully. The program may want write more but we don't
>>> want it. We ragher should ignore SIGPIPE exit code of the program
>                  rather
>>> since closing our writing end of a pipe is likely to cause it and
>>
>> I think so too.
>>
>>> even if it comes from another pipe, we can assume that the
>>> SIGPIPE immediately stopped the program before returning any
>>> garbage to us.
>>
>> Sorry, I don't understand this.
>
> Mmm. It looks confusing, sorry. In other words:
>
> We don't care the reason for program's ending if we received
> enough data from the program and we actively (= before receiving
> EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
> or something fatal happened on the program before we end reading,
> we receive EOF just after that and we are interested in the cause
> of the program's death.
>
> # Does the above make sense?

Yeah, thanks for the explanation!

> In the sense of "We don't care the reason", negligible reasons
> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> theoretically ignored safely. "theoretically" here means it is
> another issue whether we rely on the output from a program which
> causes SEGV (or any reason other than SIGPIPE, which we caused).

For the SIGSEGV case, I think it would be better that we don't rely on 
the output data, IMO, because I think there might be a possibility that 
the program have generated that data incorrectly/unexpectedly.

>>> Finally in the attached PoC, I set cstate->eof_reached on failure
>>> of NextCopyFromRawFields then if eof_reached we don't ingore
>>> SIGPIPE. For a program like
>>> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
>>> the following behavior. (I got an error without the fflush() but
>>> it is inevitable).
>>>
>>> =# select * from ft2 limit 1;
>>>      a
>>> -------
>>>    test1
>>>
>>> =# select * from ft2 limit 2;
>>> ERROR:  program "/home/horiguti/work/exprog" failed
>>> DETAIL:  child process was terminated by signal 13: Broken pipe
>>>
>>> For the original case:
>>>
>>> =# select * from ft1 limit 1;
>>>     a
>>> ------
>>>    test
>>> =# select * from ft1 limit 2;
>>>     a
>>> ------
>>>    test
>>> (1 row)
>>>
>>>
>>> I didn't confirmed whether it is complete.
>>
>> Sorry, I don't understand this fully, but the reason to add the
>> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
>> COPY FROM PROGRAM cases?
>
> Yes, if we have received EOF, the program ended before we
> finished reading from the pipe thus any error should be
> reported.  Elsewise we don't care at least SIGPIPE. In the
> attached patch all kind of errors are ignored when pipe is closed
> from reading-end on backends.

OK, I understand your idea.  Thanks for the patch!

> As the result it doesn't report an error for SELECT * FROM ft2
> LIMIT 1 on "main(void){puts("test1"); return 1;}".
>
> =#  select * from ft limit 1;
>     a
> -------
>   test1
> (1 row)
>
> limit 2 reports the error.
>
> =# select * from ft limit 2;
> ERROR:  program "/home/horiguti/work/exprog" failed
> DETAIL:  child process exited with exit code 1

I think this would be contrary to users expectations: if the SELECT 
command works for limit 1, they would expect that the command would work 
for limit 2 as well.  So, I think it would be better to error out that 
command for limit 1 as well, as-is.

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Thomas Munro
Date:
On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
> > At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BE18409.2070004@lab.ntt.co.jp>
> >> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
> >>> even if it comes from another pipe, we can assume that the
> >>> SIGPIPE immediately stopped the program before returning any
> >>> garbage to us.
> >>
> >> Sorry, I don't understand this.
> >
> > Mmm. It looks confusing, sorry. In other words:
> >
> > We don't care the reason for program's ending if we received
> > enough data from the program and we actively (= before receiving
> > EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
> > or something fatal happened on the program before we end reading,
> > we receive EOF just after that and we are interested in the cause
> > of the program's death.
> >
> > # Does the above make sense?
>
> Yeah, thanks for the explanation!

+1

I take back what I said earlier about false positives from other
pipes.  I think it's only traditional Unix programs designed for use
in pipelines and naive programs that let SIGPIPE terminate the
process.   The accepted answer here gives a good way to think about
it:

https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist

A program sophisticated enough to be writing to other pipes is no
longer in that category and should be setting up signal dispositions
itself, so I agree that we should enable the default disposition and
ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
close to the intended purpose of that signal AFAICS.

> > In the sense of "We don't care the reason", negligible reasons
> > are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> > theoretically ignored safely. "theoretically" here means it is
> > another issue whether we rely on the output from a program which
> > causes SEGV (or any reason other than SIGPIPE, which we caused).
>
> For the SIGSEGV case, I think it would be better that we don't rely on
> the output data, IMO, because I think there might be a possibility that
> the program have generated that data incorrectly/unexpectedly.

+1

I don't think we should ignore termination by signals other than
SIGPIPE: that could hide serious problems from users.  I want to know
if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
happens after we read enough data; there is a major problem that a
human needs to investigate!

> >>> Finally in the attached PoC, I set cstate->eof_reached on failure
> >>> of NextCopyFromRawFields then if eof_reached we don't ingore
> >>> SIGPIPE. For a program like
> >>> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
> >>> the following behavior. (I got an error without the fflush() but
> >>> it is inevitable).
> >>>
> >>> =# select * from ft2 limit 1;
> >>>      a
> >>> -------
> >>>    test1
> >>>
> >>> =# select * from ft2 limit 2;
> >>> ERROR:  program "/home/horiguti/work/exprog" failed
> >>> DETAIL:  child process was terminated by signal 13: Broken pipe
> >>>
> >>> For the original case:
> >>>
> >>> =# select * from ft1 limit 1;
> >>>     a
> >>> ------
> >>>    test
> >>> =# select * from ft1 limit 2;
> >>>     a
> >>> ------
> >>>    test
> >>> (1 row)
> >>>
> >>>
> >>> I didn't confirmed whether it is complete.
> >>
> >> Sorry, I don't understand this fully, but the reason to add the
> >> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
> >> COPY FROM PROGRAM cases?
> >
> > Yes, if we have received EOF, the program ended before we
> > finished reading from the pipe thus any error should be
> > reported.  Elsewise we don't care at least SIGPIPE. In the
> > attached patch all kind of errors are ignored when pipe is closed
> > from reading-end on backends.
>
> OK, I understand your idea.  Thanks for the patch!

+1

> > As the result it doesn't report an error for SELECT * FROM ft2
> > LIMIT 1 on "main(void){puts("test1"); return 1;}".
> >
> > =#  select * from ft limit 1;
> >     a
> > -------
> >   test1
> > (1 row)
> >
> > limit 2 reports the error.
> >
> > =# select * from ft limit 2;
> > ERROR:  program "/home/horiguti/work/exprog" failed
> > DETAIL:  child process exited with exit code 1
>
> I think this would be contrary to users expectations: if the SELECT
> command works for limit 1, they would expect that the command would work
> for limit 2 as well.  So, I think it would be better to error out that
> command for limit 1 as well, as-is.

I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
error.  For LIMIT 1, we got all the rows we wanted, and then we closed
the pipe.  If we got a non-zero non-signal exit code, or a signal exit
code and it was SIGPIPE (not any other signal!), then we should
consider that to be expected.

I tried to think of a scenario where the already-received output is
truly invalidated by a later error that happens after we close the
pipe.  It could be something involving a program that uses something
optimistic like serializable snapshot isolation that can later decide
that whatever it told you earlier is not valid after all.  Suppose the
program is clever enough to expect EPIPE and not consider that to be
an error, but wants to tell us about serialization failure with a
non-zero exit code.  To handle that, you'd need a way to provide an
option to file_fdw to tell it not to ignore non-zero exit codes after
close.  This seems so exotic and contrived that it's not worth
bothering with for now, but could always be added.

BTW just for curiosity:

perl -e 'for (my $i=0; $i < 1000000; $i++) { print "$i\n"; }' | head -5
Exit code: terminated by SIGPIPE, like seq

ruby -e 'for i in 1..1000000 do puts i; end' | head -5
Exit code: 1, like Python

On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/11/06 19:50), Thomas Munro wrote:
> > On my FreeBSD system, I compared the output of procstat -i (= show
> > signal disposition) for two "sleep 60" processes, one invoked from the
> > shell and the other from COPY ... FROM PROGRAM.  The differences were:
> > PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
> > default action would be to terminate the process, but the COPY PROGRAM
> > child ignored them; for TTIN and TTOU, the default action would be to
> > stop the process, but again they are ignored.  Why do bgwriter.c,
> > startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> > regular backends?
>
> So, we should revert SIGUSR2 as well to default processing?

I don't think it matters in practice, but it might be nice to restore
that just for consistency.  I'm not sure what to think about the TTIN,
TTOU stuff; I don't understand job control well right now but I don't
think it really applies to programs run by a PostgreSQL backend, so if
we restore those it'd probably again be only for consistency.  Then
again, there may be a reason someone decided to ignore those in the
postmaster + regular backends but not the various auxiliary processes.
Anyone?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Thomas Munro
Date:
On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
> > At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BE18409.2070004@lab.ntt.co.jp>
> >> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
> >>> even if it comes from another pipe, we can assume that the
> >>> SIGPIPE immediately stopped the program before returning any
> >>> garbage to us.
> >>
> >> Sorry, I don't understand this.
> >
> > Mmm. It looks confusing, sorry. In other words:
> >
> > We don't care the reason for program's ending if we received
> > enough data from the program and we actively (= before receiving
> > EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
> > or something fatal happened on the program before we end reading,
> > we receive EOF just after that and we are interested in the cause
> > of the program's death.
> >
> > # Does the above make sense?
>
> Yeah, thanks for the explanation!

+1

I take back what I said earlier about false positives from other
pipes.  I think it's only traditional Unix programs designed for use
in pipelines and naive programs that let SIGPIPE terminate the
process.   The accepted answer here gives a good way to think about
it:

https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist

A program sophisticated enough to be writing to other pipes is no
longer in that category and should be setting up signal dispositions
itself, so I agree that we should enable the default disposition and
ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
close to the intended purpose of that signal AFAICS.

> > In the sense of "We don't care the reason", negligible reasons
> > are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> > theoretically ignored safely. "theoretically" here means it is
> > another issue whether we rely on the output from a program which
> > causes SEGV (or any reason other than SIGPIPE, which we caused).
>
> For the SIGSEGV case, I think it would be better that we don't rely on
> the output data, IMO, because I think there might be a possibility that
> the program have generated that data incorrectly/unexpectedly.

+1

I don't think we should ignore termination by signals other than
SIGPIPE: that could hide serious problems from users.  I want to know
if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
happens after we read enough data; there is a major problem that a
human needs to investigate!

> >>> Finally in the attached PoC, I set cstate->eof_reached on failure
> >>> of NextCopyFromRawFields then if eof_reached we don't ingore
> >>> SIGPIPE. For a program like
> >>> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
> >>> the following behavior. (I got an error without the fflush() but
> >>> it is inevitable).
> >>>
> >>> =# select * from ft2 limit 1;
> >>>      a
> >>> -------
> >>>    test1
> >>>
> >>> =# select * from ft2 limit 2;
> >>> ERROR:  program "/home/horiguti/work/exprog" failed
> >>> DETAIL:  child process was terminated by signal 13: Broken pipe
> >>>
> >>> For the original case:
> >>>
> >>> =# select * from ft1 limit 1;
> >>>     a
> >>> ------
> >>>    test
> >>> =# select * from ft1 limit 2;
> >>>     a
> >>> ------
> >>>    test
> >>> (1 row)
> >>>
> >>>
> >>> I didn't confirmed whether it is complete.
> >>
> >> Sorry, I don't understand this fully, but the reason to add the
> >> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
> >> COPY FROM PROGRAM cases?
> >
> > Yes, if we have received EOF, the program ended before we
> > finished reading from the pipe thus any error should be
> > reported.  Elsewise we don't care at least SIGPIPE. In the
> > attached patch all kind of errors are ignored when pipe is closed
> > from reading-end on backends.
>
> OK, I understand your idea.  Thanks for the patch!

+1

> > As the result it doesn't report an error for SELECT * FROM ft2
> > LIMIT 1 on "main(void){puts("test1"); return 1;}".
> >
> > =#  select * from ft limit 1;
> >     a
> > -------
> >   test1
> > (1 row)
> >
> > limit 2 reports the error.
> >
> > =# select * from ft limit 2;
> > ERROR:  program "/home/horiguti/work/exprog" failed
> > DETAIL:  child process exited with exit code 1
>
> I think this would be contrary to users expectations: if the SELECT
> command works for limit 1, they would expect that the command would work
> for limit 2 as well.  So, I think it would be better to error out that
> command for limit 1 as well, as-is.

I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
error.  For LIMIT 1, we got all the rows we wanted, and then we closed
the pipe.  If we got a non-zero non-signal exit code, or a signal exit
code and it was SIGPIPE (not any other signal!), then we should
consider that to be expected.

I tried to think of a scenario where the already-received output is
truly invalidated by a later error that happens after we close the
pipe.  It could be something involving a program that uses something
optimistic like serializable snapshot isolation that can later decide
that whatever it told you earlier is not valid after all.  Suppose the
program is clever enough to expect EPIPE and not consider that to be
an error, but wants to tell us about serialization failure with a
non-zero exit code.  To handle that, you'd need a way to provide an
option to file_fdw to tell it not to ignore non-zero exit codes after
close.  This seems so exotic and contrived that it's not worth
bothering with for now, but could always be added.

BTW just for curiosity:

perl -e 'for (my $i=0; $i < 1000000; $i++) { print "$i\n"; }' | head -5
Exit code: terminated by SIGPIPE, like seq

ruby -e 'for i in 1..1000000 do puts i; end' | head -5
Exit code: 1, like Python

On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> (2018/11/06 19:50), Thomas Munro wrote:
> > On my FreeBSD system, I compared the output of procstat -i (= show
> > signal disposition) for two "sleep 60" processes, one invoked from the
> > shell and the other from COPY ... FROM PROGRAM.  The differences were:
> > PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
> > default action would be to terminate the process, but the COPY PROGRAM
> > child ignored them; for TTIN and TTOU, the default action would be to
> > stop the process, but again they are ignored.  Why do bgwriter.c,
> > startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> > regular backends?
>
> So, we should revert SIGUSR2 as well to default processing?

I don't think it matters in practice, but it might be nice to restore
that just for consistency.  I'm not sure what to think about the TTIN,
TTOU stuff; I don't understand job control well right now but I don't
think it really applies to programs run by a PostgreSQL backend, so if
we restore those it'd probably again be only for consistency.  Then
again, there may be a reason someone decided to ignore those in the
postmaster + regular backends but not the various auxiliary processes.
Anyone?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/08 10:50), Thomas Munro wrote:
> On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
>>> At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>   wrote
in<5BE18409.2070004@lab.ntt.co.jp>
>>>> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
>>>>> even if it comes from another pipe, we can assume that the
>>>>> SIGPIPE immediately stopped the program before returning any
>>>>> garbage to us.
>>>>
>>>> Sorry, I don't understand this.
>>>
>>> Mmm. It looks confusing, sorry. In other words:
>>>
>>> We don't care the reason for program's ending if we received
>>> enough data from the program and we actively (= before receiving
>>> EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
>>> or something fatal happened on the program before we end reading,
>>> we receive EOF just after that and we are interested in the cause
>>> of the program's death.
>>>
>>> # Does the above make sense?
>>
>> Yeah, thanks for the explanation!
>
> +1
>
> I take back what I said earlier about false positives from other
> pipes.  I think it's only traditional Unix programs designed for use
> in pipelines and naive programs that let SIGPIPE terminate the
> process.   The accepted answer here gives a good way to think about
> it:
>
> https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist

Thanks for the information!

> A program sophisticated enough to be writing to other pipes is no
> longer in that category and should be setting up signal dispositions
> itself, so I agree that we should enable the default disposition and
> ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
> close to the intended purpose of that signal AFAICS.

Great!

>>> In the sense of "We don't care the reason", negligible reasons
>>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
>>> theoretically ignored safely. "theoretically" here means it is
>>> another issue whether we rely on the output from a program which
>>> causes SEGV (or any reason other than SIGPIPE, which we caused).
>>
>> For the SIGSEGV case, I think it would be better that we don't rely on
>> the output data, IMO, because I think there might be a possibility that
>> the program have generated that data incorrectly/unexpectedly.
>
> +1
>
> I don't think we should ignore termination by signals other than
> SIGPIPE: that could hide serious problems from users.  I want to know
> if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
> happens after we read enough data; there is a major problem that a
> human needs to investigate!

I think so too.

>>>>> Finally in the attached PoC, I set cstate->eof_reached on failure
>>>>> of NextCopyFromRawFields then if eof_reached we don't ingore
>>>>> SIGPIPE. For a program like
>>>>> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
>>>>> the following behavior. (I got an error without the fflush() but
>>>>> it is inevitable).
>>>>>
>>>>> =# select * from ft2 limit 1;
>>>>>       a
>>>>> -------
>>>>>     test1
>>>>>
>>>>> =# select * from ft2 limit 2;
>>>>> ERROR:  program "/home/horiguti/work/exprog" failed
>>>>> DETAIL:  child process was terminated by signal 13: Broken pipe
>>>>>
>>>>> For the original case:
>>>>>
>>>>> =# select * from ft1 limit 1;
>>>>>      a
>>>>> ------
>>>>>     test
>>>>> =# select * from ft1 limit 2;
>>>>>      a
>>>>> ------
>>>>>     test
>>>>> (1 row)
>>>>>
>>>>>
>>>>> I didn't confirmed whether it is complete.
>>>>
>>>> Sorry, I don't understand this fully, but the reason to add the
>>>> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
>>>> COPY FROM PROGRAM cases?
>>>
>>> Yes, if we have received EOF, the program ended before we
>>> finished reading from the pipe thus any error should be
>>> reported.  Elsewise we don't care at least SIGPIPE. In the
>>> attached patch all kind of errors are ignored when pipe is closed
>>> from reading-end on backends.
>>
>> OK, I understand your idea.  Thanks for the patch!
>
> +1
>
>>> As the result it doesn't report an error for SELECT * FROM ft2
>>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
>>>
>>> =#  select * from ft limit 1;
>>>      a
>>> -------
>>>    test1
>>> (1 row)
>>>
>>> limit 2 reports the error.
>>>
>>> =# select * from ft limit 2;
>>> ERROR:  program "/home/horiguti/work/exprog" failed
>>> DETAIL:  child process exited with exit code 1
>>
>> I think this would be contrary to users expectations: if the SELECT
>> command works for limit 1, they would expect that the command would work
>> for limit 2 as well.  So, I think it would be better to error out that
>> command for limit 1 as well, as-is.
>
> I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
> error.  For LIMIT 1, we got all the rows we wanted, and then we closed
> the pipe.  If we got a non-zero non-signal exit code, or a signal exit
> code and it was SIGPIPE (not any other signal!), then we should
> consider that to be expected.

Maybe I'm missing something, but the non-zero non-signal exit code means 
that there was something wrong with the called program, so I think a 
human had better investigate that as well IMO, which would probably be a 
minor problem, though.  Too restrictive?

> I tried to think of a scenario where the already-received output is
> truly invalidated by a later error that happens after we close the
> pipe.  It could be something involving a program that uses something
> optimistic like serializable snapshot isolation that can later decide
> that whatever it told you earlier is not valid after all.  Suppose the
> program is clever enough to expect EPIPE and not consider that to be
> an error, but wants to tell us about serialization failure with a
> non-zero exit code.  To handle that, you'd need a way to provide an
> option to file_fdw to tell it not to ignore non-zero exit codes after
> close.  This seems so exotic and contrived that it's not worth
> bothering with for now, but could always be added.

Interesting!  I agree that such an option could add more flexibility in 
handling the non-zero-exit-code case.

> BTW just for curiosity:
>
> perl -e 'for (my $i=0; $i<  1000000; $i++) { print "$i\n"; }' | head -5
> Exit code: terminated by SIGPIPE, like seq

Good to know!

> ruby -e 'for i in 1..1000000 do puts i; end' | head -5
> Exit code: 1, like Python

Sad.  Anyway, thanks a lot for these experiments in addition to the 
previous ones!

> On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/11/06 19:50), Thomas Munro wrote:
>>> On my FreeBSD system, I compared the output of procstat -i (= show
>>> signal disposition) for two "sleep 60" processes, one invoked from the
>>> shell and the other from COPY ... FROM PROGRAM.  The differences were:
>>> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
>>> default action would be to terminate the process, but the COPY PROGRAM
>>> child ignored them; for TTIN and TTOU, the default action would be to
>>> stop the process, but again they are ignored.  Why do bgwriter.c,
>>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
>>> regular backends?
>>
>> So, we should revert SIGUSR2 as well to default processing?
>
> I don't think it matters in practice, but it might be nice to restore
> that just for consistency.

Agreed.

> I'm not sure what to think about the TTIN,
> TTOU stuff; I don't understand job control well right now but I don't
> think it really applies to programs run by a PostgreSQL backend, so if
> we restore those it'd probably again be only for consistency.  Then
> again, there may be a reason someone decided to ignore those in the
> postmaster + regular backends but not the various auxiliary processes.
> Anyone?

I don't have any idea about that.

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/08 10:50), Thomas Munro wrote:
> On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/11/07 9:22), Kyotaro HORIGUCHI wrote:
>>> At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>   wrote
in<5BE18409.2070004@lab.ntt.co.jp>
>>>> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote:
>>>>> even if it comes from another pipe, we can assume that the
>>>>> SIGPIPE immediately stopped the program before returning any
>>>>> garbage to us.
>>>>
>>>> Sorry, I don't understand this.
>>>
>>> Mmm. It looks confusing, sorry. In other words:
>>>
>>> We don't care the reason for program's ending if we received
>>> enough data from the program and we actively (= before receiving
>>> EOF) stop reading. On the other hand if SIGPIPE (on another pipe)
>>> or something fatal happened on the program before we end reading,
>>> we receive EOF just after that and we are interested in the cause
>>> of the program's death.
>>>
>>> # Does the above make sense?
>>
>> Yeah, thanks for the explanation!
>
> +1
>
> I take back what I said earlier about false positives from other
> pipes.  I think it's only traditional Unix programs designed for use
> in pipelines and naive programs that let SIGPIPE terminate the
> process.   The accepted answer here gives a good way to think about
> it:
>
> https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist

Thanks for the information!

> A program sophisticated enough to be writing to other pipes is no
> longer in that category and should be setting up signal dispositions
> itself, so I agree that we should enable the default disposition and
> ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
> close to the intended purpose of that signal AFAICS.

Great!

>>> In the sense of "We don't care the reason", negligible reasons
>>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
>>> theoretically ignored safely. "theoretically" here means it is
>>> another issue whether we rely on the output from a program which
>>> causes SEGV (or any reason other than SIGPIPE, which we caused).
>>
>> For the SIGSEGV case, I think it would be better that we don't rely on
>> the output data, IMO, because I think there might be a possibility that
>> the program have generated that data incorrectly/unexpectedly.
>
> +1
>
> I don't think we should ignore termination by signals other than
> SIGPIPE: that could hide serious problems from users.  I want to know
> if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
> happens after we read enough data; there is a major problem that a
> human needs to investigate!

I think so too.

>>>>> Finally in the attached PoC, I set cstate->eof_reached on failure
>>>>> of NextCopyFromRawFields then if eof_reached we don't ingore
>>>>> SIGPIPE. For a program like
>>>>> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had
>>>>> the following behavior. (I got an error without the fflush() but
>>>>> it is inevitable).
>>>>>
>>>>> =# select * from ft2 limit 1;
>>>>>       a
>>>>> -------
>>>>>     test1
>>>>>
>>>>> =# select * from ft2 limit 2;
>>>>> ERROR:  program "/home/horiguti/work/exprog" failed
>>>>> DETAIL:  child process was terminated by signal 13: Broken pipe
>>>>>
>>>>> For the original case:
>>>>>
>>>>> =# select * from ft1 limit 1;
>>>>>      a
>>>>> ------
>>>>>     test
>>>>> =# select * from ft1 limit 2;
>>>>>      a
>>>>> ------
>>>>>     test
>>>>> (1 row)
>>>>>
>>>>>
>>>>> I didn't confirmed whether it is complete.
>>>>
>>>> Sorry, I don't understand this fully, but the reason to add the
>>>> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal
>>>> COPY FROM PROGRAM cases?
>>>
>>> Yes, if we have received EOF, the program ended before we
>>> finished reading from the pipe thus any error should be
>>> reported.  Elsewise we don't care at least SIGPIPE. In the
>>> attached patch all kind of errors are ignored when pipe is closed
>>> from reading-end on backends.
>>
>> OK, I understand your idea.  Thanks for the patch!
>
> +1
>
>>> As the result it doesn't report an error for SELECT * FROM ft2
>>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
>>>
>>> =#  select * from ft limit 1;
>>>      a
>>> -------
>>>    test1
>>> (1 row)
>>>
>>> limit 2 reports the error.
>>>
>>> =# select * from ft limit 2;
>>> ERROR:  program "/home/horiguti/work/exprog" failed
>>> DETAIL:  child process exited with exit code 1
>>
>> I think this would be contrary to users expectations: if the SELECT
>> command works for limit 1, they would expect that the command would work
>> for limit 2 as well.  So, I think it would be better to error out that
>> command for limit 1 as well, as-is.
>
> I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
> error.  For LIMIT 1, we got all the rows we wanted, and then we closed
> the pipe.  If we got a non-zero non-signal exit code, or a signal exit
> code and it was SIGPIPE (not any other signal!), then we should
> consider that to be expected.

Maybe I'm missing something, but the non-zero non-signal exit code means 
that there was something wrong with the called program, so I think a 
human had better investigate that as well IMO, which would probably be a 
minor problem, though.  Too restrictive?

> I tried to think of a scenario where the already-received output is
> truly invalidated by a later error that happens after we close the
> pipe.  It could be something involving a program that uses something
> optimistic like serializable snapshot isolation that can later decide
> that whatever it told you earlier is not valid after all.  Suppose the
> program is clever enough to expect EPIPE and not consider that to be
> an error, but wants to tell us about serialization failure with a
> non-zero exit code.  To handle that, you'd need a way to provide an
> option to file_fdw to tell it not to ignore non-zero exit codes after
> close.  This seems so exotic and contrived that it's not worth
> bothering with for now, but could always be added.

Interesting!  I agree that such an option could add more flexibility in 
handling the non-zero-exit-code case.

> BTW just for curiosity:
>
> perl -e 'for (my $i=0; $i<  1000000; $i++) { print "$i\n"; }' | head -5
> Exit code: terminated by SIGPIPE, like seq

Good to know!

> ruby -e 'for i in 1..1000000 do puts i; end' | head -5
> Exit code: 1, like Python

Sad.  Anyway, thanks a lot for these experiments in addition to the 
previous ones!

> On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:
>> (2018/11/06 19:50), Thomas Munro wrote:
>>> On my FreeBSD system, I compared the output of procstat -i (= show
>>> signal disposition) for two "sleep 60" processes, one invoked from the
>>> shell and the other from COPY ... FROM PROGRAM.  The differences were:
>>> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
>>> default action would be to terminate the process, but the COPY PROGRAM
>>> child ignored them; for TTIN and TTOU, the default action would be to
>>> stop the process, but again they are ignored.  Why do bgwriter.c,
>>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
>>> regular backends?
>>
>> So, we should revert SIGUSR2 as well to default processing?
>
> I don't think it matters in practice, but it might be nice to restore
> that just for consistency.

Agreed.

> I'm not sure what to think about the TTIN,
> TTOU stuff; I don't understand job control well right now but I don't
> think it really applies to programs run by a PostgreSQL backend, so if
> we restore those it'd probably again be only for consistency.  Then
> again, there may be a reason someone decided to ignore those in the
> postmaster + regular backends but not the various auxiliary processes.
> Anyone?

I don't have any idea about that.

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5BE4318F.4040002@lab.ntt.co.jp>
> (2018/11/08 10:50), Thomas Munro wrote:
> > I take back what I said earlier about false positives from other
> > pipes.  I think it's only traditional Unix programs designed for use
> > in pipelines and naive programs that let SIGPIPE terminate the
> > process.   The accepted answer here gives a good way to think about
> > it:
> >
> > https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
> 
> Thanks for the information!
>
> > A program sophisticated enough to be writing to other pipes is no
> > longer in that category and should be setting up signal dispositions
> > itself, so I agree that we should enable the default disposition and
> > ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
> > close to the intended purpose of that signal AFAICS.
> 
> Great!
>
> >>> In the sense of "We don't care the reason", negligible reasons
> >>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> >>> theoretically ignored safely. "theoretically" here means it is
> >>> another issue whether we rely on the output from a program which
> >>> causes SEGV (or any reason other than SIGPIPE, which we caused).
> >>
> >> For the SIGSEGV case, I think it would be better that we don't rely on
> >> the output data, IMO, because I think there might be a possibility
> >> that
> >> the program have generated that data incorrectly/unexpectedly.
> >
> > +1
> >
> > I don't think we should ignore termination by signals other than
> > SIGPIPE: that could hide serious problems from users.  I want to know
> > if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
> > happens after we read enough data; there is a major problem that a
> > human needs to investigate!
> 
> I think so too.

Ok, I can live with that with no problem.

> >>> As the result it doesn't report an error for SELECT * FROM ft2
> >>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
> >>>
> >>> =#  select * from ft limit 1;
> >>>      a
> >>> -------
> >>>    test1
> >>> (1 row)
> >>>
> >>> limit 2 reports the error.
> >>>
> >>> =# select * from ft limit 2;
> >>> ERROR:  program "/home/horiguti/work/exprog" failed
> >>> DETAIL:  child process exited with exit code 1
> >>
> >> I think this would be contrary to users expectations: if the SELECT
> >> command works for limit 1, they would expect that the command would
> >> work
> >> for limit 2 as well.  So, I think it would be better to error out that
> >> command for limit 1 as well, as-is.
> >
> > I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
> > error.  For LIMIT 1, we got all the rows we wanted, and then we closed
> > the pipe.  If we got a non-zero non-signal exit code, or a signal exit
> > code and it was SIGPIPE (not any other signal!), then we should
> > consider that to be expected.
> 
> Maybe I'm missing something, but the non-zero non-signal exit code
> means that there was something wrong with the called program, so I
> think a human had better investigate that as well IMO, which would
> probably be a minor problem, though.  Too restrictive?

I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.

> > I tried to think of a scenario where the already-received output is
> > truly invalidated by a later error that happens after we close the
> > pipe.  It could be something involving a program that uses something
> > optimistic like serializable snapshot isolation that can later decide
> > that whatever it told you earlier is not valid after all.  Suppose the
> > program is clever enough to expect EPIPE and not consider that to be
> > an error, but wants to tell us about serialization failure with a
> > non-zero exit code.  To handle that, you'd need a way to provide an
> > option to file_fdw to tell it not to ignore non-zero exit codes after
> > close.  This seems so exotic and contrived that it's not worth
> > bothering with for now, but could always be added.
> 
> Interesting!  I agree that such an option could add more flexibility
> in handling the non-zero-exit-code case.

I think the program shoudn't output a line until all possible
output is validated. Once the data source emited a line, the
receiver can't do other than believe that it won't be withdrawn.

> > BTW just for curiosity:
> >
> > perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
> > Exit code: terminated by SIGPIPE, like seq
> 
> Good to know!

Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.

| $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
| 4
| $ echo $?
| 0

> > ruby -e 'for i in 1..1000000 do puts i; end' | head -5
> > Exit code: 1, like Python
> 
> Sad.  Anyway, thanks a lot for these experiments in addition to the
> previous ones!

ruby reported broken pipe but exit status was 0..

create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
select * from ft5 limit 5;
 a 
---
 1
...
 5
(5 rows)
(no error)

> > On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
> > <fujita.etsuro@lab.ntt.co.jp>  wrote:
> >> (2018/11/06 19:50), Thomas Munro wrote:
> >>> On my FreeBSD system, I compared the output of procstat -i (= show
> >>> signal disposition) for two "sleep 60" processes, one invoked from the
> >>> shell and the other from COPY ... FROM PROGRAM.  The differences were:
> >>> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
> >>> default action would be to terminate the process, but the COPY PROGRAM
> >>> child ignored them; for TTIN and TTOU, the default action would be to
> >>> stop the process, but again they are ignored.  Why do bgwriter.c,
> >>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> >>> regular backends?
> >>
> >> So, we should revert SIGUSR2 as well to default processing?
> >
> > I don't think it matters in practice, but it might be nice to restore
> > that just for consistency.
> 
> Agreed.
> 
> > I'm not sure what to think about the TTIN,
> > TTOU stuff; I don't understand job control well right now but I don't
> > think it really applies to programs run by a PostgreSQL backend, so if
> > we restore those it'd probably again be only for consistency.  Then
> > again, there may be a reason someone decided to ignore those in the
> > postmaster + regular backends but not the various auxiliary processes.
> > Anyone?
> 
> I don't have any idea about that.

In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically).  Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5BE4318F.4040002@lab.ntt.co.jp>
> (2018/11/08 10:50), Thomas Munro wrote:
> > I take back what I said earlier about false positives from other
> > pipes.  I think it's only traditional Unix programs designed for use
> > in pipelines and naive programs that let SIGPIPE terminate the
> > process.   The accepted answer here gives a good way to think about
> > it:
> >
> > https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
> 
> Thanks for the information!
>
> > A program sophisticated enough to be writing to other pipes is no
> > longer in that category and should be setting up signal dispositions
> > itself, so I agree that we should enable the default disposition and
> > ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
> > close to the intended purpose of that signal AFAICS.
> 
> Great!
>
> >>> In the sense of "We don't care the reason", negligible reasons
> >>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
> >>> theoretically ignored safely. "theoretically" here means it is
> >>> another issue whether we rely on the output from a program which
> >>> causes SEGV (or any reason other than SIGPIPE, which we caused).
> >>
> >> For the SIGSEGV case, I think it would be better that we don't rely on
> >> the output data, IMO, because I think there might be a possibility
> >> that
> >> the program have generated that data incorrectly/unexpectedly.
> >
> > +1
> >
> > I don't think we should ignore termination by signals other than
> > SIGPIPE: that could hide serious problems from users.  I want to know
> > if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
> > happens after we read enough data; there is a major problem that a
> > human needs to investigate!
> 
> I think so too.

Ok, I can live with that with no problem.

> >>> As the result it doesn't report an error for SELECT * FROM ft2
> >>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
> >>>
> >>> =#  select * from ft limit 1;
> >>>      a
> >>> -------
> >>>    test1
> >>> (1 row)
> >>>
> >>> limit 2 reports the error.
> >>>
> >>> =# select * from ft limit 2;
> >>> ERROR:  program "/home/horiguti/work/exprog" failed
> >>> DETAIL:  child process exited with exit code 1
> >>
> >> I think this would be contrary to users expectations: if the SELECT
> >> command works for limit 1, they would expect that the command would
> >> work
> >> for limit 2 as well.  So, I think it would be better to error out that
> >> command for limit 1 as well, as-is.
> >
> > I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
> > error.  For LIMIT 1, we got all the rows we wanted, and then we closed
> > the pipe.  If we got a non-zero non-signal exit code, or a signal exit
> > code and it was SIGPIPE (not any other signal!), then we should
> > consider that to be expected.
> 
> Maybe I'm missing something, but the non-zero non-signal exit code
> means that there was something wrong with the called program, so I
> think a human had better investigate that as well IMO, which would
> probably be a minor problem, though.  Too restrictive?

I think Thomas just saying that reading more lines can develop
problems. According to the current discussion, we should error
out if we had SEGV when limit 1.

> > I tried to think of a scenario where the already-received output is
> > truly invalidated by a later error that happens after we close the
> > pipe.  It could be something involving a program that uses something
> > optimistic like serializable snapshot isolation that can later decide
> > that whatever it told you earlier is not valid after all.  Suppose the
> > program is clever enough to expect EPIPE and not consider that to be
> > an error, but wants to tell us about serialization failure with a
> > non-zero exit code.  To handle that, you'd need a way to provide an
> > option to file_fdw to tell it not to ignore non-zero exit codes after
> > close.  This seems so exotic and contrived that it's not worth
> > bothering with for now, but could always be added.
> 
> Interesting!  I agree that such an option could add more flexibility
> in handling the non-zero-exit-code case.

I think the program shoudn't output a line until all possible
output is validated. Once the data source emited a line, the
receiver can't do other than believe that it won't be withdrawn.

> > BTW just for curiosity:
> >
> > perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
> > Exit code: terminated by SIGPIPE, like seq
> 
> Good to know!

Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.

| $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
| 4
| $ echo $?
| 0

> > ruby -e 'for i in 1..1000000 do puts i; end' | head -5
> > Exit code: 1, like Python
> 
> Sad.  Anyway, thanks a lot for these experiments in addition to the
> previous ones!

ruby reported broken pipe but exit status was 0..

create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
select * from ft5 limit 5;
 a 
---
 1
...
 5
(5 rows)
(no error)

> > On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
> > <fujita.etsuro@lab.ntt.co.jp>  wrote:
> >> (2018/11/06 19:50), Thomas Munro wrote:
> >>> On my FreeBSD system, I compared the output of procstat -i (= show
> >>> signal disposition) for two "sleep 60" processes, one invoked from the
> >>> shell and the other from COPY ... FROM PROGRAM.  The differences were:
> >>> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
> >>> default action would be to terminate the process, but the COPY PROGRAM
> >>> child ignored them; for TTIN and TTOU, the default action would be to
> >>> stop the process, but again they are ignored.  Why do bgwriter.c,
> >>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> >>> regular backends?
> >>
> >> So, we should revert SIGUSR2 as well to default processing?
> >
> > I don't think it matters in practice, but it might be nice to restore
> > that just for consistency.
> 
> Agreed.
> 
> > I'm not sure what to think about the TTIN,
> > TTOU stuff; I don't understand job control well right now but I don't
> > think it really applies to programs run by a PostgreSQL backend, so if
> > we restore those it'd probably again be only for consistency.  Then
> > again, there may be a reason someone decided to ignore those in the
> > postmaster + regular backends but not the various auxiliary processes.
> > Anyone?
> 
> I don't have any idea about that.

In my understanding processes not connected to a
terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
it artifically).  Since child processes are detached by setsid()
(on Linux), programs called in that way also won't have a
controlling terminal at the start time and I suppose they have no
means to connect to one since they are no longer on the same
session with postmaster.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Thomas Munro
Date:
On Fri, Nov 9, 2018 at 6:39 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.
>
> | $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
> ...
> | 4
> | $ echo $?
> | 0

That's the exit code from head.  You can see python or perl's exit
code by adding strace in front (on Linux):

$ strace python -c "for i in range(1000000): print i" | head -5
...
+++ exited with 1 +++

$ strace perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
+++ killed by SIGPIPE +++

> create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
> select * from ft5 limit 5;
>  a
> ---
>  1
> ...
>  5
> (5 rows)
> (no error)

1000 is not enough... due to buffering, it works.  Try 1000000:

postgres=# create foreign table ft5 (a text) server svf1 options
(program 'ruby -e "for i in 1..1000000 do puts i; end"');
CREATE FOREIGN TABLE
postgres=# select * from ft5 limit 5;
ERROR:  program "ruby -e "for i in 1..1000000 do puts i; end"" failed
DETAIL:  child process exited with exit code 1

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Thomas Munro
Date:
On Fri, Nov 9, 2018 at 6:39 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.
>
> | $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
> ...
> | 4
> | $ echo $?
> | 0

That's the exit code from head.  You can see python or perl's exit
code by adding strace in front (on Linux):

$ strace python -c "for i in range(1000000): print i" | head -5
...
+++ exited with 1 +++

$ strace perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
...
+++ killed by SIGPIPE +++

> create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
> select * from ft5 limit 5;
>  a
> ---
>  1
> ...
>  5
> (5 rows)
> (no error)

1000 is not enough... due to buffering, it works.  Try 1000000:

postgres=# create foreign table ft5 (a text) server svf1 options
(program 'ruby -e "for i in 1..1000000 do puts i; end"');
CREATE FOREIGN TABLE
postgres=# select * from ft5 limit 5;
ERROR:  program "ruby -e "for i in 1..1000000 do puts i; end"" failed
DETAIL:  child process exited with exit code 1

-- 
Thomas Munro
http://www.enterprisedb.com


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/09 14:39), Kyotaro HORIGUCHI wrote:
> At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BE4318F.4040002@lab.ntt.co.jp>
>> (2018/11/08 10:50), Thomas Munro wrote:
>>> I take back what I said earlier about false positives from other
>>> pipes.  I think it's only traditional Unix programs designed for use
>>> in pipelines and naive programs that let SIGPIPE terminate the
>>> process.   The accepted answer here gives a good way to think about
>>> it:
>>>
>>> https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
>>
>> Thanks for the information!
>>
>>> A program sophisticated enough to be writing to other pipes is no
>>> longer in that category and should be setting up signal dispositions
>>> itself, so I agree that we should enable the default disposition and
>>> ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
>>> close to the intended purpose of that signal AFAICS.
>>
>> Great!
>>
>>>>> In the sense of "We don't care the reason", negligible reasons
>>>>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
>>>>> theoretically ignored safely. "theoretically" here means it is
>>>>> another issue whether we rely on the output from a program which
>>>>> causes SEGV (or any reason other than SIGPIPE, which we caused).
>>>>
>>>> For the SIGSEGV case, I think it would be better that we don't rely on
>>>> the output data, IMO, because I think there might be a possibility
>>>> that
>>>> the program have generated that data incorrectly/unexpectedly.
>>>
>>> +1
>>>
>>> I don't think we should ignore termination by signals other than
>>> SIGPIPE: that could hide serious problems from users.  I want to know
>>> if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
>>> happens after we read enough data; there is a major problem that a
>>> human needs to investigate!
>>
>> I think so too.
>
> Ok, I can live with that with no problem.

OK

>>>>> As the result it doesn't report an error for SELECT * FROM ft2
>>>>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
>>>>>
>>>>> =#  select * from ft limit 1;
>>>>>       a
>>>>> -------
>>>>>     test1
>>>>> (1 row)
>>>>>
>>>>> limit 2 reports the error.
>>>>>
>>>>> =# select * from ft limit 2;
>>>>> ERROR:  program "/home/horiguti/work/exprog" failed
>>>>> DETAIL:  child process exited with exit code 1
>>>>
>>>> I think this would be contrary to users expectations: if the SELECT
>>>> command works for limit 1, they would expect that the command would
>>>> work
>>>> for limit 2 as well.  So, I think it would be better to error out that
>>>> command for limit 1 as well, as-is.
>>>
>>> I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
>>> error.  For LIMIT 1, we got all the rows we wanted, and then we closed
>>> the pipe.  If we got a non-zero non-signal exit code, or a signal exit
>>> code and it was SIGPIPE (not any other signal!), then we should
>>> consider that to be expected.
>>
>> Maybe I'm missing something, but the non-zero non-signal exit code
>> means that there was something wrong with the called program, so I
>> think a human had better investigate that as well IMO, which would
>> probably be a minor problem, though.  Too restrictive?
>
> I think Thomas just saying that reading more lines can develop
> problems. According to the current discussion, we should error
> out if we had SEGV when limit 1.

Ah, I misread that.  Sorry for the noise.

>>> On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:
>>>> (2018/11/06 19:50), Thomas Munro wrote:
>>>>> On my FreeBSD system, I compared the output of procstat -i (= show
>>>>> signal disposition) for two "sleep 60" processes, one invoked from the
>>>>> shell and the other from COPY ... FROM PROGRAM.  The differences were:
>>>>> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
>>>>> default action would be to terminate the process, but the COPY PROGRAM
>>>>> child ignored them; for TTIN and TTOU, the default action would be to
>>>>> stop the process, but again they are ignored.  Why do bgwriter.c,
>>>>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
>>>>> regular backends?
>>>>
>>>> So, we should revert SIGUSR2 as well to default processing?
>>>
>>> I don't think it matters in practice, but it might be nice to restore
>>> that just for consistency.
>>
>> Agreed.
>>
>>> I'm not sure what to think about the TTIN,
>>> TTOU stuff; I don't understand job control well right now but I don't
>>> think it really applies to programs run by a PostgreSQL backend, so if
>>> we restore those it'd probably again be only for consistency.  Then
>>> again, there may be a reason someone decided to ignore those in the
>>> postmaster + regular backends but not the various auxiliary processes.
>>> Anyone?
>>
>> I don't have any idea about that.
>
> In my understanding processes not connected to a
> terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
> it artifically).  Since child processes are detached by setsid()
> (on Linux), programs called in that way also won't have a
> controlling terminal at the start time and I suppose they have no
> means to connect to one since they are no longer on the same
> session with postmaster.

For TTIN and TTOU, we would first need to make clear the reason for the 
inconsistency Thomas pointed out.  I'm wondering if we should leave the 
TTIN/TTOU stuff for future work.

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/09 14:39), Kyotaro HORIGUCHI wrote:
> At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BE4318F.4040002@lab.ntt.co.jp>
>> (2018/11/08 10:50), Thomas Munro wrote:
>>> I take back what I said earlier about false positives from other
>>> pipes.  I think it's only traditional Unix programs designed for use
>>> in pipelines and naive programs that let SIGPIPE terminate the
>>> process.   The accepted answer here gives a good way to think about
>>> it:
>>>
>>> https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist
>>
>> Thanks for the information!
>>
>>> A program sophisticated enough to be writing to other pipes is no
>>> longer in that category and should be setting up signal dispositions
>>> itself, so I agree that we should enable the default disposition and
>>> ignore WTERMSIG(exit_code) == SIGPIPE, as proposed.  That is pretty
>>> close to the intended purpose of that signal AFAICS.
>>
>> Great!
>>
>>>>> In the sense of "We don't care the reason", negligible reasons
>>>>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be
>>>>> theoretically ignored safely. "theoretically" here means it is
>>>>> another issue whether we rely on the output from a program which
>>>>> causes SEGV (or any reason other than SIGPIPE, which we caused).
>>>>
>>>> For the SIGSEGV case, I think it would be better that we don't rely on
>>>> the output data, IMO, because I think there might be a possibility
>>>> that
>>>> the program have generated that data incorrectly/unexpectedly.
>>>
>>> +1
>>>
>>> I don't think we should ignore termination by signals other than
>>> SIGPIPE: that could hide serious problems from users.  I want to know
>>> if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it
>>> happens after we read enough data; there is a major problem that a
>>> human needs to investigate!
>>
>> I think so too.
>
> Ok, I can live with that with no problem.

OK

>>>>> As the result it doesn't report an error for SELECT * FROM ft2
>>>>> LIMIT 1 on "main(void){puts("test1"); return 1;}".
>>>>>
>>>>> =#  select * from ft limit 1;
>>>>>       a
>>>>> -------
>>>>>     test1
>>>>> (1 row)
>>>>>
>>>>> limit 2 reports the error.
>>>>>
>>>>> =# select * from ft limit 2;
>>>>> ERROR:  program "/home/horiguti/work/exprog" failed
>>>>> DETAIL:  child process exited with exit code 1
>>>>
>>>> I think this would be contrary to users expectations: if the SELECT
>>>> command works for limit 1, they would expect that the command would
>>>> work
>>>> for limit 2 as well.  So, I think it would be better to error out that
>>>> command for limit 1 as well, as-is.
>>>
>>> I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an
>>> error.  For LIMIT 1, we got all the rows we wanted, and then we closed
>>> the pipe.  If we got a non-zero non-signal exit code, or a signal exit
>>> code and it was SIGPIPE (not any other signal!), then we should
>>> consider that to be expected.
>>
>> Maybe I'm missing something, but the non-zero non-signal exit code
>> means that there was something wrong with the called program, so I
>> think a human had better investigate that as well IMO, which would
>> probably be a minor problem, though.  Too restrictive?
>
> I think Thomas just saying that reading more lines can develop
> problems. According to the current discussion, we should error
> out if we had SEGV when limit 1.

Ah, I misread that.  Sorry for the noise.

>>> On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita
>>> <fujita.etsuro@lab.ntt.co.jp>   wrote:
>>>> (2018/11/06 19:50), Thomas Munro wrote:
>>>>> On my FreeBSD system, I compared the output of procstat -i (= show
>>>>> signal disposition) for two "sleep 60" processes, one invoked from the
>>>>> shell and the other from COPY ... FROM PROGRAM.  The differences were:
>>>>> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
>>>>> default action would be to terminate the process, but the COPY PROGRAM
>>>>> child ignored them; for TTIN and TTOU, the default action would be to
>>>>> stop the process, but again they are ignored.  Why do bgwriter.c,
>>>>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
>>>>> regular backends?
>>>>
>>>> So, we should revert SIGUSR2 as well to default processing?
>>>
>>> I don't think it matters in practice, but it might be nice to restore
>>> that just for consistency.
>>
>> Agreed.
>>
>>> I'm not sure what to think about the TTIN,
>>> TTOU stuff; I don't understand job control well right now but I don't
>>> think it really applies to programs run by a PostgreSQL backend, so if
>>> we restore those it'd probably again be only for consistency.  Then
>>> again, there may be a reason someone decided to ignore those in the
>>> postmaster + regular backends but not the various auxiliary processes.
>>> Anyone?
>>
>> I don't have any idea about that.
>
> In my understanding processes not connected to a
> terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
> it artifically).  Since child processes are detached by setsid()
> (on Linux), programs called in that way also won't have a
> controlling terminal at the start time and I suppose they have no
> means to connect to one since they are no longer on the same
> session with postmaster.

For TTIN and TTOU, we would first need to make clear the reason for the 
inconsistency Thomas pointed out.  I'm wondering if we should leave the 
TTIN/TTOU stuff for future work.

Best regards,
Etsuro Fujita


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Fri, 9 Nov 2018 20:32:54 +1300, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=21wvBaKY2cbN62xn8JoPygLLhTawK2TkBac8Suw68YBw@mail.gmail.com>
> On Fri, Nov 9, 2018 at 6:39 PM Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.
> >
> > | $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
> > ...
> > | 4
> > | $ echo $?
> > | 0
> 
> That's the exit code from head.  You can see python or perl's exit
> code by adding strace in front (on Linux):

Sorry. My stupid. I understand it as head ignores not only
SIGPIPE but SIGSEV and any error exit status of the calling
program. I tried head with a program ends with SEGV and saw that
head ignores it.

$ ./t
line 1
Segmentation fault (core dumped)

$ ./t | head -5  # SEGV before head closes the pipe
line 1
$

This is more tolerant than what I proposed before. (I think it is
too-much tolerant for us).

> > create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
> > select * from ft5 limit 5;
> >  a
> > ---
> >  1
> > ...
> >  5
> > (5 rows)
> > (no error)
> 
> 1000 is not enough... due to buffering, it works.  Try 1000000:

Ah. Understood. Thanks. (Ruby's flush donesn't work for pipes..)

> postgres=# create foreign table ft5 (a text) server svf1 options
> (program 'ruby -e "for i in 1..1000000 do puts i; end"');
> CREATE FOREIGN TABLE
> postgres=# select * from ft5 limit 5;
> ERROR:  program "ruby -e "for i in 1..1000000 do puts i; end"" failed
> DETAIL:  child process exited with exit code 1

I saw the same failure. Ruby handles SIGPIPE and converts it to
exit(1). It cannot be handled by just ignoring SIGPIPE.

Ruby seems to be a friend of my second patch:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Fri, 9 Nov 2018 20:32:54 +1300, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=21wvBaKY2cbN62xn8JoPygLLhTawK2TkBac8Suw68YBw@mail.gmail.com>
> On Fri, Nov 9, 2018 at 6:39 PM Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > Mmm..I didn't get an error at hand on both CentOS7 and High Sierra.
> >
> > | $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5
> > ...
> > | 4
> > | $ echo $?
> > | 0
> 
> That's the exit code from head.  You can see python or perl's exit
> code by adding strace in front (on Linux):

Sorry. My stupid. I understand it as head ignores not only
SIGPIPE but SIGSEV and any error exit status of the calling
program. I tried head with a program ends with SEGV and saw that
head ignores it.

$ ./t
line 1
Segmentation fault (core dumped)

$ ./t | head -5  # SEGV before head closes the pipe
line 1
$

This is more tolerant than what I proposed before. (I think it is
too-much tolerant for us).

> > create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"');
> > select * from ft5 limit 5;
> >  a
> > ---
> >  1
> > ...
> >  5
> > (5 rows)
> > (no error)
> 
> 1000 is not enough... due to buffering, it works.  Try 1000000:

Ah. Understood. Thanks. (Ruby's flush donesn't work for pipes..)

> postgres=# create foreign table ft5 (a text) server svf1 options
> (program 'ruby -e "for i in 1..1000000 do puts i; end"');
> CREATE FOREIGN TABLE
> postgres=# select * from ft5 limit 5;
> ERROR:  program "ruby -e "for i in 1..1000000 do puts i; end"" failed
> DETAIL:  child process exited with exit code 1

I saw the same failure. Ruby handles SIGPIPE and converts it to
exit(1). It cannot be handled by just ignoring SIGPIPE.

Ruby seems to be a friend of my second patch:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Fri, 09 Nov 2018 18:27:09 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5BE552ED.4040304@lab.ntt.co.jp>
> > Ok, I can live with that with no problem.
> 
> OK
...
> > I think Thomas just saying that reading more lines can develop
> > problems. According to the current discussion, we should error
> > out if we had SEGV when limit 1.
> 
> Ah, I misread that.  Sorry for the noise.

Being said that, the ruby case may suggest that we should be more
tolerant for the crash-after-limit case.

> > In my understanding processes not connected to a
> > terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
> > it artifically).  Since child processes are detached by setsid()
> > (on Linux), programs called in that way also won't have a
> > controlling terminal at the start time and I suppose they have no
> > means to connect to one since they are no longer on the same
> > session with postmaster.
> 
> For TTIN and TTOU, we would first need to make clear the reason for
> the inconsistency Thomas pointed out.  I'm wondering if we should
> leave the TTIN/TTOU stuff for future work.

Inconsistency? I read the Thomas's messages as "TTIO/TTOU are not
needed to our busines and we don't have a reason to restore them
before calling external programs other than just plaster
seemingly consistency." And I agree to the analysis and I agree
to you on the point that it doens't need consideration just now.

If the consistency means the different behaviors between perl and
ruby, (as written in another message,) I'm inclined to vote for
having a bit more tolerance for error of external programs as my
second patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Kyotaro HORIGUCHI
Date:
At Fri, 09 Nov 2018 18:27:09 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in
<5BE552ED.4040304@lab.ntt.co.jp>
> > Ok, I can live with that with no problem.
> 
> OK
...
> > I think Thomas just saying that reading more lines can develop
> > problems. According to the current discussion, we should error
> > out if we had SEGV when limit 1.
> 
> Ah, I misread that.  Sorry for the noise.

Being said that, the ruby case may suggest that we should be more
tolerant for the crash-after-limit case.

> > In my understanding processes not connected to a
> > terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
> > it artifically).  Since child processes are detached by setsid()
> > (on Linux), programs called in that way also won't have a
> > controlling terminal at the start time and I suppose they have no
> > means to connect to one since they are no longer on the same
> > session with postmaster.
> 
> For TTIN and TTOU, we would first need to make clear the reason for
> the inconsistency Thomas pointed out.  I'm wondering if we should
> leave the TTIN/TTOU stuff for future work.

Inconsistency? I read the Thomas's messages as "TTIO/TTOU are not
needed to our busines and we don't have a reason to restore them
before calling external programs other than just plaster
seemingly consistency." And I agree to the analysis and I agree
to you on the point that it doens't need consideration just now.

If the consistency means the different behaviors between perl and
ruby, (as written in another message,) I'm inclined to vote for
having a bit more tolerance for error of external programs as my
second patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/12 18:52), Kyotaro HORIGUCHI wrote:
> At Fri, 09 Nov 2018 18:27:09 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BE552ED.4040304@lab.ntt.co.jp>
>>> Ok, I can live with that with no problem.
>>
>> OK
> ...
>>> I think Thomas just saying that reading more lines can develop
>>> problems. According to the current discussion, we should error
>>> out if we had SEGV when limit 1.
>>
>> Ah, I misread that.  Sorry for the noise.
>
> Being said that, the ruby case may suggest that we should be more
> tolerant for the crash-after-limit case.

The Ruby case would be sad, but I'm not sure we can handle such a case 
safely in general, because core and file_fdw don't have enough knowledge 
about whether a non-zero exit code returned from pclose is OK or not, 
which would actually depend on the called program.  One approach for 
that would be to add an option to file_fdw for the called program to 
tell it to ignore those exit codes, which would be somewhat similar to 
what Thomas proposed [1].

>>> In my understanding processes not connected to a
>>> terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
>>> it artifically).  Since child processes are detached by setsid()
>>> (on Linux), programs called in that way also won't have a
>>> controlling terminal at the start time and I suppose they have no
>>> means to connect to one since they are no longer on the same
>>> session with postmaster.
>>
>> For TTIN and TTOU, we would first need to make clear the reason for
>> the inconsistency Thomas pointed out.  I'm wondering if we should
>> leave the TTIN/TTOU stuff for future work.
>
> Inconsistency?

By "the inconsistency" I mean his words:

Why do bgwriter.c, startup.c, ... set SIGTTIN and SIGTTOU back to 
SIG_DFL, but not regular backends?

> I read the Thomas's messages as "TTIO/TTOU are not
> needed to our busines and we don't have a reason to restore them
> before calling external programs other than just plaster
> seemingly consistency." And I agree to the analysis and I agree
> to you on the point that it doens't need consideration just now.

OK

> If the consistency means the different behaviors between perl and
> ruby, (as written in another message,) I'm inclined to vote for
> having a bit more tolerance for error of external programs as my
> second patch.

Maybe my explanation was not enough, but I don't mean that.

Thanks!

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAEepm%3D0fBPiRkSiJ3v4ynm%2BaP-A-dhuHjTFBAxwo59EkE2-E5Q%40mail.gmail.com


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/12 18:52), Kyotaro HORIGUCHI wrote:
> At Fri, 09 Nov 2018 18:27:09 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp>  wrote
in<5BE552ED.4040304@lab.ntt.co.jp>
>>> Ok, I can live with that with no problem.
>>
>> OK
> ...
>>> I think Thomas just saying that reading more lines can develop
>>> problems. According to the current discussion, we should error
>>> out if we had SEGV when limit 1.
>>
>> Ah, I misread that.  Sorry for the noise.
>
> Being said that, the ruby case may suggest that we should be more
> tolerant for the crash-after-limit case.

The Ruby case would be sad, but I'm not sure we can handle such a case 
safely in general, because core and file_fdw don't have enough knowledge 
about whether a non-zero exit code returned from pclose is OK or not, 
which would actually depend on the called program.  One approach for 
that would be to add an option to file_fdw for the called program to 
tell it to ignore those exit codes, which would be somewhat similar to 
what Thomas proposed [1].

>>> In my understanding processes not connected to a
>>> terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent
>>> it artifically).  Since child processes are detached by setsid()
>>> (on Linux), programs called in that way also won't have a
>>> controlling terminal at the start time and I suppose they have no
>>> means to connect to one since they are no longer on the same
>>> session with postmaster.
>>
>> For TTIN and TTOU, we would first need to make clear the reason for
>> the inconsistency Thomas pointed out.  I'm wondering if we should
>> leave the TTIN/TTOU stuff for future work.
>
> Inconsistency?

By "the inconsistency" I mean his words:

Why do bgwriter.c, startup.c, ... set SIGTTIN and SIGTTOU back to 
SIG_DFL, but not regular backends?

> I read the Thomas's messages as "TTIO/TTOU are not
> needed to our busines and we don't have a reason to restore them
> before calling external programs other than just plaster
> seemingly consistency." And I agree to the analysis and I agree
> to you on the point that it doens't need consideration just now.

OK

> If the consistency means the different behaviors between perl and
> ruby, (as written in another message,) I'm inclined to vote for
> having a bit more tolerance for error of external programs as my
> second patch.

Maybe my explanation was not enough, but I don't mean that.

Thanks!

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAEepm%3D0fBPiRkSiJ3v4ynm%2BaP-A-dhuHjTFBAxwo59EkE2-E5Q%40mail.gmail.com


Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/12 20:41), Etsuro Fujita wrote:
> (2018/11/12 18:52), Kyotaro HORIGUCHI wrote:
>> I read the Thomas's messages as "TTIO/TTOU are not
>> needed to our busines and we don't have a reason to restore them
>> before calling external programs other than just plaster
>> seemingly consistency." And I agree to the analysis and I agree
>> to you on the point that it doens't need consideration just now.
>
> OK

Attached is an updated version of Tom's POC patch.  Here are changes:

* I modified his patch so that the called program inherits SIG_DFL for 
SIGUSR2 as well, as discussed upthread.

* I think it's better to ignore the SIGPIPE failure in 
ClosePipeToProgram if we were in a COPY FROM PROGRAM that was allowed to 
terminate early and keep the behavior as-is otherwise.  If we ignore 
that failure unconditionally in that function, eg, COPY TO PROGRAM would 
fail to get a (better) error message in CopySendEndOfRow or EndCopy when 
the invoked program was terminated on SIGPIPE, as discussed before [1]. 
  And to do so, I added a new argument to BeginCopyFrom to specify 
whether COPY FROM PROGRAM can terminate early or not.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5BE18409.2070004%40lab.ntt.co.jp

Attachment

Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT

From
Etsuro Fujita
Date:
(2018/11/12 20:41), Etsuro Fujita wrote:
> (2018/11/12 18:52), Kyotaro HORIGUCHI wrote:
>> I read the Thomas's messages as "TTIO/TTOU are not
>> needed to our busines and we don't have a reason to restore them
>> before calling external programs other than just plaster
>> seemingly consistency." And I agree to the analysis and I agree
>> to you on the point that it doens't need consideration just now.
>
> OK

Attached is an updated version of Tom's POC patch.  Here are changes:

* I modified his patch so that the called program inherits SIG_DFL for 
SIGUSR2 as well, as discussed upthread.

* I think it's better to ignore the SIGPIPE failure in 
ClosePipeToProgram if we were in a COPY FROM PROGRAM that was allowed to 
terminate early and keep the behavior as-is otherwise.  If we ignore 
that failure unconditionally in that function, eg, COPY TO PROGRAM would 
fail to get a (better) error message in CopySendEndOfRow or EndCopy when 
the invoked program was terminated on SIGPIPE, as discussed before [1]. 
  And to do so, I added a new argument to BeginCopyFrom to specify 
whether COPY FROM PROGRAM can terminate early or not.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5BE18409.2070004%40lab.ntt.co.jp