Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT - Mailing list pgsql-bugs

From Etsuro Fujita
Subject Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT
Date
Msg-id 5BE25FA1.5070308@lab.ntt.co.jp
Whole thread Raw
In response to Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-bugs
(2018/11/06 19:50), Thomas Munro wrote:
> On Wed, Oct 24, 2018 at 1:21 AM Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> I wrote:
>>> =?utf-8?q?PG_Bug_reporting_form?=<noreply@postgresql.org>  writes:
>>>> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
>>>> /*
>>>> [38000] ERROR: program "echo "test"" failed Detail: child process exited
>>>> with exit code 1
>>>> */
>>
>>> Yeah, I can reproduce this on macOS as well as Linux.  Capturing stderr
>>> shows something pretty unsurprising:
>>> sh: line 1: echo: write error: Broken pipe
>>> So the called program is behaving in a somewhat reasonable way: it's
>>> detecting EPIPE on its stdout (after we close the pipe), reporting that,
>>> and doing exit(1).
>>> Unfortunately, it's not clear what we could do about that, short of
>>> always reading the whole program output, which is likely to be a
>>> cure worse than the disease.  If the program were failing thanks to
>>> SIGPIPE, we could recognize that as a case we can ignore ... but with
>>> behavior like this, I don't see a reliable way to tell it apart from
>>> a generic program failure, which surely we'd better report.
>>
>> After a bit of thought, the problem here is blindingly obvious:
>> we generally run the backend with SIGPIPE handing set to SIG_IGN,
>> and evidently popen() allows the called program to inherit that,
>> at least on these platforms.
>>
>> So what we need to do is make sure the called program inherits SIG_DFL
>> handling for SIGPIPE, and then special-case that result as not being
>> a failure.  The attached POC patch does that and successfully makes
>> the file_fdw problem go away for me.
>>
>> It's just a POC because there are some things that need more thought
>> than I've given them:
>>
>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
>> launched through OpenPipeStream?  If not, what infrastructure do we
>> need to add to control that?  In particular, is it sane to revert
>> SIGPIPE for a pipe program that we will write to not read from?
>> (I thought probably yes, because that is the normal Unix behavior,
>> but it could be debated.)
>>
>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>> SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
>> we don't intend to terminate that early.)
>
> I'm not sure about that.  It might in theory be telling you about some
> other pipe.  If you're OK with that false positive, why not ignore all
> errors after you've read enough successful input and decided to close
> the pipe?

It's unfortunate to have that false positive, but in my opinion I think 
we had better to error out if there is something wrong with the called 
program, because in that case I think the data that we read from the 
pipe might not be reliable.  IMO I think it would be the responsibility 
of the called program to handle/ignore SIGPIPE properly if necessary.

>> 3. Maybe this should be implemented at some higher level?
>
> It won't work for some programs that ignore or catch the signal, so in
> theory you might want to give users the power/responsibility to say
> "ignore errors that occur after I decide to hang up".  Here are three
> different behaviours I found in popular software, showing termination
> by signal, custom error handling that we can't distinguish, and a
> bonehead strategy:

Interesting!

> $ seq 1 1000000 | head -5
> 1
> 2
> 3
> 4
> 5
> ... exit code indicates killed by signal
>
> $ python -c "for i in range(1000000): print i" | head -5
> 0
> 1
> 2
> 3
> 4
> Traceback (most recent call last):
>    File "<string>", line 1, in<module>
> IOError: [Errno 32] Broken pipe
> ... exit code 1

That's sad.

> $ cat Test.java
> public class Test {
>          public static void main(String[] args) {
>                  for (int i = 0; i<  1000000; ++i) {
>                          System.out.println(Integer.toString(i));
>                  }
>          }
> }
> $ javac Test.java
> $ java Test | head -5
> 0
> 1
> 2
> 3
> 4
> ... wait a really long time with no output, exit code 0
>
> (Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
> exception, except for PrintStreams like System.out which just eat data
> after an error...)

I agree that that is a bonehead strategy, but that seems not that bad to me.

>> 4. Are there any other signals we ought to be reverting to default
>> behavior before launching a COPY TO/FROM PROGRAM?
>
> On my FreeBSD system, I compared the output of procstat -i (= show
> signal disposition) for two "sleep 60" processes, one invoked from the
> shell and the other from COPY ... FROM PROGRAM.  The differences were:
> PIPE, TTIN, TTOU and USR2.  For the first and last of those, the
> default action would be to terminate the process, but the COPY PROGRAM
> child ignored them; for TTIN and TTOU, the default action would be to
> stop the process, but again they are ignored.  Why do bgwriter.c,
> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> regular backends?

So, we should revert SIGUSR2 as well to default processing?

Thanks!

Best regards,
Etsuro Fujita


pgsql-bugs by date:

Previous
From: "M Zav."
Date:
Subject: postgres_fdw
Next
From: PG Bug reporting form
Date:
Subject: BUG #15489: Segfault on DELETE