Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT - Mailing list pgsql-bugs
| From | Etsuro Fujita |
|---|---|
| Subject | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT |
| Date | |
| Msg-id | 5BE25FA1.5070308@lab.ntt.co.jp Whole thread Raw |
| In response to | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT (Thomas Munro <thomas.munro@enterprisedb.com>) |
| List | pgsql-bugs |
(2018/11/06 19:50), Thomas Munro wrote:
> On Wed, Oct 24, 2018 at 1:21 AM Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> I wrote:
>>> =?utf-8?q?PG_Bug_reporting_form?=<noreply@postgresql.org> writes:
>>>> SELECT * FROM test_file_fdw_program_limit LIMIT 0;
>>>> /*
>>>> [38000] ERROR: program "echo "test"" failed Detail: child process exited
>>>> with exit code 1
>>>> */
>>
>>> Yeah, I can reproduce this on macOS as well as Linux. Capturing stderr
>>> shows something pretty unsurprising:
>>> sh: line 1: echo: write error: Broken pipe
>>> So the called program is behaving in a somewhat reasonable way: it's
>>> detecting EPIPE on its stdout (after we close the pipe), reporting that,
>>> and doing exit(1).
>>> Unfortunately, it's not clear what we could do about that, short of
>>> always reading the whole program output, which is likely to be a
>>> cure worse than the disease. If the program were failing thanks to
>>> SIGPIPE, we could recognize that as a case we can ignore ... but with
>>> behavior like this, I don't see a reliable way to tell it apart from
>>> a generic program failure, which surely we'd better report.
>>
>> After a bit of thought, the problem here is blindingly obvious:
>> we generally run the backend with SIGPIPE handing set to SIG_IGN,
>> and evidently popen() allows the called program to inherit that,
>> at least on these platforms.
>>
>> So what we need to do is make sure the called program inherits SIG_DFL
>> handling for SIGPIPE, and then special-case that result as not being
>> a failure. The attached POC patch does that and successfully makes
>> the file_fdw problem go away for me.
>>
>> It's just a POC because there are some things that need more thought
>> than I've given them:
>>
>> 1. Is it OK to revert SIGPIPE to default processing for *all* programs
>> launched through OpenPipeStream? If not, what infrastructure do we
>> need to add to control that? In particular, is it sane to revert
>> SIGPIPE for a pipe program that we will write to not read from?
>> (I thought probably yes, because that is the normal Unix behavior,
>> but it could be debated.)
>>
>> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a
>> SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since
>> we don't intend to terminate that early.)
>
> I'm not sure about that. It might in theory be telling you about some
> other pipe. If you're OK with that false positive, why not ignore all
> errors after you've read enough successful input and decided to close
> the pipe?
It's unfortunate to have that false positive, but in my opinion I think
we had better to error out if there is something wrong with the called
program, because in that case I think the data that we read from the
pipe might not be reliable. IMO I think it would be the responsibility
of the called program to handle/ignore SIGPIPE properly if necessary.
>> 3. Maybe this should be implemented at some higher level?
>
> It won't work for some programs that ignore or catch the signal, so in
> theory you might want to give users the power/responsibility to say
> "ignore errors that occur after I decide to hang up". Here are three
> different behaviours I found in popular software, showing termination
> by signal, custom error handling that we can't distinguish, and a
> bonehead strategy:
Interesting!
> $ seq 1 1000000 | head -5
> 1
> 2
> 3
> 4
> 5
> ... exit code indicates killed by signal
>
> $ python -c "for i in range(1000000): print i" | head -5
> 0
> 1
> 2
> 3
> 4
> Traceback (most recent call last):
> File "<string>", line 1, in<module>
> IOError: [Errno 32] Broken pipe
> ... exit code 1
That's sad.
> $ cat Test.java
> public class Test {
> public static void main(String[] args) {
> for (int i = 0; i< 1000000; ++i) {
> System.out.println(Integer.toString(i));
> }
> }
> }
> $ javac Test.java
> $ java Test | head -5
> 0
> 1
> 2
> 3
> 4
> ... wait a really long time with no output, exit code 0
>
> (Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO
> exception, except for PrintStreams like System.out which just eat data
> after an error...)
I agree that that is a bonehead strategy, but that seems not that bad to me.
>> 4. Are there any other signals we ought to be reverting to default
>> behavior before launching a COPY TO/FROM PROGRAM?
>
> On my FreeBSD system, I compared the output of procstat -i (= show
> signal disposition) for two "sleep 60" processes, one invoked from the
> shell and the other from COPY ... FROM PROGRAM. The differences were:
> PIPE, TTIN, TTOU and USR2. For the first and last of those, the
> default action would be to terminate the process, but the COPY PROGRAM
> child ignored them; for TTIN and TTOU, the default action would be to
> stop the process, but again they are ignored. Why do bgwriter.c,
> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not
> regular backends?
So, we should revert SIGUSR2 as well to default processing?
Thanks!
Best regards,
Etsuro Fujita
pgsql-bugs by date: