Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT - Mailing list pgsql-bugs
From | Etsuro Fujita |
---|---|
Subject | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT |
Date | |
Msg-id | 5BE25FA1.5070308@lab.ntt.co.jp Whole thread Raw |
In response to | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT (Thomas Munro <thomas.munro@enterprisedb.com>) |
List | pgsql-bugs |
(2018/11/06 19:50), Thomas Munro wrote: > On Wed, Oct 24, 2018 at 1:21 AM Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I wrote: >>> =?utf-8?q?PG_Bug_reporting_form?=<noreply@postgresql.org> writes: >>>> SELECT * FROM test_file_fdw_program_limit LIMIT 0; >>>> /* >>>> [38000] ERROR: program "echo "test"" failed Detail: child process exited >>>> with exit code 1 >>>> */ >> >>> Yeah, I can reproduce this on macOS as well as Linux. Capturing stderr >>> shows something pretty unsurprising: >>> sh: line 1: echo: write error: Broken pipe >>> So the called program is behaving in a somewhat reasonable way: it's >>> detecting EPIPE on its stdout (after we close the pipe), reporting that, >>> and doing exit(1). >>> Unfortunately, it's not clear what we could do about that, short of >>> always reading the whole program output, which is likely to be a >>> cure worse than the disease. If the program were failing thanks to >>> SIGPIPE, we could recognize that as a case we can ignore ... but with >>> behavior like this, I don't see a reliable way to tell it apart from >>> a generic program failure, which surely we'd better report. >> >> After a bit of thought, the problem here is blindingly obvious: >> we generally run the backend with SIGPIPE handing set to SIG_IGN, >> and evidently popen() allows the called program to inherit that, >> at least on these platforms. >> >> So what we need to do is make sure the called program inherits SIG_DFL >> handling for SIGPIPE, and then special-case that result as not being >> a failure. The attached POC patch does that and successfully makes >> the file_fdw problem go away for me. >> >> It's just a POC because there are some things that need more thought >> than I've given them: >> >> 1. Is it OK to revert SIGPIPE to default processing for *all* programs >> launched through OpenPipeStream? If not, what infrastructure do we >> need to add to control that? In particular, is it sane to revert >> SIGPIPE for a pipe program that we will write to not read from? >> (I thought probably yes, because that is the normal Unix behavior, >> but it could be debated.) >> >> 2. Likewise, is it always OK for ClosePipeToProgram to ignore a >> SIGPIPE failure? (For ordinary COPY, maybe it shouldn't, since >> we don't intend to terminate that early.) > > I'm not sure about that. It might in theory be telling you about some > other pipe. If you're OK with that false positive, why not ignore all > errors after you've read enough successful input and decided to close > the pipe? It's unfortunate to have that false positive, but in my opinion I think we had better to error out if there is something wrong with the called program, because in that case I think the data that we read from the pipe might not be reliable. IMO I think it would be the responsibility of the called program to handle/ignore SIGPIPE properly if necessary. >> 3. Maybe this should be implemented at some higher level? > > It won't work for some programs that ignore or catch the signal, so in > theory you might want to give users the power/responsibility to say > "ignore errors that occur after I decide to hang up". Here are three > different behaviours I found in popular software, showing termination > by signal, custom error handling that we can't distinguish, and a > bonehead strategy: Interesting! > $ seq 1 1000000 | head -5 > 1 > 2 > 3 > 4 > 5 > ... exit code indicates killed by signal > > $ python -c "for i in range(1000000): print i" | head -5 > 0 > 1 > 2 > 3 > 4 > Traceback (most recent call last): > File "<string>", line 1, in<module> > IOError: [Errno 32] Broken pipe > ... exit code 1 That's sad. > $ cat Test.java > public class Test { > public static void main(String[] args) { > for (int i = 0; i< 1000000; ++i) { > System.out.println(Integer.toString(i)); > } > } > } > $ javac Test.java > $ java Test | head -5 > 0 > 1 > 2 > 3 > 4 > ... wait a really long time with no output, exit code 0 > > (Explanation: JVMs ignore SIGPIPE and usually convert EPIPE into an IO > exception, except for PrintStreams like System.out which just eat data > after an error...) I agree that that is a bonehead strategy, but that seems not that bad to me. >> 4. Are there any other signals we ought to be reverting to default >> behavior before launching a COPY TO/FROM PROGRAM? > > On my FreeBSD system, I compared the output of procstat -i (= show > signal disposition) for two "sleep 60" processes, one invoked from the > shell and the other from COPY ... FROM PROGRAM. The differences were: > PIPE, TTIN, TTOU and USR2. For the first and last of those, the > default action would be to terminate the process, but the COPY PROGRAM > child ignored them; for TTIN and TTOU, the default action would be to > stop the process, but again they are ignored. Why do bgwriter.c, > startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not > regular backends? So, we should revert SIGUSR2 as well to default processing? Thanks! Best regards, Etsuro Fujita
pgsql-bugs by date: