Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT |
Date | |
Msg-id | CAEepm=0fBPiRkSiJ3v4ynm+aP-A-dhuHjTFBAxwo59EkE2-E5Q@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT |
List | pgsql-bugs |
On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/11/07 9:22), Kyotaro HORIGUCHI wrote: > > At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5BE18409.2070004@lab.ntt.co.jp> > >> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote: > >>> even if it comes from another pipe, we can assume that the > >>> SIGPIPE immediately stopped the program before returning any > >>> garbage to us. > >> > >> Sorry, I don't understand this. > > > > Mmm. It looks confusing, sorry. In other words: > > > > We don't care the reason for program's ending if we received > > enough data from the program and we actively (= before receiving > > EOF) stop reading. On the other hand if SIGPIPE (on another pipe) > > or something fatal happened on the program before we end reading, > > we receive EOF just after that and we are interested in the cause > > of the program's death. > > > > # Does the above make sense? > > Yeah, thanks for the explanation! +1 I take back what I said earlier about false positives from other pipes. I think it's only traditional Unix programs designed for use in pipelines and naive programs that let SIGPIPE terminate the process. The accepted answer here gives a good way to think about it: https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist A program sophisticated enough to be writing to other pipes is no longer in that category and should be setting up signal dispositions itself, so I agree that we should enable the default disposition and ignore WTERMSIG(exit_code) == SIGPIPE, as proposed. That is pretty close to the intended purpose of that signal AFAICS. > > In the sense of "We don't care the reason", negligible reasons > > are necessariry restricted to SIGPIPE, evan SIGSEGV could be > > theoretically ignored safely. "theoretically" here means it is > > another issue whether we rely on the output from a program which > > causes SEGV (or any reason other than SIGPIPE, which we caused). > > For the SIGSEGV case, I think it would be better that we don't rely on > the output data, IMO, because I think there might be a possibility that > the program have generated that data incorrectly/unexpectedly. +1 I don't think we should ignore termination by signals other than SIGPIPE: that could hide serious problems from users. I want to know if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it happens after we read enough data; there is a major problem that a human needs to investigate! > >>> Finally in the attached PoC, I set cstate->eof_reached on failure > >>> of NextCopyFromRawFields then if eof_reached we don't ingore > >>> SIGPIPE. For a program like > >>> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had > >>> the following behavior. (I got an error without the fflush() but > >>> it is inevitable). > >>> > >>> =# select * from ft2 limit 1; > >>> a > >>> ------- > >>> test1 > >>> > >>> =# select * from ft2 limit 2; > >>> ERROR: program "/home/horiguti/work/exprog" failed > >>> DETAIL: child process was terminated by signal 13: Broken pipe > >>> > >>> For the original case: > >>> > >>> =# select * from ft1 limit 1; > >>> a > >>> ------ > >>> test > >>> =# select * from ft1 limit 2; > >>> a > >>> ------ > >>> test > >>> (1 row) > >>> > >>> > >>> I didn't confirmed whether it is complete. > >> > >> Sorry, I don't understand this fully, but the reason to add the > >> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal > >> COPY FROM PROGRAM cases? > > > > Yes, if we have received EOF, the program ended before we > > finished reading from the pipe thus any error should be > > reported. Elsewise we don't care at least SIGPIPE. In the > > attached patch all kind of errors are ignored when pipe is closed > > from reading-end on backends. > > OK, I understand your idea. Thanks for the patch! +1 > > As the result it doesn't report an error for SELECT * FROM ft2 > > LIMIT 1 on "main(void){puts("test1"); return 1;}". > > > > =# select * from ft limit 1; > > a > > ------- > > test1 > > (1 row) > > > > limit 2 reports the error. > > > > =# select * from ft limit 2; > > ERROR: program "/home/horiguti/work/exprog" failed > > DETAIL: child process exited with exit code 1 > > I think this would be contrary to users expectations: if the SELECT > command works for limit 1, they would expect that the command would work > for limit 2 as well. So, I think it would be better to error out that > command for limit 1 as well, as-is. I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an error. For LIMIT 1, we got all the rows we wanted, and then we closed the pipe. If we got a non-zero non-signal exit code, or a signal exit code and it was SIGPIPE (not any other signal!), then we should consider that to be expected. I tried to think of a scenario where the already-received output is truly invalidated by a later error that happens after we close the pipe. It could be something involving a program that uses something optimistic like serializable snapshot isolation that can later decide that whatever it told you earlier is not valid after all. Suppose the program is clever enough to expect EPIPE and not consider that to be an error, but wants to tell us about serialization failure with a non-zero exit code. To handle that, you'd need a way to provide an option to file_fdw to tell it not to ignore non-zero exit codes after close. This seems so exotic and contrived that it's not worth bothering with for now, but could always be added. BTW just for curiosity: perl -e 'for (my $i=0; $i < 1000000; $i++) { print "$i\n"; }' | head -5 Exit code: terminated by SIGPIPE, like seq ruby -e 'for i in 1..1000000 do puts i; end' | head -5 Exit code: 1, like Python On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > (2018/11/06 19:50), Thomas Munro wrote: > > On my FreeBSD system, I compared the output of procstat -i (= show > > signal disposition) for two "sleep 60" processes, one invoked from the > > shell and the other from COPY ... FROM PROGRAM. The differences were: > > PIPE, TTIN, TTOU and USR2. For the first and last of those, the > > default action would be to terminate the process, but the COPY PROGRAM > > child ignored them; for TTIN and TTOU, the default action would be to > > stop the process, but again they are ignored. Why do bgwriter.c, > > startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not > > regular backends? > > So, we should revert SIGUSR2 as well to default processing? I don't think it matters in practice, but it might be nice to restore that just for consistency. I'm not sure what to think about the TTIN, TTOU stuff; I don't understand job control well right now but I don't think it really applies to programs run by a PostgreSQL backend, so if we restore those it'd probably again be only for consistency. Then again, there may be a reason someone decided to ignore those in the postmaster + regular backends but not the various auxiliary processes. Anyone? -- Thomas Munro http://www.enterprisedb.com
pgsql-bugs by date: