Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT - Mailing list pgsql-bugs
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT |
Date | |
Msg-id | 20181109.143931.243136889.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT |
List | pgsql-bugs |
At Thu, 08 Nov 2018 21:52:31 +0900, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote in <5BE4318F.4040002@lab.ntt.co.jp> > (2018/11/08 10:50), Thomas Munro wrote: > > I take back what I said earlier about false positives from other > > pipes. I think it's only traditional Unix programs designed for use > > in pipelines and naive programs that let SIGPIPE terminate the > > process. The accepted answer here gives a good way to think about > > it: > > > > https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist > > Thanks for the information! > > > A program sophisticated enough to be writing to other pipes is no > > longer in that category and should be setting up signal dispositions > > itself, so I agree that we should enable the default disposition and > > ignore WTERMSIG(exit_code) == SIGPIPE, as proposed. That is pretty > > close to the intended purpose of that signal AFAICS. > > Great! > > >>> In the sense of "We don't care the reason", negligible reasons > >>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be > >>> theoretically ignored safely. "theoretically" here means it is > >>> another issue whether we rely on the output from a program which > >>> causes SEGV (or any reason other than SIGPIPE, which we caused). > >> > >> For the SIGSEGV case, I think it would be better that we don't rely on > >> the output data, IMO, because I think there might be a possibility > >> that > >> the program have generated that data incorrectly/unexpectedly. > > > > +1 > > > > I don't think we should ignore termination by signals other than > > SIGPIPE: that could hide serious problems from users. I want to know > > if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it > > happens after we read enough data; there is a major problem that a > > human needs to investigate! > > I think so too. Ok, I can live with that with no problem. > >>> As the result it doesn't report an error for SELECT * FROM ft2 > >>> LIMIT 1 on "main(void){puts("test1"); return 1;}". > >>> > >>> =# select * from ft limit 1; > >>> a > >>> ------- > >>> test1 > >>> (1 row) > >>> > >>> limit 2 reports the error. > >>> > >>> =# select * from ft limit 2; > >>> ERROR: program "/home/horiguti/work/exprog" failed > >>> DETAIL: child process exited with exit code 1 > >> > >> I think this would be contrary to users expectations: if the SELECT > >> command works for limit 1, they would expect that the command would > >> work > >> for limit 2 as well. So, I think it would be better to error out that > >> command for limit 1 as well, as-is. > > > > I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an > > error. For LIMIT 1, we got all the rows we wanted, and then we closed > > the pipe. If we got a non-zero non-signal exit code, or a signal exit > > code and it was SIGPIPE (not any other signal!), then we should > > consider that to be expected. > > Maybe I'm missing something, but the non-zero non-signal exit code > means that there was something wrong with the called program, so I > think a human had better investigate that as well IMO, which would > probably be a minor problem, though. Too restrictive? I think Thomas just saying that reading more lines can develop problems. According to the current discussion, we should error out if we had SEGV when limit 1. > > I tried to think of a scenario where the already-received output is > > truly invalidated by a later error that happens after we close the > > pipe. It could be something involving a program that uses something > > optimistic like serializable snapshot isolation that can later decide > > that whatever it told you earlier is not valid after all. Suppose the > > program is clever enough to expect EPIPE and not consider that to be > > an error, but wants to tell us about serialization failure with a > > non-zero exit code. To handle that, you'd need a way to provide an > > option to file_fdw to tell it not to ignore non-zero exit codes after > > close. This seems so exotic and contrived that it's not worth > > bothering with for now, but could always be added. > > Interesting! I agree that such an option could add more flexibility > in handling the non-zero-exit-code case. I think the program shoudn't output a line until all possible output is validated. Once the data source emited a line, the receiver can't do other than believe that it won't be withdrawn. > > BTW just for curiosity: > > > > perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5 > > Exit code: terminated by SIGPIPE, like seq > > Good to know! Mmm..I didn't get an error at hand on both CentOS7 and High Sierra. | $ perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5 ... | 4 | $ echo $? | 0 > > ruby -e 'for i in 1..1000000 do puts i; end' | head -5 > > Exit code: 1, like Python > > Sad. Anyway, thanks a lot for these experiments in addition to the > previous ones! ruby reported broken pipe but exit status was 0.. create foreign table ft5 (a text) server svf1 options (program 'ruby -e "for i in 1..1000 do puts i; end"'); select * from ft5 limit 5; a --- 1 ... 5 (5 rows) (no error) > > On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita > > <fujita.etsuro@lab.ntt.co.jp> wrote: > >> (2018/11/06 19:50), Thomas Munro wrote: > >>> On my FreeBSD system, I compared the output of procstat -i (= show > >>> signal disposition) for two "sleep 60" processes, one invoked from the > >>> shell and the other from COPY ... FROM PROGRAM. The differences were: > >>> PIPE, TTIN, TTOU and USR2. For the first and last of those, the > >>> default action would be to terminate the process, but the COPY PROGRAM > >>> child ignored them; for TTIN and TTOU, the default action would be to > >>> stop the process, but again they are ignored. Why do bgwriter.c, > >>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not > >>> regular backends? > >> > >> So, we should revert SIGUSR2 as well to default processing? > > > > I don't think it matters in practice, but it might be nice to restore > > that just for consistency. > > Agreed. > > > I'm not sure what to think about the TTIN, > > TTOU stuff; I don't understand job control well right now but I don't > > think it really applies to programs run by a PostgreSQL backend, so if > > we restore those it'd probably again be only for consistency. Then > > again, there may be a reason someone decided to ignore those in the > > postmaster + regular backends but not the various auxiliary processes. > > Anyone? > > I don't have any idea about that. In my understanding processes not connected to a terminal(tty/pts) cannot receive TTIN/TTOU (unless someone sent it artifically). Since child processes are detached by setsid() (on Linux), programs called in that way also won't have a controlling terminal at the start time and I suppose they have no means to connect to one since they are no longer on the same session with postmaster. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-bugs by date: