Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT - Mailing list pgsql-bugs
From | Etsuro Fujita |
---|---|
Subject | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT |
Date | |
Msg-id | 5BE4318F.4040002@lab.ntt.co.jp Whole thread Raw |
In response to | Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT |
List | pgsql-bugs |
(2018/11/08 10:50), Thomas Munro wrote: > On Wed, Nov 7, 2018 at 11:03 PM Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/11/07 9:22), Kyotaro HORIGUCHI wrote: >>> At Tue, 06 Nov 2018 21:07:37 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5BE18409.2070004@lab.ntt.co.jp> >>>> (2018/11/06 12:53), Kyotaro HORIGUCHI wrote: >>>>> even if it comes from another pipe, we can assume that the >>>>> SIGPIPE immediately stopped the program before returning any >>>>> garbage to us. >>>> >>>> Sorry, I don't understand this. >>> >>> Mmm. It looks confusing, sorry. In other words: >>> >>> We don't care the reason for program's ending if we received >>> enough data from the program and we actively (= before receiving >>> EOF) stop reading. On the other hand if SIGPIPE (on another pipe) >>> or something fatal happened on the program before we end reading, >>> we receive EOF just after that and we are interested in the cause >>> of the program's death. >>> >>> # Does the above make sense? >> >> Yeah, thanks for the explanation! > > +1 > > I take back what I said earlier about false positives from other > pipes. I think it's only traditional Unix programs designed for use > in pipelines and naive programs that let SIGPIPE terminate the > process. The accepted answer here gives a good way to think about > it: > > https://stackoverflow.com/questions/8369506/why-does-sigpipe-exist Thanks for the information! > A program sophisticated enough to be writing to other pipes is no > longer in that category and should be setting up signal dispositions > itself, so I agree that we should enable the default disposition and > ignore WTERMSIG(exit_code) == SIGPIPE, as proposed. That is pretty > close to the intended purpose of that signal AFAICS. Great! >>> In the sense of "We don't care the reason", negligible reasons >>> are necessariry restricted to SIGPIPE, evan SIGSEGV could be >>> theoretically ignored safely. "theoretically" here means it is >>> another issue whether we rely on the output from a program which >>> causes SEGV (or any reason other than SIGPIPE, which we caused). >> >> For the SIGSEGV case, I think it would be better that we don't rely on >> the output data, IMO, because I think there might be a possibility that >> the program have generated that data incorrectly/unexpectedly. > > +1 > > I don't think we should ignore termination by signals other than > SIGPIPE: that could hide serious problems from users. I want to know > if my program is crashing with SIGBUS, SIGTERM, SIGFPE etc, even if it > happens after we read enough data; there is a major problem that a > human needs to investigate! I think so too. >>>>> Finally in the attached PoC, I set cstate->eof_reached on failure >>>>> of NextCopyFromRawFields then if eof_reached we don't ingore >>>>> SIGPIPE. For a program like >>>>> "puts("test1");fflush(stdout);kill(getpid(), SIGPIPE);", I had >>>>> the following behavior. (I got an error without the fflush() but >>>>> it is inevitable). >>>>> >>>>> =# select * from ft2 limit 1; >>>>> a >>>>> ------- >>>>> test1 >>>>> >>>>> =# select * from ft2 limit 2; >>>>> ERROR: program "/home/horiguti/work/exprog" failed >>>>> DETAIL: child process was terminated by signal 13: Broken pipe >>>>> >>>>> For the original case: >>>>> >>>>> =# select * from ft1 limit 1; >>>>> a >>>>> ------ >>>>> test >>>>> =# select * from ft1 limit 2; >>>>> a >>>>> ------ >>>>> test >>>>> (1 row) >>>>> >>>>> >>>>> I didn't confirmed whether it is complete. >>>> >>>> Sorry, I don't understand this fully, but the reason to add the >>>> eof_reached stuff is to avoid ignoring the SIGPIPE failure in normal >>>> COPY FROM PROGRAM cases? >>> >>> Yes, if we have received EOF, the program ended before we >>> finished reading from the pipe thus any error should be >>> reported. Elsewise we don't care at least SIGPIPE. In the >>> attached patch all kind of errors are ignored when pipe is closed >>> from reading-end on backends. >> >> OK, I understand your idea. Thanks for the patch! > > +1 > >>> As the result it doesn't report an error for SELECT * FROM ft2 >>> LIMIT 1 on "main(void){puts("test1"); return 1;}". >>> >>> =# select * from ft limit 1; >>> a >>> ------- >>> test1 >>> (1 row) >>> >>> limit 2 reports the error. >>> >>> =# select * from ft limit 2; >>> ERROR: program "/home/horiguti/work/exprog" failed >>> DETAIL: child process exited with exit code 1 >> >> I think this would be contrary to users expectations: if the SELECT >> command works for limit 1, they would expect that the command would work >> for limit 2 as well. So, I think it would be better to error out that >> command for limit 1 as well, as-is. > > I think it's correct that LIMIT 1 gives no error but LIMIT 2 gives an > error. For LIMIT 1, we got all the rows we wanted, and then we closed > the pipe. If we got a non-zero non-signal exit code, or a signal exit > code and it was SIGPIPE (not any other signal!), then we should > consider that to be expected. Maybe I'm missing something, but the non-zero non-signal exit code means that there was something wrong with the called program, so I think a human had better investigate that as well IMO, which would probably be a minor problem, though. Too restrictive? > I tried to think of a scenario where the already-received output is > truly invalidated by a later error that happens after we close the > pipe. It could be something involving a program that uses something > optimistic like serializable snapshot isolation that can later decide > that whatever it told you earlier is not valid after all. Suppose the > program is clever enough to expect EPIPE and not consider that to be > an error, but wants to tell us about serialization failure with a > non-zero exit code. To handle that, you'd need a way to provide an > option to file_fdw to tell it not to ignore non-zero exit codes after > close. This seems so exotic and contrived that it's not worth > bothering with for now, but could always be added. Interesting! I agree that such an option could add more flexibility in handling the non-zero-exit-code case. > BTW just for curiosity: > > perl -e 'for (my $i=0; $i< 1000000; $i++) { print "$i\n"; }' | head -5 > Exit code: terminated by SIGPIPE, like seq Good to know! > ruby -e 'for i in 1..1000000 do puts i; end' | head -5 > Exit code: 1, like Python Sad. Anyway, thanks a lot for these experiments in addition to the previous ones! > On Wed, Nov 7, 2018 at 4:44 PM Etsuro Fujita > <fujita.etsuro@lab.ntt.co.jp> wrote: >> (2018/11/06 19:50), Thomas Munro wrote: >>> On my FreeBSD system, I compared the output of procstat -i (= show >>> signal disposition) for two "sleep 60" processes, one invoked from the >>> shell and the other from COPY ... FROM PROGRAM. The differences were: >>> PIPE, TTIN, TTOU and USR2. For the first and last of those, the >>> default action would be to terminate the process, but the COPY PROGRAM >>> child ignored them; for TTIN and TTOU, the default action would be to >>> stop the process, but again they are ignored. Why do bgwriter.c, >>> startup.c, ... set SIGTTIN and SIGTTOU back to SIG_DFL, but not >>> regular backends? >> >> So, we should revert SIGUSR2 as well to default processing? > > I don't think it matters in practice, but it might be nice to restore > that just for consistency. Agreed. > I'm not sure what to think about the TTIN, > TTOU stuff; I don't understand job control well right now but I don't > think it really applies to programs run by a PostgreSQL backend, so if > we restore those it'd probably again be only for consistency. Then > again, there may be a reason someone decided to ignore those in the > postmaster + regular backends but not the various auxiliary processes. > Anyone? I don't have any idea about that. Best regards, Etsuro Fujita
pgsql-bugs by date: