Thread: Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: > * I think it's better to ignore the SIGPIPE failure in > ClosePipeToProgram if we were in a COPY FROM PROGRAM that was allowed to > terminate early and keep the behavior as-is otherwise. If we ignore > that failure unconditionally in that function, eg, COPY TO PROGRAM would > fail to get a (better) error message in CopySendEndOfRow or EndCopy when > the invoked program was terminated on SIGPIPE, as discussed before [1]. > And to do so, I added a new argument to BeginCopyFrom to specify > whether COPY FROM PROGRAM can terminate early or not. If we do that, it makes this not back-patchable, I fear --- the fact that file_fdw is calling BeginCopyFrom seems like sufficient evidence that there might be third-party callers who would object to an API break in minor releases. That seems unfortunate for a bug fix. Are we sufficiently convinced that we must have the dont-allow-partial option to not fix this in the back branches? I'm not. regards, tom lane
I wrote: > Are we sufficiently convinced that we must have the dont-allow-partial > option to not fix this in the back branches? I'm not. I just had a thought about that: suppose we add a flag to CopyState to indicate whether we reached EOF while reading. This wouldn't be hugely expensive, just something like switch (cstate->copy_dest) { case COPY_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, (errcode_for_file_access(), errmsg("could not read from COPY file: %m"))); + cstate->reached_eof |= (bytesread < maxread); break; case COPY_OLD_FE: Then the logic in ClosePipeToProgram could be "if we did not reach EOF, then a SIGPIPE termination is unsurprising. If we did reach EOF, then SIGPIPE is an error." If the called program gets SIGPIPE for some reason other than us closing the pipe early, then we would see EOF next time we try to read, and correctly report that there's a problem. There are race-ish conditions in cases like the called program getting an unrelated SIGPIPE at about the same time that we decide to stop reading early, but I don't think it's really possible to disambiguate that. regards, tom lane
I wrote: > I just had a thought about that: suppose we add a flag to CopyState > to indicate whether we reached EOF while reading. ... > Then the logic in ClosePipeToProgram could be "if we did not reach > EOF, then a SIGPIPE termination is unsurprising. If we did reach > EOF, then SIGPIPE is an error." If the called program gets SIGPIPE > for some reason other than us closing the pipe early, then we would > see EOF next time we try to read, and correctly report that there's > a problem. Concretely, something like the attached. I'm not quite sure whether the reached_eof test should be "if (bytesread == 0)" or "if (bytesread < maxread)". The former seems more certain to indicate real EOF; are there other cases where the fread might return a short read? On the other hand, if we support in-band EOF indicators (\. for instance) then we might stop without having made an extra call to CopyGetData that would see bytesread == 0. On the third hand, if we do do that it's not quite clear to me whether SIGPIPE is ignorable or not. regards, tom lane diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 6588ebd..8975446 100644 *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *************** typedef struct CopyStateData *** 114,120 **** FILE *copy_file; /* used if copy_dest == COPY_FILE */ StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for * dest == COPY_NEW_FE in COPY FROM */ ! bool fe_eof; /* true if detected end of copy data */ EolType eol_type; /* EOL type of input */ int file_encoding; /* file or remote side's character encoding */ bool need_transcoding; /* file encoding diff from server? */ --- 114,122 ---- FILE *copy_file; /* used if copy_dest == COPY_FILE */ StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for * dest == COPY_NEW_FE in COPY FROM */ ! bool is_copy_from; /* COPY TO, or COPY FROM? */ ! bool reached_eof; /* true if we read to end of copy data (not ! * all copy_dest types maintain this) */ EolType eol_type; /* EOL type of input */ int file_encoding; /* file or remote side's character encoding */ bool need_transcoding; /* file encoding diff from server? */ *************** CopyGetData(CopyState cstate, void *data *** 575,580 **** --- 577,584 ---- ereport(ERROR, (errcode_for_file_access(), errmsg("could not read from COPY file: %m"))); + if (bytesread == 0) + cstate->reached_eof = true; break; case COPY_OLD_FE: *************** CopyGetData(CopyState cstate, void *data *** 595,601 **** bytesread = minread; break; case COPY_NEW_FE: ! while (maxread > 0 && bytesread < minread && !cstate->fe_eof) { int avail; --- 599,605 ---- bytesread = minread; break; case COPY_NEW_FE: ! while (maxread > 0 && bytesread < minread && !cstate->reached_eof) { int avail; *************** CopyGetData(CopyState cstate, void *data *** 623,629 **** break; case 'c': /* CopyDone */ /* COPY IN correctly terminated by frontend */ ! cstate->fe_eof = true; return bytesread; case 'f': /* CopyFail */ ereport(ERROR, --- 627,633 ---- break; case 'c': /* CopyDone */ /* COPY IN correctly terminated by frontend */ ! cstate->reached_eof = true; return bytesread; case 'f': /* CopyFail */ ereport(ERROR, *************** ProcessCopyOptions(ParseState *pstate, *** 1050,1055 **** --- 1054,1061 ---- if (cstate == NULL) cstate = (CopyStateData *) palloc0(sizeof(CopyStateData)); + cstate->is_copy_from = is_from; + cstate->file_encoding = -1; /* Extract options from the statement node tree */ *************** ClosePipeToProgram(CopyState cstate) *** 1727,1737 **** --- 1733,1755 ---- (errcode_for_file_access(), errmsg("could not close pipe to external command: %m"))); else if (pclose_rc != 0) + { + /* + * If we ended a COPY FROM PROGRAM before reaching EOF, then it's + * expectable for the called program to fail with SIGPIPE, and we + * should not report that as an error. Otherwise, SIGPIPE indicates a + * problem. + */ + if (cstate->is_copy_from && !cstate->reached_eof && + WIFSIGNALED(pclose_rc) && WTERMSIG(pclose_rc) == SIGPIPE) + return; + ereport(ERROR, (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION), errmsg("program \"%s\" failed", cstate->filename), errdetail_internal("%s", wait_result_to_str(pclose_rc)))); + } } /* *************** BeginCopyFrom(ParseState *pstate, *** 3169,3175 **** oldcontext = MemoryContextSwitchTo(cstate->copycontext); /* Initialize state variables */ ! cstate->fe_eof = false; cstate->eol_type = EOL_UNKNOWN; cstate->cur_relname = RelationGetRelationName(cstate->rel); cstate->cur_lineno = 0; --- 3187,3193 ---- oldcontext = MemoryContextSwitchTo(cstate->copycontext); /* Initialize state variables */ ! cstate->reached_eof = false; cstate->eol_type = EOL_UNKNOWN; cstate->cur_relname = RelationGetRelationName(cstate->rel); cstate->cur_lineno = 0; diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 2d75773..6bc0244 100644 *** a/src/backend/storage/file/fd.c --- b/src/backend/storage/file/fd.c *************** OpenTransientFilePerm(const char *fileNa *** 2271,2281 **** --- 2271,2287 ---- * Routines that want to initiate a pipe stream should use OpenPipeStream * rather than plain popen(). This lets fd.c deal with freeing FDs if * necessary. When done, call ClosePipeStream rather than pclose. + * + * This function also ensures that the popen'd program is run with default + * SIGPIPE and SIGUSR2 processing, rather than the SIG_IGN settings the + * backend normally uses. This ensures desirable response to, eg, closing + * a read pipe early. */ FILE * OpenPipeStream(const char *command, const char *mode) { FILE *file; + int save_errno; DO_DB(elog(LOG, "OpenPipeStream: Allocated %d (%s)", numAllocatedDescs, command)); *************** OpenPipeStream(const char *command, cons *** 2293,2300 **** TryAgain: fflush(stdout); fflush(stderr); errno = 0; ! if ((file = popen(command, mode)) != NULL) { AllocateDesc *desc = &allocatedDescs[numAllocatedDescs]; --- 2299,2313 ---- TryAgain: fflush(stdout); fflush(stderr); + pqsignal(SIGPIPE, SIG_DFL); + pqsignal(SIGUSR2, SIG_DFL); errno = 0; ! file = popen(command, mode); ! save_errno = errno; ! pqsignal(SIGPIPE, SIG_IGN); ! pqsignal(SIGUSR2, SIG_IGN); ! errno = save_errno; ! if (file != NULL) { AllocateDesc *desc = &allocatedDescs[numAllocatedDescs]; *************** TryAgain: *** 2307,2314 **** if (errno == EMFILE || errno == ENFILE) { - int save_errno = errno; - ereport(LOG, (errcode(ERRCODE_INSUFFICIENT_RESOURCES), errmsg("out of file descriptors: %m; release and retry"))); --- 2320,2325 ---- diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index a3b9757..cfa416a 100644 *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *************** PostgresMain(int argc, char *argv[], *** 3769,3774 **** --- 3769,3778 ---- * handler in the postmaster to reserve the signal. (Of course, this isn't * an issue for signals that are locally generated, such as SIGALRM and * SIGPIPE.) + * + * Also note: signals that are set to SIG_IGN here should be reset in + * OpenPipeStream, so that exec'd programs see a standard signal + * environment. */ if (am_walsender) WalSndSignals();
I wrote: > I'm not quite sure whether the reached_eof test should be > "if (bytesread == 0)" or "if (bytesread < maxread)". The former > seems more certain to indicate real EOF; are there other cases where > the fread might return a short read? On the other hand, if we > support in-band EOF indicators (\. for instance) then we might > stop without having made an extra call to CopyGetData that would > see bytesread == 0. On the third hand, if we do do that it's > not quite clear to me whether SIGPIPE is ignorable or not. After still further thought, it seems like "if (bytesread == 0)" is the right way to proceed. That protects us against incorrectly setting reached_eof if the pipe is being run in unbuffered or line-buffered mode. (Which copy.c doesn't do, at the moment, but I'd just as soon this code didn't need to assume that. Also, I'm not 100% convinced that Windows or other platforms might not act that way.) In the case where we terminate early because we saw an in-band EOF marker, we would also not set reached_eof, and again that seems like a good outcome: if we see SIGPIPE it must mean that the program kept sending data after the EOF marker, and that seems like a good thing to complain about. (It's only guaranteed to fail if the program sends more than the current pipe buffer-ful, but I don't feel a need to add extra code to check for nonempty buffer contents as we exit.) So I think this version is probably good, although maybe it could use an additional comment explaining the above reasoning. regards, tom lane
I wrote: > After still further thought, it seems like "if (bytesread == 0)" > is the right way to proceed. That protects us against incorrectly > setting reached_eof if the pipe is being run in unbuffered or > line-buffered mode. (Which copy.c doesn't do, at the moment, > but I'd just as soon this code didn't need to assume that. > Also, I'm not 100% convinced that Windows or other platforms > might not act that way.) In the case where we terminate early > because we saw an in-band EOF marker, we would also not set reached_eof, > and again that seems like a good outcome: if we see SIGPIPE it > must mean that the program kept sending data after the EOF marker, > and that seems like a good thing to complain about. (It's only > guaranteed to fail if the program sends more than the current pipe > buffer-ful, but I don't feel a need to add extra code to check for > nonempty buffer contents as we exit.) Oh, wait, that last bit is backwards: if we see an in-band EOF mark, and as a consequence exit without having set reached_eof, then the exit code will think that SIGPIPE is ignorable. So transmitting more data after an EOF mark will not be complained of, whether it's within the same bufferload or not. Still, I can't get very excited about that. Potentially we could improve it by having the places that recognize EOF marks set reached_eof, but I'm unconvinced that it's worth the trouble. I'm thinking that it's better to err in the direction of reporting SIGPIPE less often not more often, considering that up to now we've never reported it at all and there've been so few complaints. regards, tom lane
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT
From
Kyotaro HORIGUCHI
Date:
At Sat, 17 Nov 2018 11:14:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <23489.1542471289@sss.pgh.pa.us> > I wrote: > > After still further thought, it seems like "if (bytesread == 0)" > > is the right way to proceed. That protects us against incorrectly > > setting reached_eof if the pipe is being run in unbuffered or > > line-buffered mode. (Which copy.c doesn't do, at the moment, > > but I'd just as soon this code didn't need to assume that. > > Also, I'm not 100% convinced that Windows or other platforms > > might not act that way.) In the case where we terminate early > > because we saw an in-band EOF marker, we would also not set reached_eof, > > and again that seems like a good outcome: if we see SIGPIPE it > > must mean that the program kept sending data after the EOF marker, > > and that seems like a good thing to complain about. (It's only > > guaranteed to fail if the program sends more than the current pipe > > buffer-ful, but I don't feel a need to add extra code to check for > > nonempty buffer contents as we exit.) > > Oh, wait, that last bit is backwards: if we see an in-band EOF mark, > and as a consequence exit without having set reached_eof, then the > exit code will think that SIGPIPE is ignorable. So transmitting > more data after an EOF mark will not be complained of, whether > it's within the same bufferload or not. > > Still, I can't get very excited about that. Potentially we could > improve it by having the places that recognize EOF marks set > reached_eof, but I'm unconvinced that it's worth the trouble. > I'm thinking that it's better to err in the direction of reporting > SIGPIPE less often not more often, considering that up to now > we've never reported it at all and there've been so few complaints. My opinion here is when we execute an external program on the other end of a pipe, we should behave as loosely (tolerantly) as ordinary un*x programs are expected. If we're connecting to another PostgreSQL server, we should be stringent as the current behavior. In other words, we don't need to change the behavior of other than the COPY_FILE case, but ClosePipeToProgram shouldn't complain not only for SIGPIPE but any kinds of error. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #15449: file_fdw using program cause exit code error whenusing LIMIT
From
Etsuro Fujita
Date:
(2018/11/17 8:10), Tom Lane wrote: > I wrote: >> I'm not quite sure whether the reached_eof test should be >> "if (bytesread == 0)" or "if (bytesread< maxread)". The former >> seems more certain to indicate real EOF; are there other cases where >> the fread might return a short read? On the other hand, if we >> support in-band EOF indicators (\. for instance) then we might >> stop without having made an extra call to CopyGetData that would >> see bytesread == 0. On the third hand, if we do do that it's >> not quite clear to me whether SIGPIPE is ignorable or not. > > After still further thought, it seems like "if (bytesread == 0)" > is the right way to proceed. That protects us against incorrectly > setting reached_eof if the pipe is being run in unbuffered or > line-buffered mode. (Which copy.c doesn't do, at the moment, > but I'd just as soon this code didn't need to assume that. > Also, I'm not 100% convinced that Windows or other platforms > might not act that way.) > So I think this version is probably good, although maybe it could > use an additional comment explaining the above reasoning. I agree that it's better to keep the BeginCopyFrom API as-is. Also, I think your version would handle SIGPIPE in COPY FROM PROGRAM more properly than what I proposed. So, +1 from me. Thanks! Best regards, Etsuro Fujita
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: > I agree that it's better to keep the BeginCopyFrom API as-is. Also, I > think your version would handle SIGPIPE in COPY FROM PROGRAM more > properly than what I proposed. So, +1 from me. Thanks for reviewing! I've pushed it now, though at the last minute I reconsidered resetting SIGUSR2 as my previous patch did. The trouble with resetting that is that it results in a small window where receipt of SIGUSR2 would result in backend termination, which we surely don't want. Now, it doesn't look to me like the postmaster will ever send SIGUSR2 to ordinary backends, but it wouldn't be terribly surprising if someone makes a change that relies on the expectation of SIGUSR2 being SIG_IGN'd by backends. I don't see any real upside to resetting SIGUSR2 for the called program that would justify taking any risk there. (Note that this concern doesn't apply to SIGPIPE since we only expect that to be raised synchronously, ie during a write.) As Kyotaro-san said upthread, it's odd that exec() provides no way to reset all the handlers to SIG_DFL on the child side. But since it doesn't, we're not really able to do much more than this. regards, tom lane