Re: OOM in libpq and infinite loop with getCopyStart() - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: OOM in libpq and infinite loop with getCopyStart() |
Date | |
Msg-id | 23387.1459522980@sss.pgh.pa.us Whole thread Raw |
In response to | Re: OOM in libpq and infinite loop with getCopyStart() (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: OOM in libpq and infinite loop with getCopyStart()
|
List | pgsql-hackers |
Michael Paquier <michael.paquier@gmail.com> writes: > On Fri, Apr 1, 2016 at 12:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Anyway, the short of my review is that we need more clarity of thought >> about what state libpq is in after a failure like this, and what that >> state looks like to the application, and how that should affect how >> libpq reacts to application calls. > Hm. I would have thought that returning to the caller a result > generated by PQmakeEmptyPGresult(PGRES_FATAL_ERROR) with an error > string marked "out of memory" would be the best thing to do instead of > NULL... Sure, we're trying to do that, but the question is what happens afterward. An app that is following the documented usage pattern for PQgetResult will call it again, expecting to get a NULL, but as the patch stands it might get a PGRES_COPY_XXX instead. What drove me to decide the patch wasn't committable was realizing that Robert was right upthread: the change you propose in libpqwalreceiver.c is not a bug fix. That code is doing exactly what it's supposed to do per the libpq.sgml docs, namely iterate till it gets a NULL.[1] If we have to change it, that means (a) we've changed the API spec for PQgetResult and (b) every other existing caller of PQgetResult would need to change similarly, or else risk corner-case bugs about as bad as the one we're trying to fix. So the core of my complaint is that we need to fix things so that, whether or not we are able to create the PGRES_FATAL_ERROR PGresult (and we'd better consider the behavior when we cannot), we need to leave libpq in a state where subsequent calls to PQgetResult will return NULL. > If getCopyStart() complains because of a lack of data, we loop > back into pqParseInput3 to try again. If an OOM happens, we still loop > into pqParseInput3 but all the pending data received from client needs > to be discarded so as we don't rely on the next calls of PQexecStart > to do the cleanup, once all the data is received we get out of the > loop and generate the result with PGRES_FATAL_ERROR. Does that match > what you have in mind? If we could, that would be OK, but (a) you're only considering the COPY OUT case not the COPY IN case, and (b) a persistent OOM condition could very well prevent us from ever completing the copy. For example, some COPY DATA messages might be too big for our current buffer, but we won't be able to enlarge the buffer. I'm inclined to think that reporting the OOM to the client is the highest-priority goal, and that attempting to clean up during the next PQexec/PQsendQuery is a reasonable design. If we fail to do that, the client won't really understand that it's a cleanup failure and not a failure to issue the new query, but it's not clear why it needs to know the difference. regards, tom lane [1] Well, almost. Really, after dealing with a PGRES_COPY_XXX result it ought to go back to pumping PQgetResult. It doesn't, but that's an expected client behavior, and cleaning up after that is exactly what PQexecStart is intended for.
pgsql-hackers by date: