Re: PQexec() hangs on OOM - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: PQexec() hangs on OOM |
Date | |
Msg-id | 55FC501A.5000409@iki.fi Whole thread Raw |
In response to | Re: PQexec() hangs on OOM (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: PQexec() hangs on OOM
Re: PQexec() hangs on OOM |
List | pgsql-bugs |
On 09/14/2015 04:36 AM, Michael Paquier wrote: > On Sat, Sep 12, 2015 at 6:11 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Mon, Sep 7, 2015 at 1:40 PM, Michael Paquier <michael.paquier@gmail.com> >> wrote: >>> >>> On Sat, Sep 5, 2015 at 9:45 PM, Amit Kapila <amit.kapila16@gmail.com> >>> wrote: >>> >>> Now, to move into the serious things... >>> >>> + /* >>> + * Advance inStart to show that the copy related message has been >>> + * processed. >>> + */ >>> + conn->inStart = conn->inCursor; >>> This change... >>> >>> + /* getCopyStart() moves >>> inStart itself */ >>> conn->asyncStatus = >>> PGASYNC_COPY_IN; >>> - break; >>> + continue; >>> ... And this change are risky for a backpatch. And they actually >>> break the existing replication protocol > > First, many thanks for providing your input! I am really looking > forward into fixing those problems appropriately. > >> Can you please explain how will it break replication protocol? >> I have done the required handling for Copy Both mode as well in attached >> patch similar to what was done for other Copy modes in previous patch. >> Check if you still find it as broken for replication? > > When not enough data has been received from the server, it seems that > we should PQclear the existing result, and leave immediately > pqParseInput3 with the status PGASYNC_BUSY to be able to wait for more > data. Your patch, as-is, is breaking that promise (and this comes from > the first versions of my patches). It seems also that we should not > write an error message on connection in this case to be consistent in > the behavior of back-branches. For the OOM case, we definitely need to > take the error code path though, as your patch is correctly doing, and > what mine legendary failed to do. > > + if (pqGetInt(&result->numAttributes, 2, conn)) > + { > + errmsg = libpq_gettext("extraneous data in COPY start message"); > + goto advance_and_error; > + } > Here an error message is not needed, and this message should have been > formulated as "insufficient data in blah" either way. > >> I have only kept the changes for COPY modes, so that once we settle on >> those, I think similar changes could be done for getParamDescriptions() > > Yeah, I looked into this one as well, resulting in patch 0002 > attached. In this case we have the advantage to only receive data from > the server, so pqPrepareAsyncResult is handling the failure just fine. > Attached as well is the test case I used (similar to previous one, > just extended a bit to report the error message), after getting the > result from PQsendDescribePrepared, PQresultStatus is set correctly on > error and reports "out of memory" to the user. What do you think about > it? Patches 0001 and 0002 look good to me. Two tiny nitpicks: > --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c > +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c > @@ -445,6 +445,7 @@ libpqrcv_PQexec(const char *query) > if (PQresultStatus(lastResult) == PGRES_COPY_IN || > PQresultStatus(lastResult) == PGRES_COPY_OUT || > PQresultStatus(lastResult) == PGRES_COPY_BOTH || > + PQresultStatus(lastResult) == PGRES_FATAL_ERROR || > PQstatus(streamConn) == CONNECTION_BAD) > break; > } Is this really needed? AFAICS, the loop will terminate on PGRES_FATAL_ERROR too. Without this patch, it will always return the last result from the query, whether or not it was an error result. With this patch, it will return the first error encountered, or last non-error result if there is no error. In practice, there is no difference because we don't expect multiple results from the server. > @@ -1343,31 +1348,33 @@ getNotify(PGconn *conn) > * parseInput already read the message type and length. > */ > static int > -getCopyStart(PGconn *conn, ExecStatusType copytype) > +getCopyStart(PGconn *conn, ExecStatusType copytype, int msgLength) > { > PGresult *result; > int nfields; > int i; > + const char *errmsg = NULL; > > result = PQmakeEmptyPGresult(conn, copytype); > if (!result) > - goto failure; > + goto advance_and_error; > > if (pqGetc(&conn->copy_is_binary, conn)) > - goto failure; > + goto not_enough_data; > result->binary = conn->copy_is_binary; > + > /* the next two bytes are the number of fields */ > - if (pqGetInt(&(result->numAttributes), 2, conn)) > - goto failure; > + if (pqGetInt(&result->numAttributes, 2, conn)) > + goto not_enough_data; > nfields = result->numAttributes; > > /* allocate space for the attribute descriptors */ > - if (nfields > 0) > + if (result && nfields > 0) 'result' is never NULL here, so the NULL test is still not needed. The 0003 patch also looks OK to me as far as it goes. I feel that some more refactoring would be in order, though. There are still many different styles for handling a command: sometimes it's done inline in the case-statement, sometimes it calls out to a separate function. Sometimes the function moves inStart, sometimes it does not. On error (e.g. short message), sometimes the function returns 'false', sometimes it sets an error message itself. Patches 0001 and 0002 look small enough that we probably could backpatch them, so that would leave 0003 as a optional master-only cleanup. I spent some time writing a regression test for this. This is pretty difficult to test otherwise, and it's a good candidate for automated, exchaustive testing. The only difficulty is injecting the malloc() failures in a cross-platform way. I ended up with a glibc-only solution that uses glibc's malloc() hooks. Other alternatives I considered were Linux-specific LD_PRELOAD, GNU ld specific -wrap option. One cross-platform solution would be to #define malloc() and friends to wrapper functions, and compile a special version of libpq. Anyway, I came up with the attached test program. It hangs without your fixes, and works with the patches applied. Thoughts? - Heikki
Attachment
pgsql-bugs by date: