Re: OOM in libpq and infinite loop with getCopyStart() - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: OOM in libpq and infinite loop with getCopyStart()
Date
Msg-id CAB7nPqSsZjVfeSueJryjfa9na_itEYXiEGr++J+QQJtBT97xNA@mail.gmail.com
Whole thread Raw
In response to Re: OOM in libpq and infinite loop with getCopyStart()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: OOM in libpq and infinite loop with getCopyStart()  (Amit Kapila <amit.kapila16@gmail.com>)
Re: OOM in libpq and infinite loop with getCopyStart()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Apr 1, 2016 at 12:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I thought about this patch a bit more...
>
> When I first looked at the patch, I didn't believe that it worked at
> all: it failed to return a PGRES_COPY_XXX result to the application,
> and yet nonetheless switched libpq's asyncStatus to PGASYNC_COPY_XXX?
> Wouldn't things be hopelessly confused?  I tried it out and saw that
> indeed it seemed to work in psql, and after tracing through that found
> that psql has no idea what's going on, but when psql issues its next
> command PQexecStart manages to get us out of the copy state (barring
> more OOM failures, anyway).  That seems a bit accidental, though,
> and for sure it wasn't documented in the patch.

From the patch, that's mentioned:
+    * Stop if we are in copy mode and error has occurred, the pending results
+    * will be discarded during next execution in PQexecStart.

> In any case, having
> libpq in that state would have side-effects on how it responds to
> application actions, which is the core of my complaint about
> PQgetResult behavior.  Those side effects only make sense if the app
> knows (or should have known) that the connection is in COPY state.

OK. So we'd want to avoid relying on subsequent PQ* calls and return
into a clean state once the OOM shows up.

> I think it might be possible to get saner behavior if we invent a
> set of new asyncStatus values PGASYNC_COPY_XXX_FAILED, having the
> semantics that we got the relevant CopyStart message from the backend
> but were unable to report it to the application.  PQexecStart would
> treat these the same as the corresponding normal PGASYNC_COPY_XXX
> states and do what's needful to get out of the copy mode.  But in
> PQgetResult, we would *not* treat those states as a reason to return a
> PGRES_COPY_XXX result to the application.  Probably we'd just return
> NULL, with the implication being "we're done so far as you're concerned,
> please give a new command".  I might be underthinking it though.

I raised something like that a couple of months back, based on the
fact that PGASYNC_* are flags internal to libpq on frontend:
http://www.postgresql.org/message-id/CAB7nPqTaeDwpRCqAk-cZZxwWaPdEYr1gUg0vwvG7RhzhmAJy3Q@mail.gmail.com
In this case what was debated is PGASYNC_FATAL... Something that is
quite different with your proposal is that we return NULL or something
else, and it did not discard any pending data from the client. For
applications running PQgetResult in a loop that would work.

> Anyway, the short of my review is that we need more clarity of thought
> about what state libpq is in after a failure like this, and what that
> state looks like to the application, and how that should affect how
> libpq reacts to application calls.

Hm. I would have thought that returning to the caller a result
generated by PQmakeEmptyPGresult(PGRES_FATAL_ERROR) with an error
string marked "out of memory" would be the best thing to do instead of
NULL... If getCopyStart() complains because of a lack of data, we loop
back into pqParseInput3 to try again. If an OOM happens, we still loop
into pqParseInput3 but all the pending data received from client needs
to be discarded so as we don't rely on the next calls of PQexecStart
to do the cleanup, once all the data is received we get out of the
loop and generate the result with PGRES_FATAL_ERROR. Does that match
what you have in mind?
-- 
Michael



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches
Next
From: Craig Ringer
Date:
Subject: Re: Timeline following for logical slots