Re: PQexec() hangs on OOM - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: PQexec() hangs on OOM
Date
Msg-id CAB7nPqRWbL15EFVQdEzcgR4aUrTCtkCJKvUoRkQysgbCjEezKQ@mail.gmail.com
Whole thread Raw
In response to Re: PQexec() hangs on OOM  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: PQexec() hangs on OOM  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs


On Fri, Sep 4, 2015 at 2:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 31, 2015 at 12:55 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sat, Aug 29, 2015 at 10:35 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

One thing that looks slightly non-elegant about this approach is that
new async status (PGASYNC_FATAL) is used to deal with errors only
in few paths in function pqParseInput3() and not-other paths which should
be okay if there is no other better way.

I considered using this logic in more code paths, but I kept focused on having a not-too-invasive back-patchable patch as the refactoring that would occur may be too heavy for what is usually pushed to the stable branches. Do you think it would be better to get a cleaner refactoring patch first and globalize the approach with PGASYNC_FATAL?

No wait, lets first try to see if this the best fix for Copy path.

Sure. Thanks. (We could remove some dead code of libpq though).
 
The problem with the current approach is that even if we are able to
receive error, it doesn't completely clear the previous command
execution.  As an example, if using debugger, I make getCopyStart()
return OOM error, then the next command execution fails.

postgres=# copy t1 from stdin;
out of memory
postgres=# copy t1 from stdin;
another command is already in progress

Oops. Indeed. Using this new flag is obviously not a good idea because the connection remains in a state that it considers as PGASYNC_FATAL and actually it may not even finish to consume the remaining messages. Perhaps I was too sleepy last time I tested that, well...


It was actually a far cleaner approach to have a failure flag to decide if based on the async state process should move on to a failure code path or not.

I think you have already tried, but it seems to me that if we can handle
it based on result status, that would be better, rather than introducing
new state, but anyway lets first try to sort out above reported problem.

So, looking at that again with a fresh eye, I noticed that getCopyDataMessage has a similar error handling when it fails on pqCheckInBufferSpace. So looking at what I added, it seems that I am trying to duplicate the protocol error handling even if everything is already present, so that's really overdoing it. Hence, let's simply drop this idea of new flag PGASYNC_FATAL and instead treat the OOM as a sync handling error with the server, like in the patch attached.

When emulating an OOM with this patch, I am getting this error instead of the infinite loop, which looks acceptable to me:
=# copy aa to stdout;
out of memory
lost synchronization with server: got message type "H", length 5
The connection to the server was lost. Attempting reset: Succeeded.

The extra message handling I have added at the end of getCopyStart and getParamDescriptions still looks more adapted to me when a failure happens, so I kept it.
Regards,
--
Michael
Attachment

pgsql-bugs by date:

Previous
From: Amit Kapila
Date:
Subject: Re: PQexec() hangs on OOM
Next
From: Teodor Sigaev
Date:
Subject: Re: BUG #13440: unaccent does not remove all diacritics