Re: PQexec() hangs on OOM - Mailing list pgsql-bugs
From | Michael Paquier |
---|---|
Subject | Re: PQexec() hangs on OOM |
Date | |
Msg-id | CAB7nPqRWbL15EFVQdEzcgR4aUrTCtkCJKvUoRkQysgbCjEezKQ@mail.gmail.com Whole thread Raw |
In response to | Re: PQexec() hangs on OOM (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: PQexec() hangs on OOM
|
List | pgsql-bugs |
On Fri, Sep 4, 2015 at 2:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Aug 31, 2015 at 12:55 PM, Michael Paquier <michael.paquier@gmail.com> wrote:On Sat, Aug 29, 2015 at 10:35 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:One thing that looks slightly non-elegant about this approach is thatnew async status (PGASYNC_FATAL) is used to deal with errors onlyin few paths in function pqParseInput3() and not-other paths which shouldbe okay if there is no other better way.
I considered using this logic in more code paths, but I kept focused on having a not-too-invasive back-patchable patch as the refactoring that would occur may be too heavy for what is usually pushed to the stable branches. Do you think it would be better to get a cleaner refactoring patch first and globalize the approach with PGASYNC_FATAL?No wait, lets first try to see if this the best fix for Copy path.
Sure. Thanks. (We could remove some dead code of libpq though).
The problem with the current approach is that even if we are able toreceive error, it doesn't completely clear the previous commandexecution. As an example, if using debugger, I make getCopyStart()return OOM error, then the next command execution fails.postgres=# copy t1 from stdin;out of memorypostgres=# copy t1 from stdin;another command is already in progress
Oops. Indeed. Using this new flag is obviously not a good idea because the connection remains in a state that it considers as PGASYNC_FATAL and actually it may not even finish to consume the remaining messages. Perhaps I was too sleepy last time I tested that, well...
It was actually a far cleaner approach to have a failure flag to decide if based on the async state process should move on to a failure code path or not.I think you have already tried, but it seems to me that if we can handleit based on result status, that would be better, rather than introducingnew state, but anyway lets first try to sort out above reported problem.
So, looking at that again with a fresh eye, I noticed that getCopyDataMessage has a similar error handling when it fails on pqCheckInBufferSpace. So looking at what I added, it seems that I am trying to duplicate the protocol error handling even if everything is already present, so that's really overdoing it. Hence, let's simply drop this idea of new flag PGASYNC_FATAL and instead treat the OOM as a sync handling error with the server, like in the patch attached.
When emulating an OOM with this patch, I am getting this error instead of the infinite loop, which looks acceptable to me:
=# copy aa to stdout;
out of memory
lost synchronization with server: got message type "H", length 5
The connection to the server was lost. Attempting reset: Succeeded.
When emulating an OOM with this patch, I am getting this error instead of the infinite loop, which looks acceptable to me:
=# copy aa to stdout;
out of memory
lost synchronization with server: got message type "H", length 5
The connection to the server was lost. Attempting reset: Succeeded.
The extra message handling I have added at the end of getCopyStart and getParamDescriptions still looks more adapted to me when a failure happens, so I kept it.
Regards,
--
Michael
Attachment
pgsql-bugs by date: