Re: PQexec() hangs on OOM - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: PQexec() hangs on OOM
Date
Msg-id 559AB74A.6070602@iki.fi
Whole thread Raw
In response to Re: PQexec() hangs on OOM  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: PQexec() hangs on OOM
List pgsql-bugs
On 07/04/2015 03:40 PM, Michael Paquier wrote:
> On Sat, Jul 4, 2015 at 1:32 AM, Heikki Linnakangas wrote:
>> In short, the idea of returning a status code from parseInput(), instead of
>> just dealing with the error, was a bad one. Sorry I didn't have this
>> epiphany earlier...
>>
>> I came up with the attached. It fixes the few cases where we were currently
>> returning "need more data" when OOM happened, to deal with the OOM error
>> instead, by setting conn->errorMessage. How does this look to you?
>
> So this relies on the fact that when errorMessage is set subsequent
> messages are ignored, right? This looks neat.
>
> At the bottom of getAnotherTuple() in fe-protocol2.c if
> PQmakeEmptyPGresult returns NULL, shouldn't the error message be
> enforced to "out of memory"? It is an error code path, so an error
> will be set, but perhaps the message is incorrect.
>
> -       if (!res->errMsg)
> -               goto failure;
> +       if (res)
> +       {
> +               res->resultStatus = isError ? PGRES_FATAL_ERROR :
> PGRES_NONFATAL_ERROR;
> +               res->errMsg = pqResultStrdup(res, workBuf.data);
> +       }
> If res->errMsg is NULL, we may have a crash later on when calling
> appendPQExpBufferStr on this field. I think that we should add an
> additional check on it.

Yeah, added.

I tested the various error paths this patch modifies with a debugger,
and found out that the getCopyStart changes were not doing much good.
The caller still waits for the COPY-IN result, so it still gets stuck.
So I left out that change.

The getParamDescriptions() changes were slightly broken. It didn't read
the whole input message with pqGetInt() etc., so pqParseInput3() threw
the "message contents do not agree with length in message type" error. I
started fixing that, by changing the error handling in that function to
be more like that in getRowDescriptions(), but then I realized that all
the EOF return cases in the pqParseInput3() subroutines are actually
dead code. pqParseInput3() always reads the whole message, before
passing it on to the right subroutine. That was documented for
getRowDescriptions() and getAnotherTuple(), but the rest of the
functions were more like the protocol version 2 code, prepared to deal
with incomplete messages. I think it would be good to refactor that,
removing the EOF return cases altogether. So I left out that change for
now as well.

That left me with the attached patch. It doesn't handle the
getParamDescription() case, nor the getCopyStart() case, but it's a
start. We don't necessarily need to fix everything in one go. Does this
look correct to you, as far as it goes?

- Heikki


Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #13488: Wrong netmask calculation
Next
From: olivier.gosseaume@free.fr
Date:
Subject: Re: BUG #13484: Performance problem with logical decoding