Re: PQexec() hangs on OOM - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: PQexec() hangs on OOM
Date
Msg-id CAB7nPqR8UtXpTOkKOuxhhXM8PzcPa4dQDKE6rYx+tZhPrQQNWg@mail.gmail.com
Whole thread Raw
In response to PQexec() hangs on OOM  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: PQexec() hangs on OOM  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-bugs
On Tue, Nov 25, 2014 at 10:15 PM, Heikki Linnakangas wrote:
> When that malloc() returns NULL, parseInput returns without reading any
> input. PQgetResult() takes that as a sign that it needs to read more input
> from the server, before calling parseInput() again, and that read never
> returns because there is no more data coming from the server.
>
> I don't have any immediate plans to fix this, or to continue testing this.
> There might well be more cases like this. Patches are welcome.
>
> Attached is the little wrapper library I used to test this. testlibpq
hangs
> when run with MALLOC_FAIL_AT=110. It's really quick & dirty, sorry about
> that. I'm sure there are more sophisticated tools to do similar testing
out
> there somewhere..

With MALLOC_FAIL_AT=84, 86, 92, the backtrace just before the malloc
creating the OOM looks like that:
#0  0x00007f76316964d0 in __poll_nocancel () from /usr/lib/libc.so.6
#1  0x00007f7631971577 in pqSocketPoll (sock=4, forRead=1, forWrite=0,
end_time=-1) at fe-misc.c:1133
#2  0x00007f7631971461 in pqSocketCheck (conn=0x1495040, forRead=1,
forWrite=0, end_time=-1) at fe-misc.c:1075
#3  0x00007f76319712f8 in pqWaitTimed (forRead=1, forWrite=0,
conn=0x1495040, finish_time=-1) at fe-misc.c:1007
#4  0x00007f76319712ca in pqWait (forRead=1, forWrite=0, conn=0x1495040) at
fe-misc.c:990
#5  0x00007f763196d21d in PQgetResult (conn=0x1495040) at fe-exec.c:1711
#6  0x00007f763196d913 in PQexecFinish (conn=0x1495040) at fe-exec.c:1997
#7  0x00007f763196d576 in PQexec (conn=0x1495040, query=0x400ef2 "BEGIN")
at fe-exec.c:1831
#8  0x0000000000400bd8 in main (argc=1, argv=0x7ffd8a2644c8) at
testlibpq.c:5

In this case, as what happens is an OOM related to the allocation of
PGResult, I think that we had better store a status flag in PGConn related
to this OOM, as PGConn->errorMessage may not be empty to take care of the
ambiguity that PGResult == NULL does not necessarily mean wait for more
results. Something like PGResultStatus to avoid any API incompatibility.
Thoughts?

Looking at the other malloc() calls of llibpq, we do not really have this
ambiguity. For example if makeEmptyPGconn() == NULL means OOM. I am
guessing from the code as well that PQmakeEmptyPGresult() == NULL means
OOM, so the error handling problem comes from parseInput and its underlings.

Also in pqSaveParameterStatus, shouldn't we have a better OOM handling
there as well for pstatus?
Regards,
--
Michael

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #12917: C program created by ecpg core dumped due to "varcharsize * offset"
Next
From: Michael Paquier
Date:
Subject: Re: PQexec() hangs on OOM