Re: OOM in libpq and infinite loop with getCopyStart() - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: OOM in libpq and infinite loop with getCopyStart()
Date
Msg-id CAB7nPqRVr=c6pM8tX849io1+CcqvCq2+8X5skJvh26+8xV_5tQ@mail.gmail.com
Whole thread Raw
In response to Re: OOM in libpq and infinite loop with getCopyStart()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: OOM in libpq and infinite loop with getCopyStart()  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Sat, Apr 2, 2016 at 12:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> So the core of my complaint is that we need to fix things so that, whether
>> or not we are able to create the PGRES_FATAL_ERROR PGresult (and we'd
>> better consider the behavior when we cannot), ...
>
> BTW, the real Achilles' heel of any attempt to ensure sane behavior at
> the OOM limit is this possibility of being unable to create a PGresult
> with which to inform the client that we failed.
>
> I wonder if we could make things better by keeping around an emergency
> backup PGresult struct.  Something along these lines:
>
> 1. Add a field "PGresult *emergency_result" to PGconn.
>
> 2. At the very beginning of any PGresult-returning libpq function, check
> to see if we have an emergency_result, and if not make one, ensuring
> there's room in it for a reasonable-size error message; or maybe even
> preload it with "out of memory" if we assume that's the only condition
> it'll ever be used for.  If malloc fails at this point, just return NULL
> without doing anything or changing any libpq state.  (Since a NULL result
> is documented as possibly caused by OOM, this isn't violating any API.)
>
> 3. Subsequent operations never touch the emergency_result unless we're
> up against an OOM, but it can be used to return a failure indication
> to the client so long as we leave libpq in a state where additional
> calls to PQgetResult would return NULL.
>
> Basically this shifts the point where an unreportable OOM could happen
> from somewhere in the depths of libpq to the very start of an operation,
> where we're presumably in a clean state and OOM failure doesn't leave
> us with a mess we can't clean up.

I have moved this patch to next CF for the time being. As that's a
legit bug and not a feature, that should be fine to pursue work on
this item even if this CF ends.
-- 
Michael



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Support for N synchronous standby servers - take 2
Next
From: Michael Paquier
Date:
Subject: Re: WAL logging problem in 9.4.3?