On 11/08/2025 09:58, Jelte Fennema-Nio wrote:
> On Fri, 8 Aug 2025 at 17:50, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I think any of the other options would be
>> better. There's no guarantee that more data will ever arrive, the
>> connection might be used just to wait for the notification.
>
> Yeah, I think that's the important thing to realize here. The "try
> again later" only makes sense if we need more data to try again. If we
> don't then we now start waiting on data that might never come.
Here's a new set of patches, to disconnect on OOM instead of hanging or
silently discarding messages:
v4-0001-libpq-Don-t-hang-on-out-of-memory.patch
This is a very minimal fix for the "hang on OOM" issue. It changes the
behavior to silently skip over the message, i.e. discard the async
notification, or continue without cancellation key.
I think we should backpatch this.
v4-0002-libpq-Handle-OOMs-by-disconnecting-instead-of-sil.patch
This changes the behavior of all of the problematic places where we
silently discarded things on OOM to disconnect instead. That is, when
processing a Notify, BackendKeyData, or ParameterStatus message.
Now that I look at this again, we should probably forget about patch #1
and commit and backpatch this straight away. It's a little more changes
than patch #1, but not that much.
Note that there are still places where we discard things on OOM, like
when parsing an error 'E' message. Those are a little iffy too, but
they're less problematic because you still get an error, and the error
is clearly associated with a query, unlike these Notify, BackendData and
ParameterStatus messages.
v4-0003-libpq-Be-strict-about-cancel-key-lengths.patch
Your patch to add more checks, rebased over the first two.
- Heikki