libpq async command processing methods are difficult to use with edge-triggered epoll - Mailing list pgsql-hackers

From Alex Robinson
Subject libpq async command processing methods are difficult to use with edge-triggered epoll
Date
Msg-id CAJN6vNe4D18+nzrB_zkE2uJrZAsB_fK-g5Ht2KQFYWRxpW67Hw@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi,

I've been using libpq to access postgres from within a system that uses an edge-triggered epoll in its event-loop. The instructions on https://www.postgresql.org/docs/current/libpq-async.html are pretty good, but I've run into a bit of an edge case that would be much easier to handle if the interfaces exposed by libpq were a little more ergonomic. If it makes a difference, I'm running with single-row mode enabled on the client.

Specifically, PQconsumeInput returns no information that allows the caller to distinguish whether there might be more data available to be read on the network socket if PQconsumeInput were to be called again (it just returns 0 on error and 1 otherwise), and PQisBusy doesn't return false until a full row's worth of data has been read by PQconsumeInput. This is a bad combination if a result set contains rows larger than PGconn's default buffer size, since calling PQconsumeInput followed by PQisBusy can suggest that we need to wait on the socket's readability even if there's already more data available to be read on the socket.

If more detail helps, here's a slightly more detailed summary based on my trawling through the code:

* The recommended pattern for processing responses to async commands is to wait until the socket is readable, then call PQconsumeInput, then check PQisBusy. If PQisBusy returns true, the docs suggest waiting on the socket again.
* When PQconsumeInput is called, it doubles the PGconn's buffer size if less than 8k of space is unused in it.
* PQconsumeInput will then read either until its remaining buffer space fills up or until the socket has no more data in it ready to be read.
* If the buffer fills up, PQconsumeInput will return to the caller even if there's still more data to be read
* PQconsumeInput's return value only indicates whether or not there was an error, not whether any data was read.
* PQisBusy will return true unless the buffer contains an entire row; it does not actually check the status of the socket.
* If the PGconn's buffer wasn't large enough to fit an entire row in it when you called PQconsumeInput, PQisBusy will return true, suggesting that you ought to wait on socket readability, when really the right thing to do would be to call PQconsumeInput again (potentially multiple times) until the buffer finally grows to be large enough to contain the whole row before PQisBusy can return false.

This can be worked around by making a poll() syscall on the socket without timeout 0 before handing the socket off to epoll, but libpq could make this case easier to deal with with a slightly different interface.

The function that PQconsumeInput uses internally, pqReadData, has a much more helpful interface -- it returns 1 if it successfully read at least 1 byte, 0 if no data was available, or -1 if there was an error. If PQconsumeInput had a similar range of return values, this ability to distinguish between whether we read data or not would be enough information to know whether we ought to call PQconsumeInput again when PQisBusy returns true.

As for what we can realistically do about it, I imagine it may be disruptive to add an additional possible return value to PQconsumeInput (e.g. return 1 if at least one byte was read or 2 if no data was available). And I'm not sure edge-triggered polling is a use case the community cares enough about to justify adding a separate method that does basically the same thing as PQconsumeInput but with more information conveyed by its return value.

So maybe there isn't any change worth making here, but I wanted to at least mention the problem, since it was pretty disappointing to me that calling PQconsumeInput followed by PQisBusy and then waiting on readability occasionally caused a hang on large rows. I'd expect other developers using edge-triggered epoll to run into problems like this too. If there's a backwards-compatible way of making it less error prone, it'd be nice to do so.

Alex

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [sqlsmith] Failed assertion during partition pruning
Next
From: Mark Dilger
Date:
Subject: room for improvement in amcheck btree checking?