Thread: libpq support for NegotiateProtocolVersion
We have the NegotiateProtocolVersion protocol message [0], but libpq doesn't actually handle it. Say I increase the protocol number in libpq: - conn->pversion = PG_PROTOCOL(3, 0); + conn->pversion = PG_PROTOCOL(3, 1); Then I get psql: error: connection to server on socket "/tmp/.s.PGSQL.65432" failed: expected authentication request from server, but received v And the same for a protocol option (_pq_.something). Over in the column encryption patch, I'm proposing to add such a protocol option, and the above is currently the behavior when the server doesn't support it. The attached patch adds explicit handling of this protocol message to libpq. So the output in the above case would then be: psql: error: connection to server on socket "/tmp/.s.PGSQL.65432" failed: protocol version not supported by server: client uses 3.1, server supports 3.0 Or to test a protocol option: @@ -2250,6 +2291,8 @@ build_startup_packet(const PGconn *conn, char *packet, if (conn->client_encoding_initial && conn->client_encoding_initial[0]) ADD_STARTUP_OPTION("client_encoding", conn->client_encoding_initial); + ADD_STARTUP_OPTION("_pq_.foobar", "1"); + /* Add any environment-driven GUC settings needed */ for (next_eo = options; next_eo->envName; next_eo++) { Result: psql: error: connection to server on socket "/tmp/.s.PGSQL.65432" failed: protocol extension not supported by server: _pq_.foobar [0]: https://www.postgresql.org/docs/devel/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-NEGOTIATEPROTOCOLVERSION
Attachment
On Thu, Oct 13, 2022 at 10:33:01AM +0200, Peter Eisentraut wrote: > + if (their_version != conn->pversion) Shouldn't this be 'their_version < conn->pversion'? If the server supports a later protocol than what is requested but not all the requested protocol extensions, I think libpq would still report "protocol version not supported." > + appendPQExpBuffer(&conn->errorMessage, > + libpq_gettext("protocol version not supported by server: client uses %d.%d, server supports%d.%d\n"), > + PG_PROTOCOL_MAJOR(conn->pversion), PG_PROTOCOL_MINOR(conn->pversion), > + PG_PROTOCOL_MAJOR(their_version), PG_PROTOCOL_MINOR(their_version)); Should this match the error in postmaster.c and provide the range of versions the server supports? The FATAL in postmaster.c is for the major version, but I believe the same information is relevant when a NegotiateProtocolVersion message is sent. ereport(FATAL, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("unsupported frontend protocol %u.%u: server supports %u.0 to %u.%u", > + else > + appendPQExpBuffer(&conn->errorMessage, > + libpq_gettext("protocol extension not supported by server: %s\n"), buf.data); nitpick: s/extension/extensions What if neither the protocol version nor the requested extensions are supported? Right now, I think only the unsupported protocol version is supported in that case, but presumably we could report both pretty easily. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On 13.10.22 23:00, Nathan Bossart wrote: > On Thu, Oct 13, 2022 at 10:33:01AM +0200, Peter Eisentraut wrote: >> + if (their_version != conn->pversion) > > Shouldn't this be 'their_version < conn->pversion'? If the server supports > a later protocol than what is requested but not all the requested protocol > extensions, I think libpq would still report "protocol version not > supported." Ok, changed. >> + appendPQExpBuffer(&conn->errorMessage, >> + libpq_gettext("protocol version not supported by server: client uses %d.%d, server supports%d.%d\n"), >> + PG_PROTOCOL_MAJOR(conn->pversion), PG_PROTOCOL_MINOR(conn->pversion), >> + PG_PROTOCOL_MAJOR(their_version), PG_PROTOCOL_MINOR(their_version)); > > Should this match the error in postmaster.c and provide the range of > versions the server supports? The FATAL in postmaster.c is for the major > version, but I believe the same information is relevant when a > NegotiateProtocolVersion message is sent. > > ereport(FATAL, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("unsupported frontend protocol %u.%u: server supports %u.0 to %u.%u", If you increase the libpq minor protocol version and connect to an older server, you would get an error like "server supports 3.0 to 3.0", which is probably a bit confusing. I changed it to "up to 3.0" to convey that it could be a range. >> + else >> + appendPQExpBuffer(&conn->errorMessage, >> + libpq_gettext("protocol extension not supported by server: %s\n"), buf.data); > > nitpick: s/extension/extensions Ok, added proper plural support. > What if neither the protocol version nor the requested extensions are > supported? Right now, I think only the unsupported protocol version is > supported in that case, but presumably we could report both pretty easily. Ok, I just appended both error messages in that case.
Attachment
A few notes: > + else if (beresp == 'v') > + { > + if (pqGetNegotiateProtocolVersion3(conn)) > + { > + /* We'll come back when there is more data */ > + return PGRES_POLLING_READING; > + } > + /* OK, we read the message; mark data consumed */ > + conn->inStart = conn->inCursor; > + goto error_return; > + } This new code path doesn't go through the message length checks that are done for the 'R' and 'E' cases, and pqGetNegotiateProtocolVersion3() doesn't take the message length to know where to stop anyway, so a misbehaving server can chew up client resources. It looks like the server is expecting to be able to continue the conversation with a newer client after sending a NegotiateProtocolVersion. Is an actual negotiation planned for the future? I think the documentation on NegotiateProtocolVersion (not introduced in this patch) is misleading/wrong; it says that the version number sent back is the "newest minor protocol version supported by the server for the major protocol version requested by the client" which doesn't seem to match the actual usage seen here. Thanks, --Jacob
On 02.11.22 20:02, Jacob Champion wrote: > A few notes: > >> + else if (beresp == 'v') >> + { >> + if (pqGetNegotiateProtocolVersion3(conn)) >> + { >> + /* We'll come back when there is more data */ >> + return PGRES_POLLING_READING; >> + } >> + /* OK, we read the message; mark data consumed */ >> + conn->inStart = conn->inCursor; >> + goto error_return; >> + } > > This new code path doesn't go through the message length checks that are > done for the 'R' and 'E' cases, and pqGetNegotiateProtocolVersion3() > doesn't take the message length to know where to stop anyway, so a > misbehaving server can chew up client resources. Fixed in new patch. > It looks like the server is expecting to be able to continue the > conversation with a newer client after sending a > NegotiateProtocolVersion. Is an actual negotiation planned for the future? The protocol documentation says: | The client may then choose either | to continue with the connection using the specified protocol version | or to abort the connection. In this case, we are choosing to abort the connection. We could add negotiation in the future, but then we'd have to first have a concrete case of something to negotiate about. For example, if we added an optional performance feature into the protocol, then one could negotiate by falling back to not using that. But for the kinds of features I'm thinking about right now (column encryption), you can't proceed if the feature is not supported. So I think this would need to be considered case by case. > I think the documentation on NegotiateProtocolVersion (not introduced in > this patch) is misleading/wrong; it says that the version number sent > back is the "newest minor protocol version supported by the server for > the major protocol version requested by the client" which doesn't seem > to match the actual usage seen here. I don't follow. If libpq sends a protocol version of 3.1, then the server responds by saying it supports only 3.0. What are you seeing?
Attachment
On 11/8/22 00:40, Peter Eisentraut wrote: > On 02.11.22 20:02, Jacob Champion wrote: >> This new code path doesn't go through the message length checks that are >> done for the 'R' and 'E' cases, and pqGetNegotiateProtocolVersion3() >> doesn't take the message length to know where to stop anyway, so a >> misbehaving server can chew up client resources. > > Fixed in new patch. pqGetNegotiateProtocolVersion3() is still ignoring the message length, though; it won't necessarily stop at the message boundary. > We could add negotiation in the future, but then we'd have to first have > a concrete case of something to negotiate about. For example, if we > added an optional performance feature into the protocol, then one could > negotiate by falling back to not using that. But for the kinds of > features I'm thinking about right now (column encryption), you can't > proceed if the feature is not supported. So I think this would need to > be considered case by case. I guess I'm wondering about the definition of "minor" version if the client treats an increment as incompatible by default. But that's a discussion for the future, and this patch is just improving the existing behavior, so I'll pipe down and watch. >> I think the documentation on NegotiateProtocolVersion (not introduced in >> this patch) is misleading/wrong; it says that the version number sent >> back is the "newest minor protocol version supported by the server for >> the major protocol version requested by the client" which doesn't seem >> to match the actual usage seen here. > > I don't follow. If libpq sends a protocol version of 3.1, then the > server responds by saying it supports only 3.0. What are you seeing? I see what you've described on my end, too. The sentence I quoted seemed to imply that the server should respond with only the minor version (the least significant 16 bits). I think it should probably just say "newest protocol version" in the docs. Thanks, --Jacob
On 09.11.22 00:08, Jacob Champion wrote: > On 11/8/22 00:40, Peter Eisentraut wrote: >> On 02.11.22 20:02, Jacob Champion wrote: >>> This new code path doesn't go through the message length checks that are >>> done for the 'R' and 'E' cases, and pqGetNegotiateProtocolVersion3() >>> doesn't take the message length to know where to stop anyway, so a >>> misbehaving server can chew up client resources. >> >> Fixed in new patch. > > pqGetNegotiateProtocolVersion3() is still ignoring the message length, > though; it won't necessarily stop at the message boundary. I don't follow. The calls to pqGetInt(), pqGets(), etc. check the message length. Do you have something else in mind? Can you give an example or existing code? >>> I think the documentation on NegotiateProtocolVersion (not introduced in >>> this patch) is misleading/wrong; it says that the version number sent >>> back is the "newest minor protocol version supported by the server for >>> the major protocol version requested by the client" which doesn't seem >>> to match the actual usage seen here. >> >> I don't follow. If libpq sends a protocol version of 3.1, then the >> server responds by saying it supports only 3.0. What are you seeing? > > I see what you've described on my end, too. The sentence I quoted seemed > to imply that the server should respond with only the minor version (the > least significant 16 bits). I think it should probably just say "newest > protocol version" in the docs. Ok, I see the distinction.
On 11/11/22 07:13, Peter Eisentraut wrote: > On 09.11.22 00:08, Jacob Champion wrote: >> pqGetNegotiateProtocolVersion3() is still ignoring the message length, >> though; it won't necessarily stop at the message boundary. > > I don't follow. The calls to pqGetInt(), pqGets(), etc. check the > message length. I may be missing something obvious, but I don't see any message length checks in those functions, just bounds checks on the connection buffer. > Do you have something else in mind? Can you give an > example or existing code? Sure. Consider the case where the server sends a NegotiateProtocolVersion with a reasonable length, but then runs over its own message (either by sending an unterminated string as one of the extension names, or by sending a huge extension number). When I test that against a client on my machine, it churns CPU and memory waiting for the end of a message that will never come, even though it had already decided that the maximum length of the message should have been less than 2K. Put another way, why do we loop around and poll for more data when we hit the end of the connection buffer, if we've already checked at this point that we should have the entire message buffered locally? >+ initPQExpBuffer(&buf); >+ if (pqGetInt(&tmp, 4, conn) != 0) >+ return EOF; Tangentially related -- I think the temporary PQExpBuffer is being leaked in the EOF case. --Jacob
On 11.11.22 23:28, Jacob Champion wrote: > Consider the case where the server sends a > NegotiateProtocolVersion with a reasonable length, but then runs over > its own message (either by sending an unterminated string as one of the > extension names, or by sending a huge extension number). When I test > that against a client on my machine, it churns CPU and memory waiting > for the end of a message that will never come, even though it had > already decided that the maximum length of the message should have been > less than 2K. > > Put another way, why do we loop around and poll for more data when we > hit the end of the connection buffer, if we've already checked at this > point that we should have the entire message buffered locally? Isn't that the same behavior for other message types? I don't see anything in the handling of the early 'E' and 'R' messages that would handle this. If we want to address this, maybe this should be handled in the polling loop before we pass off the input buffer to the per-message-type handlers.
On 11/13/22 01:21, Peter Eisentraut wrote: > On 11.11.22 23:28, Jacob Champion wrote: >> Put another way, why do we loop around and poll for more data when we >> hit the end of the connection buffer, if we've already checked at this >> point that we should have the entire message buffered locally? > > Isn't that the same behavior for other message types? I don't see > anything in the handling of the early 'E' and 'R' messages that would > handle this. I agree for the 'E' case. For 'R', I see the msgLength being passed down to pg_fe_sendauth(). > If we want to address this, maybe this should be handled > in the polling loop before we pass off the input buffer to the > per-message-type handlers. I thought it was supposed to be handled by this code: > /* > * Can't process if message body isn't all here yet. > */ > msgLength -= 4; > avail = conn->inEnd - conn->inCursor; > if (avail < msgLength) > { > /* > * Before returning, try to enlarge the input buffer if > * needed to hold the whole message; see notes in > * pqParseInput3. > */ > if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength, > conn)) > goto error_return; > /* We'll come back when there is more data */ > return PGRES_POLLING_READING; > } But after this block, we still treat EOF as if we need to get more data. If we know that the message was supposed to be fully buffered, can we just avoid the return to the pooling loop altogether and error out whenever we see EOF? Thanks, --Jacob
On 14.11.22 19:11, Jacob Champion wrote: >> If we want to address this, maybe this should be handled >> in the polling loop before we pass off the input buffer to the >> per-message-type handlers. > > I thought it was supposed to be handled by this code: > >> /* >> * Can't process if message body isn't all here yet. >> */ >> msgLength -= 4; >> avail = conn->inEnd - conn->inCursor; >> if (avail < msgLength) >> { >> /* >> * Before returning, try to enlarge the input buffer if >> * needed to hold the whole message; see notes in >> * pqParseInput3. >> */ >> if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength, >> conn)) >> goto error_return; >> /* We'll come back when there is more data */ >> return PGRES_POLLING_READING; >> } > > But after this block, we still treat EOF as if we need to get more data. > If we know that the message was supposed to be fully buffered, can we > just avoid the return to the pooling loop altogether and error out > whenever we see EOF? I agree this doesn't make sense together. Digging through the history, this code is ancient and might have come about during the protocol 2/3 transition. (Protocol 2 didn't have length fields in the message IIRC.) I think for the current code, the following would be an appropriate adjustment: diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 746e9b4f1efc..d15fb96572d9 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -3412,8 +3412,7 @@ PQconnectPoll(PGconn *conn) /* Get the type of request. */ if (pqGetInt((int *) &areq, 4, conn)) { - /* We'll come back when there are more data */ - return PGRES_POLLING_READING; + goto error_return; } msgLength -= 4; And then the handling of the 'v' message in my patch would also be adjusted like that.
On Tue, Nov 15, 2022 at 2:19 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > I think for the current code, the following would be an appropriate > adjustment: > > diff --git a/src/interfaces/libpq/fe-connect.c > b/src/interfaces/libpq/fe-connect.c > index 746e9b4f1efc..d15fb96572d9 100644 > --- a/src/interfaces/libpq/fe-connect.c > +++ b/src/interfaces/libpq/fe-connect.c > @@ -3412,8 +3412,7 @@ PQconnectPoll(PGconn *conn) > /* Get the type of request. */ > if (pqGetInt((int *) &areq, 4, conn)) > { > - /* We'll come back when there are more data */ > - return PGRES_POLLING_READING; > + goto error_return; > } > msgLength -= 4; > > And then the handling of the 'v' message in my patch would also be > adjusted like that. Yes -- though that particular example may be dead code, since we should have already checked that there are at least four more bytes in the buffer. --Jacob
On 16.11.22 19:35, Jacob Champion wrote: > On Tue, Nov 15, 2022 at 2:19 AM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: >> I think for the current code, the following would be an appropriate >> adjustment: >> >> diff --git a/src/interfaces/libpq/fe-connect.c >> b/src/interfaces/libpq/fe-connect.c >> index 746e9b4f1efc..d15fb96572d9 100644 >> --- a/src/interfaces/libpq/fe-connect.c >> +++ b/src/interfaces/libpq/fe-connect.c >> @@ -3412,8 +3412,7 @@ PQconnectPoll(PGconn *conn) >> /* Get the type of request. */ >> if (pqGetInt((int *) &areq, 4, conn)) >> { >> - /* We'll come back when there are more data */ >> - return PGRES_POLLING_READING; >> + goto error_return; >> } >> msgLength -= 4; >> >> And then the handling of the 'v' message in my patch would also be >> adjusted like that. > > Yes -- though that particular example may be dead code, since we > should have already checked that there are at least four more bytes in > the buffer. I have committed this change and the adjusted original patch. Thanks.