Re: Request for comment on setting binary format output per session - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Request for comment on setting binary format output per session
Date
Msg-id CA+Tgmoa8m_aT049yhL=vXchiX7upW6RziPrwbu2UTdcnvwvH4A@mail.gmail.com
Whole thread Raw
In response to Re: Request for comment on setting binary format output per session  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Request for comment on setting binary format output per session
List pgsql-hackers
On Mon, Apr 17, 2023 at 1:55 PM Jeff Davis <pgsql@j-davis.com> wrote:
> It involves introducing new message types which I didn't really
> consider. We might want to be careful about how many kinds of messages
> we introduce so that the one-letter codes are still managable. I've
> been frustrated in the past that we don't have separate symbols in the
> source code to refer to the message types (we just use literal 'S',
> etc.).

Right. That was part of the thinking behind the protocol session
parameter thing I was throwing out there.

> Maybe we should have a single new message type 'x' to indicate a
> message for a protocol extension, and then have a sub-message-type? It
> might make error handling better for unexpected messages.

I'm somewhat skeptical that we want every protocol extension in the
universe to use a single message type. I think that could lead to
munging together all sorts of messages that are actually really
different from each other. On the other hand, in a certain sense, we
don't really have a choice. The type byte for a protocol message can
only take on one of 256 possible values, and some of those are already
used, so if we add a bunch of stuff to the protocol, we're eventually
going to run short of byte values. In fact, even if we said, well, 'x'
means that it's an extended message and then there's a type byte as
the first byte of the payload, that only doubles the number of
possible message types before we run out of room, and maybe there's a
world where we eventually have thousands upon thousands of message
types. We'd need a longer type code than 1 byte to really get out from
under the problem, so if we add a message like what you're talking
about, we should probably do that.

But I don't know if we need to be too paranoid about this. For
example, suppose we were to agree on adding protocol session
parameters and make this the first one. To do that, suppose we add two
new messages to the protocol, ProtocolSessionParameterSet and
ProtocolSessionParameterReponse. And suppose we just pick single
letter codes for those, like we have right now. How much use would
such a mechanism get? It seems possible that we'd add as many as 5 or
10 such parameters in the next half-decade, but they'd all only need
those two new message types. We'd only need a different message type
if somebody wanted to customize something about the protocol that
didn't fit into that model, and that might happen, but I bet it
wouldn't happen that often. I feel like if we're careful to make sure
that the new protocol messages that we add are carefully designed to
be reasonably general, we'd add them very slowly. It seems very
possible that we could go a century or more without running out of
possible values. We could then decide to leave it to future hackers to
decide what to do about it when the remaining bit space starts to get
tight.

The point of this thought experiment is to help us estimate how
careful we need to be. I think that if we added messages with 1-byte
type codes for things as specific as SetTypesWithBinaryOutputAlways,
there would be a significant chance that we would run out of 1-byte
type codes while some of us are still around to be sad about it. Maybe
it wouldn't happen, but it seems risky. Furthermore, such messages are
FAR more specific than existing protocol messages like Query or
Execute or ErrorResponse which cover HUGE amounts of territory. I
think we need to be a level of abstraction removed. Something like
ProtocolSessionParameterSet seems good enough to me - I think we'll
run out of codes like that soon enough to matter. I don't think it
would be wrong to take that as far as you propose here, and just add
one new message type to cover all future developments, but it feels
like it might not really help anyone. A lot of code would probably
have to drill down and look at what type of extended message it was
before deciding what to do with it, which seems a bit annoying.

One thing to keep in mind is that it's possible that in the future we
might want protocol extensions for things that are very
performance-sensitive. For instance, I think it might be advantageous
to have something that is intermediate between the simple and extended
query protocol. The simple query protocol doesn't let you set
parameters, but the extended query protocol requires you to send a
whole series of messages (Parse-Bind-Describe-Execute-Sync) which
doesn't seem to be particularly efficient for either the client or the
server. I think it would be nice to have a way to send a single
message that says "run this query with these parameters." But, if we
had that, some clients might use it Really A Lot. They would therefore
want the message to be as short as possible, which means that using up
a single byte code for it would probably be desirable. On the other
hand, the kinds of things we're talking about here really shouldn't be
subjected to that level of use, and so if for this purpose we pick a
message format that is longer and wordier and more extensible, that
should be fine. If your connection pooler is switching your connection
back and forth between a bunch of end clients that all have different
ideas about binary format parameters, it should be running at least
one query after each such change, and probably more than that. And
that query probably has some results, so a few extra bytes of overhead
in the message format shouldn't cost much even in fairly extreme
cases.

> Also, is there any reason we'd want this concept to integrate with
> connection strings/URIs? Probably not a good idea to turn on features
> that way, but perhaps we'd want to support disabling protocol
> extensions from a URI? This could be used to restrict authentication
> methods or sources of authentication information.

I don't really see why the connection string/URI has any business
disabling anything. It might require something to be enabled, though.
For instance, if we added a protocol extension to encrypt all result
sets returned to the client using rot13, we might also add a
connection parameter to control that behavior. If the user requested
that behavior using a connection parameter, libpq might then try to
enable it via a protocol extension -- it would have to, else it would
otherwise be unable to deliver the requested behavior. But the user
shouldn't get to say "please enable the protocol extension that would
enable you to turn on rot13 even though I don't actually want to use
rot13" nor should they be able to say "please give me rot13 without
using the protocol extension that would let you ask for that". Those
requests aren't sensible. The connection parameter interface is a way
for the user to request certain behaviors that they might want, and
then it's up to libpq, or some other connector, to decide what needs
to happen at a protocol level to implement those requests.

And that might change over time. We could introduce a new major
protocol version (v4!) or somebody could eventually say "hey, these
six protocol extensions are now universally supported by literally
every bit of code that we can find that speaks the PG wire protocol,
let's just start sending all these messages unconditionally and the
counterparty can error out if they're a fossil from the Jurassic era".
It's kind of hard to imagine that happening from where we are now, but
times change.

> > The reason why I suggest this is that I feel like there could be a
> > bunch of things like this.
>
> What's the trade-off between having one protocol extension (e.g.
> _pq_protocol_session_parameters) that tries to work for multiple cases
> (e.g. binary_formats and session_user) vs just having two protocol
> extensions (_pq_set_session_user and _pq_set_binary_formats)?

Well, it seems related to the message types issue mentioned above.
Presumably if we were going to have one set of message types for both
features, we'd want one protocol extension to enable that set of
message types. And if we were going to have separate message types for
each feature, we'd want separate protocol extensions to enable them.
There are probably other ways it could work, but that seems like the
most natural idea.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Miroslav Bendik
Date:
Subject: Re: Incremental sort for access method with ordered scan support (amcanorderbyop)
Next
From: Tom Lane
Date:
Subject: Re: Request for comment on setting binary format output per session