Re: Request for comment on setting binary format output per session - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Request for comment on setting binary format output per session |
Date | |
Msg-id | CA+Tgmoa8m_aT049yhL=vXchiX7upW6RziPrwbu2UTdcnvwvH4A@mail.gmail.com Whole thread Raw |
In response to | Re: Request for comment on setting binary format output per session (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Request for comment on setting binary format output per session
|
List | pgsql-hackers |
On Mon, Apr 17, 2023 at 1:55 PM Jeff Davis <pgsql@j-davis.com> wrote: > It involves introducing new message types which I didn't really > consider. We might want to be careful about how many kinds of messages > we introduce so that the one-letter codes are still managable. I've > been frustrated in the past that we don't have separate symbols in the > source code to refer to the message types (we just use literal 'S', > etc.). Right. That was part of the thinking behind the protocol session parameter thing I was throwing out there. > Maybe we should have a single new message type 'x' to indicate a > message for a protocol extension, and then have a sub-message-type? It > might make error handling better for unexpected messages. I'm somewhat skeptical that we want every protocol extension in the universe to use a single message type. I think that could lead to munging together all sorts of messages that are actually really different from each other. On the other hand, in a certain sense, we don't really have a choice. The type byte for a protocol message can only take on one of 256 possible values, and some of those are already used, so if we add a bunch of stuff to the protocol, we're eventually going to run short of byte values. In fact, even if we said, well, 'x' means that it's an extended message and then there's a type byte as the first byte of the payload, that only doubles the number of possible message types before we run out of room, and maybe there's a world where we eventually have thousands upon thousands of message types. We'd need a longer type code than 1 byte to really get out from under the problem, so if we add a message like what you're talking about, we should probably do that. But I don't know if we need to be too paranoid about this. For example, suppose we were to agree on adding protocol session parameters and make this the first one. To do that, suppose we add two new messages to the protocol, ProtocolSessionParameterSet and ProtocolSessionParameterReponse. And suppose we just pick single letter codes for those, like we have right now. How much use would such a mechanism get? It seems possible that we'd add as many as 5 or 10 such parameters in the next half-decade, but they'd all only need those two new message types. We'd only need a different message type if somebody wanted to customize something about the protocol that didn't fit into that model, and that might happen, but I bet it wouldn't happen that often. I feel like if we're careful to make sure that the new protocol messages that we add are carefully designed to be reasonably general, we'd add them very slowly. It seems very possible that we could go a century or more without running out of possible values. We could then decide to leave it to future hackers to decide what to do about it when the remaining bit space starts to get tight. The point of this thought experiment is to help us estimate how careful we need to be. I think that if we added messages with 1-byte type codes for things as specific as SetTypesWithBinaryOutputAlways, there would be a significant chance that we would run out of 1-byte type codes while some of us are still around to be sad about it. Maybe it wouldn't happen, but it seems risky. Furthermore, such messages are FAR more specific than existing protocol messages like Query or Execute or ErrorResponse which cover HUGE amounts of territory. I think we need to be a level of abstraction removed. Something like ProtocolSessionParameterSet seems good enough to me - I think we'll run out of codes like that soon enough to matter. I don't think it would be wrong to take that as far as you propose here, and just add one new message type to cover all future developments, but it feels like it might not really help anyone. A lot of code would probably have to drill down and look at what type of extended message it was before deciding what to do with it, which seems a bit annoying. One thing to keep in mind is that it's possible that in the future we might want protocol extensions for things that are very performance-sensitive. For instance, I think it might be advantageous to have something that is intermediate between the simple and extended query protocol. The simple query protocol doesn't let you set parameters, but the extended query protocol requires you to send a whole series of messages (Parse-Bind-Describe-Execute-Sync) which doesn't seem to be particularly efficient for either the client or the server. I think it would be nice to have a way to send a single message that says "run this query with these parameters." But, if we had that, some clients might use it Really A Lot. They would therefore want the message to be as short as possible, which means that using up a single byte code for it would probably be desirable. On the other hand, the kinds of things we're talking about here really shouldn't be subjected to that level of use, and so if for this purpose we pick a message format that is longer and wordier and more extensible, that should be fine. If your connection pooler is switching your connection back and forth between a bunch of end clients that all have different ideas about binary format parameters, it should be running at least one query after each such change, and probably more than that. And that query probably has some results, so a few extra bytes of overhead in the message format shouldn't cost much even in fairly extreme cases. > Also, is there any reason we'd want this concept to integrate with > connection strings/URIs? Probably not a good idea to turn on features > that way, but perhaps we'd want to support disabling protocol > extensions from a URI? This could be used to restrict authentication > methods or sources of authentication information. I don't really see why the connection string/URI has any business disabling anything. It might require something to be enabled, though. For instance, if we added a protocol extension to encrypt all result sets returned to the client using rot13, we might also add a connection parameter to control that behavior. If the user requested that behavior using a connection parameter, libpq might then try to enable it via a protocol extension -- it would have to, else it would otherwise be unable to deliver the requested behavior. But the user shouldn't get to say "please enable the protocol extension that would enable you to turn on rot13 even though I don't actually want to use rot13" nor should they be able to say "please give me rot13 without using the protocol extension that would let you ask for that". Those requests aren't sensible. The connection parameter interface is a way for the user to request certain behaviors that they might want, and then it's up to libpq, or some other connector, to decide what needs to happen at a protocol level to implement those requests. And that might change over time. We could introduce a new major protocol version (v4!) or somebody could eventually say "hey, these six protocol extensions are now universally supported by literally every bit of code that we can find that speaks the PG wire protocol, let's just start sending all these messages unconditionally and the counterparty can error out if they're a fossil from the Jurassic era". It's kind of hard to imagine that happening from where we are now, but times change. > > The reason why I suggest this is that I feel like there could be a > > bunch of things like this. > > What's the trade-off between having one protocol extension (e.g. > _pq_protocol_session_parameters) that tries to work for multiple cases > (e.g. binary_formats and session_user) vs just having two protocol > extensions (_pq_set_session_user and _pq_set_binary_formats)? Well, it seems related to the message types issue mentioned above. Presumably if we were going to have one set of message types for both features, we'd want one protocol extension to enable that set of message types. And if we were going to have separate message types for each feature, we'd want separate protocol extensions to enable them. There are probably other ways it could work, but that seems like the most natural idea. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: