Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Date
Msg-id CA+TgmoYZQ4N6aJwtaoCUTfjniqvZohgOh9R=EkyUVB+oN413vQ@mail.gmail.com
Whole thread Raw
In response to Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs  (Jelte Fennema-Nio <me@jeltef.nl>)
Responses Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
List pgsql-hackers
On Mon, Apr 22, 2024 at 5:19 PM Jelte Fennema-Nio <me@jeltef.nl> wrote:
> On Mon, 22 Apr 2024 at 16:26, Robert Haas <robertmhaas@gmail.com> wrote:
> > That's a fair point, but I'm still not seeing much practical
> > advantage. It's unlikely that a client is going to set a random bit in
> > a format parameter for no reason.
>
> I think you're missing an important point of mine here. The client
> wouldn't be "setting a random bit in a format parameter for no
> reason". The client would decide it is allowed to set this bit,
> because the PG version it connected to supports column encryption
> (e.g. PG18). But this completely breaks protocol and application layer
> separation.

I can't see what the problem is here. If the client is connected to a
database that contains encrypted columns, and its response to seeing
an encrypted column is to set this bit, that's fine and nothing should
break. If a client doesn't know about encrypted columns and sets that
bit at random, that will break things, and formally I think that's a
risk, because I don't believe we document anywhere that you shouldn't
set unused bits in the format mask. But practically, it's not likely.
(And also, maybe we should document that you shouldn't do that.)

> It doesn't seem completely outside of the realm of possibility for a
> pooler to gather some statistics on the amount of Bind messages that
> use text vs binary query parameters. That's very easily doable now,
> while looking only at the protocol layer. If a client then sets the
> new format parameter bit, this pooler could then get confused and
> close the connection.

Right, this is the kind of risk I was worried about. I think it's
similar to my example of a client setting an unused bit for no reason
and breaking everything. Here, you've hypothesized a pooler that tries
to interpret the bit and just errors out when it sees something it
doesn't understand. I agree that *formally* this is enough to justify
bumping the protocol version, but I think *practically* it isn't,
because the incompatibility is so minor as to inconvenience almost
nobody, whereas changing the protocol version affects everybody.

Let's consider a hypothetical country much like Canada except that
there are three official languages rather than two: English, French,
and Robertish. Robertish is just like English except that the meanings
of the words cabbage and rutabaga are reversed. Shall we mandate that
all signs in the country be printed in three languages rather than
two? Formally, we ought, because the substantial minority of our
hypothetical country that proudly speaks Robertish as their mother
tongue will not want to feel that they are second class citizens. But
practically, there are very few situations where the differences
between the two languages are going to inconvenience anyone. Indeed,
the French speakers might be a bit put out if English is effectively
represented twice on every sign while their mother tongue is there
only once. Of course, people are entitled to organize their countries
politically in any way that works for the people who live in them, but
as a practical matter, English and Robertish are mutually
intelligible.

And so here. If someone codes a connection pooler in the way you
suppose, then it will break. But, first of all, they probably won't do
that, both because it's not particularly likely that someone wants to
gather that particular set of statistics and also because erroring out
seems like an overreaction. And secondly, let's imagine that we do
bump the protocol version and think about whether and how that solves
the problem. A client will request from the pooler a version 3.1
connection and the pooler will say, sorry, no can do, I only
understand 3.0. So the client will now say, oh ok, no problem, I'm
going to refrain from setting that parameter format bit. Cool, right?

Well, no, not really. First, now the client application is probably
broken. If the client is varying its behavior based on the server's
protocol version, that must mean that it cares about accessing
encrypted columns, and that means that the bit in question is not an
optional feature. So actually, the fact that the pooler can force the
client to downgrade hasn't fixed anything at all.

Second, if the connection pooler were written to do something other
than close the connection, like say mask out the one bit that it knows
how to deal with or have an "unknown" bucket to count values that it
doesn't recognize, then it wouldn't have needed to care about the
protocol version in the first place. It would have been better off not
even knowing, because then it wouldn't have forced a downgrade onto
the client application for no real reason. Throwing an error wasn't a
wrong decision on the part of the person writing the pooler, but there
are other things they could have done that would have been less
brittle.

Third, applications, drivers, and connection poolers now all need to
worry about handling downgrades smoothly. If a connection pooler
requests a v3.1 connection to the server and gets v3.0, it had better
make sure that it only advertises 3.0 to the client. If the client
requests v3.0, the pooler had better make sure to either request v3.0
from the server. Or alternatively, the pooler can be prepared to
translate between 3.0 and 3.1 wherever that's needed, in either
direction. But it's not at all clear what that would look like for
something like TCE. Will the pooler arrange to encrypt parameters
destined for encrypted tables if the client doesn't do so? Will it
arrange to decrypt values coming from encrypted tables if the client
doesn't understand encryption? It's possible someone will code that
sort of thing, but I bet a lot of people won't bother. In general, I
think we'll quickly end up with a bunch of different protocol versions
-- say, 3.0 through 3.4 -- but people will thoroughly test with only
one or two of them and support for the others will either be buggy
because it wasn't tested or work anyway because the differences didn't
really matter in the first place.

> 1. I strongly believe minor protocol version bumps after the initial
> 3.1 one can be made painless for clients/poolers (so the ones to
> 3.2/3.3/etc). Similar to how TLS 1.3 can be safely introduced, and not
> having to worry about breaking TLS 1.2 communication. Once clients and
> poolers implement version negotiation support for 3.1, there's no
> reason for version negation support to work for 3.0 and 3.1 to then
> suddenly break on the 3.2 bump. To be clear, I'm talking about the act
> of bumping the version here, not the actual protocol changes. So
> assuming zero/near-zero client implementation effort for the new
> features (like never setting the newly supported bit in a format
> parameter), then bumping the protocol version for these new features
> can never have negative consequences.

I do like the idea of being able to introduce new versions without
breaking things, but I think that if the TLS folks bumped the protocol
version for something as minor as what we're talking about here, there
would quickly be so many TLS versions that the result would be
unmanageable. I suspect that they either never make small changes and
batch everything up for the next rev, or they slip small changes into
existing protocol versions as I propose that we do here. I have zero
objection to bumping the protocol version when there is a real
question of mutual intelligibility, and zero objection to trying to
reduce friction around version bumps. But my current view, which I
reserve the right to revise at a later time, is that a change that
99.99+% of people can safely ignore is not a sufficient reason for a
version bump.

> 2. I very much want to keep a clear split between the protocol layer
> and the application layer of our communication. And these layers merge
> whenever (like you say) "the wire protocol has changed from one
> release to another", but no protocol version bump or protocol
> extension is used to indicate that. When that happens the only way for
> a client to know what valid wire protocol messages are according to
> the server, is by checking the server version. This completely breaks
> the separation between layers. So, while checking the server version
> indeed works for direct client to postgres communication, it starts to
> break down whenever you put a pooler inbetween (as explained in the
> example earlier in this email). And it breaks down even more when
> connecting to servers that implement the Postgres wire protocol, but
> are not postgres at all, like CockroachDB. Right now libpq and other
> postgres drivers can be used to talk to these other servers and
> poolers, but if we start mixing protocol and application layer stuff
> then eventually that will stop being the case.

In practice, it's already the case. If such databases don't share code
with PostgreSQL, it seems impossible that the replication subprotocol
works in any meaningful way. It seems very likely that there are other
dark corners of the protocol where things don't work either. And TCE
will be another one, but bumping the protocol version doesn't fix
that.

I kind of feel bad arguing so much about this - I don't think the urge
to bump the protocol version when we change the protocol is a bad one
in concept. And it sounds like you've done more work with software
that cares about the protocol outside of PostgreSQL itself than I
have. So maybe you're right and I'm all wet. But I can't understand
why you don't see practical problems with frequent version bumps. It's
not just about the one-time effort of getting everything that doesn't
currently understand how to negotiate a version to do so. It's about
how everyone acts on that information, or doesn't, and whether the end
result of all of those individual decisions is better or worse for the
community as a whole.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
Next
From: Sushrut Shivaswamy
Date:
Subject: Background Processes in Postgres Extension