Thread: Re: protocol-level wait-for-LSN
On Mon, 28 Oct 2024 at 17:51, Peter Eisentraut <peter@eisentraut.org> wrote: > This is something I hacked together on the way back from pgconf.eu. > It's highly experimental. > > The idea is to do the equivalent of pg_wal_replay_wait() on the protocol > level, so that it is ideally fully transparent to the application code. > The application just issues queries, and they might be serviced by a > primary or a standby, but there is always a correct ordering of reads > after writes. The idea is great, I have been wanting something like this for a long time. For future proofing it might be a good idea to not require the communicated-waited value to be a LSN. In a sharded database a Lamport timestamp would allow for sequential consistency. Lamport timestamp is just some monotonically increasing value that is eagerly shared between all communicating participants, including clients. For a single cluster LSNs work fine for this purpose. But with multiple shards LSNs will not work, unless arranged as a vector clock which is what I think Matthias proposed. Even without sharding LSN might not be a final choice. Right now on the primary the visibility order is not LSN order. So if a connection does synchronous_commit = off commit, the write location is not even going to see the commit. By publishing the end of the commit record it would be better. But I assume at some point we would like to have a consistent visibility order, which quite likely means using something other than LSN as the logical clock. I see the patch names the field LSN, but on the protocol level and for the client library this is just an opaque 127 byte token. So basically I'm thinking the naming could be more generic. And for a complete Lamport timestamp implementation we would need the capability of extracting the last seen value and another set-if-greater update operation. -- Ants Aasma www.cybertec-postgresql.com
On Wed, 30 Oct 2024 at 18:18, Ants Aasma <ants.aasma@cybertec.at> wrote: > The idea is great, I have been wanting something like this for a long > time. For future proofing it might be a good idea to not require the > communicated-waited value to be a LSN. Yours and Matthias' feedback make total sense I think. From an implementation perspective I think there are a few things necessary to enable these wider usecases: 1. The token should be considered opaque for clients (should be documented) 2. The token should be defined as variable length in the protocol 3. We should have a hook to allow postgres extensions to override the default token generation 4. We should have a hook to allow postgres extensions to override waiting until the token "timestamp" > Even without sharding LSN might not be a final choice. Right now on > the primary the visibility order is not LSN order. So if a connection > does synchronous_commit = off commit, the write location is not even > going to see the commit. By publishing the end of the commit record it > would be better. But I assume at some point we would like to have a > consistent visibility order, which quite likely means using something > other than LSN as the logical clock. I was going to say that the default could probably still be LSN, but this makes me doubt that. Is there some other token that we can send now that we could "wait" on instead of the LSN, which would work for. If not, I think LSN is still probably a good choice as the default. Or maybe only as a default in case synchronous_commit != off.
Hi, On 10/30/24 1:45 PM, Jelte Fennema-Nio wrote: > On Wed, 30 Oct 2024 at 18:18, Ants Aasma <ants.aasma@cybertec.at> wrote: >> The idea is great, I have been wanting something like this for a long >> time. For future proofing it might be a good idea to not require the >> communicated-waited value to be a LSN. > > Yours and Matthias' feedback make total sense I think. From an > implementation perspective I think there are a few things necessary to > enable these wider usecases: > 1. The token should be considered opaque for clients (should be documented) > 2. The token should be defined as variable length in the protocol > 3. We should have a hook to allow postgres extensions to override the > default token generation > 4. We should have a hook to allow postgres extensions to override > waiting until the token "timestamp" > >> Even without sharding LSN might not be a final choice. Right now on >> the primary the visibility order is not LSN order. So if a connection >> does synchronous_commit = off commit, the write location is not even >> going to see the commit. By publishing the end of the commit record it >> would be better. But I assume at some point we would like to have a >> consistent visibility order, which quite likely means using something >> other than LSN as the logical clock. > > I was going to say that the default could probably still be LSN, but > this makes me doubt that. Is there some other token that we can send > now that we could "wait" on instead of the LSN, which would work for. > If not, I think LSN is still probably a good choice as the default. Or > maybe only as a default in case synchronous_commit != off. > There are known wish-lists for a protocol v4, like https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md and a lot of clean-room implementations in drivers and embedded in projects/products. Having LSN would be nice, but to break all existing implementations, no. Having to specify with startup parameters how a core message format looks like sounds like a bad idea to me, https://www.postgresql.org/docs/devel/protocol-message-formats.html is it. If we want to start on a protocol v4 thing then that is ok - but there are a lot of feature requests for that one. Best regards, Jesper
On Wed, 30 Oct 2024 at 19:04, Jesper Pedersen <jesper.pedersen@comcast.net> wrote: > Having LSN would be nice, but to break all existing implementations, no. > Having to specify with startup parameters how a core message format > looks like sounds like a bad idea to me, It would really help if you would explain why you think it's a bad idea to use a startup parameter for that, instead of simply stating that you think it needs a major protocol version bump. The point of enabling it through a startup parameter (aka protocol option) is exactly so it will not break any existing implementations. If clients request the protocol option (which as the name suggests is optional), then they are expected to be able to parse it. If they don't, then they will get the old message format. So no existing implementation will be broken. If some middleware/proxy gets a request for a startup option it does not support it can advertise that to the client using the NegotiateProtocolVersion message. Allowing the client to continue in a mode where the option is not enabled. So, not bumping the major protocol version and enabling this feature through a protocol option actually causes less breakage in practice. Also regarding the wishlist. I think it's much more likely for any of those to happen in a minor version bump and/or protocol option than it is that we'll bump the major protocol version. P.S. Like I said in another email on this thread: I think for this specific case I'd also prefer a separate new message, because that makes it easier to filter that message out when received by PgBouncer. But I'd still like to understand your viewpoint better on this, because adding fields to existing message types is definitely one of the types of changes that I personally think would be fine for some protocol changes.
Hi, On 10/30/24 3:49 PM, Jelte Fennema-Nio wrote: > On Wed, 30 Oct 2024 at 19:04, Jesper Pedersen > <jesper.pedersen@comcast.net> wrote: >> Having LSN would be nice, but to break all existing implementations, no. >> Having to specify with startup parameters how a core message format >> looks like sounds like a bad idea to me, > > It would really help if you would explain why you think it's a bad > idea to use a startup parameter for that, instead of simply stating > that you think it needs a major protocol version bump. > > The point of enabling it through a startup parameter (aka protocol > option) is exactly so it will not break any existing implementations. > If clients request the protocol option (which as the name suggests is > optional), then they are expected to be able to parse it. If they > don't, then they will get the old message format. So no existing > implementation will be broken. If some middleware/proxy gets a request > for a startup option it does not support it can advertise that to the > client using the NegotiateProtocolVersion message. Allowing the client > to continue in a mode where the option is not enabled. > > So, not bumping the major protocol version and enabling this feature > through a protocol option actually causes less breakage in practice. > Yes, but it opens up for everybody changing all message formats by startup parameters. And, it will be confusing to clean-room implementations: When you have this startup parameter then you get these message formats, when you have this startup parameter then you get these message formats -- and what about combinations ? Like Tatsuo-san stated up-thread. You are also assuming that all PostgreSQL protocol implementations uses the Length (Int32) field very strict - so when one developer adds the startup parameter, but doesn't change the underlying implementation everything will break. The protocol format must be 100% clear and well-documented in all cases. > Also regarding the wishlist. I think it's much more likely for any of > those to happen in a minor version bump and/or protocol option than it > is that we'll bump the major protocol version. > I agree that protocol v4 is likely far out unless somebody want to coordinate the work needed. > P.S. Like I said in another email on this thread: I think for this > specific case I'd also prefer a separate new message, because that > makes it easier to filter that message out when received by PgBouncer. > But I'd still like to understand your viewpoint better on this, > because adding fields to existing message types is definitely one of > the types of changes that I personally think would be fine for some > protocol changes. > If this door is open then it has to very clear how multiple startup parameters are handled at the protocol level, and that is a much bigger fish because what happens if extensions add startup parameters as well. Adding a new message could be the way forward, but that opens the door for the wish-lists for v4. Best regards, Jesper
On Thu, 31 Oct 2024 at 13:59, Jesper Pedersen <jesper.pedersen@comcast.net> wrote: > And, it will be confusing to clean-room implementations: When you have > this startup parameter then you get these message formats, when you have > this startup parameter then you get these message formats -- and what > about combinations ? Like Tatsuo-san stated up-thread. I really don't understand why you think that's so difficult. To be clear, no client is forced to implement any of these protocol options. And all of these protocol options would be documented in the official protocol docs. For instance the ReadyForQuery docs on the "Message Formats" page in the docs could easily be made to look like the following, which imho would be very clear to any implementer of the protocol about ordering of these fields: ReadyForQuery (B) Byte1('Z') Identifies the message type. ReadyForQuery is sent whenever the backend is ready for a new query cycle. Int32 Length of message contents in bytes, including self. Int64: Only present if protocol option wait_for_lsn is set to 1 by the client The LSN at time of commit Int64: Only present if protocol option query_time is set to 1 by the client Time it took to run the query in microseconds Byte1 Current backend transaction status indicator. Possible values are 'I' if idle (not in a transaction block); 'T' if in a transaction block; or 'E' if in a failed transaction block (queries will be rejected until block is ended). > You are also assuming that all PostgreSQL protocol implementations uses > the Length (Int32) field very strict - so when one developer adds the > startup parameter, but doesn't change the underlying implementation > everything will break. Yes... But that seems equivalent to saying: If a developer of a Postgres client advertises that they support protocol v4, but don't actually implement it, then everything will break. i.e. it's the job of the client author to not send protocol options that it doesn't know anything about. Just like it's the job of the client author not to request versions that it does not know anything about. > The protocol format must be 100% clear and well-documented in all cases. Agreed. See above. > If this door is open then it has to very clear how multiple startup > parameters are handled at the protocol level, and that is a much bigger > fish because what happens if extensions add startup parameters as well. Postgres extensions **cannot** add such startup parameters. Heikki already mentioned that the naming was confusing in the docs. At this point in time we're only discussing protocol changes that are coming from Postgres core (which is already a contentious enough topic).
On Wed, 30 Oct 2024, 18:45 Jelte Fennema-Nio, <postgres@jeltef.nl> wrote: > > On Wed, 30 Oct 2024 at 18:18, Ants Aasma <ants.aasma@cybertec.at> wrote: > > The idea is great, I have been wanting something like this for a long > > time. For future proofing it might be a good idea to not require the > > communicated-waited value to be a LSN. > > Yours and Matthias' feedback make total sense I think. From an > implementation perspective I think there are a few things necessary to > enable these wider usecases: > 1. The token should be considered opaque for clients (should be documented) I disagree. It is critical that a consumer knows what to do with the output. Blindly passing it around is not a valid strategy: In my example of keeping track of replication slots the client also has to keep track of every cluster ID to make it work correctly, as every postgres instance may only know about a subset of other PG instances: A client would have to know how to discern and how to merge the returned set of [cluster_id, LSN] pairs into its own view of a global progress: Say, you connect to cluster A, which receives changes from clusters X and Y, cluster B, which receives from X and Z, and cluster C, which receives from all of X, Y, and Z. Cluster B should ignore [Y_ID, Lsn], as keeping the [cluster id, LSN] pair around would be sensitive to resource attacks, but the client will have to merge the response from that scluster to make sure it doesn't accidentally "go back in time" when it switches from cluster A or B to another cluster with the "wait for this minimal replication state" 'token'. > > Even without sharding LSN might not be a final choice. Right now on > > the primary the visibility order is not LSN order. So if a connection > > does synchronous_commit = off commit, the write location is not even > > going to see the commit. By publishing the end of the commit record it > > would be better. But I assume at some point we would like to have a > > consistent visibility order, which quite likely means using something > > other than LSN as the logical clock. Or have CSN=LSN -based snapshots on the primary, too, as that also would solve the unordered visibility issue on the primary, as well as the unacknowledged read issue. > I was going to say that the default could probably still be LSN, but > this makes me doubt that. Is there some other token that we can send > now that we could "wait" on instead of the LSN, which would work for. > If not, I think LSN is still probably a good choice as the default. Or > maybe only as a default in case synchronous_commit != off. I don't see how we can have anything but LSN as 'wait-for-this' condition, as everything else could appear out-of-order in the WAL (we don't allow the record to be modified during XLogInsert()/ReserveXLogInsertLocation()), and WAL is our one source of truth for change capture. PS. I have other complaints about timestamp-based replication/snapshots, but unless someone thinks otherwise and/or it is made relevant I'll consider that off-topic. Kind regards, Matthias van de Meent Neon (https://neon.tech)