Thread: Re: protocol-level wait-for-LSN
On Mon, 28 Oct 2024 at 17:51, Peter Eisentraut <peter@eisentraut.org> wrote: > This is something I hacked together on the way back from pgconf.eu. > It's highly experimental. > > The idea is to do the equivalent of pg_wal_replay_wait() on the protocol > level, so that it is ideally fully transparent to the application code. > The application just issues queries, and they might be serviced by a > primary or a standby, but there is always a correct ordering of reads > after writes. The idea is great, I have been wanting something like this for a long time. For future proofing it might be a good idea to not require the communicated-waited value to be a LSN. In a sharded database a Lamport timestamp would allow for sequential consistency. Lamport timestamp is just some monotonically increasing value that is eagerly shared between all communicating participants, including clients. For a single cluster LSNs work fine for this purpose. But with multiple shards LSNs will not work, unless arranged as a vector clock which is what I think Matthias proposed. Even without sharding LSN might not be a final choice. Right now on the primary the visibility order is not LSN order. So if a connection does synchronous_commit = off commit, the write location is not even going to see the commit. By publishing the end of the commit record it would be better. But I assume at some point we would like to have a consistent visibility order, which quite likely means using something other than LSN as the logical clock. I see the patch names the field LSN, but on the protocol level and for the client library this is just an opaque 127 byte token. So basically I'm thinking the naming could be more generic. And for a complete Lamport timestamp implementation we would need the capability of extracting the last seen value and another set-if-greater update operation. -- Ants Aasma www.cybertec-postgresql.com
On Wed, 30 Oct 2024 at 18:18, Ants Aasma <ants.aasma@cybertec.at> wrote: > The idea is great, I have been wanting something like this for a long > time. For future proofing it might be a good idea to not require the > communicated-waited value to be a LSN. Yours and Matthias' feedback make total sense I think. From an implementation perspective I think there are a few things necessary to enable these wider usecases: 1. The token should be considered opaque for clients (should be documented) 2. The token should be defined as variable length in the protocol 3. We should have a hook to allow postgres extensions to override the default token generation 4. We should have a hook to allow postgres extensions to override waiting until the token "timestamp" > Even without sharding LSN might not be a final choice. Right now on > the primary the visibility order is not LSN order. So if a connection > does synchronous_commit = off commit, the write location is not even > going to see the commit. By publishing the end of the commit record it > would be better. But I assume at some point we would like to have a > consistent visibility order, which quite likely means using something > other than LSN as the logical clock. I was going to say that the default could probably still be LSN, but this makes me doubt that. Is there some other token that we can send now that we could "wait" on instead of the LSN, which would work for. If not, I think LSN is still probably a good choice as the default. Or maybe only as a default in case synchronous_commit != off.
Hi, On 10/30/24 1:45 PM, Jelte Fennema-Nio wrote: > On Wed, 30 Oct 2024 at 18:18, Ants Aasma <ants.aasma@cybertec.at> wrote: >> The idea is great, I have been wanting something like this for a long >> time. For future proofing it might be a good idea to not require the >> communicated-waited value to be a LSN. > > Yours and Matthias' feedback make total sense I think. From an > implementation perspective I think there are a few things necessary to > enable these wider usecases: > 1. The token should be considered opaque for clients (should be documented) > 2. The token should be defined as variable length in the protocol > 3. We should have a hook to allow postgres extensions to override the > default token generation > 4. We should have a hook to allow postgres extensions to override > waiting until the token "timestamp" > >> Even without sharding LSN might not be a final choice. Right now on >> the primary the visibility order is not LSN order. So if a connection >> does synchronous_commit = off commit, the write location is not even >> going to see the commit. By publishing the end of the commit record it >> would be better. But I assume at some point we would like to have a >> consistent visibility order, which quite likely means using something >> other than LSN as the logical clock. > > I was going to say that the default could probably still be LSN, but > this makes me doubt that. Is there some other token that we can send > now that we could "wait" on instead of the LSN, which would work for. > If not, I think LSN is still probably a good choice as the default. Or > maybe only as a default in case synchronous_commit != off. > There are known wish-lists for a protocol v4, like https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md and a lot of clean-room implementations in drivers and embedded in projects/products. Having LSN would be nice, but to break all existing implementations, no. Having to specify with startup parameters how a core message format looks like sounds like a bad idea to me, https://www.postgresql.org/docs/devel/protocol-message-formats.html is it. If we want to start on a protocol v4 thing then that is ok - but there are a lot of feature requests for that one. Best regards, Jesper
On Wed, 30 Oct 2024 at 19:04, Jesper Pedersen <jesper.pedersen@comcast.net> wrote: > Having LSN would be nice, but to break all existing implementations, no. > Having to specify with startup parameters how a core message format > looks like sounds like a bad idea to me, It would really help if you would explain why you think it's a bad idea to use a startup parameter for that, instead of simply stating that you think it needs a major protocol version bump. The point of enabling it through a startup parameter (aka protocol option) is exactly so it will not break any existing implementations. If clients request the protocol option (which as the name suggests is optional), then they are expected to be able to parse it. If they don't, then they will get the old message format. So no existing implementation will be broken. If some middleware/proxy gets a request for a startup option it does not support it can advertise that to the client using the NegotiateProtocolVersion message. Allowing the client to continue in a mode where the option is not enabled. So, not bumping the major protocol version and enabling this feature through a protocol option actually causes less breakage in practice. Also regarding the wishlist. I think it's much more likely for any of those to happen in a minor version bump and/or protocol option than it is that we'll bump the major protocol version. P.S. Like I said in another email on this thread: I think for this specific case I'd also prefer a separate new message, because that makes it easier to filter that message out when received by PgBouncer. But I'd still like to understand your viewpoint better on this, because adding fields to existing message types is definitely one of the types of changes that I personally think would be fine for some protocol changes.