Re: Reporting the commit LSN at commit time - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Reporting the commit LSN at commit time
Date
Msg-id 53E4227E.6030702@2ndquadrant.com
Whole thread Raw
In response to Re: Reporting the commit LSN at commit time  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 08/07/2014 11:42 PM, Robert Haas wrote:
> I doubt whether it makes sense to do this without a broader
> understanding of how the client-side failover mechanism would work.
> If we're going to add something like this, it should include libpq
> support for actually doing something useful with it.

I'm currently interested in targeting PgBouncer and PgJDBC, not libpq,
though I can see that exposing helpers for it in libpq could be useful.

The goal here is to permit a client to safely switch from one server to
another - either in a multimaster async replication system like BDR, or
routing read-only queries to hot standbys with streaming replication -
and know for sure that its last commit is visible on the server it is
now connected to.

For hot standby that means it can avoid running queries that won't see
the latest work it did if the standby is lagging - deciding to run them
on the upstream instead, or wait, as appropriate.

For BDR it'll permit the client to safely perform transparent failover
to another node and resume write operations without risking conflicts
with its own prior transactions . (I wrote some explanations about this
on -general in the thread here:
http://www.postgresql.org/message-id/84184AEF-887D-49DF-8F47-6377B1D6EE9F@gmail.com
).


Broadly, what I'm thinking of is:

* Whenever a client issues a transaction that gets a txid assigned, and
that tx commits, the server reports the LSN that includes the commit.

* The client keeps track of which server it is connected to using the
server's (sysid, timelineid, databaseoid) or a similar identifier -
probably specific to the replication protocol in use, unless something
generic proves practical.

* When the client has to switch to a new server or chooses to do so, it
checks pg_stat_replication or pg_replication_slots, finds the server it
was previously connected to, and checks to see if the new server has
replayed up to the last write transaction this client performed on the
previous server. If not, it can make a policy-driven decision: wait
until replay catchup, wait for a while then bail out, etc.

This is admittedly all a bit hand-wavey. I'm looking at ways to do it,
not a firm implementation plan.

Notably, the LSN (and timelineID) aren't the only way to keep track of
the replay progress of a server and check it from another server. If the
commit timestamps work is merged and the timestamp of the last replayed
commit record is exposed in pg_replication_slots, the client could use
the server-reported commit timestamp to the same effect.


In the above you'll note that the client has to make some choices. The
client might be picking different servers for failover, read load
spreading, or other things I haven't thought of. It might be retaining
the old connection and making new ones it wants to be consistent up to a
certain point on the old connection (read spreading), or it might be
dropping the old connection and making a new one (failover). If the new
server to connect to isn't caught up yet it might want to wait
indefinitely, wait a short while, or bail out immediately and try a
different server. There's a lot of client/application specific policy
going to be involved here, so I'm not sure it makes sense to try to make
it transparent in libpq. I can see it being useful to expose some tools
in libpq for it, without a doubt, so clients can do these sorts of
things usefully.

(There's also another whole new question: how do you pick which
alternative server to connect to? But that's not really within the scope
of this.)

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Reporting the commit LSN at commit time
Next
From: David Johnston
Date:
Subject: Re: Fixed redundant i18n strings in json