Re: Reporting the commit LSN at commit time - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Reporting the commit LSN at commit time
Date
Msg-id 53E44269.10809@2ndquadrant.com
Whole thread Raw
In response to Re: Reporting the commit LSN at commit time  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 08/08/2014 09:51 AM, Tom Lane wrote:

>> AFAIK we don't _have_ a fancy negotiation system in the protocol, with
>> back-and-forth exchanges of capabilities information.
> 
> Maybe it's time to invent that.  It would be positively foolish to
> create any such behavior without a protocol version bump anyway.

I was hoping it'd be easier to sneak a new message type in without a
full protocol bump. As you can imagine that makes it a ... rather larger
job.

Still, if it's unsafe to do it that way...

> Although as I said, I don't think embedding knowledge of LSNs at the
> protocol level is a good thing to begin with. 

As I said upthread, it need not necessarily be an LSN. A commit
timestamp would do the job too, if information about the last-replayed
commit timestamp was accessible on the downstream node.

It needs to be a sequence identifier that can be matched against
pg_replication_slots / pg_stat_replication or passed to a function on
the downstream end to say "wait until we're replayed to this point".

For streaming replication there's only one upstream, so there's no need
to identify it. For BDR you'd also have to identify the upstream node of
interest - probably by slot ID, or by (sysid, tlid, dboid) tuple.

In the end, it can be an opaque magic cookie. It really doesn't matter,
so long as what the client receives is a value it can pass to another Pg
instance and say "wait until you've replayed up to this, please" or
"have you replayed up to this yet?".

> Is it really necessary that this information be pushed to the client
on every commit, as opposed to the client asking for it occasionally?

I think so, yes, though I'd be glad to be proved wrong.

For the purpose of transparent failover (BDR) at least, the server
currently being written to can go away at any moment, and you should
know exactly what you're up to in order to make it safe to continue on
another server.



Consider, for a multi-master configuration where two servers replicate
to each other:

On a connection to server1:

INSERT INTO bird(id, parrot)
VALUES (1, 'African Grey');

[client grabs magic cookie for server replay state]

INSERT INTO bird(id, parrot)
VALUES (2, 'Lorikkeet');

[server1 sends changes to server2, which is behind on replayand still working on catching up]

[server1 dies abruptly]

[client drops connection to server1, connects to server2]

-- Correct spelling
UPDATE bird
SET parrot = 'Lorikeet'
WHERE id = 2;


If the INSERT from server1 hasn't replayed on server2 yet this will
fail. Other anomalies can be worse and cause lost updates, etc.

To protect against this the client needs a way to wait, after connecting
to server2, until it's caught up with the state of server1. That's what
I'm talking about here. In this case, if you used a periodic progress
indicator requested by the client, you'd still get the same error,
because you'd wait until the first INSERT but not the second.

So yes, the client needs this info at every commit.

That means that enabling client-side fail-over won't be free, especially
for lots of small transactions. It'll be cheaper if Pg can push the info
with the commit confirmation instead of the client having to request it
afterwards though.


(Note that the client risks waiting forever if server1 didn't send the
required commits before it died, but that's where application policy
decisions come in).



-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: jsonb format is pessimal for toast compression
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Introducing coarse grain parallelism by postgres_fdw.