Re: Re: Hot Standby query cancellation and Streaming Replication integration - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Re: Hot Standby query cancellation and Streaming Replication integration
Date
Msg-id 4B8A0818.5000100@2ndquadrant.com
Whole thread Raw
In response to Re: Re: Hot Standby query cancellation and Streaming Replication integration  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Re: Hot Standby query cancellation and Streaming Replication integration
List pgsql-hackers
Robert Haas wrote:
> It seems to me that if we're forced to pass the xmin from the
> slave back to the master, that would be a huge step backward in terms
> of both scalability and performance, so I really hope it doesn't come
> to that.

Not forced to--have the option of.  There are obviously workloads where 
you wouldn't want this.  At the same time, I think there are some pretty 
common ones people are going to expect HS+SR to work on transparently 
where this would obviously be the preferred trade-off to make, were it 
available as one of the options.  The test case I put together shows an 
intentionally pathological but not completely unrealistic example of 
such a workload.

> I wish I understood better exactly what you mean by "the
> notion of synchronizing the WAL stream against slave queries" and why
> you don't think it will work.  Can you elaborate?
>   

There's this constant WAL stream coming in from the master to the 
slave.  Each time the slave is about to apply a change from that stream, 
it considers "will this disrupt one of the queries I'm already 
executing?".  If so, it has to make a decision about what to do; that's 
where the synchronization problem comes from.

The current two options are "delay applying the change", at which point 
the master and standby will drift out of sync until the query ends and 
it can catch back up, or "cancel the query".  There are tunables for 
each of these, and they all seem to work fine (albeit without too much 
testing in the field yet).  My concern is that the tunable that tries to 
implement the other thing you might want to optimize for--"avoid letting 
the master generate WAL entires that are the most likely ones to 
conflict"--just isn't very usable in its current form.

Tom and I don't see completely eye to eye on this, in that I'm not so 
sure the current behaviors are "fundamentally wrong and we will never be 
able to make [them] work".  If that's really the case, you may not ever 
get the scalability/performance results you're hoping for from this 
release, and really we're all screwed if those are the only approaches 
available.

What I am sure of is that a SR-based xmin passing approach is simpler, 
easier to explain, more robust for some common workloads, and less 
likely to give surprised "wow, I didn't think *that* would cancel my 
standby query" reports from the field than any way you can configure Hot 
Standby alone right now.  And since I never like to bet against Tom's 
gut feel, having it around as a "plan B" in case he's right about an 
overwhelming round of bug reports piling up against the 
max_standby_delay etc. logic doesn't hurt either.

I spent a little time today seeing if there was any interesting code I 
might steal from the early "synchrep" branch at 
http://git.postgresql.org/gitweb?p=users/fujii/postgres.git;a=summary , 
but sadly when I tried to rebase that against the master to separate out 
just the parts unique to it the merge conflicts were overwhelming.  I 
hate getting beaten by merge bitrot even when Git is helping.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



pgsql-hackers by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Lock Wait Statistics (next commitfest)
Next
From: Greg Smith
Date:
Subject: Re: Hot Standby query cancellation and Streaming Replication integration