Re: Transaction Snapshots and Hot Standby - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Transaction Snapshots and Hot Standby
Date
Msg-id 1222328406.4445.711.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Transaction Snapshots and Hot Standby  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Wed, 2008-09-24 at 21:19 -0400, Bruce Momjian wrote:
> Simon Riggs wrote:
> > 2. Master ignores Standby's OldestXmin
> > Effects:
> > * Long running queries on standby...
> >    Have no effect on primary
> >    Can delay apply of WAL records on standby
> > * Queries on standby give consistent answers in all cases.
> 
> Just for clarification, if you set a max_slave_delay it means it is the
> maximum amount of time WAL replay can be delayed on the slave, _and_ it
> is the maximum amount of time a query/snapshot can be guaranteed to run
> without the possibility of being canceled.  My point is that these two
> concepts are linked to the same setting.

I didn't even see them as separate, but now you mention it they could
be.

Startup process may need to wait up to max_slave_delay before applying
WAL, if it sees that it must cancel a query to continue. max_slave_delay
is set in postgresql.conf and honoured by the startup process.

What is not yet defined is whether max_slave_delay is a setting
per-blockage event, or a total time that could occur because of a single
long blockage or many smaller ones. The latter makes more sense, but
would only be sensible to calculate it when we have WAL streaming using
calculations similar to the bgwriter delay calculations. That definition
makes sense for the sysadmin because it is the amount of extra time a
standby could take to startup after a failover.

If we take max_slave_delay to mean the latter then there is no guarantee
for user queries, since we may already have used up all our time waiting
on one query and there may be no wait time left for the current user
query. It would depend heavily on the rows accessed, so queries might
easily run to completion even though they exceed the max_slave_delay.
That makes sense because if you are asking a question like "what is the
current total of widgets available" you understand that answer changes
quickly over time, whereas the "how many widgets were reordered on day
X" doesn't change over time at all.

If you define this as a "per wait event" setting it provides a guarantee
to the user queries, but also puts WAL apply into free-fall since you
can't control number of blockage points queries might cause.

>From what Merlin was saying, it would be sensible to have multiple
slaves: one with a max_slave_delay of 30 seconds to be your HA
fast-startup standby and another where we set max_slave_delay to 5 hours
to guarantee execution time for large reporting queries. Anyway, its
clear that max_slave_delay needs to be settable while running.

>From my perspective, all I can say is "all things are possible" and I
appreciate the need for options to satisfy different use cases. I'll
build the basic mechanics and then we can add fine tuning over time. So
I'll get the basic mechanics in place, suggest we observe how that works
and then decide on the control mechanisms because they sound relatively
simple to add.

Another important note is Hannu's suggestion of using snapshot
filesystems with Postgres. We can include that feature very quickly and
it will complement Hot Standby very well.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Zdenek Kotala
Date:
Subject: Re: FSM, now without WAL-logging
Next
From: Heikki Linnakangas
Date:
Subject: Re: FSM, now without WAL-logging