On Sun, 2009-11-15 at 14:43 +0200, Heikki Linnakangas wrote:
> This isn't absolutely necessary for the first version, but it's
> something to keep in mind...
Do I take that as agreement to the phased plan?
> In general, I'd like to remove as many as possible of those cases
> where the standby starts up, and can't open up for connections. It
> makes the standby a lot less useful if you can't rely on it being
> open. So I'd like to make it so that the standby can *always* open up.
Yes, of course. The only reason for restrictions being acceptable is
that we have 99% of what we want, yet may lose everything if we play for
100% too quickly.
The standby will open quickly in many cases, as is. There are also a
range of other ways of doing this.
> There's currently three cases where that can happen:
>
> 1. If the subxid cache has overflown.
>
> 2. If there's no running-xacts record after the checkpoint record for
> some reason. For example, one was written but not archive yet, or
> because the master crashed before it was written.
>
> 3. If too many AccessExclusiveLocks was being held.
>
> Case 3 should be pretty easy to handle. Just need to WAL log all the
> AccessExclusiveLocks, perhaps as separate WAL records (we already have
> a
> new WAL record type for logging locks) if we're worried about the
> running-xacts record growing too large. I think we could handle case 2
> if we wrote the running-xacts record *before* the checkpoint record.
> Then it would be always between the REDO pointer of the checkpoint
> record, and the checkpoint record itself, so it would always be seen
> by
> the WAL recovery. To handle case 1, we could scan pg_subtrans. It
> would
> add some amount of code and would add some more work to taking the
> running-xacts snapshot, but it could be done.
"Some amount of code" requires some amount of thought, followed by some
amount of review which takes some amount of time.
-- Simon Riggs www.2ndQuadrant.com