Re: Immediate standby promotion - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Immediate standby promotion |
Date | |
Msg-id | 20140925152934.GA21746@alap3.anarazel.de Whole thread Raw |
In response to | Re: Immediate standby promotion (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Immediate standby promotion
|
List | pgsql-hackers |
On 2014-09-24 21:36:50 +0100, Simon Riggs wrote: > On 18 September 2014 01:22, Robert Haas <robertmhaas@gmail.com> wrote: > > >> "fast" promotion was actually a supported option in r8 of Postgres but > >> this option was removed when we implemented streaming replication in > >> r9.0 > >> > >> The *rough* requirement is sane, but that's not the same thing as > >> saying this exact patch makes sense. > > > > Granted. Fair point. > > > >> If you are paused and you can see that WAL up ahead is damaged, then > >> YES, you do want to avoid applying it. That is possible by setting a > >> PITR target so that recovery stops at a precise location specified by > >> you. As an existing option is it better than the blunt force trauma > >> suggested here. > > > > You can pause at a recovery target, but then what if you want to go > > read/write at that point? Or what if you've got a time-delayed > > standby and you want to break replication so that it doesn't replay > > the DROP TABLE students that somebody ran on the master? It doesn't > > have to be that WAL is unreadable or corrupt; it's enough for it to > > contain changes you wish to avoid replaying. > > > >> If you really don't care, just shutdown server, resetxlog and start > >> her up - again, no need for new option. I think that should pretty much never be something an admin has to run. It's just about impossible to get this right. In all likelihood just running pg_resetxlog on a database in recovery will have corrupted your database. Which is why pg_resetxlog won't even let you proceed without using -f because it checks for DB_SHUTDOWNED. Rightly so. pg_resetxlog *removes* *all* existing WAL and sets the current control file state to DB_SHUTDOWNED. Thus there will be no recovery when starting afterwards. > > To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it" > > seems like a friendlier interface than making somebody shut down the > > server, run pg_resetxlog, and start it up again. > > It makes sense to go from paused --> promoted. > > It doesn't make sense to go from normal running --> promoted, since > that is just random data loss. Why? I don't see what's random in promoting a node in the current state *iff* it's currently consistent. Just imagine something like promoting a current standby to a full node because you want to run some tests on it that require writes. There's absolutely no need to investigate the current state for that. > I very much understand the case where > somebody is shouting "get the web site up, we are losing business". > Implementing a feature that allows people to do exactly what they > asked (go live now), but loses business transactions that we thought > had been safely recorded is not good. It implements only the exact > request, not its actual intention. That seems to be a problem of massively understanding on the part of the user. And I don't see how this is going to be safer by requiring the user to first issue a pause reuest. I think we should attempt to solve this by naming the command appropriately. Something like 'abort_replay_and_promote'. Long, nontrivial to type, and descriptive. > Any feature that lumps both cases together is wrongly designed and > will cause data loss. > > We go to a lot of trouble to ensure data is successfully on disk and > in WAL. I won't give that up, nor do I want to make it easier to lose > data than it already is. I think that's not really related. Such a promotion doesn't cause data loss in the sense of loosing data a *clueful* operator wanted to keep. Yes, it can be used wrongly, but it's far from alone in that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: