On Tue, Dec 3, 2013 at 5:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > > - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and > > XLOG_XACT_COMMIT_COMPACT checks > > Why just those? Why not aborts and restore points also? >
I think make no sense execute the delay after aborts and/or restore points, because it not change data in a standby server.
> > - don't care about clockdrift because it's an admin problem. > > Few minor points on things > > * The code with comment "Clear any previous recovery delay time" is in > wrong place, move down or remove completely. Setting the delay to zero > doesn't prevent calling recoveryDelay(), so that logic looks wrong > anyway. >
Fixed.
> * The loop exit in recoveryDelay() is inelegant, should break if <= 0 >
Fixed.
> * There's a spelling mistake in sample >
Fixed.
> * The patch has whitespace in one place >
Fixed.
> and one important point... > > * The delay loop happens AFTER we check for a pause. Which means if > the user notices a problem on a commit, then hits pause button on the > standby, the pause will have no effect and the next commit will be > applied anyway. Maybe just one commit, but its an off by one error > that removes the benefit of the patch. So I think we need to test this > again after we finish delaying > > if (xlogctl->recoveryPause) > recoveryPausesHere(); >
Fixed.
> > We need to explain in the docs that this is intended only for use in a > live streaming deployment. It will have little or no meaning in a > PITR. >
Fixed.
> I think recovery_time_delay should be called > <something>_apply_delay > to highlight the point that it is the apply of records that is > delayed, not the receipt. And hence the need to document that sync rep > is NOT slowed down by setting this value. >
Fixed.
> And to make the name consistent with other parameters, I suggest > min_standby_apply_delay >
I agree. Fixed!
> We also need to document caveats about the patch, in that it only > delays on timestamped WAL records and other records may be applied > sooner than the delay in some circumstances, so it is not a way to > avoid all cancellations. > > We also need to document the behaviour of the patch is to apply all > data received as quickly as possible once triggered, so the specified > delay does not slow down promoting the server to a master. That might > also be seen as a negative behaviour, since promoting the master > effectively sets recovery_time_delay to zero. > > I will handle the additional documentation, if you can update the > patch with the main review comments. Thanks. >