Thread: issue with track_commit_timestamp and server restart
I just noticed that if track_commit_timestamp is enabled, the oldestCommitTsXid and newestCommitTsXid don't persist after a server restart, so you can't ask for the commit ts of a transaction that committed before the last server start, although the information is still available (unless of course if a freeze occured). AFAICT it always behave this way. I'm not familiar with that code, but it looks like these counters are supposed to be restored in StartupXLOG(), with a call to SetCommitTsLimit(). However, this function doesn't store the new value if ShmemVariableCache->oldestCommitTsXid is InvalidTransactionId, which is the initial value. So the counters are initialized on a subsequent call of ActivateCommitTs(), which does: if (ShmemVariableCache->oldestCommitTsXid == InvalidTransactionId) { ShmemVariableCache->oldestCommitTsXid = ShmemVariableCache->newestCommitTsXid = ReadNewTransactionId(); } (but doesn't try to cleanup the SLRU directory btw) leaving any previous transaction unreachable. Simple attached patch seems to fix the issue. I tried on a master and a replica, enabling and disabling track_commit_timestamp, and everything seemed to work as intended. -- Julien Rouhaud http://dalibo.com - http://dalibo.org
Attachment
On 22 October 2016 at 19:51, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: > I just noticed that if track_commit_timestamp is enabled, the > oldestCommitTsXid and newestCommitTsXid don't persist after a server > restart, so you can't ask for the commit ts of a transaction that > committed before the last server start, although the information is > still available (unless of course if a freeze occured). AFAICT it > always behave this way. I initially could'n't see this when tested on 8f1fb7d with a src/test/recovery/t test script. But it turns out it's OK on immediate shutdown and crash recovery, but not on clean shutdown and restart. The attached patch adds a TAP test to src/test/recovery to show this. If you run the TAP test before recompiling with the fix it'll fail. "make" to apply the fix, then re-run and it'll succeed. Or just temporarily roll back the fix with: git checkout HEAD^1 -- src/backend/access/transam/commit_ts.c git reset src/backend/access/transam/commit_ts.c and rebuild to see it fail. To save time running the recovery suite, just rm src/test/recovery/00[0-8]*.pl (It'd be nice to have a prove_check target to run just one test file). This would explain a few issues I've seen reported with BDR from the community which I've so far been unable to reproduce, so thanks very much for the report. Álvaro, would you mind checking this and committing to HEAD and 9.6? The commits.c change only should also be committed to 9.5, but not the TAP test. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
(thread hijacking) On Mon, Oct 24, 2016 at 1:58 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > To save time running the recovery suite, just > > rm src/test/recovery/00[0-8]*.pl > > (It'd be nice to have a prove_check target to run just one test file). Agreed! Or multiple chosen files. I usually remove those files manually from time to time when working on a fix just to run one dedicated test repeatedly. -- Michael
On 24 October 2016 at 12:58, Craig Ringer <craig@2ndquadrant.com> wrote: > The attached patch adds a TAP test to src/test/recovery to show this. Added to CF. https://commitfest.postgresql.org/11/834/ ... but IMO this should go in the next bugfix release. I've also applied nearly the same fix for the same bug in the original commit timestamp implementation in the BDR Postgres tree. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 24/10/2016 06:58, Craig Ringer wrote: > On 22 October 2016 at 19:51, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: >> I just noticed that if track_commit_timestamp is enabled, the >> oldestCommitTsXid and newestCommitTsXid don't persist after a server >> restart, so you can't ask for the commit ts of a transaction that >> committed before the last server start, although the information is >> still available (unless of course if a freeze occured). AFAICT it >> always behave this way. > > I initially could'n't see this when tested on 8f1fb7d with a > src/test/recovery/t test script. But it turns out it's OK on immediate > shutdown and crash recovery, but not on clean shutdown and restart. > > The attached patch adds a TAP test to src/test/recovery to show this. > If you run the TAP test before recompiling with the fix it'll fail. > "make" to apply the fix, then re-run and it'll succeed. Or just > temporarily roll back the fix with: > > git checkout HEAD^1 -- src/backend/access/transam/commit_ts.c > git reset src/backend/access/transam/commit_ts.c > > and rebuild to see it fail. > > To save time running the recovery suite, just > > rm src/test/recovery/00[0-8]*.pl > > (It'd be nice to have a prove_check target to run just one test file). > > This would explain a few issues I've seen reported with BDR from the > community which I've so far been unable to reproduce, so thanks very > much for the report. > > Álvaro, would you mind checking this and committing to HEAD and 9.6? > The commits.c change only should also be committed to 9.5, but not the > TAP test. > Thanks a lot for the review, and adding the tests! -- Julien Rouhaud http://dalibo.com - http://dalibo.org
Craig Ringer wrote: > I initially could'n't see this when tested on 8f1fb7d with a > src/test/recovery/t test script. But it turns out it's OK on immediate > shutdown and crash recovery, but not on clean shutdown and restart. Oh, oops. > The attached patch adds a TAP test to src/test/recovery to show this. I moved it to src/test/modules/commit_ts and pushed. src/test/recovery is not currently run by buildfarm, except hamster, unless the bf client has been patched since I last looked. But that's not the reason I moved it, but rather because the other location seemed better. I also fixed the wrongly pasted comment at the top of the file while at it. If this disrupts the buildfarm in any way I will just revert it immediately. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Now let's discuss a release note entry for 9.5 and 9.6, shall we? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Now let's discuss a release note entry for 9.5 and 9.6, shall we? The text in your commit message seems sufficient from here. On it now (although if somebody doesn't fix anon git PDQ, we're not releasing today anyway). regards, tom lane