pgsql: Fix longstanding crash-safety bug with newly-created-or-reset se - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Fix longstanding crash-safety bug with newly-created-or-reset se
Date
Msg-id E1Su9MG-000300-4T@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix longstanding crash-safety bug with newly-created-or-reset sequences.

If a crash occurred immediately after the first nextval() call for a serial
column, WAL replay would restore the sequence to a state in which it
appeared that no nextval() had been done, thus allowing the first sequence
value to be returned again by the next nextval() call; as reported in
bug #6748 from Xiangming Mei.

More generally, the problem would occur if an ALTER SEQUENCE was executed
on a freshly created or reset sequence.  (The manifestation with serial
columns was introduced in 8.2 when we added an ALTER SEQUENCE OWNED BY step
to serial column creation.)  The cause is that sequence creation attempted
to save one WAL entry by writing out a WAL record that made it appear that
the first nextval() had already happened (viz, with is_called = true),
while marking the sequence's in-database state with log_cnt = 1 to show
that the first nextval() need not emit a WAL record.  However, ALTER
SEQUENCE would emit a new WAL entry reflecting the actual in-database state
(with is_called = false).  Then, nextval would allocate the first sequence
value and set is_called = true, but it would trust the log_cnt value and
not emit any WAL record.  A crash at this point would thus restore the
sequence to its post-ALTER state, causing the next nextval() call to return
the first sequence value again.

To fix, get rid of the idea of logging an is_called status different from
reality.  This means that the first nextval-driven WAL record will happen
at the first nextval call not the second, but the marginal cost of that is
pretty negligible.  In addition, make sure that ALTER SEQUENCE resets
log_cnt to zero in any case where it touches sequence parameters that
affect future nextval results.  This will result in some user-visible
changes in the contents of a sequence's log_cnt column, as reflected in the
patch's regression test changes; but no application should be depending on
that anyway, since it was already true that log_cnt changes rather
unpredictably depending on checkpoint timing.

In addition, make some basically-cosmetic improvements to get rid of
sequence.c's undesirable intimacy with page layout details.  It was always
really trying to WAL-log the contents of the sequence tuple, so we should
have it do that directly using a HeapTuple's t_data and t_len, rather than
backing into it with some magic assumptions about where the tuple would be
on the sequence's page.

Back-patch to all supported branches.

Branch
------
REL9_1_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/aa7cd144062bef3baf8fd4548c0f54bb108c3270

Modified Files
--------------
src/backend/commands/sequence.c          |  165 +++++++++++++++++-------------
src/test/regress/expected/sequence.out   |    4 +-
src/test/regress/expected/sequence_1.out |    4 +-
3 files changed, 99 insertions(+), 74 deletions(-)


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Fix longstanding crash-safety bug with newly-created-or-reset se
Next
From: Tom Lane
Date:
Subject: pgsql: Fix longstanding crash-safety bug with newly-created-or-reset se