On Thu, 14 Mar 2002, Tom Lane wrote:
>
> If you do a SELECT nextval() and then use the returned value externally
> *without waiting for a commit acknowledgement*, then I think you are
> risking trouble; there's no guarantee that the WAL record (if one is
> needed) has hit disk yet, and so a crash could roll back the sequence.
>
> This isn't an issue for a SELECT nextval() standing on its own ---
> AFAIK the result will not be transmitted to the client until after the
> commit happens. But it would be an issue for a select executed inside
> a transaction block (begin/commit).
>
The behavior of SELECT nextval() should not be conditional on being in or
out of a transaction block. What you implying by saying that the data is
that it would be possible to rollback an uncommited call to nextval().
Am I missing some terminology?
I think I finally realized why my old patch that forces a log right off
the bat works to fix at least part of the problem. When the database
is shutdown properly all of the sequences that are in memory are written
back to disk in their state at that time. But the problem with that is
that their state at that time can have log_cnt > 0. This is why after
startup that the sequence in memory is 'behind' the one on disk, the
code sees log > fetch and doesn't log. When you really think about it
log_cnt should not be part of the sequence record at all since there
is never a valid case for storing a log_cnt on disk with a value other
than 0.
Maybe the purpose for the on disk value of log_cnt should be changed?
It could be the value used in place of the static SEQ_LOG_VALS, which
could then be definable on a per sequence basis. And then log_cnt
could be moved into elm->log_cnt. Anyway, that's just a thought.
Here's my latest patch to work around the problem. If there is a way
to prevent log_cnt from being written out with a value greater than
zero, that would be better than this. With this behavior log_cnt is
reset to 0 each time a backend accesses a sequence for the first time.
That's probably overkill... But I still believe that the XLogFlush
after XLogInsert is necessary to ensure that the WAL value is written
to disk immediately. In my testing this patch works fine, YMMV.
-- Ben
*** src/backend/commands/sequence.c.orig Tue Mar 12 18:58:55 2002
--- src/backend/commands/sequence.c Thu Mar 14 17:34:25 2002
***************
*** 62,67 ****
--- 62,68 ---- int64 cached; int64 last; int64 increment;
+ bool reset_logcnt; struct SeqTableData *next; } SeqTableData;
***************
*** 270,275 ****
--- 271,277 ----
PageSetLSN(page, recptr); PageSetSUI(page, ThisStartUpID);
+ XLogFlush(recptr); } END_CRIT_SECTION();
***************
*** 314,321 **** PG_RETURN_INT64(elm->last); }
! seq = read_info("nextval", elm, &buf); /* lock page' buffer and
! * read tuple */
last = next = result = seq->last_value; incby = seq->increment_by;
--- 316,322 ---- PG_RETURN_INT64(elm->last); }
! seq = read_info("nextval", elm, &buf); /* lock page' buffer and read tuple */
last = next = result = seq->last_value; incby = seq->increment_by;
***************
*** 331,339 **** log--; }
! if (log < fetch) {
! fetch = log = fetch - log + SEQ_LOG_VALS; logit = true; }
--- 332,340 ---- log--; }
! if (log < fetch) {
! fetch = log = fetch - log + SEQ_LOG_VALS * cache; logit = true; }
***************
*** 403,409 **** rdata[0].data = (char *) &xlrec; rdata[0].len = sizeof(xl_seq_rec);
rdata[0].next = &(rdata[1]);
! seq->last_value = next; seq->is_called = true; seq->log_cnt = 0;
--- 404,410 ---- rdata[0].data = (char *) &xlrec; rdata[0].len = sizeof(xl_seq_rec);
rdata[0].next = &(rdata[1]);
! seq->last_value = next; seq->is_called = true; seq->log_cnt = 0;
***************
*** 417,423 ****
PageSetLSN(page, recptr); PageSetSUI(page, ThisStartUpID);
! if (fetch) /* not all numbers were fetched */ log -=
fetch; }
--- 418,425 ----
PageSetLSN(page, recptr); PageSetSUI(page, ThisStartUpID);
! XLogFlush(recptr);
! if (fetch) /* not all numbers were fetched */ log -=
fetch; }
***************
*** 507,513 **** XLogRecPtr recptr; XLogRecData rdata[2]; Page
page = BufferGetPage(buf);
! xlrec.node = elm->rel->rd_node; rdata[0].buffer = InvalidBuffer;
rdata[0].data= (char *) &xlrec;
--- 509,516 ---- XLogRecPtr recptr; XLogRecData rdata[2]; Page
page = BufferGetPage(buf);
!
! xlrec.node = elm->rel->rd_node; rdata[0].buffer = InvalidBuffer;
rdata[0].data= (char *) &xlrec;
***************
*** 527,532 ****
--- 530,536 ----
PageSetLSN(page, recptr); PageSetSUI(page, ThisStartUpID);
+ XLogFlush(recptr); } /* save info in sequence relation */ seq->last_value = next;
/* last fetched number */
***************
*** 660,665 ****
--- 664,674 ----
seq = (Form_pg_sequence) GETSTRUCT(&tuple);
+ if (elm->reset_logcnt)
+ {
+ seq->log_cnt = 0;
+ elm->reset_logcnt = false;
+ } elm->increment = seq->increment_by;
return seq;
***************
*** 703,708 ****
--- 712,718 ---- name, caller); elm->relid =
RelationGetRelid(seqrel); elm->cached = elm->last = elm->increment = 0;
+ elm->reset_logcnt = true; } } else
***************
*** 721,726 ****
--- 731,737 ---- elm->relid = RelationGetRelid(seqrel); elm->cached = elm->last =
elm->increment= 0; elm->next = (SeqTable) NULL;
+ elm->reset_logcnt = true;
if (seqtab == (SeqTable) NULL) seqtab = elm;