Thread: Re: [BUGS] Bug #613: Sequence values fall back to previously chec

Re: [BUGS] Bug #613: Sequence values fall back to previously chec

From
Tom Lane
Date:
"Tom Pfau" <T.Pfau@emCrit.com> writes:
> I'm concerned that the discussion here has been of the opinion that
> since no records were written to the database using the value retrieved
> from the sequence that no damage has been done.

Um, you certainly didn't hear me saying that ;-)

There are two different bugs involved here.  One is the no-WAL-flush-
if-transaction-is-only-nextval problem.  AFAIK everyone agrees we must
fix that.  The other issue is this business about "logging ahead"
(to reduce the number of WAL records written) not interacting correctly
with checkpoints.  What we're arguing about is exactly how to fix that
part.

> We are using database
> sequences to get unique identifiers for things outside the database.  If
> a sequence could ever under any circumstances reissue a value, this
> could be damaging to the integrity of our software.

If you do a SELECT nextval() and then use the returned value externally
*without waiting for a commit acknowledgement*, then I think you are
risking trouble; there's no guarantee that the WAL record (if one is
needed) has hit disk yet, and so a crash could roll back the sequence.

This isn't an issue for a SELECT nextval() standing on its own ---
AFAIK the result will not be transmitted to the client until after the
commit happens.  But it would be an issue for a select executed inside
a transaction block (begin/commit).
        regards, tom lane


Re: [BUGS] Bug #613: Sequence values fall back to previously chec

From
"Dave Cramer"
Date:
I noticed a message asking if this scenario was consistent with the
other reports, and yes it is. We have seen this occuring on our system
with versions as old as 7.0.

Glad to see someone has finally nailed this one.

Dave



Re: [BUGS] Bug #613: Sequence values fall back to previously chec

From
Tom Lane
Date:
"Dave Cramer" <dave@fastcrypt.com> writes:
> I noticed a message asking if this scenario was consistent with the
> other reports, and yes it is. We have seen this occuring on our system
> with versions as old as 7.0.

Given that these are WAL bugs, they could not predate 7.1.
        regards, tom lane


Re: [BUGS] Bug #613: Sequence values fall back to previously chec

From
Ben Grimm
Date:
On Thu, 14 Mar 2002, Tom Lane wrote:
> 
> If you do a SELECT nextval() and then use the returned value externally
> *without waiting for a commit acknowledgement*, then I think you are
> risking trouble; there's no guarantee that the WAL record (if one is
> needed) has hit disk yet, and so a crash could roll back the sequence.
> 
> This isn't an issue for a SELECT nextval() standing on its own ---
> AFAIK the result will not be transmitted to the client until after the
> commit happens.  But it would be an issue for a select executed inside
> a transaction block (begin/commit).
> 

The behavior of SELECT nextval() should not be conditional on being in or 
out of a transaction block.  What you implying by saying that the data is 
that it would be possible to rollback an uncommited call to nextval().
Am I missing some terminology?  

I think I finally realized why my old patch that forces a log right off 
the bat works to fix at least part of the problem.  When the database 
is shutdown properly all of the sequences that are in memory are written 
back to disk in their state at that time.   But the problem with that is
that their state at that time can have log_cnt > 0.  This is why after
startup that the sequence in memory is 'behind' the one on disk, the 
code sees log > fetch and doesn't log.  When you really think about it
log_cnt should not be part of the sequence record at all since there
is never a valid case for storing a log_cnt on disk with a value other 
than 0.  

Maybe the purpose for the on disk value of log_cnt should be changed?
It could be the value used in place of the static SEQ_LOG_VALS, which 
could then be definable on a per sequence basis.  And then log_cnt 
could be moved into elm->log_cnt.  Anyway, that's just a thought.

Here's my latest patch to work around the problem.  If there is a way 
to prevent log_cnt from being written out with a value greater than
zero, that would be better than this.  With this behavior log_cnt is 
reset to 0 each time a backend accesses a sequence for the first time.
That's probably overkill...  But I still believe that the XLogFlush 
after XLogInsert is necessary to ensure that the WAL value is written 
to disk immediately.  In my testing this patch works fine, YMMV.

-- Ben



*** src/backend/commands/sequence.c.orig        Tue Mar 12 18:58:55 2002
--- src/backend/commands/sequence.c     Thu Mar 14 17:34:25 2002
***************
*** 62,67 ****
--- 62,68 ----       int64           cached;       int64           last;       int64           increment;
+       bool            reset_logcnt;       struct SeqTableData *next; } SeqTableData;

***************
*** 270,275 ****
--- 271,277 ----
               PageSetLSN(page, recptr);               PageSetSUI(page, ThisStartUpID);
+               XLogFlush(recptr);       }       END_CRIT_SECTION();

***************
*** 314,321 ****               PG_RETURN_INT64(elm->last);       }

!       seq = read_info("nextval", elm, &buf);          /* lock page' buffer and
!                                                                                                * read tuple */
       last = next = result = seq->last_value;       incby = seq->increment_by;
--- 316,322 ----               PG_RETURN_INT64(elm->last);       }

!       seq = read_info("nextval", elm, &buf);          /* lock page' buffer and read tuple */
       last = next = result = seq->last_value;       incby = seq->increment_by;
***************
*** 331,339 ****               log--;       }

!       if (log < fetch)       {
!               fetch = log = fetch - log + SEQ_LOG_VALS;               logit = true;       }

--- 332,340 ----               log--;       }

!       if (log < fetch)       {
!               fetch = log = fetch - log + SEQ_LOG_VALS * cache;               logit = true;       }

***************
*** 403,409 ****               rdata[0].data = (char *) &xlrec;               rdata[0].len = sizeof(xl_seq_rec);
      rdata[0].next = &(rdata[1]);
 
!               seq->last_value = next;               seq->is_called = true;               seq->log_cnt = 0;
--- 404,410 ----               rdata[0].data = (char *) &xlrec;               rdata[0].len = sizeof(xl_seq_rec);
      rdata[0].next = &(rdata[1]);
 
!               seq->last_value = next;               seq->is_called = true;               seq->log_cnt = 0;
***************
*** 417,423 ****
               PageSetLSN(page, recptr);               PageSetSUI(page, ThisStartUpID);
!               if (fetch)                              /* not all numbers were fetched */                       log -=
fetch;      }
 
--- 418,425 ----
               PageSetLSN(page, recptr);               PageSetSUI(page, ThisStartUpID);
!               XLogFlush(recptr);
!               if (fetch)                              /* not all numbers were fetched */                       log -=
fetch;      }
 
***************
*** 507,513 ****               XLogRecPtr      recptr;               XLogRecData rdata[2];               Page
page = BufferGetPage(buf);
 
!               xlrec.node = elm->rel->rd_node;               rdata[0].buffer = InvalidBuffer;
rdata[0].data= (char *) &xlrec;
 
--- 509,516 ----               XLogRecPtr      recptr;               XLogRecData rdata[2];               Page
page = BufferGetPage(buf);
 
!
!               xlrec.node = elm->rel->rd_node;               rdata[0].buffer = InvalidBuffer;
rdata[0].data= (char *) &xlrec;
 
***************
*** 527,532 ****
--- 530,536 ----
               PageSetLSN(page, recptr);               PageSetSUI(page, ThisStartUpID);
+               XLogFlush(recptr);       }       /* save info in sequence relation */       seq->last_value = next;
   /* last fetched number */
 
***************
*** 660,665 ****
--- 664,674 ----
       seq = (Form_pg_sequence) GETSTRUCT(&tuple);

+       if (elm->reset_logcnt)
+       {
+               seq->log_cnt = 0;
+               elm->reset_logcnt = false;
+       }       elm->increment = seq->increment_by;
       return seq;
***************
*** 703,708 ****
--- 712,718 ----                                name, caller);                       elm->relid =
RelationGetRelid(seqrel);                      elm->cached = elm->last = elm->increment = 0;
 
+                       elm->reset_logcnt = true;               }       }       else
***************
*** 721,726 ****
--- 731,737 ----               elm->relid = RelationGetRelid(seqrel);               elm->cached = elm->last =
elm->increment= 0;               elm->next = (SeqTable) NULL;
 
+               elm->reset_logcnt = true;
               if (seqtab == (SeqTable) NULL)                       seqtab = elm;