SIGQUIT and lost sequence WAL records - Mailing list pgsql-bugs

From Michael Paquier
Subject SIGQUIT and lost sequence WAL records
Date
Msg-id CAB7nPqQ6+0LNkitj507eDBMh-OBdm_1_X-LmLOQfsjiWXe9SvA@mail.gmail.com
Whole thread Raw
Responses Re: SIGQUIT and lost sequence WAL records  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hi all,

Lately a couple of colleagues have pinged me regarding the fact that a
server signaled with SIGQUIT, or "pg_ctl stop -m immediate", may
ignore sequence WAL records. See for example the script attached
seqtest.py (simplified test case provided from my colleague Nikhil
Deshpande), that will fail while for example the following command is
run in parallel:
for i in {1..1000}; do pg_ctl restart -m immediate -w; sleep 1; done
The failure happening being for example:
AssertionError: DB returned reused sequence value prev=1118 curr=958
Here it means that the last return value of nextval() was 958, even if
the highest value ever fetched was 1118. Note that this test script
uses psycopg2, but that it does *not* commit transactions on purpose.
Also note that those WAL records do not disappear all the time.
Sometimes they are here and the sequence keeps increasing monotically
after server is restarted.

Trying to find out if there were records ignored I added some logs in
the server thanks to the patch attached. In the case of this last
failure there have been a couple of records after the last one found
at recovery:
WARNING:  Logged new sequence value at 0/1505570 for nextval = 957
WARNING:  Logged new sequence value at 0/15066D8 for nextval = 990
WARNING:  Logged new sequence value at 0/1507828 for nextval = 1023
WARNING:  Logged new sequence value at 0/1508990 for nextval = 1056
WARNING:  Logged new sequence value at 0/1509AE0 for nextval = 1089
WARNING:  Logged new sequence value at 0/150AC48 for nextval = 1122
And if I looked at the WAL records those were not of course not
present. That's not really surprising as recovery didn't use them to
restore the sequence up to where it should have.

func.sgml mentions that a nextval() operation is *never* rolled back
even if a transaction is aborted, though in this case this is exactly
what is happening, even if when the server is stopped the existing
transactions running are forcibly aborted.

I am lacking of fuel to look more at that today, but at this point I
wanted to get some input to see if I am missing something obvious. So,
code bug or documentation bug?
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: BUG #13440: unaccent does not remove all diacritics
Next
From: Tom Lane
Date:
Subject: Re: SIGQUIT and lost sequence WAL records