On Fri, Jul 22, 2022 at 1:22 AM 王伟(学弈) <rogers.ww@alibaba-inc.com> wrote:
> I recently find this problem while testing PG14 with sysbench.
The line numbers from your stack trace don't match up with
REL_14_STABLE. Is this actually a fork of Postgres 14? (Oh, looks like
it's an old beta release.)
> Then I look through the emails from pgsql-hackers and find a previous similary bug which is
https://www.postgresql.org/message-id/flat/2247102.1618008027%40sss.pgh.pa.us.But the bugfix
commit(34f581c39e97e2ea237255cf75cccebccc02d477)is already patched to PG14.
It does seem possible that there is another similar bug somewhere --
another case where we were protected by the fact that VACUUM acquired
a full cleanup lock (not just an exclusive buffer lock) during its
second heap pass. That changed in Postgres 14 (commit 8523492d4e). But
I really don't know -- almost anything is possible.
> I'm wondering whether there's another code path to lead this problem happened. Since, I take a deep dig via gdb which
turnsout that newbuffer is not euqal to buffer. In other words, the function RelationGetBufferForTuple must have been
calledjust now.
> Besides, why didn't we re-check the flag after RelationGetBufferForTuple was called?
Recheck what flag? And at what point? It's not easy to figure this out
from your stack trace, because of the line number issues.
It would also be helpful if you told us about the specific table
involved. Though the important thing (the essential thing) is to test
today's REL_14_STABLE. There have been *lots* of bug fixes since
Postgres 14 beta2 was current.
--
Peter Geoghegan