Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I started to look at when this nice code was added to determine if this
> was part of the original design or added later and found you wrote it
> yourself, so I guess we don't have to ask anyone to make sure there
> isn't something were are missing.
As far as I can recall my thinking at the time, it went like so:
"We *should* be able to accept a cancel interrupt anywhere we are not
actually in the midst of modifying shared-memory data structures,
because after all the database system is supposed to be robust against
crashes, and those could happen anyplace".
But the fallacy in equating a cancel to a crash is that we have rather
extensive logic for coping with a crash (including reinitializing shared
memory from scratch). A cancel will only provoke elog cleanup, which is
not nearly as thorough. For example, it's not obvious that shared
memory structures that are protected by different locks couldn't get out
of sync.
BTW, I spent some time yesterday trying to use this worry to explain my
latest favorite bugaboo, the duplicate-rows complaints we've gotten from
a few people. It is easy to see that a cancel being accepted at the
right place (exit from the first WriteBuffer in heap_update) could leave
an updated tuple created and its buffer marked dirty, while the old
tuple's buffer is not yet marked dirty and might therefore be discarded
unwritten. (The WAL entry is correct but will never be consulted unless
there's a crash.) However, this scenario doesn't seem to explain the
failures because the cancel would lead to transaction abort, so the
updated tuple should never be considered good anyway. Back to the
drawing board...
regards, tom lane