Re: Are we accepting cancel interrupts too often? - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Are we accepting cancel interrupts too often?
Date
Msg-id 200112311641.fBVGfuP28721@candle.pha.pa.us
Whole thread Raw
In response to Re: Are we accepting cancel interrupts too often?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Are we accepting cancel interrupts too often?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I started to look at when this nice code was added to determine if this
> > was part of the original design or added later and found you wrote it
> > yourself, so I guess we don't have to ask anyone to make sure there
> > isn't something were are missing.
> 
> As far as I can recall my thinking at the time, it went like so:
> "We *should* be able to accept a cancel interrupt anywhere we are not
> actually in the midst of modifying shared-memory data structures,
> because after all the database system is supposed to be robust against
> crashes, and those could happen anyplace".
> 
> But the fallacy in equating a cancel to a crash is that we have rather
> extensive logic for coping with a crash (including reinitializing shared
> memory from scratch).  A cancel will only provoke elog cleanup, which is
> not nearly as thorough.  For example, it's not obvious that shared
> memory structures that are protected by different locks couldn't get out
> of sync.
> 

Yes, I saw the RESUME_INTERRUPTS in SpinLockRelease().  It seems very
aggresive to allow a query cancel there.

> 
> BTW, I spent some time yesterday trying to use this worry to explain my
> latest favorite bugaboo, the duplicate-rows complaints we've gotten from
> a few people.  It is easy to see that a cancel being accepted at the
> right place (exit from the first WriteBuffer in heap_update) could leave
> an updated tuple created and its buffer marked dirty, while the old
> tuple's buffer is not yet marked dirty and might therefore be discarded
> unwritten.  (The WAL entry is correct but will never be consulted unless
> there's a crash.)  However, this scenario doesn't seem to explain the
> failures because the cancel would lead to transaction abort, so the
> updated tuple should never be considered good anyway.  Back to the
> drawing board...

I thought we were seeing duplicates in 7.1, which didn't have this code.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Are we accepting cancel interrupts too often?
Next
From: Laszlo Hornyak
Date:
Subject: PL/(pg)J