Re: Think I see a btree vacuuming bug - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Think I see a btree vacuuming bug |
Date | |
Msg-id | 200208262014.g7QKEel19090@candle.pha.pa.us Whole thread Raw |
In response to | Think I see a btree vacuuming bug (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Think I see a btree vacuuming bug
|
List | pgsql-hackers |
Is this fixed, and if not, can I have some TODO text? --------------------------------------------------------------------------- Tom Lane wrote: > If a VACUUM running concurrently with someone else's indexscan were to > delete the index tuple that the indexscan is currently stopped on, then > we'd get a failure when the indexscan resumes and tries to re-find its > place. (This is the infamous "my bits moved right off the end of the > world" error condition.) What is supposed to prevent that from > happening is that the indexscan retains a buffer pin (but not a read > lock) on the index page containing the tuple it's stopped on. VACUUM > will not delete any tuple until it can get a "super exclusive" lock on > the page (cf. LockBufferForCleanup), and the pin prevents it from doing > so. > > However: suppose that some other activity causes the index page to be > split while the indexscan is stopped, and that the tuple it's stopped > on gets relocated into the new righthand page of the pair. Then the > indexscan is holding a pin on the wrong page --- not the one its tuple > is in. It would then be possible for the VACUUM to arrive at the tuple > and delete it before the indexscan is resumed. > > This is a pretty low-probability scenario, especially given the new > index-tuple-killing mechanism (which renders it less likely that an > indexscan will stop on a vacuum-able tuple). But it could happen. > > The only solution I've thought of is to make btbulkdelete acquire > "super exclusive" lock on *every* leaf page of the index as it scans, > rather than only locking the pages it actually needs to delete something > from. And we'd need to tweak _bt_restscan to chain its pins (pin the > next page to the right before releasing pin on the previous page). > This would prevent a btbulkdelete scan from overtaking ordinary > indexscans, and thereby ensure that it couldn't arrive at the tuple > on which an indexscan is stopped, even with splitting. > > I'm somewhat concerned that the more stringent locking will slow down > VACUUM a good deal when there's lots of concurrent activity, but I don't > see another answer. Ideas anyone? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: