Re: Think I see a btree vacuuming bug - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Think I see a btree vacuuming bug
Date
Msg-id 200208262014.g7QKEel19090@candle.pha.pa.us
Whole thread Raw
In response to Think I see a btree vacuuming bug  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Think I see a btree vacuuming bug  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Is this fixed, and if not, can I have some TODO text?

---------------------------------------------------------------------------

Tom Lane wrote:
> If a VACUUM running concurrently with someone else's indexscan were to
> delete the index tuple that the indexscan is currently stopped on, then
> we'd get a failure when the indexscan resumes and tries to re-find its
> place.  (This is the infamous "my bits moved right off the end of the
> world" error condition.)  What is supposed to prevent that from
> happening is that the indexscan retains a buffer pin (but not a read
> lock) on the index page containing the tuple it's stopped on.  VACUUM
> will not delete any tuple until it can get a "super exclusive" lock on
> the page (cf. LockBufferForCleanup), and the pin prevents it from doing
> so.
> 
> However: suppose that some other activity causes the index page to be
> split while the indexscan is stopped, and that the tuple it's stopped
> on gets relocated into the new righthand page of the pair.  Then the
> indexscan is holding a pin on the wrong page --- not the one its tuple
> is in.  It would then be possible for the VACUUM to arrive at the tuple
> and delete it before the indexscan is resumed.
> 
> This is a pretty low-probability scenario, especially given the new
> index-tuple-killing mechanism (which renders it less likely that an
> indexscan will stop on a vacuum-able tuple).  But it could happen.
> 
> The only solution I've thought of is to make btbulkdelete acquire
> "super exclusive" lock on *every* leaf page of the index as it scans,
> rather than only locking the pages it actually needs to delete something
> from.  And we'd need to tweak _bt_restscan to chain its pins (pin the
> next page to the right before releasing pin on the previous page).
> This would prevent a btbulkdelete scan from overtaking ordinary
> indexscans, and thereby ensure that it couldn't arrive at the tuple
> on which an indexscan is stopped, even with splitting.
> 
> I'm somewhat concerned that the more stringent locking will slow down
> VACUUM a good deal when there's lots of concurrent activity, but I don't
> see another answer.  Ideas anyone?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: anonymous composite types - how to pass tupdesc to
Next
From: Tom Lane
Date:
Subject: Re: Think I see a btree vacuuming bug