Thread: Gin page deletion bug
Gin page deletion fails to take into account that there might be a search in-flight to the page that is deleted. If the page is reused for something else, the search can get very confused. That's pretty difficult to reproduce in a real system, as the window between releasing a lock on page and following its right-link is very tight, but by setting a breakpoint with a debugger it's easy. Here's how I reproduced it: ----------- 1. Put a breakpoint or sleep in entryGetNextItem() function, where it has released lock on one page and is about to read the next one. I used this patch: --- a/src/backend/access/gin/ginget.c +++ b/src/backend/access/gin/ginget.c @@ -574,6 +574,9 @@ entryGetNextItem(GinState *ginstate, GinScanEntry entry) return; } + elog(NOTICE, "about to move right to page %u", blkno); + sleep(5); + entry->buffer = ReleaseAndReadBuffer(entry->buffer, ginstate->index, blkno); 2. Initialize a page with a gin index in suitable state: create extension btree_gin; create table foo (i int4); create index i_gin_foo on foo using gin (i) with (fastupdate = off); insert into foo select 1 from generate_series(1, 5000); insert into foo select 2 from generate_series(1, 5000); set enable_bitmapscan=off; set enable_seqscan=on; delete from foo where i = 1; 3. Start a query, it will sleep between every page: set enable_bitmapscan=on; set enable_seqscan=off; select * from foo where i = 1; postgres=# select * from foo where i = 1; NOTICE: about to move right to page 3 NOTICE: about to move right to page 5 ... 4. In another session, delete and reuse the pages: vacuum foo; insert into foo select 2 from generate_series(1, 10000) g 5. Let the query run to completion. It will return a lot of tuples with i=2, which should not have matched: ... NOTICE: about to move right to page 24 NOTICE: about to move right to page 25 i --- 2 2 2 ... ----------- The regular b-tree code solves this by stamping deleted pages with the current XID, and only allowing them to be reused once that XID becomes old enough (< RecentGlobalXmin). Another approach might be to grab a cleanup-strength lock on the left and parent pages when deleting a page, and requiring search to keep the pin on the page its coming from, until it has locked the next page. - Heikki
On 07.11.2013 22:49, Heikki Linnakangas wrote: > Gin page deletion fails to take into account that there might be a > search in-flight to the page that is deleted. If the page is reused for > something else, the search can get very confused. > > ... > > The regular b-tree code solves this by stamping deleted pages with the > current XID, and only allowing them to be reused once that XID becomes > old enough (< RecentGlobalXmin). Another approach might be to grab a > cleanup-strength lock on the left and parent pages when deleting a page, > and requiring search to keep the pin on the page its coming from, until > it has locked the next page. I came up with the attached fix. In a nutshell, when walking along a right-link, the new page is locked before releasing the lock on the old one. Also, never delete the leftmost branch of a posting tree. I believe these changes are sufficient to fix the problem, because of the way the posting tree is searched: All searches on a posting tree are "full searches", scanning the whole tree from left to right. Insertions do seek to the middle of the tree, but they are locked out during vacuum by holding a cleanup lock on the posting tree root (even without this patch). Thanks to this property, a search cannot step on a page that's being deleted by following a downlink, if we refrain from deleting the leftmost page on a level, as searches only descend down the leftmost branch. It would be nice to improve that in master - holding an exclusive lock on the root page is pretty horrible - but this is a nice back-patchable patch. I'm not worried about the loss of concurrency because we now have to hold the lock on previous page when stepping to next page. In searches, it's only a share-lock, and in insertions, it's very rare that you have to step right. The patch also adds some sanity checks to stepping right: the next page should be of the same type as the current page, e.g stepping right should not go from leaf to non-leaf or vice versa. - Heikki
Attachment
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > I came up with the attached fix. In a nutshell, when walking along a > right-link, the new page is locked before releasing the lock on the old > one. Also, never delete the leftmost branch of a posting tree. I believe > these changes are sufficient to fix the problem, because of the way the > posting tree is searched: This seems reasonable, but I wonder whether the concurrency discussion shouldn't be in gist/README, rather than buried in a comment inside ginDeletePage (not exactly the first place one would think to look IMO). regards, tom lane
On 08.11.2013 19:14, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> I came up with the attached fix. In a nutshell, when walking along a >> right-link, the new page is locked before releasing the lock on the old >> one. Also, never delete the leftmost branch of a posting tree. I believe >> these changes are sufficient to fix the problem, because of the way the >> posting tree is searched: > > This seems reasonable, but I wonder whether the concurrency discussion > shouldn't be in gist/README, rather than buried in a comment inside > ginDeletePage (not exactly the first place one would think to look IMO). Yeah, README is probably better. There's zero explanation of concurrency and locking issues there at the moment, so I added a new section there explaining why the page deletion is (now) safe. There's a lot more to say about locking in GIN, but I'll leave that for another day. - Heikki