Re: [WIP] [B-Tree] Retail IndexTuple deletion - Mailing list pgsql-hackers
From | Andrey V. Lepikhov |
---|---|
Subject | Re: [WIP] [B-Tree] Retail IndexTuple deletion |
Date | |
Msg-id | e4cec53a-ab12-4858-d9d8-5fac9948a4c7@postgrespro.ru Whole thread Raw |
In response to | Re: [WIP] [B-Tree] Retail IndexTuple deletion (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [WIP] [B-Tree] Retail IndexTuple deletion
Re: [WIP] [B-Tree] Retail IndexTuple deletion Re: [WIP] [B-Tree] Retail IndexTuple deletion |
List | pgsql-hackers |
On 23.06.2018 00:43, Peter Geoghegan wrote: > On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> According to your feedback, i develop second version of the patch. >> In this version: >> 1. High-level functions index_beginscan(), index_rescan() not used. Tree >> descent made by _bt_search(). _bt_binsrch() used for positioning on the >> page. >> 2. TID list introduced in amtargetdelete() interface. Now only one tree >> descent needed for deletion all tid's from the list with equal scan key >> value - logical duplicates deletion problem. >> 3. Only one WAL record for index tuple deletion per leaf page per >> amtargetdelete() call. > > Cool. > > What is this "race" code about? > >> + buffer = ReadBufferExtended(rel, MAIN_FORKNUM, ItemPointerGetBlockNumber(tid), RBM_NORMAL, NULL); >> + LockBuffer(buffer, BUFFER_LOCK_SHARE); >> + >> + page = (Page) BufferGetPage(buffer); >> + offnum = ItemPointerGetOffsetNumber(tid); >> + lp = PageGetItemId(page, offnum); >> + >> + /* >> + * VACUUM Races: someone already remove the tuple from HEAP. Ignore it. >> + */ >> + if (!ItemIdIsUsed(lp)) >> + return NULL; > > It looks wrong -- why should another process have set the item as > unused? And if we assume that that's possible at all, what's to stop a > third process from actually reusing the item pointer before we arrive > (at get_tuple_by_tid()), leaving us to find a tuple that is totally > unrelated to the original tuple to be deleted? > > (Also, you're not releasing the buffer lock before you return.) > >> 4. VACUUM can sort TID list preliminary for more quick search of duplicates. > > This version of the patch prevents my own "force unique keys" patch > from working, since you're not using my proposed new > _bt_search()/_bt_binsrch()/_bt_compare() interface (you're not passing > them a heap TID). It is essential that your patch be able to quickly > reach any tuple that it needs to kill. Otherwise, the worst case > performance is totally unacceptable; it will never be okay to go > through 10%+ of the index to kill a single tuple hundreds or even > thousands of times per VACUUM. It seems to me that doing this > tid_list_search() binary search is pointless -- you should instead be > relying on logical duplicates using their heap TID as a tie-breaker. > Rather than doing a binary search within tid_list_search(), you should > instead match the presorted heap TIDs at the leaf level against the > sorted in-memory TID list. You know, a bit like a merge join. I agree with you. Binary search was developed in awaiting your patch. > > I suggest that you go even further than this: I think that you should > just start distributing my patch as part of your patch series. You can > change my code if you need to. Good. I am ready to start distributing your patch. At the beginning of the work I planned to make patch for physical TID ordering in the btree index. Your patch will make it much easier. I also suggest using "git format patch" > with simple, short commit messages to produce patches. This makes it a > lot easier to track the version of the patch, changes over time, etc. > Ok > I understand why you'd hesitate to take ownership of my code (it has > big problems!), but the reality is that all the problems that my patch > has are also problems for your patch. One patch cannot get committed > without the other, so they are already the same project. As a bonus, > my patch will probably improve the best case performance for your > patch, since multi-deletions will now have much better locality of > access. > I still believe that the patch for physical TID ordering in btree: 1) has its own value, not only for target deletion, 2) will require only a few local changes in my code, and this patches can be developed independently. I prepare third version of the patches. Summary: 1. Mask DEAD tuples at a page during consistency checking (See comments for the mask_dead_tuples() function). 2. Still not using physical TID ordering. 3. Index cleanup() after each quick_vacuum_index() call was excluded. -- Andrey Lepikhov Postgres Professional: https://postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: