Re: [WIP] [B-Tree] Retail IndexTuple deletion - Mailing list pgsql-hackers

From Andrey V. Lepikhov
Subject Re: [WIP] [B-Tree] Retail IndexTuple deletion
Date
Msg-id e4cec53a-ab12-4858-d9d8-5fac9948a4c7@postgrespro.ru
Whole thread Raw
In response to Re: [WIP] [B-Tree] Retail IndexTuple deletion  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [WIP] [B-Tree] Retail IndexTuple deletion
Re: [WIP] [B-Tree] Retail IndexTuple deletion
Re: [WIP] [B-Tree] Retail IndexTuple deletion
List pgsql-hackers

On 23.06.2018 00:43, Peter Geoghegan wrote:
> On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> According to your feedback, i develop second version of the patch.
>> In this version:
>> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
>> descent made by _bt_search(). _bt_binsrch() used for positioning on the
>> page.
>> 2. TID list introduced in amtargetdelete() interface. Now only one tree
>> descent needed for deletion all tid's from the list with equal scan key
>> value - logical duplicates deletion problem.
>> 3. Only one WAL record for index tuple deletion per leaf page per
>> amtargetdelete() call.
> 
> Cool.
> 
> What is this "race" code about?
> 
>> +   buffer = ReadBufferExtended(rel, MAIN_FORKNUM, ItemPointerGetBlockNumber(tid), RBM_NORMAL, NULL);
>> +   LockBuffer(buffer, BUFFER_LOCK_SHARE);
>> +
>> +   page = (Page) BufferGetPage(buffer);
>> +   offnum = ItemPointerGetOffsetNumber(tid);
>> +   lp = PageGetItemId(page, offnum);
>> +
>> +   /*
>> +    * VACUUM Races: someone already remove the tuple from HEAP. Ignore it.
>> +    */
>> +   if (!ItemIdIsUsed(lp))
>> +       return NULL;
> 
> It looks wrong -- why should another process have set the item as
> unused? And if we assume that that's possible at all, what's to stop a
> third process from actually reusing the item pointer before we arrive
> (at get_tuple_by_tid()), leaving us to find a tuple that is totally
> unrelated to the original tuple to be deleted?
> 
> (Also, you're not releasing the buffer lock before you return.)
> 
>> 4. VACUUM can sort TID list preliminary for more quick search of duplicates.
> 
> This version of the patch prevents my own "force unique keys" patch
> from working, since you're not using my proposed new
> _bt_search()/_bt_binsrch()/_bt_compare() interface (you're not passing
> them a heap TID). It is essential that your patch be able to quickly
> reach any tuple that it needs to kill. Otherwise, the worst case
> performance is totally unacceptable; it will never be okay to go
> through 10%+ of the index to kill a single tuple hundreds or even
> thousands of times per VACUUM. It seems to me that doing this
> tid_list_search() binary search is pointless -- you should instead be
> relying on logical duplicates using their heap TID as a tie-breaker.
> Rather than doing a binary search within tid_list_search(), you should
> instead match the presorted heap TIDs at the leaf level against the
> sorted in-memory TID list. You know, a bit like a merge join.

I agree with you. Binary search was developed in awaiting your patch.
> 
> I suggest that you go even further than this: I think that you should
> just start distributing my patch as part of your patch series. You can
> change my code if you need to. 

Good. I am ready to start distributing your patch. At the beginning of 
the work I planned to make patch for physical TID ordering in the btree 
index. Your patch will make it much easier.

I also suggest using "git format patch"
> with simple, short commit messages to produce patches. This makes it a
> lot easier to track the version of the patch, changes over time, etc.
> 
Ok

> I understand why you'd hesitate to take ownership of my code (it has
> big problems!), but the reality is that all the problems that my patch
> has are also problems for your patch. One patch cannot get committed
> without the other, so they are already the same project. As a bonus,
> my patch will probably improve the best case performance for your
> patch, since multi-deletions will now have much better locality of
> access.
> 
I still believe that the patch for physical TID ordering in btree:
1) has its own value, not only for target deletion,
2) will require only a few local changes in my code,
and this patches can be developed independently.

I prepare third version of the patches. Summary:
1. Mask DEAD tuples at a page during consistency checking (See comments 
for the mask_dead_tuples() function).
2. Still not using physical TID ordering.
3. Index cleanup() after each quick_vacuum_index() call was excluded.

-- 
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company

Attachment

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: postgresql_fdw doesn't handle defaults correctly
Next
From: Peter Eisentraut
Date:
Subject: Re: Constraint documentation