Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location
Date
Msg-id CAA4eK1JKm=bRZf4WcF9asejEGMi5kQaTHbqHALorVAX3M9w64Q@mail.gmail.com
Whole thread Raw
In response to [WIP] [B-Tree] Keep indexes sorted by heap physical location  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
On Thu, Aug 18, 2016 at 8:24 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>
> A couple of points make me uneasy about this patch, yet I can think of
> no better alternative, so I seek feedback:
>
>  - introducing a different format for inner index tuples makes for an
> invasive patch and quite difficult-to-reason code (it's easy to forget
> whether a page is leaf or inner and that now causes assertion failures
> or segfaults)
>  - the ascent-descent to check for uniqueness when there are large
> dead tuple runs could have locking subtleties that escape me. Perhaps
> there's better ways to solve this.

I have tried to study this part of your patch and it seems somewhat
non-trivial and risky part of proposal.

+ } else {
+ /*
+ * We found the actual first item on another block, so we have to perform
+ * a two-step search - first half, until our write-locked buffer, then another
+ * starting from our write-locked buffer.
+ */
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ LockBuffer(buf, BT_WRITE);
+
+ buf = _bt_moveright(rel, buf, natts, itup_scankey, &(itup->t_tid), false,
+ true, stack, BT_WRITE, NULL);
+
+ first_offset = _bt_binsrch(rel, buf, natts, itup_scankey, NULL, false, NULL);
+
+ xwait = _bt_check_unique(rel, itup, heapRel, nbuf, buf,
first_offset, itup_scankey,
+ checkUnique, &is_unique, &speculativeToken);
+
+ _bt_relbuf(rel, nbuf);
+ }

The idea for uniqueness check is that we hold the write lock on
buffer/page on which we are trying to operate (we do take read locks
on the consecutive pages during this check).  Here, in your changed
proposal, you have two buffers (one to which the search led you and
one buffer previous in the chain) and before checking uniqueness, on
one of the buffers you seem to have write lock and on other you have
read lock.  This seems to change the way uniqueness check works till
now, can you explain how this works (can we safely assume that all
searches for uniqueness check will lead to the same buffer first).

With this new mechanism, do we have two type of search interfaces
where one would work for keys (as now) and other for key-ctid or it
will be just a single interface which works both ways?  I think either
way there are pros and cons.

Also, in the thread below you are talking about using the last bit in
t_info, I want to bring it to your notice that there is a patch of
mine [1] in which I have used it to avoid changing on-disk
compatibility of hash indexes.  I am not saying that we should not
plan it for some other use, but just to keep in mind that there are
other proposals to use it.  We can decide what is best way to proceed,
if we are aware of all the proposals that want to use it.


[1] - https://www.postgresql.org/message-id/CAA4eK1LkQ_Udism-Z2Dq6cUvjH3dB5FNFNnEzZBPsRjw0haFqA@mail.gmail.com
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Should we cacheline align PGXACT?
Next
From: Bruce Momjian
Date:
Subject: TODO list updated for PG 10