Thread: GSoC 2017 Proposal for "Explicitly support predicate locks in indexaccess methods besides btree"

Hi guys,

My name is Shubham Barai and I am a final year student at Maharashtra Institute of Technology, Pune, India. I am very interested in contributing Postgresql this year through GSoC project.

I am particularly interested in working on the project "Explicitly support predicate locks in index access methods besides btree". I have gone through some research papers which were recommended on https://wiki.postgresql.org/wiki/GSoC_2017 to understand the concept of Serializable Snapshot Isolation used in PostgreSQL. I have also browsed through the codebase to get some idea of different access methods for gin, gist, and hash indexes. I want to discuss my proposal to get some feedback before I write my final proposal. Sorry, I am discussing my proposal little late. I was really busy in my academics.

Currently, only B+-trees support page level predicate locking.For other indexes, it acquires relation level lock which can lead to unnecessary serialization failure due to rw dependency caused by any insert into the index. So, the main task of this project is to support page level predicate locking for remaining indexes.

* Gist Index

B+tree index acquires predicate locks only on leaf pages as index tuples with record pointers are stored on leaf pages. But for Gist index, we have to consider locking at each index level as index tuples can be stored in buffers associated with internal pages or in leaf pages.
So, the functions where we need to insert a call for

1. PredicateLockPage() 

->gistScanPage()
after acquiring a shared lock on a buffer

2.CheckForSerializableConflictIn()

-> gistdoinsert()
after acquiring an exclusive lock on a target buffer and before inserting a tuple


3. PredicateLockPageSplit() 

->gistplacetopage()

If there is not enough space for insertion, we need to copy predicate lock from an old page to all new pages generated after a successful split operation.

PredicateLockPageSplit(Relation relation, BlockNumber oldblkno, BlockNumber a lot) is used by b+-tree where two pages are involved in a split operation. For  Gist index, we can define a similar function where more than one page can be generated after split operation.




* Gin Index

Gin index consists of a B-tree index over key values, where each key is an element of some indexed items and each tuple in a leaf page contains either a posting list if the list is small enough or a pointer to posting tree.

1. PredicateLockPage()  

->startscanentry()
   before calling collectMatchBitmap()

->scanPostingTree()
   after acquiring a shared lock on a leaf page

2.CheckForSerializableConflictIn()

-> ginentryinsert()

->gininsertitempointers()
   in case of insertion in a posting tree


3. PredicateLockPageSplit() 

-> dataBeginPlacetoPageLeaf()
   
 after calling dataPlacetoPageLeafSplit()


* Hash Index

1. PredicateLockPage() 

->hashgettuple()
->_hash_first()
->_hash_readnext()
->_hash_readprev()

2.CheckForSerializableConflictIn()

-> _hash_doinsert()

3. PredicateLockPageSplit() 

currently, I am trying to understand how the splitting of bucket works.
should we acquire predicate lock on every page from an old and new bucket in case there is a predicate lock on a page of an old bucket?

There may be a lot of other places where we need to insert function calls for predicate locking that I haven't included yet. I didn't go into details of every index AM.

can anyone help me find existing tests for b-tree?


Regards,
Shubham
On Tue, Mar 28, 2017 at 12:36 PM, Shubham Barai
<shubhambaraiss@gmail.com> wrote:

> My name is Shubham Barai and I am a final year student at Maharashtra
> Institute of Technology, Pune, India. I am very interested in contributing
> Postgresql this year through GSoC project.

Welcome!  Sorry I didn't spot this post earlier -- I'm behind on
reading the community lists and this didn't trigger any of the phrases
that pop things out to my attention right away.

> I am particularly interested in working on the project "Explicitly support
> predicate locks in index access methods besides btree". I have gone through
> some research papers which were recommended on
> https://wiki.postgresql.org/wiki/GSoC_2017 to understand the concept of
> Serializable Snapshot Isolation used in PostgreSQL. I have also browsed
> through the codebase to get some idea of different access methods for gin,
> gist, and hash indexes. I want to discuss my proposal to get some feedback
> before I write my final proposal. Sorry, I am discussing my proposal little
> late. I was really busy in my academics.

Understandable, but please be careful to get your final proposal in by
the deadline.  Deadlines in GSoC are not flexible.

> Currently, only B+-trees support page level predicate locking.For other
> indexes, it acquires relation level lock which can lead to unnecessary
> serialization failure due to rw dependency caused by any insert into the
> index. So, the main task of this project is to support page level predicate
> locking for remaining indexes.

Right.

> [calls out several places that specific calls to predicate locking functions are needed]

> There may be a lot of other places where we need to insert function calls
> for predicate locking that I haven't included yet. I didn't go into details
> of every index AM.

That will be about half the work of the project.  It is fine to
identify examples for your proposal, to show that you know what to
look for, but tracking down every last location can be completed after
the proposal is accepted.  The other half of the work will be testing
and responding to issues others might raise.

> can anyone help me find existing tests for b-tree?

I think this should be it:

kgrittn@kevin-desktop:~/pg/master$ find src/backend/access/nbtree
-type f -name '*.c' | grep -v '/tmp_check/' | grep -v '/Debug/' |
xargs egrep -n 'PredicateLock|SerializableConflict'
src/backend/access/nbtree/nbtinsert.c:201:
CheckForSerializableConflictIn(rel, NULL, buf);
src/backend/access/nbtree/nbtinsert.c:402:
         CheckForSerializableConflictIn(rel, NULL, buf);
src/backend/access/nbtree/nbtinsert.c:784:
PredicateLockPageSplit(rel,
src/backend/access/nbtree/nbtsearch.c:1040:
PredicateLockRelation(rel, scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1052:
PredicateLockPage(rel, BufferGetBlockNumber(buf),
src/backend/access/nbtree/nbtsearch.c:1483:
 PredicateLockPage(rel, blkno, scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1578:
 PredicateLockPage(rel, BufferGetBlockNumber(so->currPos.buf),
scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1869:
PredicateLockRelation(rel, scan->xs_snapshot);
src/backend/access/nbtree/nbtsearch.c:1874:     PredicateLockPage(rel,
BufferGetBlockNumber(buf), scan->xs_snapshot);
src/backend/access/nbtree/nbtpage.c:1410:
PredicateLockPageCombine(rel, leafblkno, leafrightsib);

--
Kevin Grittner


On Tue, Mar 28, 2017 at 12:36 PM, Shubham Barai
<shubhambaraiss@gmail.com> wrote:

> * Hash Index

> currently, I am trying to understand how the splitting of bucket works.
> should we acquire predicate lock on every page from an old and new bucket in
> case there is a predicate lock on a page of an old bucket?

For page level locking in hash indexes, I think it might be fine to
lock the first page of a bucket.

Also, in case you missed it, here are some guidelines I suggested.
There weren't any comments, which means they probably didn't offend
anyone else too badly.  They're just my opinion, but you might want
to consider them:

Each GSoC student proposal should be a PDF file of 6 to 8 pages.  In
the end, Google will publish these documents on a web page, so the
student should make each proposal something which they will be happy
to have future potential employers review.

Some ideas for desirable content:

  - A resume or CV of the student, including any prior GSoC work
  - Their reasons for wanting to participate
  - What else they have planned for the summer, and what their time
    commitment to the GSoC work will be
  - A clear statement that there will be no intellectual property
    problems with the work they will be doing -- that the PostgreSQL
    community will be able to use their work without encumbrances
    (e.g., there should be no agreements related to prior or
    ongoing work which might assign the rights to the work they do
    to someone else)
  - A description of what they will do, and how
  - Milestones with dates
  - What they consider to be the test that they have successfully
    completed the project

--
Kevin Grittner