Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
Date
Msg-id CAEepm=3S+MtY+K20_yjj1h1n3vk2m7d0EYNmxZttb_01y18o8w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Hmm.  I notice that this calls PredicateLockPageSplit() after both
>> calls to _hash_splitbucket() (the one in _hash_finish_split() and the
>> one in _hash_expandtable()) instead of doing it inside that function,
>> and that _hash_splitbucket() unlocks bucket_nbuf before returning.
>> What if someone else accesses bucket_nbuf between
>> LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and
>> PredicateLockPageSplit()?  Doesn't that mean that another session can
>> read a newly created page and miss a predicate lock that is about to
>> be transferred to it?
>
> Yes.  I think you are primarily worried about if there is an insert on
> new bucket from another session as scans will anyway take the
> predicate lock, right?

Yeah.

>>  If that is indeed a race, could it be fixed by
>> calling PredicateLockPageSplit() at the start of _hash_splitbucket()
>> instead?
>
> Yes, but I think it would be better if we call this once we are sure
> that at least one tuple from the old bucket has been transferred
> (consider if all tuples in the old bucket are dead).  Apart from this,
> I think this patch has missed handling the cases where we scan the
> buckets when the split is in progress.  In such cases, we scan both
> old and new bucket, so I think we need to ensure that we take
> PredicateLock on both the buckets during such scans.

Hmm.  Yeah.

So, in _hash_first(), do you think we might just need this?

      if (H_BUCKET_BEING_POPULATED(opaque))
      {
          ...
          old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket);
          ...
          old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE);
+         PredicateLockPage(rel, BufferGetBlockNumber(old_buf),
scan->xs_snapshot);
          TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf));

That is, if you begin scanning a 'new' bucket, we remember the old
bucket and go and scan that too, so we'd better predicate-lock both up
front (or I suppose we could do it later when we visit that page, but
here it can be done in a single place).

What if we begin scanning an 'old' bucket that is being split?  I
think we'll only do that for tuples that actually belong in the old
bucket after the split, so no need to double-lock?  And I don't think
a split could begin while we are scanning.  Do I have that right?

As for inserting, I'm not sure if any special treatment is needed, as
long as the scan code path (above) and the split code path are
correct.  I'm not sure though.

I'm wondering how to test all this.  I'm thinking about a program that
repeatedly creates a hash index and then slowly adds more things to it
so that buckets split (maybe using distinct keys carefully crafted to
hit the same bucket?), while concurrently hammering it with a ton of
scans and then ... somehow checking correctness...

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly
Next
From: Thomas Munro
Date:
Subject: Typo in src/backend/access/hash/README