Thread: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index
[HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index
From
Shubham Barai
Date:
Project: Explicitly support predicate locks in index AMs besides b-tree
Hi,
During this week, I continued my work on predicate locking in the hash index and created a patch for it. As I have written in my proposal for the hash index, every scan operation acquires a predicate lock on the primary page of the bucket.
And hash indexes are used for equality comparison. So, this can still generate false positives when a scan operation and an insert operation are trying to access the same bucket but are looking for different tuples. Let's see that with an example.
setup:
create table hash_tbl(id int4, p integer);
create index hash_pointidx on hash_tbl using hash(p);
insert into hash_tbl (id, p)
select g, g*2 from generate_series(1, 10000000) g;
read operation:
select * from hash_tbl where p=78988658;
insert operation:
insert into hash_tbl values(99999999, 546789888);
If these two hash keys (78988658 and 546789888) mapped to the same bucket, this will result in false serialization failure.
Please correct me if this assumption about false positives is wrong.
In summary, I have done following things in this week.
1) found appropriate places in the hash index AM to insert calls to existing functions (PredicateLockPage(), CheckForSerializableConflictIn() ...etc)
2) created tests to check serialization failures and false positives
3) tested my patch on the current head
Regards,
Shubham
Attachment
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Thomas Munro
Date:
Hi Shubham, On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote: > If these two hash keys (78988658 and 546789888) mapped to the same bucket, this will result in false serialization failure. > Please correct me if this assumption about false positives is wrong. I wonder if there is an opportunity to use computed hash values directly in predicate lock tags, rather than doing it on the basis of buckets. Perhaps I'm missing something important about the way that locks need to escalate that would prevent that from working. > 3) tested my patch on the current head This no longer applies, but it's in "Needs review" status in the Commitfest. Could you please post a rebased version? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Shubham Barai
Date:
Hi Thomas,
I have attached the rebased version of patch here.On 8 September 2017 at 06:37, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
Hi Shubham,
On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
> If these two hash keys (78988658 and 546789888) mapped to the same bucket, this will result in false serialization failure.
> Please correct me if this assumption about false positives is wrong.
I wonder if there is an opportunity to use computed hash values
directly in predicate lock tags, rather than doing it on the basis of
buckets. Perhaps I'm missing something important about the way that
locks need to escalate that would prevent that from working.
> 3) tested my patch on the current head
This no longer applies, but it's in "Needs review" status in the
Commitfest. Could you please post a rebased version?
--
Thomas Munro
http://www.enterprisedb.com
Attachment
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Alexander Korotkov
Date:
On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro <thomas.munro@enterprisedb.com > wrote:
Hi Shubham,
On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote:
> If these two hash keys (78988658 and 546789888) mapped to the same bucket, this will result in false serialization failure.
> Please correct me if this assumption about false positives is wrong.
I wonder if there is an opportunity to use computed hash values
directly in predicate lock tags, rather than doing it on the basis of
buckets. Perhaps I'm missing something important about the way that
locks need to escalate that would prevent that from working.
+1,
Very nice idea! Locking hash values directly seems to be superior over locking hash index pages.
Shubham, do you have any comment on this?
Shubham, do you have any comment on this?
------
Alexander Korotkov
Postgres Professional: http://www. postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.
The Russian Postgres Company
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Michael Paquier
Date:
On Mon, Sep 25, 2017 at 10:34 PM, Shubham Barai <shubhambaraiss@gmail.com> wrote: > I have attached the rebased version of patch here. The patch does not apply and there has been no reviews as well. In consequence, I am moving it to next CF with "waiting on author" as status. Please provide a rebased patch. -- Michael
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Stephen Frost
Date:
Greeting Shubham, all, * Michael Paquier (michael.paquier@gmail.com) wrote: > On Mon, Sep 25, 2017 at 10:34 PM, Shubham Barai > <shubhambaraiss@gmail.com> wrote: > > I have attached the rebased version of patch here. > > The patch does not apply and there has been no reviews as well. In > consequence, I am moving it to next CF with "waiting on author" as > status. Please provide a rebased patch. Shubham, would you mind providing an updated patch which applies cleanly, so we can change this to Needs Review and hopefully get someone looking at it? Also, it would be good to respond to Alexander's as to if it would work or not (and perhaps updating the patch accordingly). Otherwise, I'm afriad this patch may end up just getting bumped to the next CF or ending up as 'returned with feedback'. Would be great to get this up to a point where it could be committed. Thanks! Stephen
Attachment
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Shubham Barai
Date:
On 15 January 2018 at 08:03, Stephen Frost <sfrost@snowman.net> wrote:
Greeting Shubham, all,
* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Mon, Sep 25, 2017 at 10:34 PM, Shubham Barai
> <shubhambaraiss@gmail.com> wrote:
> > I have attached the rebased version of patch here.
>
> The patch does not apply and there has been no reviews as well. In
> consequence, I am moving it to next CF with "waiting on author" as
> status. Please provide a rebased patch.
Shubham, would you mind providing an updated patch which applies
cleanly, so we can change this to Needs Review and hopefully get someone
looking at it? Also, it would be good to respond to Alexander's as to
if it would work or not (and perhaps updating the patch accordingly).
Otherwise, I'm afriad this patch may end up just getting bumped to the
next CF or ending up as 'returned with feedback'. Would be great to get
this up to a point where it could be committed.
Hi Stephen,
The new approach was suggested after completion of GSoC. So, I didn't get
enough time to implement this approach. Also, I was constantly updating my
patches for gist and gin index based on reviews.
Here, I am attaching the rebased version of the patch (which is based on an old approch:
acquiring a predicate lock on primary page of a bucket)
Kind Regards,
Shubham
Attachment
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro <thomas.munro@enterprisedb.com> > wrote: >> >> Hi Shubham, >> >> On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <shubhambaraiss@gmail.com> >> wrote: >> > If these two hash keys (78988658 and 546789888) mapped to the same >> > bucket, this will result in false serialization failure. >> > Please correct me if this assumption about false positives is wrong. >> >> I wonder if there is an opportunity to use computed hash values >> directly in predicate lock tags, rather than doing it on the basis of >> buckets. Perhaps I'm missing something important about the way that >> locks need to escalate that would prevent that from working. > > > +1, > Very nice idea! Locking hash values directly seems to be superior over > locking hash index pages. > Shubham, do you have any comment on this? > As Shubham seems to be running out of time, I thought of helping him by looking into the above-suggested idea. I think one way to lock a particular hash value is we can use TID of heap tuple associated with each index entry (index entry for the hash index will be hash value). However, do we really need it for implementing predicate locking for hash indexes? If we look at the "Index AM implementations" section of README-SSI, it doesn't seem to be required. Basically, if we look at the strategy of predicate locks in btree [1], it seems to me locking at page level for hash index seems to be a right direction as similar to btree, the corresponding heap tuple read will be locked. What do you think? [1] - * B-tree index searches acquire predicate locks only on the index *leaf* pages needed to lock the appropriate index range. If, however, a search discovers that no root page has yet been created, a predicate lock on the index relation is required. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Alexander Korotkov
Date:
On Sat, Jan 20, 2018 at 4:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro <thomas.munro@enterprisedb.com>
> wrote:
>>
>> Hi Shubham,
>>
>> On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <shubhambaraiss@gmail.com>
>> wrote:
>> > If these two hash keys (78988658 and 546789888) mapped to the same
>> > bucket, this will result in false serialization failure.
>> > Please correct me if this assumption about false positives is wrong.
>>
>> I wonder if there is an opportunity to use computed hash values
>> directly in predicate lock tags, rather than doing it on the basis of
>> buckets. Perhaps I'm missing something important about the way that
>> locks need to escalate that would prevent that from working.
>
>
> +1,
> Very nice idea! Locking hash values directly seems to be superior over
> locking hash index pages.
> Shubham, do you have any comment on this?
>
As Shubham seems to be running out of time, I thought of helping him
by looking into the above-suggested idea. I think one way to lock a
particular hash value is we can use TID of heap tuple associated with
each index entry (index entry for the hash index will be hash value).
Sorry, I didn't get what do you particularly mean. If locking either TID of
associated heap tuple or TID of hash index tuple, then what will we lock
in the case when nothing found? Even if we found nothing, we have
to place some lock according to search key in order to detect cases when
somebody has inserted the row which we might see according to that search key.
However, do we really need it for implementing predicate locking for
hash indexes? If we look at the "Index AM implementations" section of
README-SSI, it doesn't seem to be required. Basically, if we look at
the strategy of predicate locks in btree [1], it seems to me locking
at page level for hash index seems to be a right direction as similar
to btree, the corresponding heap tuple read will be locked.
Btree uses leaf-pages locking because it supports both range searches
and exact value searches. And it needs to detect overlaps between
these two kinds of searches. Therefore, btree locks leaf-pages in both
cases. Hash index case is different. Hash index doesn't support and
isn't going to support range searches. Assuming, that hash index
supports only exact value searches, locking hash values would be
superior over page locking (unless I'm missing something), because
it would provide better locality of locks.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Thu, Jan 25, 2018 at 7:29 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Sat, Jan 20, 2018 at 4:24 PM, Amit Kapila <amit.kapila16@gmail.com> > wrote: >> >> On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov >> <a.korotkov@postgrespro.ru> wrote: >> > +1, >> > Very nice idea! Locking hash values directly seems to be superior over >> > locking hash index pages. >> > Shubham, do you have any comment on this? >> > >> >> As Shubham seems to be running out of time, I thought of helping him >> by looking into the above-suggested idea. I think one way to lock a >> particular hash value is we can use TID of heap tuple associated with >> each index entry (index entry for the hash index will be hash value). > > > Sorry, I didn't get what do you particularly mean. If locking either TID of > associated heap tuple or TID of hash index tuple, then what will we lock > in the case when nothing found? Even if we found nothing, we have > to place some lock according to search key in order to detect cases when > somebody has inserted the row which we might see according to that search > key. > Okay, but if you use hash value as lock tag (which is possible) how will we deal with things like page split? I think even if use blocknumber/page/bucketnumber corresponding to the hash value along with hash value in lock tag, then also it doesn't appear to work. I think using page level locks for index makes sense, especially because it will be convinient to deal with page splits. Also, as predicate locks stay in-memory, so creating too many such locks doesn't sound like a nice strategy even though we have a way to upgrade it to next level (page) as that has a separate cost to it. >> >> However, do we really need it for implementing predicate locking for >> hash indexes? If we look at the "Index AM implementations" section of >> README-SSI, it doesn't seem to be required. Basically, if we look at >> the strategy of predicate locks in btree [1], it seems to me locking >> at page level for hash index seems to be a right direction as similar >> to btree, the corresponding heap tuple read will be locked. > > > Btree uses leaf-pages locking because it supports both range searches > and exact value searches. And it needs to detect overlaps between > these two kinds of searches. Therefore, btree locks leaf-pages in both > cases. > Also, probably using page level locks makes it easier to deal index operations like page split. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Sun, Jan 28, 2018 at 7:28 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Jan 25, 2018 at 7:29 PM, Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: >>> As Shubham seems to be running out of time, I thought of helping him >>> by looking into the above-suggested idea. I think one way to lock a >>> particular hash value is we can use TID of heap tuple associated with >>> each index entry (index entry for the hash index will be hash value). >> >> >> Sorry, I didn't get what do you particularly mean. If locking either TID of >> associated heap tuple or TID of hash index tuple, then what will we lock >> in the case when nothing found? Even if we found nothing, we have >> to place some lock according to search key in order to detect cases when >> somebody has inserted the row which we might see according to that search >> key. >> > > Okay, but if you use hash value as lock tag (which is possible) how > will we deal with things like page split? I think even if use > blocknumber/page/bucketnumber corresponding to the hash value along > with hash value in lock tag, then also it doesn't appear to work. I > think using page level locks for index makes sense, especially because > it will be convinient to deal with page splits. > What I intend to say here is that we already have a mechanism like PredicateLockPageSplit() which can deal with predicate locks during page split if we operate at page level. However, if we want to go for has value locking technique, it can be quite complex and expensive to make it work as we have to search all the locks that belong to the bucket being split and then move them for the new bucket. Alexander/Thomas, do you have better ideas to make it work, otherwise, I think we can proceed to review the Shubham's current approach/patch? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Thomas Munro
Date:
On Tue, Feb 13, 2018 at 4:28 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sun, Jan 28, 2018 at 7:28 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Thu, Jan 25, 2018 at 7:29 PM, Alexander Korotkov >> <a.korotkov@postgrespro.ru> wrote: >>>> As Shubham seems to be running out of time, I thought of helping him >>>> by looking into the above-suggested idea. I think one way to lock a >>>> particular hash value is we can use TID of heap tuple associated with >>>> each index entry (index entry for the hash index will be hash value). >>> >>> >>> Sorry, I didn't get what do you particularly mean. If locking either TID of >>> associated heap tuple or TID of hash index tuple, then what will we lock >>> in the case when nothing found? Even if we found nothing, we have >>> to place some lock according to search key in order to detect cases when >>> somebody has inserted the row which we might see according to that search >>> key. >>> >> >> Okay, but if you use hash value as lock tag (which is possible) how >> will we deal with things like page split? I think even if use >> blocknumber/page/bucketnumber corresponding to the hash value along >> with hash value in lock tag, then also it doesn't appear to work. I >> think using page level locks for index makes sense, especially because >> it will be convinient to deal with page splits. >> > > What I intend to say here is that we already have a mechanism like > PredicateLockPageSplit() which can deal with predicate locks during > page split if we operate at page level. However, if we want to go for > has value locking technique, it can be quite complex and expensive to > make it work as we have to search all the locks that belong to the > bucket being split and then move them for the new bucket. True. One way to avoid all that might be to use pseudo page numbers that don't suffer from splits. I don't know how you'd choose the constant, but it could be something like pseudo page number = hash value % 1024. In other words, you'd use the full hash value for the 'tuple' part of the predicate lock tag, and a shorter hash value for the 'page' part of the predicate lock tag, so you'd never have to handle split, and promotion from fine grained 'tuple' (= hash value) locks to coarse grained 'page' = (short hash value) locks would still work automatically when space runs out. > Alexander/Thomas, do you have better ideas to make it work, otherwise, > I think we can proceed to review the Shubham's current approach/patch? I think we should proceed to review the current patch. As far as I can see, adding more fine-grained locking would remain a possibility for future improvement. With this patch we get page-level hash index SIREAD locks, and perhaps in a future patch we could add fine grained hash value SIREAD locks (maybe as described above, if that works...) -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Thomas Munro
Date:
On Tue, Feb 13, 2018 at 7:47 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > One way to avoid all that might be to use pseudo page numbers that > don't suffer from splits. I don't know how you'd choose the > constant, but it could be something like pseudo page number = hash > value % 1024. In other words, you'd use the full hash value for the > 'tuple' part of the predicate lock tag, and a shorter hash value for > the 'page' part of the predicate lock tag, so you'd never have to > handle split, and promotion from fine grained 'tuple' (= hash value) > locks to coarse grained 'page' = (short hash value) locks would still > work automatically when space runs out. Thinking about how to tune that got me thinking about a simple middle way we could perhaps consider... What if we just always locked pseudo page numbers using hash_value % max_predicate_locks_per_relation (= effectively 31 by default)? Then you'd have lower collision rates on small hash indexes, you'd never have to deal with page splits, and you'd never exceed max_predicate_locks_per_relation so you'd never escalate to relation level locks on busy systems. On the downside, you'd have eg ~3% chance of collision instead of a 1/hash_maxbucket chance of collision, so it gets a bit worse for large indexes on systems that are not busy enough to exceed max_predicate_locks_per_relation. You win some, you lose some... -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Robert Haas
Date:
On Mon, Feb 26, 2018 at 7:51 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Thinking about how to tune that got me thinking about a simple middle > way we could perhaps consider... > > What if we just always locked pseudo page numbers using hash_value % > max_predicate_locks_per_relation (= effectively 31 by default)? Then > you'd have lower collision rates on small hash indexes, you'd never > have to deal with page splits, and you'd never exceed > max_predicate_locks_per_relation so you'd never escalate to relation > level locks on busy systems. On the downside, you'd have eg ~3% > chance of collision instead of a 1/hash_maxbucket chance of collision, > so it gets a bit worse for large indexes on systems that are not busy > enough to exceed max_predicate_locks_per_relation. You win some, you > lose some... Hmm, yeah, that definitely has some appeal. On the other hand, there's a lot of daylight between locking hv % 2^32 and locking hv % 31; the former is going to potentially blow out the lock table really fast, while the latter is potentially going to create an uncomfortable number of false collisions. One could imagine a modulus larger than 31 and smaller than 4294967296, although I don't have a principled suggestion for how to pick it. On the third hand, people who are using SSI heavily may well have increased max_predicate_locks_per_relation and with this proposal that just kinda does what you want. I don't really know how we can judge the merits of any particular modulus (or of committing the patch at all) without some test results showing that it helps reduce rollbacks or increase performance or something. Otherwise we're just guessing. It does however seem to me that locking the hash value % (something) is better than basing the locking on bucket or page numbers. Basing it on page numbers strictly speaking cannot work, since the same tuple could be present in any page in the bucket chain; you'd have to lock the page number of the head of the bucket chain. There is however no advantage of doing that over locking the bucket number directly. Moreover, locking the bucket number directly forces you to worry about splits, whereas if you log hv % (something) then you don't have to care. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Andres Freund
Date:
Hi, Based on this sub-thread this patch's status of 'needs review' doesn't quite seem accurate and 'waiting on author' and then 'returned with feedback' would be more fitting? Greetings, Andres Freund
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Thomas Munro
Date:
On Fri, Mar 2, 2018 at 3:57 PM, Andres Freund <andres@anarazel.de> wrote: > Based on this sub-thread this patch's status of 'needs review' doesn't > quite seem accurate and 'waiting on author' and then 'returned with > feedback' would be more fitting? I personally think this patch is really close to RFC. Shubham has fulfilled the project requirement, it's a tidy and short patch, it has tests. I think we really just need to verify that the split case works correctly. Hmm. I notice that this calls PredicateLockPageSplit() after both calls to _hash_splitbucket() (the one in _hash_finish_split() and the one in _hash_expandtable()) instead of doing it inside that function, and that _hash_splitbucket() unlocks bucket_nbuf before returning. What if someone else accesses bucket_nbuf between LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and PredicateLockPageSplit()? Doesn't that mean that another session can read a newly created page and miss a predicate lock that is about to be transferred to it? If that is indeed a race, could it be fixed by calling PredicateLockPageSplit() at the start of _hash_splitbucket() instead? Could we get a few days to mull over this and Shubham's other patches? It'd be really great to get some of these into 11. My thought experiments about pseudo-pages and avoiding the split stuff were not intended to get the patch kicked out. I thought for a while that hash indexes were a special case and could benefit from dispensing with those trickier problems. Upon further reflection, for interesting size hash indexes pure hash value predicate tags wouldn't be much better. Furthermore, if we do decide we want to use using x % max_predicate_locks_per_relation to avoid having to escalate to relation predicate locks at the cost of slightly higher collision rate then we should consider that for the whole system (including heap page predicate locking), not just hash indexes. Please consider those ideas parked for now. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Fri, Mar 2, 2018 at 3:57 PM, Andres Freund <andres@anarazel.de> wrote: >> Based on this sub-thread this patch's status of 'needs review' doesn't >> quite seem accurate and 'waiting on author' and then 'returned with >> feedback' would be more fitting? > > I personally think this patch is really close to RFC. Shubham has > fulfilled the project requirement, it's a tidy and short patch, it has > tests. I think we really just need to verify that the split case > works correctly. > > Hmm. I notice that this calls PredicateLockPageSplit() after both > calls to _hash_splitbucket() (the one in _hash_finish_split() and the > one in _hash_expandtable()) instead of doing it inside that function, > and that _hash_splitbucket() unlocks bucket_nbuf before returning. > What if someone else accesses bucket_nbuf between > LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and > PredicateLockPageSplit()? Doesn't that mean that another session can > read a newly created page and miss a predicate lock that is about to > be transferred to it? > Yes. I think you are primarily worried about if there is an insert on new bucket from another session as scans will anyway take the predicate lock, right? > If that is indeed a race, could it be fixed by > calling PredicateLockPageSplit() at the start of _hash_splitbucket() > instead? > Yes, but I think it would be better if we call this once we are sure that at least one tuple from the old bucket has been transferred (consider if all tuples in the old bucket are dead). Apart from this, I think this patch has missed handling the cases where we scan the buckets when the split is in progress. In such cases, we scan both old and new bucket, so I think we need to ensure that we take PredicateLock on both the buckets during such scans. > Could we get a few days to mull over this and Shubham's other patches? > I would also like to see this patch going in v11. So, I can try to finish the remaining review comments, if Shubham is not able to spare time and you can help with the review. I am also okay to review if anyone else other than me can handle the remaining points. > It'd be really great to get some of these into 11. > +1. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Thomas Munro
Date:
On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Hmm. I notice that this calls PredicateLockPageSplit() after both >> calls to _hash_splitbucket() (the one in _hash_finish_split() and the >> one in _hash_expandtable()) instead of doing it inside that function, >> and that _hash_splitbucket() unlocks bucket_nbuf before returning. >> What if someone else accesses bucket_nbuf between >> LockBuffer(bucket_nbuf, BUFFER_LOCK_UNLOCK) and >> PredicateLockPageSplit()? Doesn't that mean that another session can >> read a newly created page and miss a predicate lock that is about to >> be transferred to it? > > Yes. I think you are primarily worried about if there is an insert on > new bucket from another session as scans will anyway take the > predicate lock, right? Yeah. >> If that is indeed a race, could it be fixed by >> calling PredicateLockPageSplit() at the start of _hash_splitbucket() >> instead? > > Yes, but I think it would be better if we call this once we are sure > that at least one tuple from the old bucket has been transferred > (consider if all tuples in the old bucket are dead). Apart from this, > I think this patch has missed handling the cases where we scan the > buckets when the split is in progress. In such cases, we scan both > old and new bucket, so I think we need to ensure that we take > PredicateLock on both the buckets during such scans. Hmm. Yeah. So, in _hash_first(), do you think we might just need this? if (H_BUCKET_BEING_POPULATED(opaque)) { ... old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket); ... old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE); + PredicateLockPage(rel, BufferGetBlockNumber(old_buf), scan->xs_snapshot); TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf)); That is, if you begin scanning a 'new' bucket, we remember the old bucket and go and scan that too, so we'd better predicate-lock both up front (or I suppose we could do it later when we visit that page, but here it can be done in a single place). What if we begin scanning an 'old' bucket that is being split? I think we'll only do that for tuples that actually belong in the old bucket after the split, so no need to double-lock? And I don't think a split could begin while we are scanning. Do I have that right? As for inserting, I'm not sure if any special treatment is needed, as long as the scan code path (above) and the split code path are correct. I'm not sure though. I'm wondering how to test all this. I'm thinking about a program that repeatedly creates a hash index and then slowly adds more things to it so that buckets split (maybe using distinct keys carefully crafted to hit the same bucket?), while concurrently hammering it with a ton of scans and then ... somehow checking correctness... -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Mon, Mar 5, 2018 at 8:58 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Yes, but I think it would be better if we call this once we are sure >> that at least one tuple from the old bucket has been transferred >> (consider if all tuples in the old bucket are dead). Apart from this, >> I think this patch has missed handling the cases where we scan the >> buckets when the split is in progress. In such cases, we scan both >> old and new bucket, so I think we need to ensure that we take >> PredicateLock on both the buckets during such scans. > > Hmm. Yeah. > > So, in _hash_first(), do you think we might just need this? > > if (H_BUCKET_BEING_POPULATED(opaque)) > { > ... > old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket); > ... > old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE); > + PredicateLockPage(rel, BufferGetBlockNumber(old_buf), > scan->xs_snapshot); > TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf)); > > That is, if you begin scanning a 'new' bucket, we remember the old > bucket and go and scan that too, so we'd better predicate-lock both up > front (or I suppose we could do it later when we visit that page, but > here it can be done in a single place). > Yeah, that can work, but I am slightly worried that we might actually never scan the old bucket (say for queries with Limit clause) in which case it might give false positives for insertions in old buckets. > What if we begin scanning an 'old' bucket that is being split? I > think we'll only do that for tuples that actually belong in the old > bucket after the split, so no need to double-lock? And I don't think > a split could begin while we are scanning. Do I have that right? > Right. > As for inserting, I'm not sure if any special treatment is needed, as > long as the scan code path (above) and the split code path are > correct. I'm not sure though. > I also don't think we need any special handling for insertions. > I'm wondering how to test all this. I'm thinking about a program that > repeatedly creates a hash index and then slowly adds more things to it > so that buckets split (maybe using distinct keys carefully crafted to > hit the same bucket?), while concurrently hammering it with a ton of > scans and then ... somehow checking correctness... > Yeah, that will generate the required errors, but not sure how to verify correctness. One idea could be that when the split is in progress, we somehow stop it in-between (say by cancel request) and then run targeted selects and inserts on the bucket being scanned and bucket being populated. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Tue, Mar 6, 2018 at 11:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Mar 5, 2018 at 8:58 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> Yes, but I think it would be better if we call this once we are sure >>> that at least one tuple from the old bucket has been transferred >>> (consider if all tuples in the old bucket are dead). Apart from this, >>> I think this patch has missed handling the cases where we scan the >>> buckets when the split is in progress. In such cases, we scan both >>> old and new bucket, so I think we need to ensure that we take >>> PredicateLock on both the buckets during such scans. >> >> Hmm. Yeah. >> >> So, in _hash_first(), do you think we might just need this? >> >> if (H_BUCKET_BEING_POPULATED(opaque)) >> { >> ... >> old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket); >> ... >> old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE); >> + PredicateLockPage(rel, BufferGetBlockNumber(old_buf), >> scan->xs_snapshot); >> TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf)); >> >> That is, if you begin scanning a 'new' bucket, we remember the old >> bucket and go and scan that too, so we'd better predicate-lock both up >> front (or I suppose we could do it later when we visit that page, but >> here it can be done in a single place). >> > > Yeah, that can work, but I am slightly worried that we might actually > never scan the old bucket (say for queries with Limit clause) in which > case it might give false positives for insertions in old buckets. > I have changed the patch to address this point by acquiring predicate lock in _hash_readnext where it will acquire the lock only when it tries to scan the old bucket. I have also addressed another problem related to transfer of predicate locks during split such that it will transfer locks only when there is any tuple transferred from old to the new bucket. > >> I'm wondering how to test all this. I'm thinking about a program that >> repeatedly creates a hash index and then slowly adds more things to it >> so that buckets split (maybe using distinct keys carefully crafted to >> hit the same bucket?), while concurrently hammering it with a ton of >> scans and then ... somehow checking correctness... >> > > Yeah, that will generate the required errors, but not sure how to > verify correctness. One idea could be that when the split is in > progress, we somehow stop it in-between (say by cancel request) and > then run targeted selects and inserts on the bucket being scanned and > bucket being populated. > I have verified the bucket split scenario manually as below: Setup ------------ create table hash_tbl(id int4, p integer); insert into hash_tbl (id, p) select g, 10 from generate_series(1, 10) g; Analyze hash_tbl; create index hash_idx on hash_tbl using hash(p); Session-1 ---------------- begin isolation level serializable; set enable_seqscan=off; set enable_bitmapscan=off; set enable_indexonlyscan=on; select sum(p) from hash_tbl where p=10; sum ----- 100 (1 row) insert into hash_tbl (id, p) select g, 10 from generate_series(10, 1000) g; -- Via debugger, stop in _hash_splitbucket at line 1283 {.. LockBuffer(bucket_obuf, BUFFER_LOCK_EXCLUSIVE); ...} By this time split of bucket 1 is done but the split flag is not cleared. So, predicate lock from bucket-1 have been transferred to bucket-3 (new bucket). Session-2 ---------------- begin isolation level serializable; set enable_seqscan=off; set enable_bitmapscan=off; set enable_indexonlyscan=on; select sum(p) from hash_tbl where p=10; sum ----- 100 (1 row) Session-1 -------------- Commit; Session-2 ---------- postgres=# insert into hash_tbl (id, p) select g, 10 from generate_series(51, 60) g; ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during write. HINT: The transaction might succeed if retried. It got conflict while inserting in the new bucket (bucket number -3). I think this patch now addresses all the open issues. Let me know what do you think about it? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Sat, Mar 10, 2018 at 1:10 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Mar 6, 2018 at 11:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Mon, Mar 5, 2018 at 8:58 AM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> Yes, but I think it would be better if we call this once we are sure >>>> that at least one tuple from the old bucket has been transferred >>>> (consider if all tuples in the old bucket are dead). Apart from this, >>>> I think this patch has missed handling the cases where we scan the >>>> buckets when the split is in progress. In such cases, we scan both >>>> old and new bucket, so I think we need to ensure that we take >>>> PredicateLock on both the buckets during such scans. >>> >>> Hmm. Yeah. >>> >>> So, in _hash_first(), do you think we might just need this? >>> >>> if (H_BUCKET_BEING_POPULATED(opaque)) >>> { >>> ... >>> old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket); >>> ... >>> old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE); >>> + PredicateLockPage(rel, BufferGetBlockNumber(old_buf), >>> scan->xs_snapshot); >>> TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf)); >>> >>> That is, if you begin scanning a 'new' bucket, we remember the old >>> bucket and go and scan that too, so we'd better predicate-lock both up >>> front (or I suppose we could do it later when we visit that page, but >>> here it can be done in a single place). >>> >> >> Yeah, that can work, but I am slightly worried that we might actually >> never scan the old bucket (say for queries with Limit clause) in which >> case it might give false positives for insertions in old buckets. >> > > I have changed the patch to address this point by acquiring predicate > lock in _hash_readnext where it will acquire the lock only when it > tries to scan the old bucket. I have also addressed another problem > related to transfer of predicate locks during split such that it will > transfer locks only when there is any tuple transferred from old to > the new bucket. > Added some additional text in README in the attached patch to explain the new change in mechanism. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Alexander Korotkov
Date:
On Fri, Mar 2, 2018 at 6:57 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
My thought experiments about pseudo-pages and avoiding the split stuff
were not intended to get the patch kicked out. I thought for a while
that hash indexes were a special case and could benefit from
dispensing with those trickier problems. Upon further reflection, for
interesting size hash indexes pure hash value predicate tags wouldn't
be much better. Furthermore, if we do decide we want to use using x %
max_predicate_locks_per_relation to avoid having to escalate to
relation predicate locks at the cost of slightly higher collision rate
then we should consider that for the whole system (including heap page
predicate locking), not just hash indexes. Please consider those
ideas parked for now.
OK. While our potential pseudo-pages are identified as
"hash_value % some_constant_modulus", real bucket pages are very roughly
identified as "hash_value % number_of_index_pages". So, page number is
adoptive to index size, despite it costs us handling page split. In the same way,
locking in other index access methods is adoptive to an index size, so
that should be considered as useful feature which should be present in hash index
as well.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Alexander Korotkov
Date:
On Sat, Mar 3, 2018 at 2:53 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro
> If that is indeed a race, could it be fixed by
> calling PredicateLockPageSplit() at the start of _hash_splitbucket()
> instead?
>
Yes, but I think it would be better if we call this once we are sure
that at least one tuple from the old bucket has been transferred
(consider if all tuples in the old bucket are dead).
Is it really fair? For example, predicate lock can be held by session
which queried some key, but didn't find any corresponding tuple.
If we imagine this key should be in new bucket while all existing
tuples would be left in old bucket. As I get, in this case no locks
would be transferred since no tuples were moved to the new bucket.
So, further insertion to the new bucket wouldn't conflict with session,
which looked for non-existing key, while it should. Do it make sense?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Mon, Mar 12, 2018 at 12:18 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Sat, Mar 3, 2018 at 2:53 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro >> > If that is indeed a race, could it be fixed by >> > calling PredicateLockPageSplit() at the start of _hash_splitbucket() >> > instead? >> > >> >> Yes, but I think it would be better if we call this once we are sure >> that at least one tuple from the old bucket has been transferred >> (consider if all tuples in the old bucket are dead). > > > Is it really fair? For example, predicate lock can be held by session > which queried some key, but didn't find any corresponding tuple. > If we imagine this key should be in new bucket while all existing > tuples would be left in old bucket. As I get, in this case no locks > would be transferred since no tuples were moved to the new bucket. > So, further insertion to the new bucket wouldn't conflict with session, > which looked for non-existing key, while it should. Do it make sense? > Valid point, I think on split we should always transfer locks from old bucket to new bucket. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Mon, Mar 12, 2018 at 7:18 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Mar 12, 2018 at 12:18 AM, Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: >> On Sat, Mar 3, 2018 at 2:53 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> >>> On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro >>> > If that is indeed a race, could it be fixed by >>> > calling PredicateLockPageSplit() at the start of _hash_splitbucket() >>> > instead? >>> > >>> >>> Yes, but I think it would be better if we call this once we are sure >>> that at least one tuple from the old bucket has been transferred >>> (consider if all tuples in the old bucket are dead). >> >> >> Is it really fair? For example, predicate lock can be held by session >> which queried some key, but didn't find any corresponding tuple. >> If we imagine this key should be in new bucket while all existing >> tuples would be left in old bucket. As I get, in this case no locks >> would be transferred since no tuples were moved to the new bucket. >> So, further insertion to the new bucket wouldn't conflict with session, >> which looked for non-existing key, while it should. Do it make sense? >> > > Valid point, I think on split we should always transfer locks from old > bucket to new bucket. > Attached patch changes it as per above suggestion. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Alexander Korotkov
Date:
On Tue, Mar 13, 2018 at 4:57 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Mar 12, 2018 at 7:18 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Mar 12, 2018 at 12:18 AM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
>> On Sat, Mar 3, 2018 at 2:53 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> On Fri, Mar 2, 2018 at 9:27 AM, Thomas Munro
>>> > If that is indeed a race, could it be fixed by
>>> > calling PredicateLockPageSplit() at the start of _hash_splitbucket()
>>> > instead?
>>> >
>>>
>>> Yes, but I think it would be better if we call this once we are sure
>>> that at least one tuple from the old bucket has been transferred
>>> (consider if all tuples in the old bucket are dead).
>>
>>
>> Is it really fair? For example, predicate lock can be held by session
>> which queried some key, but didn't find any corresponding tuple.
>> If we imagine this key should be in new bucket while all existing
>> tuples would be left in old bucket. As I get, in this case no locks
>> would be transferred since no tuples were moved to the new bucket.
>> So, further insertion to the new bucket wouldn't conflict with session,
>> which looked for non-existing key, while it should. Do it make sense?
>>
>
> Valid point, I think on split we should always transfer locks from old
> bucket to new bucket.
>
Attached patch changes it as per above suggestion.
OK. Now patch looks good for me.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Thu, Mar 15, 2018 at 4:29 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Tue, Mar 13, 2018 at 4:57 PM, Amit Kapila <amit.kapila16@gmail.com> > wrote: >> >> > >> > Valid point, I think on split we should always transfer locks from old >> > bucket to new bucket. >> > >> >> Attached patch changes it as per above suggestion. > > > OK. Now patch looks good for me. > Thanks, I have marked the patch as RFC in CF app. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Teodor Sigaev
Date:
Thanks to everyone, pushed Alexander Korotkov wrote: > On Fri, Mar 2, 2018 at 6:57 AM, Thomas Munro > <thomas.munro@enterprisedb.com <mailto:thomas.munro@enterprisedb.com>> > wrote: > > My thought experiments about pseudo-pages and avoiding the split stuff > were not intended to get the patch kicked out. I thought for a while > that hash indexes were a special case and could benefit from > dispensing with those trickier problems. Upon further reflection, for > interesting size hash indexes pure hash value predicate tags wouldn't > be much better. Furthermore, if we do decide we want to use using x % > max_predicate_locks_per_relation to avoid having to escalate to > relation predicate locks at the cost of slightly higher collision rate > then we should consider that for the whole system (including heap page > predicate locking), not just hash indexes. Please consider those > ideas parked for now. > > > OK. While our potential pseudo-pages are identified as > "hash_value % some_constant_modulus", real bucket pages are very roughly > identified as "hash_value % number_of_index_pages". So, page number is > adoptive to index size, despite it costs us handling page split. In the > same way, > locking in other index access methods is adoptive to an index size, so > that should be considered as useful feature which should be present in > hash index > as well. > > ------ > Alexander Korotkov > Postgres Professional:http://www.postgrespro.com > <http://www.postgrespro.com/> > The Russian Postgres Company -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patchfor hash index
From
Amit Kapila
Date:
On Sat, Apr 7, 2018 at 7:47 PM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Thanks to everyone, pushed > Thanks! -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com