Thread: Cache Hash Index meta page.
I have created a patch to cache the meta page of Hash index in backend-private memory. This is to save reading the meta page buffer every time when we want to find the bucket page. In “_hash_first” call, we try to read meta page buffer twice just to make sure bucket is not split after we found bucket page. With this patch meta page buffer read is not done, if the bucket is not split after caching the meta page.
Idea is to cache the Meta page data in rd_amcache and store maxbucket number in hasho_prevblkno of bucket primary page (which will always be NULL other wise, so reusing it here for this cause!!!). So when we try to do hash lookup for bucket page if locally cached maxbucket number is greater than or equal to bucket page's maxbucket number then we can say given bucket is not split after we have cached the meta page. Hence avoid reading meta page buffer.
I have attached the benchmark results and perf stats (refer hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2: Benchmark results). There we can see improvements at higher clients, as lwlock contentions due to buffer read are more at higher clients. If I apply the same patch on Amit's concurrent hash index patch [1] we can see improvements at lower clients also. Amit's patch has removed a heavy weight page lock which was the bottle neck at lower clients.
--
Attachment
On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > I have created a patch to cache the meta page of Hash index in > backend-private memory. This is to save reading the meta page buffer every > time when we want to find the bucket page. In “_hash_first” call, we try to > read meta page buffer twice just to make sure bucket is not split after we > found bucket page. With this patch meta page buffer read is not done, if the > bucket is not split after caching the meta page. > > Idea is to cache the Meta page data in rd_amcache and store maxbucket number > in hasho_prevblkno of bucket primary page (which will always be NULL other > wise, so reusing it here for this cause!!!). If it is otherwise unused, shouldn't we rename the field to reflect what it is now used for? What happens on a system which has gone through pg_upgrade? Are we sure that those on-disk representations will always have InvalidBlockNumber in that fields? If not, then it seems we can't support pg_upgrade at all. If so, I don't see a provision for properly dealing with pages which still have InvalidBlockNumber in them. Unless I am missing something, the code below will always think it found the right bucket in such cases, won't it? if (opaque->hasho_prevblkno <= metap->hashm_maxbucket) Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> I have created a patch to cache the meta page of Hash index in >> backend-private memory. This is to save reading the meta page buffer every >> time when we want to find the bucket page. In “_hash_first” call, we try to >> read meta page buffer twice just to make sure bucket is not split after we >> found bucket page. With this patch meta page buffer read is not done, if the >> bucket is not split after caching the meta page. Is this really safe? The metapage caching in btree is all right because the algorithm is guaranteed to work even if it starts with a stale idea of where the root page is. I do not think the hash code is equally robust about stale data in its metapage. >> Idea is to cache the Meta page data in rd_amcache and store maxbucket number >> in hasho_prevblkno of bucket primary page (which will always be NULL other >> wise, so reusing it here for this cause!!!). > If it is otherwise unused, shouldn't we rename the field to reflect > what it is now used for? No, because on other pages that still means what it used to. Nonetheless, I agree that's a particularly ugly wart, and probably a dangerous one. > What happens on a system which has gone through pg_upgrade? That being one reason why. It might be okay if we add another hasho_flag bit saying that hasho_prevblkno really contains a maxbucket number, and then add tests for that bit everyplace that hasho_prevblkno is referenced. regards, tom lane
On Thu, Aug 4, 2016 at 3:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >>> I have created a patch to cache the meta page of Hash index in >>> backend-private memory. This is to save reading the meta page buffer every >>> time when we want to find the bucket page. In “_hash_first” call, we try to >>> read meta page buffer twice just to make sure bucket is not split after we >>> found bucket page. With this patch meta page buffer read is not done, if the >>> bucket is not split after caching the meta page. > > Is this really safe? The metapage caching in btree is all right because > the algorithm is guaranteed to work even if it starts with a stale idea of > where the root page is. I do not think the hash code is equally robust > about stale data in its metapage. > I think stale data in metapage could only cause problem if it leads to a wrong calculation of bucket based on hashkey. I think that shouldn't happen. It seems to me that the safety comes from the fact that required fields (lowmask/highmask) to calculate the bucket won't be changed more than once without splitting the current bucket (which we are going to scan). Do you see a problem in hashkey to bucket mapping (_hash_hashkey2bucket), if the lowmask/highmask are changed by one additional table half or do you have something else in mind? > >> What happens on a system which has gone through pg_upgrade? > > That being one reason why. It might be okay if we add another hasho_flag > bit saying that hasho_prevblkno really contains a maxbucket number, and > then add tests for that bit everyplace that hasho_prevblkno is referenced. > Good idea. - if (retry) + if (opaque->hasho_prevblkno <= metap->hashm_maxbucket) This code seems to be problematic with respect to upgrades, because hasho_prevblkno will be initialized to 0xFFFFFFFF without the patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 07/22/2016 06:02 AM, Mithun Cy wrote: > I have created a patch to cache the meta page of Hash index in > backend-private memory. This is to save reading the meta page buffer every > time when we want to find the bucket page. In “_hash_first” call, we try to > read meta page buffer twice just to make sure bucket is not split after we > found bucket page. With this patch meta page buffer read is not done, if > the bucket is not split after caching the meta page. > > Idea is to cache the Meta page data in rd_amcache and store maxbucket > number in hasho_prevblkno of bucket primary page (which will always be NULL > other wise, so reusing it here for this cause!!!). So when we try to do > hash lookup for bucket page if locally cached maxbucket number is greater > than or equal to bucket page's maxbucket number then we can say given > bucket is not split after we have cached the meta page. Hence avoid reading > meta page buffer. > > I have attached the benchmark results and perf stats (refer > hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2: > Benchmark results). There we can see improvements at higher clients, as > lwlock contentions due to buffer read are more at higher clients. If I > apply the same patch on Amit's concurrent hash index patch [1] we can see > improvements at lower clients also. Amit's patch has removed a heavy weight > page lock which was the bottle neck at lower clients. > Could you provide a rebased patch based on Amit's v5 ? Best regards, Jesper
On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com> wrote:
> Could you provide a rebased patch based on Amit's v5 ?
Please find the the patch, based on Amit's V5.
I have fixed following things
1. now in "_hash_first" we check if (opaque->hasho_prevblkno == InvalidBlockNumber) to see if bucket is from older version hashindex and has been upgraded. Since as of now InvalidBlockNumber is one value greater than maximum value the variable "metap->hashm_maxbucket" can be set (see _hash_expandtable). We can distinguish it from rest. I tested the upgrade issue reported by amit. It is fixed now.
2. One case which buckets hasho_prevblkno is used is where we do backward scan. So now before testing for previous block number I test whether current page is bucket page if so we end the bucket scan (see changes in _hash_readprev). On other places where hasho_prevblkno is used it is not for bucket page, so I have not put any extra check to verify if is a bucket page.
Attachment
On Tue, Sep 6, 2016 at 12:20 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com> > wrote: >> Could you provide a rebased patch based on Amit's v5 ? > > Please find the the patch, based on Amit's V5. > I think you want to say based on patch in the below mail: https://www.postgresql.org/message-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm%2Bqn0jZX31-obXrJw%40mail.gmail.com It is better if we can provide the link for a patch on which the current patch is based on, that will help people to easily identify the dependent patches. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 09/05/2016 02:50 PM, Mithun Cy wrote: > On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com> > wrote: >> Could you provide a rebased patch based on Amit's v5 ? > > Please find the the patch, based on Amit's V5. > > I have fixed following things > > 1. now in "_hash_first" we check if (opaque->hasho_prevblkno == > InvalidBlockNumber) to see if bucket is from older version hashindex and > has been upgraded. Since as of now InvalidBlockNumber is one value greater > than maximum value the variable "metap->hashm_maxbucket" can be set (see > _hash_expandtable). We can distinguish it from rest. I tested the upgrade > issue reported by amit. It is fixed now. > > 2. One case which buckets hasho_prevblkno is used is where we do backward > scan. So now before testing for previous block number I test whether > current page is bucket page if so we end the bucket scan (see changes in > _hash_readprev). On other places where hasho_prevblkno is used it is not > for bucket page, so I have not put any extra check to verify if is a bucket > page. > I think that the + pageopaque->hasho_prevblkno = metap->hashm_maxbucket; trick should be documented in the README, as hashm_maxbucket is defined as uint32 where as hasho_prevblkno is a BlockNumber. (All bucket variables should probably use the Bucket definition instead of uint32). For the archives, this patch conflicts with the WAL patch [1]. [1] https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com Best regards, Jesper
> For the archives, this patch conflicts with the WAL patch [1].
> [1] https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFp nsSmxZKingrRH7WNyWULJeEJSj1-% 3D0w%40mail.gmail.com
Attachment
On Thu, Sep 8, 2016 at 11:21 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:> For the archives, this patch conflicts with the WAL patch [1].
> [1] https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFp nsSmxZKingrRH7WNyWULJeEJSj1-%3 D0w%40mail.gmail.com Updated the patch it applies over Amit's concurrent hash index[1] and Amit's wal for hash index patch[2] together.
Attachment
On Thu, Sep 29, 2016 at 12:55 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > > I think that this needs to be updated again for v8 of concurrent and v5 > of wal > > Adding the rebased patch over [1] + [2] > > [1] Concurrent Hash index. > [2] Wal for hash index. Moved to next CF. -- Michael
I have created a patch to cache the meta page of Hash index in backend-private memory. This is to save reading the meta page buffer every time when we want to find the bucket page. In “_hash_first” call, we try to read meta page buffer twice just to make sure bucket is not split after we found bucket page. With this patch meta page buffer read is not done, if the bucket is not split after caching the meta page.
Idea is to cache the Meta page data in rd_amcache and store maxbucket number in hasho_prevblkno of bucket primary page (which will always be NULL other wise, so reusing it here for this cause!!!). So when we try to do hash lookup for bucket page if locally cached maxbucket number is greater than or equal to bucket page's maxbucket number then we can say given bucket is not split after we have cached the meta page. Hence avoid reading meta page buffer.
I have attached the benchmark results and perf stats (refer hash_index_perf_stat_and_
benchmarking.odc [sheet 1: perf stats; sheet 2: Benchmark results). There we can see improvements at higher clients, as lwlock contentions due to buffer read are more at higher clients. If I apply the same patch on Amit's concurrent hash index patch [1] we can see improvements at lower clients also. Amit's patch has removed a heavy weight page lock which was the bottle neck at lower clients.
On 09/28/2016 11:55 AM, Mithun Cy wrote: > On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > > I think that this needs to be updated again for v8 of concurrent and v5 > of wal > > Adding the rebased patch over [1] + [2] > As the concurrent hash index patch was committed in 6d46f4 this patch needs a rebase. I have moved this submission to the next CF. Thanks for working on this ! Best regards, Jesper
Clients | Cache Meta Page patch | Base code with amits changes | %imp |
1 | 17062.513102 | 17218.353817 | -0.9050848685 |
8 | 138525.808342 | 128149.381759 | 8.0971335488 |
16 | 212278.44762 | 205870.456661 | 3.1126326054 |
32 | 369453.224112 | 360423.566937 | 2.5052904425 |
64 | 576090.293018 | 510665.044842 | 12.8117733604 |
96 | 686813.187117 | 504950.885867 | 36.0158396272 |
104 | 688932.67516 | 498365.55841 | 38.2384202789 |
128 | 730728.526322 | 409011.008553 | 78.6574226711 |
Appears there is a good improvement at higher clients.
Attachment
Clients
Cache Meta Page patch
Base code with amits changes
%imp
1
17062.513102
17218.353817
-0.9050848685
8
138525.808342
128149.381759
8.0971335488
16
212278.44762
205870.456661
3.1126326054
32
369453.224112
360423.566937
2.5052904425
64
576090.293018
510665.044842
12.8117733604
96
686813.187117
504950.885867
36.0158396272
104
688932.67516
498365.55841
38.2384202789
128
730728.526322
409011.008553
78.6574226711
On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:>As the concurrent hash index patch was committed in 6d46f4 this patch needs a rebase.Thanks Jesper,Adding the rebased patch.
- bucket = _hash_hashkey2bucket(hashkey,
- metap->hashm_maxbucket,
+ bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
metap->hashm_highmask,
metap->hashm_lowmask);
+ metap = (HashMetaPage)rel->rd_amcache;
+ /* Cache the metapage data for next time*/
+ /* Check if this bucket is split after we have cached the metapage.
-
+ oopaque->hasho_prevblkno = maxbucket;
+ bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
metap->hashm_highmask,
metap->hashm_lowmask);This hunk appears useless.
+ metap = (HashMetaPage)rel->rd_amcache;Whitespace.
+ /* Cache the metapage data for next time*/Whitespace.
+ /* Check if this bucket is split after we have cached the metapage.Whitespace.
Shouldn't _hash_doinsert() be using the cache, too?
Clients | After Meta page cache | Base Code | %imp |
1 | 2276.151633 | 2304.253611 | -1.2195696631 |
32 | 36816.596513 | 36439.552652 | 1.0347104549 |
64 | 50943.763133 | 51005.236973 | -0.120524565 |
128 | 49156.980457 | 48458.275106 | 1.4418700407 |
I think it's probably not a good idea to cache the entire metapage. The only things that you are "really" trying to cache, I think, are hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire HashPageMetaData structure is 696 bytes on my machine, and it doesn't really make sense to copy the whole thing into memory if you only need 16 bytes of it. It could even be dangerous -- if somebody tries to rely on the cache for some other bit of data and we're not really guaranteeing that it's fresh enough for that.I'd suggest defining a new structure HashMetaDataCache with members hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a comment explaining that we only care about having the data be fresh enough to test whether the bucket mapping we computed for a tuple is still correct, and that for that to be the case we only need to know whether a bucket has suffered a new split since we last refreshed the cache.
uint32 hashm_spares[HASH_MAX_SPLITPOINTS], for bucket number to block mapping in "BUCKET_TO_BLKNO(metap, bucket)".
Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) = 35*4 = 140 bytes.
The comments in this patch need some work, e.g.:
-
+ oopaque->hasho_prevblkno = maxbucket;No comment?
+ */
+ if (itemsz > HashMaxItemSize((Page) metap))
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("index row size %zu exceeds hash maximum %zu",
+ itemsz, HashMaxItemSize((Page) metap)),
+ errhint("Values larger than a buffer page cannot be indexed.")));
"metap" (HashMetaPage) and Page are different data structure their member types are not in sync, so should not typecast blindly as above. I think we should remove this part of the code as we only store hash keys. So I have removed same but kept the assert below as it is.
Attachment
Shouldn't _hash_doinsert() be using the cache, tooYes, we have an opportunity there, added same in code. But one difference is at the end at-least once we need to read the meta page to split and/or write. Performance improvement might not be as much as read-only.
I think it's probably not a good idea to cache the entire metapage. The only things that you are "really" trying to cache, I think, are hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire HashPageMetaData structure is 696 bytes on my machine, and it doesn't really make sense to copy the whole thing into memory if you only need 16 bytes of it. It could even be dangerous -- if somebody tries to rely on the cache for some other bit of data and we're not really guaranteeing that it's fresh enough for that.I'd suggest defining a new structure HashMetaDataCache with members hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a comment explaining that we only care about having the data be fresh enough to test whether the bucket mapping we computed for a tuple is still correct, and that for that to be the case we only need to know whether a bucket has suffered a new split since we last refreshed the cache.It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3 uint32s) but we also needuint32 hashm_spares[HASH_MAX_
SPLITPOINTS], for bucket number to block mapping in "BUCKET_TO_BLKNO(metap, bucket)". Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) = 35*4 = 140 bytes.
Apart from this, there seems to be some base bug in _hash_doinsert().+ * XXX this is useless code if we are only storing hash keys.+ */
+ if (itemsz > HashMaxItemSize((Page) metap))
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_
LIMIT_EXCEEDED), + errmsg("index row size %zu exceeds hash maximum %zu",
+ itemsz, HashMaxItemSize((Page) metap)),
+ errhint("Values larger than a buffer page cannot be indexed.")));
"metap" (HashMetaPage) and Page are different data structure their member types are not in sync, so should not typecast blindly as above. I think we should remove this part of the code as we only store hash keys. So I have removed same but kept the assert below as it is.
On Fri, Dec 16, 2016 at 5:16 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:Shouldn't _hash_doinsert() be using the cache, tooYes, we have an opportunity there, added same in code. But one difference is at the end at-least once we need to read the meta page to split and/or write. Performance improvement might not be as much as read-only.Why would you need to read it at least once in one case but not the other?
Well, I guess that makes it more appealing to cache the whole page at least in terms of raw number of bytes, but I suppose my real complaint here is that there don't seem to be any clear rules for where, whether, and to what extent we can rely on the cache to be valid.
Without that, I think this patch is creating an extremely serious maintenance hazard for anyone who wants to try to modify this code in the future. A future patch author needs to be able to tell what data they can use and what data they can't use, and why.
typedef struct HashMetaPageData |
{ |
1. uint32 hashm_magic; /* magic no. for hash tables */ -- I think this should remain same. So can be read from catched meta page. |
2. uint32 hashm_version; /* version ID */ -- This is one time initied, never changed afterwards. So can be read from catched metapage. |
3. double hashm_ntuples; /* number of tuples stored in the table */ -- This changes on every insert. So should not be read from chached data. |
4. uint16 hashm_ffactor; /* target fill factor (tuples/bucket) */ -- This is one time initied, never changed afterwards. So can be read from catched metapage. |
5. uint16 hashm_bsize; /* index page size (bytes) */ -- This is one time initied, never changed afterwards. So can be read from catched metapage. |
6. uint16 hashm_bmsize; /* bitmap array size (bytes) - must be a power of 2 */ |
-- This is one time initied, never changed afterwards. So can be read from catched metapage |
7. uint16 hashm_bmshift; /* log2(bitmap array size in BITS) */ -- This is one time initied, never changed afterwards. So can be read from catched metapage |
8. If hashm_maxbucket, hashm_highmask and hashm_lowmask are all read and cached at same time when metapage was locked, then key to bucket number map function _hash_hashkey2bucket() should always produce same output. If bucket is split after caching above elements (which can be known because old bucket pages will never move once allocated and we mark bucket pages hasho_prevblkno with incremented hashm_highmask), we can invalidate them and re-read same from meta page. If your intention is not to save a metapage read while trying to map the key to buket page, then do not read them from cached meta page. uint32 hashm_maxbucket; /* ID of maximum bucket in use */ |
uint32 hashm_highmask; /* mask to modulo into entire table */ |
uint32 hashm_lowmask; /* mask to modulo into lower half of table */ |
9. uint32 hashm_ovflpoint;/* splitpoint from which ovflpgs being |
* allocated */ -- Since used for allocation of overflow pages, should get latest value directly from meta page. |
10. uint32 hashm_firstfree; /* lowest-number free ovflpage (bit#) */ -- Should always be read from metapage directly. |
11. uint32 hashm_nmaps; /* number of bitmap pages */ -- Should always be read from metapage directly. |
12. RegProcedure hashm_procid; /* hash procedure id from pg_proc */ -- Never used till now, When we use, need to look at it about its policy. |
13 uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* spare pages before |
* each splitpoint */ Spares before given split_point never change and bucket pages never move. So when combined with cached hashm_maxbucket, hashm_highmask and hashm_lowmask, all read at same time under lock, BUCKET_TO_BLKNO should always produce same output, pointing to right physical block. Should only be used to save a meta page read when we want to map key to bucket block as said above. |
14. BlockNumber hashm_mapp[HASH_MAX_BITMAPS]; /* blknos of ovfl bitmaps */ Always read from metapage durectly. |
} HashMetaPageData; |
Any existing bugs should be the subject of a separate patch.
--
On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > -- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from metapage or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go withthat approach, I will produce a neat patch for same. Plain text emails are preferred on this list. I don't have any confidence in this approach. I'm not sure exactly what needs to be changed here, but what you're doing right now is just too error-prone. There's a cached metapage available, and you've got code accessing directly, and that's OK except when it's not, and maybe we can add some comments to explain, but I don't think that's going to be good enough to really make it clear and maintainable. We need some kind of more substantive safeguard to prevent the cached metapage data from being used in unsafe ways -- and while we're at it, we should try to use it in as many of the places where it *is* safe as possible. My suggestion for a separate structure was one idea; another might be providing some kind of API that's always used to access the metapage cache. Or maybe there's a third option. But this, the way it is right now, is just too ad-hoc, even with more comments. IMHO, anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> -- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from metapage or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go withthat approach, I will produce a neat patch for same. > > Plain text emails are preferred on this list. > > I don't have any confidence in this approach. I'm not sure exactly > what needs to be changed here, but what you're doing right now is just > too error-prone. There's a cached metapage available, and you've got > code accessing directly, and that's OK except when it's not, and maybe > we can add some comments to explain, but I don't think that's going to > be good enough to really make it clear and maintainable. We need some > kind of more substantive safeguard to prevent the cached metapage data > from being used in unsafe ways -- and while we're at it, we should try > to use it in as many of the places where it *is* safe as possible. My > suggestion for a separate structure was one idea; another might be > providing some kind of API that's always used to access the metapage > cache. Or maybe there's a third option. > This metapage cache can be validated only when we have a bucket in which we have stored the maxbucket value. I think what we can do to localize the use of metapage cache is to write a new API which will return a bucket page locked in specified mode based on hashkey. Something like Buffer _hash_get_buc_buffer_from_hashkey(hashkey, lockmode). I think this will make metpage cache access somewhat similar to what we have in btree where we use cache to access rootpage. Will something like that address your concern? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Dec 22, 2016 at 12:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >>> -- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from metapage or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go withthat approach, I will produce a neat patch for same. >> >> Plain text emails are preferred on this list. Sorry, I have set the mail to plain text mode now. > I think this will make metpage cache access somewhat > similar to what we have in btree where we use cache to access > rootpage. Will something like that address your concern? Thanks, just like _bt_getroot I am introducing a new function _hash_getbucketbuf_from_hashkey() which will give the target bucket buffer for the given hashkey. This will use the cached metapage contents instead of reading meta page buffer if cached data is valid. There are 2 places which can use this service 1. _hash_first and 2. _hash_doinsert. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Dec 27, 2016 at 1:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: Oops, patch number should be 08 re-attaching same after renaming. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Dec 27, 2016 at 1:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Thu, Dec 22, 2016 at 12:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >>>> -- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from metapage or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go withthat approach, I will produce a neat patch for same. >>> >>> Plain text emails are preferred on this list. > > Sorry, I have set the mail to plain text mode now. > >> I think this will make metpage cache access somewhat >> similar to what we have in btree where we use cache to access >> rootpage. Will something like that address your concern? > > Thanks, just like _bt_getroot I am introducing a new function > _hash_getbucketbuf_from_hashkey() which will give the target bucket > buffer for the given hashkey. This will use the cached metapage > contents instead of reading meta page buffer if cached data is valid. > There are 2 places which can use this service 1. _hash_first and 2. > _hash_doinsert. > This version of the patch looks much better than the previous version. I have few review comments: 1. + * pin so we can use the metabuf while writing into to it below. + */ + metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE); usage "into to .." in above comment seems to be wrong. 2. - pageopaque->hasho_prevblkno = InvalidBlockNumber; + + /* + * Setting hasho_prevblkno of bucket page with latest maxbucket number + * to indicate bucket has been initialized and need to reconstruct + * HashMetaCache if it is older. + */ + pageopaque->hasho_prevblkno = metap->hashm_maxbucket; In the above comment, a reference to HashMetaCache is confusing, are your referring to any structure here? If you change this, consider to change the similar usage at other places in the patch as well. Also, it is not clear to what do you mean by ".. to indicate bucket has been initialized .."? assigning maxbucket as a special value to prevblkno is not related to initializing a bucket page. 3.typedef struct HashPageOpaqueData{ + /* + * If this is an ovfl page this stores previous ovfl (or bucket) blkno. + * Else if this is a bucket page we use this for a special purpose. We + * store hashm_maxbucket value, whenever this page is initialized or + * split. So this helps us to know whether the bucket has been split after + * caching the some of the meta page data. See _hash_doinsert(), + * _hash_first() to know how to use same. + */ In above comment, where you are saying ".. caching the some of the meta page data .." is slightly confusing, because it appears to me that you are caching whole of the metapage not some part of it. 4. +_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel, Generally, for _hash_* API's, we use rel as the first parameter, so I think it is better to maintain the same with this API as well. 5. _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket, - maxbucket, highmask, lowmask); + usedmetap->hashm_maxbucket, + usedmetap->hashm_highmask, + usedmetap->hashm_lowmask); I think we should add an Assert for the validity of usedmetap before using it. 6. Getting few compilation errors in assert-enabled build. 1>src/backend/access/hash/hashinsert.c(85): error C2065: 'bucket' : undeclared identifier 1>src/backend/access/hash/hashinsert.c(155): error C2065: 'bucket' : undeclared identifier 1>src/backend/access/hash/hashsearch.c(223): warning C4101: 'blkno' : unreferenced local variable 1> hashutil.c 1>\src\backend\access\hash\hashsearch.c(284): warning C4700: uninitialized local variable 'bucket' used 7. + /* + * Check if this bucket is split after we have cached the hash meta + * data. To do this we need to check whether cached maxbucket number + * is less than or equal to maxbucket number stored in bucket page, + * which was set with that times maxbucket number during bucket page + * splits. In case of upgrade hashno_prevblkno of old bucket page will + * be set with InvalidBlockNumber. And as of now maximum value the + * hashm_maxbucket can take is 1 less than InvalidBlockNumber (see + * _hash_expandtable). So an explicit check for InvalidBlockNumber in + * hasho_prevblkno will tell whether current bucket has been split + * after caching hash meta data. + */ I can understand what you want to say in above comment, but I think you can write it in somewhat shorter form. There is no need to explain the exact check. 8. @@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan, _hash_relbuf(rel, *bufp); *bufp = InvalidBuffer; + + /* If it is a bucket page there will not be a prevblkno. */ + if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE) + return; + I don't think above check is right. Even if it is a bucket page, we might need to scan bucket being populated, refer check else if (so->hashso_buc_populated && so->hashso_buc_split). Apart from that, you can't access bucket page after releasing the lock on it. Why have you added such a check? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Thanks Amit for detailed review, and pointing out various issues in the patch. I have tried to fix all of your comments as below. On Mon, Jan 2, 2017 at 11:29 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > 1. > usage "into to .." in above comment seems to be wrong.usage "into to .." in above comment seems to be wrong.usage "intoto .." in above comment seems to be wrong.usage "into to .." in above comment seems to be wrong. -- Fixed. > 2. > In the above comment, a reference to HashMetaCache is confusing, are > your referring to any structure here? If you change this, consider toyour referring to any structure here? If you changethis, consider toyour referring to any structure here? If you change this, consider toyour referring to any structurehere? If you change this, consider to > change the similar usage at other places in the patch as well.change the similar usage at other places in the patch aswell.change the similar usage at other places in the patch as well.change the similar usage at other places in the patchas well. -- Fixed. Removed HashMetaCache everywhere in the code. Where ever needed added HashMetaPageData. > Also, it is not clear to what do you mean by ".. to indicate bucketto indicate bucket > has been initialized .."? assigning maxbucket as a special value tohas been initialized .."? assigning maxbucket as aspecial value to > prevblkno is not related to initializing a bucket page.prevblkno is not related to initializing a bucket page. -- To be consistent with our definition of prevblkno, I am setting prevblkno with current hashm_maxbucket when we initialize the bucket page. I have tried to correct the comments accordingly. > 3. > In above comment, where you are saying ".. caching the some of the > meta page data .." is slightly confusing, because it appears to me > that you are caching whole of the metapage not some part of it. -- Fixed. Changed to caching the HashMetaPageData. > 4. > +_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel, > > Generally, for _hash_* API's, we use rel as the first parameter, so I > think it is better to maintain the same with this API as well. -- Fixed. > 5. > _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket, > - maxbucket, highmask, lowmask); > + usedmetap->hashm_maxbucket, > + usedmetap->hashm_highmask, > + usedmetap->hashm_lowmask); > I think we should add an Assert for the validity of usedmetap before using it. -- Fixed. Added Assert(usedmetap != NULL); > 6. Getting few compilation errors in assert-enabled build. > -- Fixed. Sorry, I missed handling bucket number which is needed at below codes. I have fixed same now. > 7. > I can understand what you want to say in above comment, but I think > you can write it in somewhat shorter form. There is no need to > explain the exact check. -- Fixed. I have tried to compress it into a few lines. > 8. > @@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan, > _hash_relbuf(rel, *bufp); > > *bufp = InvalidBuffer; > + > + /* If it is a bucket page there will not be a prevblkno. */ > + if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE) > + return; > + > > I don't think above check is right. Even if it is a bucket page, we > might need to scan bucket being populated, refer check else if > (so->hashso_buc_populated && so->hashso_buc_split). Apart from that, > you can't access bucket page after releasing the lock on it. Why have > you added such a check? > -- Fixed. That was a mistake, now I have fixed it. Actually, if bucket page is passed to _hash_readprev then there will not be a prevblkno. But from this patch onwards prevblkno of bucket page will store hashm_maxbucket. So a check BlockNumberIsValid (blkno) will not be valid anymore. I have fixed by adding as below. + /* If it is a bucket page there will not be a prevblkno. */ + if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE)) { + Assert(BlockNumberIsValid(blkno)); There are 2 other places which does same @_hash_freeovflpage, @_hash_squeezebucket. But that will only be called for overflow pages. So I did not make changes. But I think we should also change there to make it consistent. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Jan 3, 2017 at 12:05 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: As part performance/space analysis of hash index on varchar data type with this patch, I have run some tests for same with modified pgbench. Modification includes: 1. Changed aid column of pg_accounts table from int to varchar(x) 2. To generate unique values as before inserted stringified integer repeatedly to fill x. I have mainly tested for varchar(30) and varchar(90), Below document has the detailed report on our captured data. New hash indexes have some ~25% improved performance over btree. And, as expected very space efficient. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Jan 4, 2017 at 4:19 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > As part performance/space analysis of hash index on varchar data typevarchar data type > with this patch, I have run some tests for same with modified pgbench.with this patch, I have run some tests for same withmodified pgbench. I forgot to mention all these tests were run on power2 machine which has 192 2 hyperthreads. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: I have re-based the patch to fix one compilation warning @_hash_doinsert where variable bucket was only used for Asserting but was not declared about its purpose. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jan 5, 2017 at 11:38 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > I have re-based the patch to fix one compilation warning > @_hash_doinsert where variable bucket was only used for Asserting but > was not declared about its purpose. > Few more comments: 1.} + + +/* + * _hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given no need to two extra lines, one is sufficient and matches the nearby coding pattern. 2. + * old bucket in case of bucket split, see @_hash_doinsert(). Do you see anywhere else in the code the pattern of using @ symbol in comments before function name? In general, there is no harm in using it, but maybe it is better to be consistent with usage at other places. 3. + /* + * @_hash_getbucketbuf_from_hashkey we have verified the hasho_bucket. + * Should be safe to use further. + */ + bucket = pageopaque->hasho_bucket; /* * If this bucket is in the process of being split, try to finish the @@ -152,7 +107,9 @@ restart_insert: LockBuffer(buf, BUFFER_LOCK_UNLOCK); _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket, - maxbucket, highmask, lowmask); + usedmetap->hashm_maxbucket, after this change, i think you can directly use bucket in _hash_finish_split instead of pageopaque->hasho_bucket 4. @@ -154,8 +154,11 @@ _hash_readprev(IndexScanDesc scan, *bufp = InvalidBuffer; /* check for interrupts while we're not holdingany buffer lock */ CHECK_FOR_INTERRUPTS(); - if (BlockNumberIsValid(blkno)) + + /* If it is a bucket page there will not be a prevblkno. */ + if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE)) Won't the check similar to the existing check (if (*bufp == so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just above this code will suffice the need? If so, then you can check it once and use it in both places. 5. The reader and insertion algorithm needs to be updated in README. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Dec 27, 2016 at 3:06 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > Thanks, just like _bt_getroot I am introducing a new function > _hash_getbucketbuf_from_hashkey() which will give the target bucket > buffer for the given hashkey. This will use the cached metapage > contents instead of reading meta page buffer if cached data is valid. > There are 2 places which can use this service 1. _hash_first and 2. > _hash_doinsert. Can we adapt the ad-hoc caching logic in hashbulkdelete() to work with this new logic? Or at least update the comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > Few more comments: > 1. > no need to two extra lines, one is sufficient and matches the nearby > coding pattern. -- Fixed. > 2. > Do you see anywhere else in the code the pattern of using @ symbol in > comments before function name? -- Fixed. > 3. > > after this change, i think you can directly use bucket in > _hash_finish_split instead of pageopaque->hasho_bucket -- Fixed. > 4. > > Won't the check similar to the existing check (if (*bufp == > so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just > above this code will suffice the need? If so, then you can check it > once and use it in both places. > -- Fixed. > 5. The reader and insertion algorithm needs to be updated in README. -- Added info in README. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
> Can we adapt the ad-hoc caching logic in hashbulkdelete() to work with
> this new logic? Or at least update the comments?
I have introduced a new function _hash_getcachedmetap in patch 11 [1] with this hashbulkdelete() can use metapage cache instead of saving it locally.
[1] cache_hash_index_meta_page_11.patch
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 13, 2017 at 9:58 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: Below are review comments on latest version of patch. 1. /* - * Read the metapage to fetch original bucket and tuple counts. Also, we - * keep a copy of the last-seen metapage so that we can use its - * hashm_spares[] values to compute bucket page addresses. This is a bit - * hokey but perfectly safe, since the interesting entries in the spares - * array cannot change under us; and it beats rereading the metapage for - * each bucket. + * update and get the metapage cache data. */ - metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); - metap = HashPageGetMeta(BufferGetPage(metabuf)); - orig_maxbucket = metap->hashm_maxbucket; - orig_ntuples = metap->hashm_ntuples; - memcpy(&local_metapage, metap, sizeof(local_metapage)); - /* release the lock, but keep pin */ - LockBuffer(metabuf, BUFFER_LOCK_UNLOCK); + cachedmetap = _hash_getcachedmetap(rel, true); + orig_maxbucket = cachedmetap->hashm_maxbucket; + orig_ntuples = cachedmetap->hashm_ntuples; (a) I think you can retain the previous comment or modify it slightly. Just removing the whole comment and replacing it with a single line seems like a step backward. (b) Another somewhat bigger problem is that with this new change it won't retain the pin on meta page till the end which means we might need to perform an I/O again during operation to fetch the meta page. AFAICS, you have just changed it so that you can call new API _hash_getcachedmetap, if that's true, then I think you have to find some other way of doing it. BTW, why can't you design your new API such that it always take pinned metapage? You can always release the pin in the caller if required. I understand that you don't always need a metapage in that API, but I think the current usage of that API is also not that good. 2. + if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber || + bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket) + cachedmetap = _hash_getcachedmetap(rel, true); I don't understand the meaning of above if check. It seems like you will update the metapage when previous block number is not a valid block number which will be true at the first split. How will you ensure that there is a re-split and cached metapage is not relevant. I think if there is && in the above condition, then we can ensure it. 3. + Given a hashkey get the target bucket page with read lock, using cached + metapage. The getbucketbuf_from_hashkey method below explains the same. + All the sentences in algorithm start with small letters, then why do you need an exception for this sentence. I think you don't need to add an empty line. Also, I think the usage of getbucketbuf_from_hashkey seems out of place. How about writing it as: The usage of cached metapage is explained later. 4. + If target bucket is split before metapage data was cached then we are + done. + Else first release the bucket page and then update the metapage cache + with latest metapage data. I think it is better to retain original text of readme and add about meta page update. 5. + Loop: .. .. + Loop again to reach the new target bucket. No need to write "Loop again ..", that is implicit. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > 1. > (a) I think you can retain the previous comment or modify it slightly. > Just removing the whole comment and replacing it with a single line > seems like a step backward. -- Fixed, Just modified to say it > (b) Another somewhat bigger problem is that with this new change it > won't retain the pin on meta page till the end which means we might > need to perform an I/O again during operation to fetch the meta page. > AFAICS, you have just changed it so that you can call new API > _hash_getcachedmetap, if that's true, then I think you have to find > some other way of doing it. BTW, why can't you design your new API > such that it always take pinned metapage? You can always release the > pin in the caller if required. I understand that you don't always > need a metapage in that API, but I think the current usage of that API > is also not that good. -- Yes what you say is right. I wanted to make _hash_getcachedmetap API self-sufficient. But always 2 possible consecutive reads for metapage are connected as we pin the page to buffer to avoid I/O. Now redesigned the API such a way that caller pass pinned meta page if we want to set the metapage cache; So this gives us the flexibility to use the cached meta page data in different places. > 2. > + if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber || > + bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket) > + cachedmetap = _hash_getcachedmetap(rel, true); > > I don't understand the meaning of above if check. It seems like you > will update the metapage when previous block number is not a valid > block number which will be true at the first split. How will you > ensure that there is a re-split and cached metapage is not relevant. > I think if there is && in the above condition, then we can ensure it. > -- Oops that was a mistake corrected as you stated. > 3. > + Given a hashkey get the target bucket page with read lock, using cached > + metapage. The getbucketbuf_from_hashkey method below explains the same. > + > > All the sentences in algorithm start with small letters, then why do > you need an exception for this sentence. I think you don't need to > add an empty line. Also, I think the usage of > getbucketbuf_from_hashkey seems out of place. How about writing it > as: > > The usage of cached metapage is explained later. > -- Fixed as like you have asked. > > 4. > + If target bucket is split before metapage data was cached then we are > + done. > + Else first release the bucket page and then update the metapage cache > + with latest metapage data. > > I think it is better to retain original text of readme and add about > meta page update. > -- Fixed. Now where ever it is meaning full I have kept original wordings. But, the way we get to right target buffer after the latest split is slightly different than what the original code did. So there is a slight modification to show we use metapage cache. > 5. > + Loop: > .. > .. > + Loop again to reach the new target bucket. > > No need to write "Loop again ..", that is implicit. > -- Fixed as liked you have asked. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Jan 18, 2017 at 11:51 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> (b) Another somewhat bigger problem is that with this new change it >> won't retain the pin on meta page till the end which means we might >> need to perform an I/O again during operation to fetch the meta page. >> AFAICS, you have just changed it so that you can call new API >> _hash_getcachedmetap, if that's true, then I think you have to find >> some other way of doing it. BTW, why can't you design your new API >> such that it always take pinned metapage? You can always release the >> pin in the caller if required. I understand that you don't always >> need a metapage in that API, but I think the current usage of that API >> is also not that good. > > > -- Yes what you say is right. I wanted to make _hash_getcachedmetap > API self-sufficient. But always 2 possible consecutive reads for > metapage are connected as we pin the page to buffer to avoid I/O. Now > redesigned the API such a way that caller pass pinned meta page if we > want to set the metapage cache; So this gives us the flexibility to > use the cached meta page data in different places. > 1. @@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, .. .. + metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE); + cachedmetap = _hash_getcachedmetap(rel, metabuf); In the above flow, do we really need an updated metapage, can't we use the cached one? We are already taking care of bucket split down in that function. 2. +HashMetaPage +_hash_getcachedmetap(Relation rel, Buffer metabuf) +{ .. .. + if (BufferIsInvalid(metabuf)) + return (HashMetaPage) rel->rd_amcache; .. +_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access, + HashMetaPage *cachedmetap) { .. + if (!(metap = _hash_getcachedmetap(rel, InvalidBuffer))) + { + metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE); + metap = _hash_getcachedmetap(rel, metabuf); + Assert(metap != NULL); + } .. } The above two chunks of code look worse as compare to your previous patch. I think what we can do is keep the patch ready with both the versions of _hash_getcachedmetap API (as you have in _v11 and _v12) and let committer take the final call. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 24, 2017 at 3:10 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > 1. > @@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info, > In the above flow, do we really need an updated metapage, can't we use > the cached one? We are already taking care of bucket split down in > that function. Yes, we can use the old cached metap entry, the only reason I decided to use the latest metapage content is because the old code used to do that. And, cached metap is used to avoid ad-hoc local saving of same and hence unify the cached metap API. I did not intend to save the metapage read here which I thought will not be much useful if new buckets are added anyway we need to read the metapage at the end. I have taken you comments now I only read metap cache which is already set. > 2. > The above two chunks of code look worse as compare to your previous > patch. I think what we can do is keep the patch ready with both the > versions of _hash_getcachedmetap API (as you have in _v11 and _v12) > and let committer take the final call. _v11 API's was self-sustained one but it does not hold pins on the metapage buffer. Whereas in _v12 we hold the pin for two consecutive reads of metapage. I have taken your advice and producing 2 different patches. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jan 26, 2017 at 1:48 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > _v11 API's was self-sustained one but it does not hold pins on the > metapage buffer. Whereas in _v12 we hold the pin for two consecutive > reads of metapage. I have taken your advice and producing 2 different > patches. Hmm. I think both of these APIs have some advantages. On the one hand, passing metabuf sometimes allows you to avoid pin/unpin cycles - e.g. in hashbulkdelete it makes a fair amount of sense to keep the metabuf pinned once we've had to read it, just in case we need it again. On the other hand, it's surprising that passing a value for the metabuf forces the cached to be refreshed. I wonder if a good API might be something like this: HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool force_refresh); If the cache is initialized and force_refresh is not true, then this just returns the cached data, and the metabuf argument isn't used. Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to the metabuffer, pin and lock it, use it to set the cache, release the lock, and return with the pin still held. If *metabuf != InvalidBuffer, we assume it's pinned and return with it still pinned. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool > force_refresh); > > If the cache is initialized and force_refresh is not true, then this > just returns the cached data, and the metabuf argument isn't used. > Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to > the metabuffer, pin and lock it, use it to set the cache, release the > lock, and return with the pin still held. If *metabuf != > InvalidBuffer, we assume it's pinned and return with it still pinned. Thanks, Robert I have made a new patch which tries to do same. Now I think code looks less complicated. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool >> force_refresh); >> >> If the cache is initialized and force_refresh is not true, then this >> just returns the cached data, and the metabuf argument isn't used. >> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to >> the metabuffer, pin and lock it, use it to set the cache, release the >> lock, and return with the pin still held. If *metabuf != >> InvalidBuffer, we assume it's pinned and return with it still pinned. > > Thanks, Robert I have made a new patch which tries to do same. Now I > think code looks less complicated. Moved to CF 2017-03. -- Michael
On Wed, Feb 1, 2017 at 12:23 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >>> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool >>> force_refresh); >>> >>> If the cache is initialized and force_refresh is not true, then this >>> just returns the cached data, and the metabuf argument isn't used. >>> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to >>> the metabuffer, pin and lock it, use it to set the cache, release the >>> lock, and return with the pin still held. If *metabuf != >>> InvalidBuffer, we assume it's pinned and return with it still pinned. >> >> Thanks, Robert I have made a new patch which tries to do same. Now I >> think code looks less complicated. > > Moved to CF 2017-03. Committed with some changes (which I noted in the commit message). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017-02-07 18:41, Robert Haas wrote: > Committed with some changes (which I noted in the commit message). This has caused a warning with gcc 6.20: hashpage.c: In function ‘_hash_getcachedmetap’: hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this function [-Wmaybe-uninitialized] rel->rd_amcache = cache; ~~~~~~~~~~~~~~~~^~~~~~~ which hopefully can be prevented... thanks, Erik Rijkers
On Tue, Feb 7, 2017 at 11:21 PM, Erik Rijkers <er@xs4all.nl> wrote: > On 2017-02-07 18:41, Robert Haas wrote: >> >> Committed with some changes (which I noted in the commit message). Thanks, Robert and all who have reviewed the patch and given their valuable comments. > This has caused a warning with gcc 6.20: > > hashpage.c: In function ‘_hash_getcachedmetap’: > hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this > function [-Wmaybe-uninitialized] > rel->rd_amcache = cache; > ~~~~~~~~~~~~~~~~^~~~~~~ Yes, I also see the warning. I think the compiler is not able to see cache is always initialized and used under condition if (rel->rd_amcache == NULL). I think to make the compiler happy we can initialize the cache with NULL when it is defined. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Tue, Feb 7, 2017 at 1:52 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > On Tue, Feb 7, 2017 at 11:21 PM, Erik Rijkers <er@xs4all.nl> wrote: >> On 2017-02-07 18:41, Robert Haas wrote: >>> >>> Committed with some changes (which I noted in the commit message). > > Thanks, Robert and all who have reviewed the patch and given their > valuable comments. > >> This has caused a warning with gcc 6.20: >> >> hashpage.c: In function ‘_hash_getcachedmetap’: >> hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this >> function [-Wmaybe-uninitialized] >> rel->rd_amcache = cache; >> ~~~~~~~~~~~~~~~~^~~~~~~ > > Yes, I also see the warning. I think the compiler is not able to see > cache is always initialized and used under condition if > (rel->rd_amcache == NULL). > I think to make the compiler happy we can initialize the cache with > NULL when it is defined. Thanks for the reports and patch. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company