Re: Cache Hash Index meta page. - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Cache Hash Index meta page.
Date
Msg-id CA+Tgmoa4b_34D3vp7nCdyHGR+QCKCBUFGZenNsHDfDFwLEb60Q@mail.gmail.com
Whole thread Raw
In response to Re: Cache Hash Index meta page.  (Mithun Cy <mithun.cy@enterprisedb.com>)
Responses Re: [HACKERS] Cache Hash Index meta page.  (Mithun Cy <mithun.cy@enterprisedb.com>)
List pgsql-hackers
On Mon, Dec 5, 2016 at 2:58 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
>As the concurrent hash index patch was committed in 6d46f4 this patch needs a rebase.

Thanks Jesper,

Adding the rebased patch.

-        bucket = _hash_hashkey2bucket(hashkey,
-                                      metap->hashm_maxbucket,
+        bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
                                       metap->hashm_highmask,
                                       metap->hashm_lowmask);

This hunk appears useless.

+        metap = (HashMetaPage)rel->rd_amcache;

Whitespace.

+        /*  Cache the metapage data for next time*/

Whitespace.

+        /* Check if this bucket is split after we have cached the metapage.

Whitespace.

Shouldn't _hash_doinsert() be using the cache, too?

I think it's probably not a good idea to cache the entire metapage.  The only things that you are "really" trying to cache, I think, are hashm_maxbucket, hashm_lowmask, and hashm_highmask.  The entire HashPageMetaData structure is 696 bytes on my machine, and it doesn't really make sense to copy the whole thing into memory if you only need 16 bytes of it.  It could even be dangerous -- if somebody tries to rely on the cache for some other bit of data and we're not really guaranteeing that it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members hmc_maxbucket, hmc_lowmask, and hmc_highmask.  The structure can have a comment explaining that we only care about having the data be fresh enough to test whether the bucket mapping we computed for a tuple is still correct, and that for that to be the case we only need to know whether a bucket has suffered a new split since we last refreshed the cache.

The comments in this patch need some work, e.g.:

-
+       oopaque->hasho_prevblkno = maxbucket;

No comment?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: Separate connection handling from backends
Next
From: "David G. Johnston"
Date:
Subject: Re: Typmod associated with multi-row VALUES constructs