Re: [HACKERS] pgsql 10: hash indexes testing - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] pgsql 10: hash indexes testing
Date
Msg-id CAA4eK1+nywfLyrfQV4LBSe2DMTEdD2wCuKMY-wzSGQROsn5eWg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] pgsql 10: hash indexes testing  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] pgsql 10: hash indexes testing
Re: [HACKERS] pgsql 10: hash indexes testing
List pgsql-hackers
On Fri, Aug 4, 2017 at 8:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Aug 2, 2017 at 9:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jul 12, 2017 at 1:10 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>> Yes, I also think the same idea can be used, in fact, I have mentioned
>>>>> it [1] as soon as you have committed that patch.  Do we want to do
>>>>> anything at this stage for PG-10?  I don't think we should attempt
>>>>> something this late unless people feel this is a show-stopper issue
>>>>> for usage of hash indexes.  If required, I think a separate function
>>>>> can be provided to allow users to perform squeeze operation.
>>>>
>>>> Sorry, I have no idea how critical this squeeze thing is for the
>>>> newfangled hash indexes, so I cannot comment on that.  Does this make
>>>> the indexes unusable in some way under some circumstances?
>>>
>>> It seems so.  Basically, in the case of a large number of duplicates,
>>> we hit the maximum number of overflow pages.  There is a theoretical
>>> possibility of hitting it but it could also happen that we are not
>>> free the existing unused overflow pages due to which it keeps on
>>> growing and hit the limit.  I have requested up thread to verify if
>>> that is happening in this case and I am still waiting for same.  The
>>> squeeze operation does free such unused overflow pages after cleaning
>>> them.  As this is a costly operation and needs a cleanup lock, so we
>>> currently perform it only during Vacuum and next split from the bucket
>>> which can have redundant overflow pages.
>>
>> Oops.  It was rather short-sighted of us not to increase
>> HASH_MAX_BITMAPS when we bumped HASH_VERSION.  Actually removing that
>> limit is hard, but we could have easily bumped it for 128 to say 1024
>> without (I think) causing any problem, which would have given us quite
>> a bit of headroom here.
>
> Yes, that sounds sensible, but I think it will just delay the problem
> to happen.  I think here the actual problem is that we are not able to
> perform squeeze operation often enough that it frees the overflow
> pages.  Currently, we try to perform the squeeze only at the start of
> next split of the bucket or during vacuum.  The reason for doing it
> that way was that squeeze operation needs cleanup lock and we already
> have that during the start of split and vacuum. Now, to solve it I
> have already speculated few ways above [1] and among those, it is
> feasible to either do this at end of split which can have performance
> implications in some work loads, but will work fine for the case
> reported in this thread
>

I have implemented the patch with this approach as other approach
require quite extensive changes which I am not sure is the right thing
to do at this stage.

>
> I think we can fix it in one of above ways and increase the value of
> HASH_MAX_BITMAPS as well.
>

I have increased the number of hash bitmap pages as a separate patch.
I am not completely sure if it is a good idea to directly increase it
to 1024 as that will increase the size of hashmetapagedata from 960
bytes to 4544 bytes.  Shall we increase it to 512?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Rushabh Lathia
Date:
Subject: Re: [HACKERS] reload-through-the-top-parent switch the partition table
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] pgsql 10: hash indexes testing