Re: 10.1: hash index size exploding on vacuum full analyze - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: 10.1: hash index size exploding on vacuum full analyze
Date
Msg-id CAA4eK1KWiXAQrAKNdpiQ6s682s0ywjYYUvZHKGCLrhYddBcLwA@mail.gmail.com
Whole thread Raw
In response to Re: 10.1: hash index size exploding on vacuum full analyze  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-bugs
On Fri, Nov 17, 2017 at 11:58 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> On Fri, Nov 17, 2017 at 7:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Nov 16, 2017 at 10:00 AM, AP <pgsql@inml.weebeastie.net> wrote:
>>> On Thu, Nov 16, 2017 at 09:48:13AM +0530, Amit Kapila wrote:
>>>> On Thu, Nov 16, 2017 at 4:59 AM, AP <pgsql@inml.weebeastie.net> wrote:
>>>> > I've some tables that'll never grow so I decided to replace a big index
>>>> > with one with a fillfactor of 100. That went well. The index shrunk to
>>>> > 280GB. I then did a vacuum full analyze on the table to get rid of any
>>>> > cruft (as the table will be static for a long time and then only deletes
>>>> > will happen) and the index exploded to 701GB. When it was created with
>>>> > fillfactor 90 (organically by filling the table) the index was 309GB.
>>>>
>>>> Sounds quite strange.  I think during vacuum it leads to more number
>>>> of splits than when the original data was loaded.  By any chance do
>>>> you have a copy of both the indexes (before vacuum full and after
>>>> vacuum full)?  Can you once check and share the output of
>>>> pgstattuple-->pgstathashindex() and pageinspect->hash_metapage_info()?
>>>>  I wanted to confirm if the bloat is due to additional splits.
>>>
>>> I'll see what I can do. Currently vacuuming the table without the index
>>> so that I can then do a create index concurrently and get back my 280GB
>>> index (it's how I got it in the first place). Namely:
>>>
>>
>> One possible theory could be that the calculation for initial buckets
>> required for the index has overestimated the number of buckets.  I
>> think this is possible because we choose the initial number of buckets
>> based on the number of tuples, but actually while inserting the values
>> we might have created more of overflow buckets rather than using the
>> newly created primary buckets.  The chances of such a misestimation
>> are more when there are duplicate values.  Now, if that is true, then
>> actually one should see the same size of the index (as you have seen
>> after vacuum full ..) when you create an index on the table with the
>> same values in index columns.
>>
>
> Amit,  I think what you are trying to put here is that the estimation
> on number of hash buckets required is calculated based on the number
> of tuples in the base table but during this calculation we are not
> aware of the fact that the table contains more of the duplicate values
> or not. If it contains more of a duplicate values then during index
> insertion it would start adding overflow page and many of the hash
> index buckets added at start i.e. during hash index size estimation
> would remain unused. If this is true then i think hash index would not
> be the right choice.
>

Hmm, I am not sure that is the conclusion we can draw from this
behavior as we can change it if required.  However, before drawing any
conclusions based on this theory, we should first try to find what is
the actual problem.

> However, this is might not be exactly related to
> what AP has reported here.
>

Yeah, quite possible.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: 10.1: hash index size exploding on vacuum full analyze
Next
From: Wim Bertels
Date:
Subject: psql ignores server port setting through alter system whenconnecting?