Re: 10.1: hash index size exploding on vacuum full analyze - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: 10.1: hash index size exploding on vacuum full analyze
Date
Msg-id CAA4eK1K2ynh=_gDHYydVXHvG+Lk=xyY-Pb9n86QjbtY768jm-A@mail.gmail.com
Whole thread Raw
In response to Re: 10.1: hash index size exploding on vacuum full analyze  (AP <pgsql@inml.weebeastie.net>)
Responses Re: 10.1: hash index size exploding on vacuum full analyze  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-bugs
On Thu, Nov 16, 2017 at 10:00 AM, AP <pgsql@inml.weebeastie.net> wrote:
> On Thu, Nov 16, 2017 at 09:48:13AM +0530, Amit Kapila wrote:
>> On Thu, Nov 16, 2017 at 4:59 AM, AP <pgsql@inml.weebeastie.net> wrote:
>> > I've some tables that'll never grow so I decided to replace a big index
>> > with one with a fillfactor of 100. That went well. The index shrunk to
>> > 280GB. I then did a vacuum full analyze on the table to get rid of any
>> > cruft (as the table will be static for a long time and then only deletes
>> > will happen) and the index exploded to 701GB. When it was created with
>> > fillfactor 90 (organically by filling the table) the index was 309GB.
>>
>> Sounds quite strange.  I think during vacuum it leads to more number
>> of splits than when the original data was loaded.  By any chance do
>> you have a copy of both the indexes (before vacuum full and after
>> vacuum full)?  Can you once check and share the output of
>> pgstattuple-->pgstathashindex() and pageinspect->hash_metapage_info()?
>>  I wanted to confirm if the bloat is due to additional splits.
>
> I'll see what I can do. Currently vacuuming the table without the index
> so that I can then do a create index concurrently and get back my 280GB
> index (it's how I got it in the first place). Namely:
>

One possible theory could be that the calculation for initial buckets
required for the index has overestimated the number of buckets.  I
think this is possible because we choose the initial number of buckets
based on the number of tuples, but actually while inserting the values
we might have created more of overflow buckets rather than using the
newly created primary buckets.  The chances of such a misestimation
are more when there are duplicate values.  Now, if that is true, then
actually one should see the same size of the index (as you have seen
after vacuum full ..) when you create an index on the table with the
same values in index columns.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: BUG #14915: Create sub-partitioning using GENERATED ALWAYS ASIDENTITY will lead to system collapse.
Next
From: Amit Langote
Date:
Subject: Re: BUG #14915: Create sub-partitioning using GENERATED ALWAYS ASIDENTITY will lead to system collapse.