Re: 10.1: hash index size exploding on vacuum full analyze - Mailing list pgsql-bugs

From Amit Kapila
Subject Re: 10.1: hash index size exploding on vacuum full analyze
Date
Msg-id CAA4eK1+PYgxOKr-wJiOF37BWFdPjH6O-OyNN1V+58T=gtyY6tg@mail.gmail.com
Whole thread Raw
In response to Re: 10.1: hash index size exploding on vacuum full analyze  (AP <pgsql@inml.weebeastie.net>)
Responses Re: 10.1: hash index size exploding on vacuum full analyze  (AP <pgsql@inml.weebeastie.net>)
Re: 10.1: hash index size exploding on vacuum full analyze  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
On Mon, Nov 20, 2017 at 5:01 AM, AP <pgsql@inml.weebeastie.net> wrote:
> On Thu, Nov 16, 2017 at 09:48:13AM +0530, Amit Kapila wrote:
>> On Thu, Nov 16, 2017 at 4:59 AM, AP <pgsql@inml.weebeastie.net> wrote:
>> > I've some tables that'll never grow so I decided to replace a big index
>> > with one with a fillfactor of 100. That went well. The index shrunk to
>> > 280GB. I then did a vacuum full analyze on the table to get rid of any
>> > cruft (as the table will be static for a long time and then only deletes
>> > will happen) and the index exploded to 701GB. When it was created with
>> > fillfactor 90 (organically by filling the table) the index was 309GB.
>
> FYI: Nuking the above and doing a create index run gave me a 280GB index again.
>
>> Sounds quite strange.  I think during vacuum it leads to more number
>> of splits than when the original data was loaded.  By any chance do
>> you have a copy of both the indexes (before vacuum full and after
>> vacuum full)?  Can you once check and share the output of
>> pgstattuple-->pgstathashindex() and pageinspect->hash_metapage_info()?
>>  I wanted to confirm if the bloat is due to additional splits.
>
> Before VACUUM FULL:
>
>     Schema     |                        Name                        | Type  |   Owner   |                 Table
         |  Size  | Description
 
>
---------------+----------------------------------------------------+-------+-----------+---------------------------------------+--------+-------------
>  bundle_link   | be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx | index | mdkingpin |
be5be0ac8_3ba5_407d_8183_2d05b9387e81| 273 GB |
 
>
> mdstash=# select * from
hash_metapage_info(get_raw_page('bundle_link.be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx',0))\gx
 
> -[ RECORD 1 ]---...---
> magic     | 105121344
> version   | 4
> ntuples   | 9123458028
> ffactor   | 409
> bsize     | 8152
> bmsize    | 4096
> bmshift   | 15
> maxbucket | 25165823
> highmask  | 33554431
> lowmask   | 16777215
> ovflpoint | 71
> firstfree | 10623106
> nmaps     | 325
> procid    | 456
>
...
> Time: 0.613 ms
>
> mdstash=# select * from pgstathashindex('bundle_link.be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx');
>  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent
> ---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>        4 |     25165824 |       10622781 |          325 |            0 | 9123458028 |          0 | 37.4567373970968
> (1 row)
>
> Time: 2002419.406 ms (33:22.419)
>
> After VACUUM FULL:
>
>     Schema     |                        Name                        | Type  |   Owner   |                 Table
         |  Size  | Description
 
>
---------------+----------------------------------------------------+-------+-----------+---------------------------------------+--------+-------------
>  bundle_link   | be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx | index | mdkingpin |
be5be0ac8_3ba5_407d_8183_2d05b9387e81| 701 GB |
 
>
> -[ RECORD 1 ]---...---
> magic     | 105121344
> version   | 4
> ntuples   | 9123458028
> ffactor   | 409
> bsize     | 8152
> bmsize    | 4096
> bmshift   | 15
> maxbucket | 83886079
> highmask  | 134217727
> lowmask   | 67108863
> ovflpoint | 78
> firstfree | 7996259
> nmaps     | 245
> procid    | 456
>
> Time: 69.237 ms
>
> mdstash=# select * from pgstathashindex('bundle_link.be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx');
>  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items |   free_percent
> ---------+--------------+----------------+--------------+--------------+------------+------------+------------------
>        4 |     83886080 |        7996014 |          245 |            0 | 9123458028 |          0 | 75.6390388675015
> (1 row)
>
> Time: 2474290.172 ms (41:14.290)
>

Based on above data, we can easily see that after vacuum full, there
is a huge increase in free_percent which is mostly due to the
additional bucket_pages after vacuum full.  See the below calculation:

Index size difference = 701 - 273 = 428GB

Bucket page size difference = 83886080 - 25165824 = 58720256 = 448GB

Overflow page size difference = 7996014 - 10622781 = -2626767 = -20GB

So, if we just add the difference of bucket pages and overflow pages
size, it will give us the difference of size you are seeing after
vacuum full.   So, this clearly indicates the theory I was speculating
above that somehow the estimated number of tuples (based on which
number of buckets are computed) is different when we do a vacuum full. On further looking into it, I found that the
relcacheentry for a
 
relation doesn't have the correct value for relpages and reltuples
during vacuum full due to which estimate_rel_size can give some size
which might be quite different and then hashbuild can create buckets
which it might not even need to populate the tuples.  I am not sure if
it is expected to have uninitialized (0) values for these attributes
during vacuum full, but I see that in swap_relation_files when we swap
the statistics, we are assuming that new rel has freshly-updated stats
which I think is not true.  This needs some further investigation.

Another angle to look at it is that even if the values of relpages and
reltuples is not updated why we get such a wrong estimation by
estimate_rel_size.  I think to some extent it depends on the schema of
the table, so is it possible for you to share schema of the table.


> Tell me if you need me to keep the index around.
>

I don't think so, but till we solve the problem there is no harm in
keeping it if possible because one might want some information at a
later stage to debug this problem.  OTOH, if you have space crunch
then feel free to delete it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: tanes@siamscan.net
Date:
Subject: BUG #14919: Invalid column in sub select is still a valid select
Next
From: Boris Sagadin
Date:
Subject: Re: BUG #14917: process hang on create index