Re: 10.1: hash index size exploding on vacuum full analyze - Mailing list pgsql-bugs
| From | Amit Kapila | 
|---|---|
| Subject | Re: 10.1: hash index size exploding on vacuum full analyze | 
| Date | |
| Msg-id | CAA4eK1+6BSGrm+NtUDhx59CNR51Ehbnmch9LpswMEyrLBBjKLg@mail.gmail.com Whole thread Raw | 
| In response to | Re: 10.1: hash index size exploding on vacuum full analyze (Amit Kapila <amit.kapila16@gmail.com>) | 
| Responses | Re: 10.1: hash index size exploding on vacuum full analyze Re: 10.1: hash index size exploding on vacuum full analyze | 
| List | pgsql-bugs | 
On Mon, Nov 20, 2017 at 1:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Nov 20, 2017 at 5:01 AM, AP <pgsql@inml.weebeastie.net> wrote: >> On Thu, Nov 16, 2017 at 09:48:13AM +0530, Amit Kapila wrote: >>> On Thu, Nov 16, 2017 at 4:59 AM, AP <pgsql@inml.weebeastie.net> wrote: >>> > I've some tables that'll never grow so I decided to replace a big index >>> > with one with a fillfactor of 100. That went well. The index shrunk to >>> > 280GB. I then did a vacuum full analyze on the table to get rid of any >>> > cruft (as the table will be static for a long time and then only deletes >>> > will happen) and the index exploded to 701GB. When it was created with >>> > fillfactor 90 (organically by filling the table) the index was 309GB. >> >> FYI: Nuking the above and doing a create index run gave me a 280GB index again. >> >>> Sounds quite strange. I think during vacuum it leads to more number >>> of splits than when the original data was loaded. By any chance do >>> you have a copy of both the indexes (before vacuum full and after >>> vacuum full)? Can you once check and share the output of >>> pgstattuple-->pgstathashindex() and pageinspect->hash_metapage_info()? >>> I wanted to confirm if the bloat is due to additional splits. >> >> > > Based on above data, we can easily see that after vacuum full, there > is a huge increase in free_percent which is mostly due to the > additional bucket_pages after vacuum full. See the below calculation: > > Index size difference = 701 - 273 = 428GB > > Bucket page size difference = 83886080 - 25165824 = 58720256 = 448GB > > Overflow page size difference = 7996014 - 10622781 = -2626767 = -20GB > > So, if we just add the difference of bucket pages and overflow pages > size, it will give us the difference of size you are seeing after > vacuum full. So, this clearly indicates the theory I was speculating > above that somehow the estimated number of tuples (based on which > number of buckets are computed) is different when we do a vacuum full. > On further looking into it, I found that the relcache entry for a > relation doesn't have the correct value for relpages and reltuples > during vacuum full due to which estimate_rel_size can give some size > which might be quite different and then hashbuild can create buckets > which it might not even need to populate the tuples. I am not sure if > it is expected to have uninitialized (0) values for these attributes > during vacuum full, but I see that in swap_relation_files when we swap > the statistics, we are assuming that new rel has freshly-updated stats > which I think is not true. This needs some further investigation. > I think if we update the stats in copy_heap_data after copying the data, then we don't see such problem. Attached patch should fix the issue. You can try this patch to see if it fixes the issue for you. You might want to wait for a day or so to see if anyone else has any opinion on the patch or my analysis. >> Another angle to look at it is that even if the values of relpages and >> reltuples is not updated why we get such a wrong estimation by >> estimate_rel_size. I think to some extent it depends on the schema of >> the table, so is it possible for you to share schema of the table. >> > Schema's simple: > > CREATE TABLE link ( > datum_id BYTEA NOT NULL, > ids BYTEA NOT NULL > ); > ALTER TABLE link ALTER COLUMN datum_id SET STATISTICS 10000; > ALTER TABLE link ALTER COLUMN ids SET STATISTICS 0; > I think the reason for getting totally off stats during estimate_rel_size is that for the second column you have set statistics to 0. I think if you keep it to default or some reasonable number, then you won't get such a behavior. Anyhow, I think irrespective of the value of stats, the relcache entry should also be updated as explained above. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-bugs by date: