Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR - Mailing list pgsql-general
From | Peter Geoghegan |
---|---|
Subject | Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR |
Date | |
Msg-id | CAH2-Wzm226vew2Ho4qkQ6_HoVGmZVutoEPndd9VrgV6XU8JWAw@mail.gmail.com Whole thread Raw |
In response to | Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR (John Lumby <johnlumby@hotmail.com>) |
Responses |
Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR
|
List | pgsql-general |
On Tue, Jul 9, 2019 at 12:29 PM John Lumby <johnlumby@hotmail.com> wrote: > I was not thinking of a new command, just an extension of the existing REINDEX > which would apply a fillfactor equal to current average page density, > by adding a preliminary step to sample that first. That would be a very different thing to REINDEX no matter how you spelt it, though. REINDEX creates a new index, from scratch, whereas you're talking about restructuring what's already there. > > I believe that this is a lot more important in systems that generally > > use clustered indexes, such as MS SQL Server. This kind of > > "fragmentation" isn't usually much of a problem when using Postgres. > > > We have found that, for an index which has both experienced large number of page splits > and whose table has a large number of dead tuples (despite autovacuum), > REINDEX with FILLFACTOR set to current page_density does produce a performance improvement, > and also does reduce future growth in number of pages. I don't have numbers to > hand, and in fact not sure if any catalog view or pgstattuple tells me about the proportion > of dead key-tids in the index itself (do you know of any source?) as opposed to the table, > but based on that recollection, yes, REINDEX can reduce fragmentation. This could help the old "getting tired" behavior with many duplicates, by making the free space available in earlier leaf pages (those further to the left) that are full of duplicates -- the original average space utilization may reflect a very uneven distribution of free space overall. Or, it could be that range scan performance benefitted from reduced fragmentation, because your workload happened to be bottlenecked on large range scans. Though that seems unlikely. I believe that the effect that you identified is real, but at a minimum it's not clear why a REINDEX with a fillfactor to match the original leaf space utilization helped. It would be fairly difficult to figure it out for sure. If it was a problem with duplicates/"getting tired", then I'd expect the new v12 code will help a lot. > However we did not run a VACUUM command first. Maybe if we had run VACUUM instead of > the REINDEX commands, we might have obtained the same degree of improvement, I don't know. > I think this was Tom's point earlier on in this thread. It was. Tom's intuition about that matches my own, though I acknowledge that the old behavior with duplicates muddies the waters. > Correct me if I'm wrong but I believe whether an index is "clustered" or not is not relevant for > this discussion because the clustering in that context is referring to ordering of the > table pages, not the index pages. Right. > I believe it is quite possible to have a perfectly > "clustered" table whose clustering index is itself badly disorganized. Technically the two things are separate metrics, so that is theoretically possible, but it doesn't seem all that likely. It could happen with lots of non-HOT updates, where all new index tuples relate to the same logical row as some existing index tuple, causing many page splits despite there being no real change in the logical contents of the index. Even then, the table will itself lose much of its original order, so the index will become "unclustered" as it becomes fragmented. -- Peter Geoghegan
pgsql-general by date: