> From: Peter Geoghegan <pg@bowt.ie>
> Sent: July 9, 2019 3:01 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR
>
> On Tue, Jul 9, 2019 at 11:27 AM John Lumby <johnlumby@hotmail.com> wrote:
> > And the point of the REINDEX at that point (below) is to remove dead tuple keys-tids
> > and reorganize those split pages back into physical order without losing the freespace.
>
> VACUUM already removes the tuples, accounting for all overhead.
>
> We could in principle come up with a way of moving pages around,
> [ ... ]
> That would either necessitate that the command acquire a disruptive lock
> [ ... ]
> Neither of which seem particularly appealing.
I was not thinking of a new command, just an extension of the existing REINDEX
which would apply a fillfactor equal to current average page density,
by adding a preliminary step to sample that first.
Of course, the user can do that for themselves by a series of steps with
ANALYZE, get page_density from pgstattuple, ALTER INDEX, REINDEX
so this new parameter would be a convenience, assuming that this sequence
actually is beneficial, which I believe it is - see my next.
>
> I believe that this is a lot more important in systems that generally
> use clustered indexes, such as MS SQL Server. This kind of
> "fragmentation" isn't usually much of a problem when using Postgres.
>
We have found that, for an index which has both experienced large number of page splits
and whose table has a large number of dead tuples (despite autovacuum),
REINDEX with FILLFACTOR set to current page_density does produce a performance improvement,
and also does reduce future growth in number of pages. I don't have numbers to
hand, and in fact not sure if any catalog view or pgstattuple tells me about the proportion
of dead key-tids in the index itself (do you know of any source?) as opposed to the table,
but based on that recollection, yes, REINDEX can reduce fragmentation.
However we did not run a VACUUM command first. Maybe if we had run VACUUM instead of
the REINDEX commands, we might have obtained the same degree of improvement, I don't know.
I think this was Tom's point earlier on in this thread.
Correct me if I'm wrong but I believe whether an index is "clustered" or not is not relevant for
this discussion because the clustering in that context is referring to ordering of the
table pages, not the index pages. I believe it is quite possible to have a perfectly
"clustered" table whose clustering index is itself badly disorganized.
Cheers, John