Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR - Mailing list pgsql-general

From John Lumby
Subject Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR
Date
Msg-id BYAPR06MB55599AB30E29597342AE9C7BA3F10@BYAPR06MB5559.namprd06.prod.outlook.com
Whole thread Raw
In response to Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-general
> From: Peter Geoghegan <pg@bowt.ie>
> Sent: July 9, 2019 3:01 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR
>  
> On Tue, Jul 9, 2019 at 11:27 AM John Lumby <johnlumby@hotmail.com> wrote:
> > And the point of the REINDEX at that point (below) is to remove dead tuple keys-tids
> > and  reorganize those split pages back into physical order without losing the freespace.
>
> VACUUM already removes the tuples, accounting for all overhead.
>
> We could in principle come up with a way of moving pages around,
>  [ ... ]
> That would either necessitate that the command acquire a disruptive lock
>  [ ... ]
> Neither of which seem particularly appealing.

I was not thinking of a new command,  just an extension of the existing REINDEX
which would apply a fillfactor equal to current average page density,
by adding a preliminary step to sample that first.
Of course,   the user can do that for themselves by a series of steps with
ANALYZE, get page_density from pgstattuple,  ALTER INDEX,  REINDEX
so this new parameter would be a convenience,  assuming that this sequence
actually is beneficial,   which I believe it is  -  see my next.

>
> I believe that this is a lot more important in systems that generally
> use clustered indexes, such as MS SQL Server. This kind of
> "fragmentation" isn't usually much of a problem when using Postgres.
>
We have found that, for an index which has both experienced large number of page splits
and whose table has a large number of dead tuples (despite autovacuum),
REINDEX with FILLFACTOR set to current page_density does produce a performance improvement,
and also does reduce future growth in number of pages.    I don't have numbers to
hand,  and in fact not sure if any catalog view or pgstattuple tells me about the proportion
of dead key-tids in the index itself (do you know of any source?) as opposed to the table,
but based on that recollection,  yes,   REINDEX can reduce fragmentation.

However we did not run a VACUUM command first.     Maybe if we had run VACUUM instead of
the REINDEX commands,   we might have obtained the same degree of improvement,  I don't know.
I think this was Tom's point earlier on in this thread.

Correct me if I'm wrong but I believe whether an index is "clustered" or not is not relevant for
this discussion because the clustering in that context is referring to ordering of the
table pages,  not the index pages.    I believe it is quite possible to have a perfectly
"clustered" table whose clustering index is itself badly disorganized.

Cheers,     John

pgsql-general by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR
Next
From: Christoph Bilz
Date:
Subject: how to return rows of data via function written by language C strict