Home > mailing lists

Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR - Mailing list pgsql-general

From	John Lumby
Subject	Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR
Date	July 9, 2019 22:29:37
Msg-id	BYAPR06MB55599AB30E29597342AE9C7BA3F10@BYAPR06MB5559.namprd06.prod.outlook.com Whole thread Raw
In response to	Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR
List	pgsql-general

Tree view

> From: Peter Geoghegan <pg@bowt.ie>

> Sent: July 9, 2019 3:01 PM

> Subject: Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

> On Tue, Jul 9, 2019 at 11:27 AM John Lumby <johnlumby@hotmail.com> wrote:

> > And the point of the REINDEX at that point (below) is to remove dead tuple keys-tids

> > and reorganize those split pages back into physical order without losing the freespace.

> VACUUM already removes the tuples, accounting for all overhead.

> We could in principle come up with a way of moving pages around,

> [ ... ]

> That would either necessitate that the command acquire a disruptive lock

> [ ... ]

> Neither of which seem particularly appealing.

I was not thinking of a new command, just an extension of the existing REINDEX

which would apply a fillfactor equal to current average page density,

by adding a preliminary step to sample that first.

Of course, the user can do that for themselves by a series of steps with

ANALYZE, get page_density from pgstattuple, ALTER INDEX, REINDEX

so this new parameter would be a convenience, assuming that this sequence

actually is beneficial, which I believe it is - see my next.

> I believe that this is a lot more important in systems that generally

> use clustered indexes, such as MS SQL Server. This kind of

> "fragmentation" isn't usually much of a problem when using Postgres.

We have found that, for an index which has both experienced large number of page splits

and whose table has a large number of dead tuples (despite autovacuum),

REINDEX with FILLFACTOR set to current page_density does produce a performance improvement,

and also does reduce future growth in number of pages. I don't have numbers to

hand, and in fact not sure if any catalog view or pgstattuple tells me about the proportion

of dead key-tids in the index itself (do you know of any source?) as opposed to the table,

but based on that recollection, yes, REINDEX can reduce fragmentation.

However we did not run a VACUUM command first. Maybe if we had run VACUUM instead of

the REINDEX commands, we might have obtained the same degree of improvement, I don't know.

I think this was Tom's point earlier on in this thread.

Correct me if I'm wrong but I believe whether an index is "clustered" or not is not relevant for

this discussion because the clustering in that context is referring to ordering of the

table pages, not the index pages. I believe it is quite possible to have a perfectly

"clustered" table whose clustering index is itself badly disorganized.

Cheers, John

pgsql-general by date:

From: Peter Geoghegan
Date: 09 July 2019, 22:01:20
Subject: Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR

From: Christoph Bilz
Date: 09 July 2019, 23:00:06
Subject: how to return rows of data via function written by language C strict

Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR - Mailing list pgsql-general

Previous

Next