Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR - Mailing list pgsql-general

From Peter Geoghegan
Subject Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR
Date
Msg-id CAH2-Wzm226vew2Ho4qkQ6_HoVGmZVutoEPndd9VrgV6XU8JWAw@mail.gmail.com
Whole thread Raw
In response to Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR  (John Lumby <johnlumby@hotmail.com>)
Responses Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR  (John Lumby <johnlumby@hotmail.com>)
List pgsql-general
On Tue, Jul 9, 2019 at 12:29 PM John Lumby <johnlumby@hotmail.com> wrote:
> I was not thinking of a new command,  just an extension of the existing REINDEX
> which would apply a fillfactor equal to current average page density,
> by adding a preliminary step to sample that first.

That would be a very different thing to REINDEX no matter how you
spelt it, though. REINDEX creates a new index, from scratch, whereas
you're talking about restructuring what's already there.

> > I believe that this is a lot more important in systems that generally
> > use clustered indexes, such as MS SQL Server. This kind of
> > "fragmentation" isn't usually much of a problem when using Postgres.
> >
> We have found that, for an index which has both experienced large number of page splits
> and whose table has a large number of dead tuples (despite autovacuum),
> REINDEX with FILLFACTOR set to current page_density does produce a performance improvement,
> and also does reduce future growth in number of pages.    I don't have numbers to
> hand,  and in fact not sure if any catalog view or pgstattuple tells me about the proportion
> of dead key-tids in the index itself (do you know of any source?) as opposed to the table,
> but based on that recollection,  yes,   REINDEX can reduce fragmentation.

This could help the old "getting tired" behavior with many duplicates,
by making the free space available in earlier leaf pages (those
further to the left) that are full of duplicates -- the original
average space utilization may reflect a very uneven distribution of
free space overall. Or, it could be that range scan performance
benefitted from reduced fragmentation, because your workload happened
to be bottlenecked on large range scans. Though that seems unlikely.

I believe that the effect that you identified is real, but at a
minimum it's not clear why a REINDEX with a fillfactor to match the
original leaf space utilization helped. It would be fairly difficult
to figure it out for sure. If it was a problem with
duplicates/"getting tired", then I'd expect the new v12 code will help
a lot.

> However we did not run a VACUUM command first.     Maybe if we had run VACUUM instead of
> the REINDEX commands,   we might have obtained the same degree of improvement,  I don't know.
> I think this was Tom's point earlier on in this thread.

It was. Tom's intuition about that matches my own, though I
acknowledge that the old behavior with duplicates muddies the waters.

> Correct me if I'm wrong but I believe whether an index is "clustered" or not is not relevant for
> this discussion because the clustering in that context is referring to ordering of the
> table pages,  not the index pages.

Right.

> I believe it is quite possible to have a perfectly
> "clustered" table whose clustering index is itself badly disorganized.

Technically the two things are separate metrics, so that is
theoretically possible, but it doesn't seem all that likely. It could
happen with lots of non-HOT updates, where all new index tuples relate
to the same logical row as some existing index tuple, causing many
page splits despite there being no real change in the logical contents
of the index. Even then, the table will itself lose much of its
original order, so the index will become "unclustered" as it becomes
fragmented.

-- 
Peter Geoghegan



pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: execute_values
Next
From: John Lumby
Date:
Subject: Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR