Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR - Mailing list pgsql-general

From John Lumby
Subject Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR
Date
Msg-id DM6PR06MB5562839D0229D832E1AE4742A3F60@DM6PR06MB5562.namprd06.prod.outlook.com
Whole thread Raw
In response to Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: REINDEX : new parameter to preserve current average leaf densityas new implicit FILLFACTOR
List pgsql-general
> From: Peter Geoghegan <pg@bowt.ie>
> Sent: June 26, 2019 12:09 PM
> Subject: Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR
>
> > >
> > > I suspect that you might find that the enhancements to B-Tree indexes
> > > that went into Postgres 12 would help with this workload, especially
> > > if you notice that this happens with indexes that have a lot of duplicates
> > >
>
> >
> > I had not noticed that,   thanks for pointing it out.  Yes ,  in my workload most of the indexes in question
> >  are non-unique and some have very low key card.    I will try out the pg-12 beta when I get a chance.
> >
>
>
> It's easy to show problems with very low cardinality indexes in the
> old code. You'll definitely notice a difference there.
>

I've run a comparison of pg-12beta2 with an older release, pg-9.4.6,       running the same intensive-delete-insert
workloadas mentioned before in this thread,    and would appreciate if you could comment on the results,  especially as
towhether they are roughly in line with your expectation. 
I also have one question about a new-in-pg-11 config parm.

Briefly,   the workload involves a repetition of a loop in which ,  on one single table which has 8 indexes,  2 unique
and6 non-unique, 
about 4300 records are deleted,  and for each of those records,   a corresponding record is inserted in which one or
moreof the non-unique key values are modified to values which are not present in the relevant index at that point.  In
otherwords, across all the indexes ,  4300 key-tids are deleted and then 4300 new key-tids are inserted.     At the end
ofeach loop there is zero net change in counts of records and keys but possibly some increase in numbers of pages,
whichis what the test is interested in as well as overall throughput rate. 

For this test,   I did not modify the index default fill factors which therefore remained at 90%,   in order to make a
stabat evaluating not setting explicit fillfactor.   In each case the indexes were either freshly loaded or else
reindexedto have the same starting density.       Here are counts and sizes after 768 iterations 



| tbpages |  tbtuples    |            ixname            | isuniq | livetuples | deadtuples | avg_leaf_density | ixpages


+---------+--------------+------------------------------+--------+------------+------------+------------------+---------

pg-9.4.6
---------------------

|   32160 | 2.55548e+06  | metadata_value_boolean       | f      |    2932852 |          0 |            46.39 |   13535
|   32160 | 2.55548e+06  | metadata_value_field_id      | f      |    2932852 |          0 |            48.58 |   12916
|   32160 | 2.55548e+06  | metadata_value_floatnumber   | f      |    2932852 |          0 |            45.97 |   13658
|   32160 | 2.55548e+06  | metadata_value_longnumber    | f      |    2932852 |          0 |            48.26 |   13009
|   32160 | 2.55548e+06  | metadata_value_owner_field_u | t      |    2932852 |          0 |            58.69 |   14990
|   32160 | 2.55548e+06  | metadata_value_owner_id      | f      |    2932852 |          0 |            53.06 |   11817
|   32160 | 2.55548e+06  | metadata_value_pkey          | t      |    2932852 |          0 |            57.83 |   10842
|   32160 | 2.55548e+06  | metadata_value_timestamp     | f      |    2932852 |          0 |            45.96 |   13663

pg-12beta2
---------------------

|   41814 | 2.519268e+06 | metadata_value_boolean       | f      |    2519268 |       6766 |            63.17 |   10768
|   41814 | 2.519268e+06 | metadata_value_field_id      | f      |    2519268 |       6766 |             68.7 |   12031
|   41814 | 2.519268e+06 | metadata_value_floatnumber   | f      |    2519268 |       6766 |            61.48 |   11225
|   41814 | 2.519268e+06 | metadata_value_longnumber    | f      |    2519268 |       6766 |            58.34 |   12397
|   41814 | 2.519268e+06 | metadata_value_owner_field_u | t      |    2519268 |       6766 |            61.69 |   14780
|   41814 | 2.519268e+06 | metadata_value_owner_id      | f      |    2519268 |       6766 |            48.86 |   12947
|   41814 | 2.519268e+06 | metadata_value_pkey          | t      |    2519268 |       6766 |            59.71 |   11076
|   41814 | 2.519268e+06 | metadata_value_timestamp     | f      |    2519268 |       6766 |            57.81 |   12295

Overall,  pg-12beta2 yielded a 6.7% reduction in sizes (total pages) of indexes,   which was most noticable with the 6
non-uniqueones. 
In fact the primary-key index was larger with pg-12.           Would you have expected better than 6.7%?       Although
awelcome improvement,  I think it is not enough to justify stopping use of setting a lower explicit FILLFACTOR.
Whichthen brings me back to  thinking there is a case for the subject of this thread,  an automatic way to preserve
density.

Secondary points:

I did not expect to see the number of table pages grow so much larger for pg-12 than for pg-9.4.      The number of
tablepages was almost identical at the start of each run.   However this was not the focus of the test.   

Also,  although not shown in those tables,  pg-12 was around 4.5 times faster in completing those 768 iterations,    an
enormousimprovement. 

And one question :
I notice that in some pg-11 release,   a new config parameter appeared  :
      vacuum_cleanup_index_scale_factor
specifies the fraction of the total number of heap tuples counted in the previous statistics collection that can be
insertedwithout incurring an index scan at the VACUUM cleanup stage.  

I have not researched this at all and nor did I set it to anything for my pg-12beta2 run,      but it sounds as though
maybeit could be relevant to this kind of workload  -   Is that so? 

Cheers    John Lumby


pgsql-general by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Measuring the Query Optimizer Effect: Turning off the QO?
Next
From: Tom Lane
Date:
Subject: Re: Measuring the Query Optimizer Effect: Turning off the QO?