Re: Frequetly updated partial index leads to bloat on index for Postresql 11 - Mailing list pgsql-general

From Tom Dearman
Subject Re: Frequetly updated partial index leads to bloat on index for Postresql 11
Date
Msg-id 95D506BA-D1E8-45BB-8167-BD61ED3D9B9C@gmail.com
Whole thread Raw
In response to Re: Frequetly updated partial index leads to bloat on index for Postresql 11  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Frequetly updated partial index leads to bloat on index for Postresql 11  (Michael Lewis <mlewis@entrata.com>)
Re: Frequetly updated partial index leads to bloat on index for Postresql 11  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-general
Hi,

We upgraded to 13 a couple of months ago on production but are still having an issue with bloated partial indexes which have an impact on our partial queries especially towards the end of a quarter when our quarterly-partitioned tables are getting big.  I have built 14 (on macOS catalina, 14 beta 2) and run a test but the index grows fairly large (even though vacuums are running as the table is still relatively small - I put in 2 million inserts, each having one update of the column that makes up the partial index). The table is:

                                                     Table "public.buyer"
    Column     |          Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
---------------+------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 buyer_id      | integer                |           | not null |         | plain    |             |              | 
 first_name    | character varying(35)  |           | not null |         | extended |             |              | 
 last_name     | character varying(35)  |           | not null |         | extended |             |              | 
 email_address | character varying(50)  |           |          |         | extended |             |              | 
 status        | character varying(256) |           | not null |         | extended |             |              | 
Indexes:
    "buyer_pkey" PRIMARY KEY, btree (buyer_id)
    "idex_buyer_inactive" btree (first_name) WHERE status::text = 'IN_PROGRESS'::text
Access method: heap


I run a loop to insert, commit, update, commit one row at a time as this is an emulation of what a similar table would experience in production.  The index never has many rows with status=‘IN_PROGRESS’ as each row is set to CANCEL in the update.  If the index is reindexed it takes 1 page as expected but without the reindexing it keeps growing, currently reaching 3MB - this is with 2 million inserts and updates but our production will have about 300 million inserts and > 300 million updates on the partial index in the quarter.  Should we have seen more of an improvement in 14?  Is it valid to look at the size of the index (\di+) as a measure of whether this latest change to bottom up index deleting has helped?

Thanks,

Tom

On 18 Mar 2021, at 16:30, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Mar 18, 2021 at 6:51 AM Tom Dearman <tom.dearman@gmail.com> wrote:
Is this a known issue, are they any ways around it, and if it is an
issue is there a plan to fix it if a fix is possible?

It's not exactly a known issue per se, but I think the problem here is
related to the fact that you have lots of duplicates, which did
perform rather badly prior to Postgres 12. I bet that you'd benefit
from upgrading to Postgres 12, or especially to Postgres 13. The
B-Tree space management is a lot better now. (Actually, it'll be
better again in Postgres 14.)

--
Peter Geoghegan

pgsql-general by date:

Previous
From: "Markhof, Ingolf"
Date:
Subject: dealing with dependencies
Next
From: Michael Lewis
Date:
Subject: Re: Frequetly updated partial index leads to bloat on index for Postresql 11