Re: Frequetly updated partial index leads to bloat on index for Postresql 11 - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Frequetly updated partial index leads to bloat on index for Postresql 11
Date
Msg-id CA+bJJbwGtoS9BsMq1GC+_9bQjQxprZqMKYzCL1xhFaTD9oYVkA@mail.gmail.com
Whole thread Raw
In response to Re: Frequetly updated partial index leads to bloat on index for Postresql 11  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom-Tom:

On Fri, Jul 16, 2021 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Tom Dearman <tom.dearman@gmail.com> writes:
> > We have change autovacuum so that it runs more frequently autovacuum_vacuum_scale_factor=0.035, the reason we have
apartial index on the status is that in a table of 300 million entries, only about 100 or so would have
status=‘IN_PROGRESS’so we think this should be a nice small index and many of our queries want to look up with a where
clausestatus=‘IN_PROGRESS’.  In theory it works well, but we get a lot of index bloat as there is a lot of churn on the
statusvalue, ie each row starts as IN_PROGRESS and then goes to one of 4 possible completed statuses. 
> Is it really the case that only this index is bloating?  In principle, an
> update on a row of the table should result in new entries in every index
> of the table.  A partial index, due to the filter applied to possibly not
> store any index entry, should in theory have less bloat than other
> indexes.

May be not in relative terms. If I understand correctly, lets say you
start unbloated with 300M entries with 100 in progress. You insert, in
small batches, 10K rows "in_progress" and update 10k "in_progress" row
to, let's say, "done" ( may be including the 100 original ones ) (
description seems to fit this kind of flow, something like a task
queue ). Then you will have 10k dead tuples bloating a 100 live
entries index, for an absolutely small but relatively large bloat,
while a full pk index will have 10k for 300m.

I may be misunderstanding some thing and HOT and similar things may
help here, but it seems like a plausible explanation for an apparent
bloat ( on a small index, I had similar things but for task-like
things I use a pending and a done table, and the pending table bloated
a lot on the before-autovacuum times,  small table nearly all dead
tuples, not a problem vaccuming it via cron every some minutes, as in
this case only the index is bloated autovacuum may not fire frequently
enough for it, as the table would have only 10k/30M~333 ppm bloat  )

Francisco Olarte



pgsql-general by date:

Previous
From: Sasha Aliashkevich
Date:
Subject: Re: ERROR: cannot freeze committed xmax
Next
From: Tom Dearman
Date:
Subject: Re: Frequetly updated partial index leads to bloat on index for Postresql 11