Re: GiST index for pgtrgm bloats a lot - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: GiST index for pgtrgm bloats a lot
Date
Msg-id 5559E6ED.2010405@iki.fi
Whole thread Raw
In response to GiST index for pgtrgm bloats a lot  (hubert depesz lubaczewski <depesz@depesz.com>)
Responses Re: GiST index for pgtrgm bloats a lot  (hubert depesz lubaczewski <depesz@depesz.com>)
List pgsql-bugs
On 05/18/2015 03:12 PM, hubert depesz lubaczewski wrote:
> Hi,
> We have this situation on many servers - used versions are 9.1.14 and/or
> 9.3.6, on Linux servers.
>
> There is a table, with ~ 1 million rows.
> There are no deletes there, inserts happen at the rate of ~ 100-200
> daily (but not spread, it's usually a single moment in day where there
> happen the inserts).
> There is also HUGE number of updates - up to 200,000 rows updated per
> hour.

Are the columns that are included in the bloated index also updated that
often? If not, I'd suggest moving those columns to a separate table with
a one-to-one relationship to the main table. Or perhaps just create a
helper table that contains copies of those columns, and keep it
up-to-date with triggers. That would reduce the churn in the index.

> On these databases we have some (not many) 1-2 minute transactions, but usually
> every transaction within 30 seconds
>
> This table is vacuumed (autovacuum) quite a lot, usually at least 10 times a
> day.
>
> Is there anything we could do to help diagnose the problem, and fix it?
> I don't think I can install custom pg version, and downtime would be
> complicated to get approval, but anything about the data/database I can
> check that would allow diagnosing the bug, I'd be happy to do.

GiST doesn't have the same features that e.g B-tree has to avoid bloat:

1. When GiST has multiple equally good choices where it could insert a
tuple, it favours branches that are "earlier" in the index. Always
descending down the same branch is good for cache efficiency when you
insert multiple items with similar keys, but the downside is that the
other branches can easily have a lot of free space that goes unused,
while the "hot" branch just gets split repeatedly. This is explained in
the comments in the gistchoose() function. That leads to bloat, because
the free space isn't being used.

2. When an index page becomes completely empty, GiST doesn't try to
remove and reuse it.

It would be nice to address those in a future version, but that won't
help you right now.

- Heikki

pgsql-bugs by date:

Previous
From: hubert depesz lubaczewski
Date:
Subject: GiST index for pgtrgm bloats a lot
Next
From: hubert depesz lubaczewski
Date:
Subject: Re: GiST index for pgtrgm bloats a lot