Re: Extreme bloating of intarray GiST indexes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Extreme bloating of intarray GiST indexes
Date
Msg-id 15188.1304533765@sss.pgh.pa.us
Whole thread Raw
In response to Re: Extreme bloating of intarray GiST indexes  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Extreme bloating of intarray GiST indexes  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
Josh Berkus <josh@agliodbs.com> writes:
>> Can you provide the data in the column that's indexed?

> Attached.  This is for the index which is 90% free space.

I tried loading this data in fresh, and then creating a gist__intbig_ops
index on it.  I got these pgstattuple numbers (in 8.4.8):

table_len          | 8806400
tuple_count        | 15005
tuple_len          | 4081360
tuple_percent      | 46.35
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 4088100
free_percent       | 46.42

On the other hand, loading the data with a pre-existing empty index gave

table_len          | 7798784
tuple_count        | 15005
tuple_len          | 4081360
tuple_percent      | 52.33
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 3183672
free_percent       | 40.82

Neither of those numbers are great, and it's a bit surprising that
CREATE INDEX produces a result notably worse than incremental loading;
but still a darn sight better than 90% free space.  So I think probably
the update pattern has a lot to do with this.

> * If you didn't notice earlier, it's a partial index.  The two columns
> which determine the partial index change more often than the intarray
> column.

Yeah, with only about half of the table actually indexed, since you
showed only 7786 index entries in your results.  But unless there's
reason to think the indexed and unindexed entries are substantially
different in the intarray column, this is probably not very relevant.

> * We've also determined some other unusual patterns from watching the
> application:

> (a) the "listings" table is a very wide table, with about 60 columns

> (b) whenever the table gets updated, the application code updates these
> 60 columns in 4 sections.  So there's 4 updates to the same row, in a
> single transaction.

Hmm.  That is going to lead to four dead index entries for every live
one (unless some of the updates are HOT, which won't happen if you're
changing any indexed columns).  VACUUM will get back the space
eventually, but not before you've caused some index bloat.

I tried doing something similar to my test table:

contrib_regression=# alter table listings add column junk int;
ALTER TABLE
contrib_regression=# create index li on listings(junk);
CREATE INDEX
contrib_regression=# begin;
BEGIN
contrib_regression=# update listings set junk=1;
UPDATE 15005
contrib_regression=# update listings set junk=2;
UPDATE 15005
contrib_regression=# update listings set junk=3;
UPDATE 15005
contrib_regression=# update listings set junk=4;
UPDATE 15005
contrib_regression=# commit;
COMMIT
contrib_regression=# vacuum listings;
VACUUM

and then got these pgstattuple numbers:

table_len          | 39460864
tuple_count        | 15005
tuple_len          | 4081360
tuple_percent      | 10.34
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 32923872
free_percent       | 83.43

which is up in the same ballpark with your problem.  Now probably your
client's app is not updating all rows at once, but still this is a
pretty wasteful update pattern.  Is there a reason not to update all
the columns in a single update?

If you can't change the app, I'd suggest more aggressive autovacuuming
as the least painful fix.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Unfriendly handling of pg_hba SSL options with SSL off
Next
From: Tom Lane
Date:
Subject: Re: Unfriendly handling of pg_hba SSL options with SSL off