Index bloat with "USING GIN(varchar[])" index? - Mailing list pgsql-general

From Eric Ridge
Subject Index bloat with "USING GIN(varchar[])" index?
Date
Msg-id BANLkTi=SNWAeFOKt+Pp2o0fN2eHNmOXNRw@mail.gmail.com
Whole thread Raw
List pgsql-general
PostgreSQL 8.4.8 on i386-apple-darwin10.7.0, compiled by GCC
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664),
64-bit

I'm not sure exactly what's going on, but I've got a table defined like this:

CREATE TABLE foo (
      ...
      tags varchar(1000)[],
      ...
);
CREATE INDEX idxfoo_tags ON foo USING GIN (tags);

I query the tags column quite a bit like so:

   SELECT * FROM foo WHERE tags @> ARRAY['value']::varchar[];

Works great.  Super fast... usually.

"foo" has roughly 50k records, and each row has anywhere from 2 to 5
elements in "tags".   "tags" gets rewritten pretty regularly across
large swaths of records (1k - ~10k at a time) with different-ish
values.  What I see happing is the above SQL going from a few
milliseconds to a few seconds.  Until I REINDEX it.  Then it performs
well again until lots of rewrites happen.

Before or after the REINDEX, the query plan is always the same (and
it's a good plan):

explain analyze SELECT * FROM foo WHERE tags @> ARRAY['CATTLE']::varchar[];
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=8.59..147.76 rows=38 width=496)
(actual time=8.870..10.073 rows=1654 loops=1)
   Recheck Cond: (tags @> '{CATTLE}'::character varying[])
   ->  Bitmap Index Scan on idxfoo_tags87  (cost=0.00..8.58 rows=38
width=0) (actual time=8.806..8.806 rows=5034 loops=1)
         Index Cond: (tags @> '{CATTLE}'::character varying[])
 Total runtime: 10.258 ms
(5 rows)


I haven't had a chance to nail down a standalone test case to
reproduce this, but it smells like index bloat.

Are GIN indexes known for bloating, especially if they're on a
varchar[]?  Any suggestions for how to prove/disprove that it's index
bloat?

eric

pgsql-general by date:

Previous
From: Vlad Romascanu
Date:
Subject: One-off attempt at catalog hacking to turn bytea column into text
Next
From: Thomas Kellerer
Date:
Subject: Documentation suggestion