Thread: tsearch2 and gist index bloat

tsearch2 and gist index bloat

From
George Essig
Date:
I have installed tsearch2 and have noticed that the gist index used to do searches grows and grows
as I update rows, delete rows, or run VACUUM FULL ANALYZE.  Below are some details:

PostgreSQL 7.4RC1
Red Hat 9

                                   Table "public.series"
    Column     |       Type        |                       Modifiers
---------------+-------------------+--------------------------------------------------------
 id            | integer           | not null default nextval('public.series_id_seq'::text)
 database      | character varying | not null
 name          | character varying | not null
 description   | character varying |
 documentation | character varying |
 ts_vec        | tsvector          |
Indexes:
    "series_pkey" primary key, btree (id)
    "series_db_name_un" unique, btree ("database", name)
    "ts_in" gist (ts_vec)
Triggers:
    ts_update BEFORE INSERT OR UPDATE ON series FOR EACH ROW EXECUTE PROCEDURE tsearch2('ts_vec',
'description', 'documentation')

There are 110,873 rows in this table and 13398 unique words indexed by ts_in.  Using oid2name, I
monitored the size of the index ts_in as I performed different operations:

154 MB After the index was created.
190 MB After updating 40,422 rows.
243 MB After VACUUM FULL
275 MB After deleting 40,422 rows & again VACUUM FULL

Below is some output from VACUUM FULL ANALYZE VERBOSE after the 40,422 rows were deleted.

INFO:  index "ts_in" now contains 70451 row versions in 2969 pages
DETAIL:  40422 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.27s/0.44u sec elapsed 1.03 sec.

As the index size grows, search performance slows to a crawl because it's too to fit in RAM.  Is
the only solution to drop and recreate the index after large updates?

Thanks,

George Essig

Re: tsearch2 and gist index bloat

From
George Essig
Date:

--- George Essig <george_essig@yahoo.com> wrote:
> I have installed tsearch2 and have noticed that the gist index used to do searches grows and
> grows
> as I update rows, delete rows, or run VACUUM FULL ANALYZE.  Below are some details:
>

....

>
> There are 110,873 rows in this table and 13398 unique words indexed by ts_in.  Using oid2name, I
> monitored the size of the index ts_in as I performed different operations:
>
> 154 MB After the index was created.
> 190 MB After updating 40,422 rows.
> 243 MB After VACUUM FULL
> 275 MB After deleting 40,422 rows & again VACUUM FULL
>

Sorry, I mis-reported the index sizes.  They are about 1/10 the size:

15 MB After the index was created.
19 MB After updating 40,422 rows.
24 MB After VACUUM FULL
27 MB After deleting 40,422 rows & again VACUUM FULL

I still have a problem that the index size grows and grows and eventually searches slow to a
crawl.

George Essig

Re: tsearch2 and gist index bloat

From
"Joshua D. Drake"
Date:
Hello,

    I don't know if you can do this with a gist index but try using the
REINDEX command.

J


George Essig wrote:

>
> --- George Essig <george_essig@yahoo.com> wrote:
>
>>I have installed tsearch2 and have noticed that the gist index used to do searches grows and
>>grows
>>as I update rows, delete rows, or run VACUUM FULL ANALYZE.  Below are some details:
>>
>
>
> ....
>
>
>>There are 110,873 rows in this table and 13398 unique words indexed by ts_in.  Using oid2name, I
>>monitored the size of the index ts_in as I performed different operations:
>>
>>154 MB After the index was created.
>>190 MB After updating 40,422 rows.
>>243 MB After VACUUM FULL
>>275 MB After deleting 40,422 rows & again VACUUM FULL
>>
>
>
> Sorry, I mis-reported the index sizes.  They are about 1/10 the size:
>
> 15 MB After the index was created.
> 19 MB After updating 40,422 rows.
> 24 MB After VACUUM FULL
> 27 MB After deleting 40,422 rows & again VACUUM FULL
>
> I still have a problem that the index size grows and grows and eventually searches slow to a
> crawl.
>
> George Essig
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly


Re: tsearch2 and gist index bloat

From
George Essig
Date:
Thanks for the reply.  For this project, I can update the data and reindex during off-peak hours.
I was just surprised to see the size of the index double after heavy write activity.

George Essig

--- "Joshua D. Drake" <jd@commandprompt.com> wrote:
> Hello,
>
>     I don't know if you can do this with a gist index but try using the
> REINDEX command.
>
> J
>
>
> George Essig wrote:
>
> >
> > --- George Essig <george_essig@yahoo.com> wrote:
> >
> >>I have installed tsearch2 and have noticed that the gist index used to do searches grows and
> >>grows
> >>as I update rows, delete rows, or run VACUUM FULL ANALYZE.  Below are some details:
> >>
> >
> >
> > ....
> >
> >
> >>There are 110,873 rows in this table and 13398 unique words indexed by ts_in.  Using oid2name,
> I
> >>monitored the size of the index ts_in as I performed different operations:
> >>
> >>154 MB After the index was created.
> >>190 MB After updating 40,422 rows.
> >>243 MB After VACUUM FULL
> >>275 MB After deleting 40,422 rows & again VACUUM FULL
> >>
> >
> >
> > Sorry, I mis-reported the index sizes.  They are about 1/10 the size:
> >
> > 15 MB After the index was created.
> > 19 MB After updating 40,422 rows.
> > 24 MB After VACUUM FULL
> > 27 MB After deleting 40,422 rows & again VACUUM FULL
> >
> > I still have a problem that the index size grows and grows and eventually searches slow to a
> > crawl.
> >
> > George Essig
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> >       subscribe-nomail command to majordomo@postgresql.org so that your
> >       message can get through to the mailing list cleanly
>

Re: tsearch2 and gist index bloat

From
Oleg Bartunov
Date:
On Thu, 6 Nov 2003, George Essig wrote:

> Thanks for the reply.  For this project, I can update the data and reindex during off-peak hours.
> I was just surprised to see the size of the index double after heavy write activity.
>

This is not tsearch specific problem. It was discussed several times, ]
see index bloat subject in archives.

    Oleg

> George Essig
>
> --- "Joshua D. Drake" <jd@commandprompt.com> wrote:
> > Hello,
> >
> >     I don't know if you can do this with a gist index but try using the
> > REINDEX command.
> >
> > J
> >
> >
> > George Essig wrote:
> >
> > >
> > > --- George Essig <george_essig@yahoo.com> wrote:
> > >
> > >>I have installed tsearch2 and have noticed that the gist index used to do searches grows and
> > >>grows
> > >>as I update rows, delete rows, or run VACUUM FULL ANALYZE.  Below are some details:
> > >>
> > >
> > >
> > > ....
> > >
> > >
> > >>There are 110,873 rows in this table and 13398 unique words indexed by ts_in.  Using oid2name,
> > I
> > >>monitored the size of the index ts_in as I performed different operations:
> > >>
> > >>154 MB After the index was created.
> > >>190 MB After updating 40,422 rows.
> > >>243 MB After VACUUM FULL
> > >>275 MB After deleting 40,422 rows & again VACUUM FULL
> > >>
> > >
> > >
> > > Sorry, I mis-reported the index sizes.  They are about 1/10 the size:
> > >
> > > 15 MB After the index was created.
> > > 19 MB After updating 40,422 rows.
> > > 24 MB After VACUUM FULL
> > > 27 MB After deleting 40,422 rows & again VACUUM FULL
> > >
> > > I still have a problem that the index size grows and grows and eventually searches slow to a
> > > crawl.
> > >
> > > George Essig
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 3: if posting/reading through Usenet, please send an appropriate
> > >       subscribe-nomail command to majordomo@postgresql.org so that your
> > >       message can get through to the mailing list cleanly
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 and gist index bloat

From
George Essig
Date:
--- Oleg Bartunov <oleg@sai.msu.su> wrote:
> On Thu, 6 Nov 2003, George Essig wrote:
>
> > Thanks for the reply.  For this project, I can update the data and reindex during off-peak
> hours.
> > I was just surprised to see the size of the index double after heavy write activity.
> >
>
> This is not tsearch specific problem. It was discussed several times, ]
> see index bloat subject in archives.
>
>     Oleg
>

I understand that index bloat is a general problem, but is this particular problem more severe
because of a gist index?  By the way, I'm running PostgreSQL 7.4 Release Candidate 1.

Thanks,
George Essig