Home > mailing lists

Re: Queryplan within FTS/GIN index -search. - Mailing list pgsql-performance

From	Kevin Grittner
Subject	Re: Queryplan within FTS/GIN index -search.
Date	November 3, 2009 12:49:45
Msg-id	4AF00ABB020000250002C1B3@gw.wicourts.gov Whole thread Raw
In response to	Re: Queryplan within FTS/GIN index -search. (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Queryplan within FTS/GIN index -search. Re: Queryplan within FTS/GIN index -search.
List	pgsql-performance

Tree view

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The answer to that clearly is to not index common terms

My understanding is that we don't currently get statistics on how
common the terms in a tsvector column are until we ANALYZE the *index*
created from it.  Seems like sort of a Catch 22.  Also, if we exclude
words which are in the tsvector from the index on the tsvector, we
need to know what words were excluded so we know not to search on them
as well as forcing the recheck of the full tsquery (unless this always
happens already?).

> It may well be that Jesper's identified a place where the GIN code
> could be improved

My naive assumption has been that it would be possible to get an
improvement without touching the index logic, by changing this part of
the query plan:

                     Index Cond: (ftsbody_body_fts @@ to_tsquery
('TERM1 & TERM2 & TERM3 & TERM4 & TERM5'::text))

to something like this:

                     Index Cond: (ftsbody_body_fts @@ to_tsquery
('TERM1'::text))

and count on this doing the rest:

               Recheck Cond: (ftsbody_body_fts @@ to_tsquery
('TERM1 & TERM2 & TERM3 & TERM4 & TERM5'::text))

I'm wondering if anyone has ever confirmed that probing for the more
frequent term through the index is *ever* a win, versus using the
index for the most common of the top level AND conditions and doing
the rest on recheck.  That seems like a dangerous assumption from
which to start.

> But the particular example shown here doesn't make a very good case
> for that, because it's hard to tell how much of a penalty would be
> taken in more realistic examples.

Fair enough.  We're in the early stages of moving to tsearch2 and I
haven't run across this yet in practice.  If I do, I'll follow up.

-Kevin

pgsql-performance by date:

From: Tom Lane
Date: 03 November 2009, 11:35:57
Subject: Re: Queryplan within FTS/GIN index -search.

From: "Kevin Grittner"
Date: 03 November 2009, 13:04:23
Subject: Re: Queryplan within FTS/GIN index -search.

Re: Queryplan within FTS/GIN index -search. - Mailing list pgsql-performance

Previous

Next