Re: Queryplan within FTS/GIN index -search. - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Queryplan within FTS/GIN index -search.
Date
Msg-id 4AF00ABB020000250002C1B3@gw.wicourts.gov
Whole thread Raw
In response to Re: Queryplan within FTS/GIN index -search.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Queryplan within FTS/GIN index -search.
Re: Queryplan within FTS/GIN index -search.
List pgsql-performance
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The answer to that clearly is to not index common terms

My understanding is that we don't currently get statistics on how
common the terms in a tsvector column are until we ANALYZE the *index*
created from it.  Seems like sort of a Catch 22.  Also, if we exclude
words which are in the tsvector from the index on the tsvector, we
need to know what words were excluded so we know not to search on them
as well as forcing the recheck of the full tsquery (unless this always
happens already?).

> It may well be that Jesper's identified a place where the GIN code
> could be improved

My naive assumption has been that it would be possible to get an
improvement without touching the index logic, by changing this part of
the query plan:

                     Index Cond: (ftsbody_body_fts @@ to_tsquery
('TERM1 & TERM2 & TERM3 & TERM4 & TERM5'::text))

to something like this:

                     Index Cond: (ftsbody_body_fts @@ to_tsquery
('TERM1'::text))

and count on this doing the rest:

               Recheck Cond: (ftsbody_body_fts @@ to_tsquery
('TERM1 & TERM2 & TERM3 & TERM4 & TERM5'::text))

I'm wondering if anyone has ever confirmed that probing for the more
frequent term through the index is *ever* a win, versus using the
index for the most common of the top level AND conditions and doing
the rest on recheck.  That seems like a dangerous assumption from
which to start.

> But the particular example shown here doesn't make a very good case
> for that, because it's hard to tell how much of a penalty would be
> taken in more realistic examples.

Fair enough.  We're in the early stages of moving to tsearch2 and I
haven't run across this yet in practice.  If I do, I'll follow up.

-Kevin

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Queryplan within FTS/GIN index -search.
Next
From: "Kevin Grittner"
Date:
Subject: Re: Queryplan within FTS/GIN index -search.