Re: Queryplan within FTS/GIN index -search. - Mailing list pgsql-performance

From Jesper Krogh
Subject Re: Queryplan within FTS/GIN index -search.
Date
Msg-id 4AEBD740.9060705@krogh.cc
Whole thread Raw
In response to Re: Queryplan within FTS/GIN index -search.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom Lane wrote:
> But having said that, this particular test case is far from compelling.
> Any sane text search application is going to try to filter out
> common words as stopwords; it's only the failure to do that that's
> making this run slow.

Below is tests-runs not with a "commonterm" but and 80% term and a 60%
term.

There are two issues in this, one is the way PG "blows up" when
searching for a stop-word (and it even performs excellent when searching
for a term in the complete doc-base):

ftstest=# select id from ftstest where body_fts @@
to_tsquery('commonterm') limit 10;
 id
----
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
(10 rows)

Time: 1.004 ms
ftstest=# select id from ftstest where body_fts @@ to_tsquery('the')
limit 10;
NOTICE:  text-search query contains only stop words or doesn't contain
lexemes, ignored
NOTICE:  text-search query contains only stop words or doesn't contain
lexemes, ignored
 id
----
(0 rows)

Time: 0.587 ms

I can definetely effort the index-size for getting the first behavior to
my application. Stop words will first be really useful when searches for
them translates into full results not errors.

I also think you're trying to limit the scope of the problem more than
whats fair.

ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm & commonterm');
 id
----
(0 rows)

Time: 28.230 ms
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm') and body_fts @@ to_tsquery('commonterm');
 id
----
(0 rows)

Time: 0.930 ms
(so explain analyze is not a fair measurement .. it seems to make the
problem way worse). This is "only" x28
Time: 22.432 ms
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm') and body_fts @@ to_tsquery('commonterm80');
 id
----
(0 rows)

Time: 0.992 ms
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm & commonterm80');
 id
----
(0 rows)

Time: 22.393 ms
ftstest=#
And for a 80% term .. x23

ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm') and body_fts @@ to_tsquery('commonterm60');
 id
----
(0 rows)

Time: 0.954 ms
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm & commonterm60');
 id
----
(0 rows)

Time: 17.006 ms

and x17

Just trying to say that the body of the problem isn't a discussion about
stop-words.

That being said, if you coin the term "stopword" to mean "any term that
exists in all or close to all documents" then the way it behaves when
searching for only one of them is a situation that we'll hit all the
time. (when dealing with user typed input).

Jesper
--
Jesper

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Queryplan within FTS/GIN index -search.
Next
From: Greg Stark
Date:
Subject: Re: Queryplan within FTS/GIN index -search.