Re: Queryplan within FTS/GIN index -search. - Mailing list pgsql-performance

From jesper@krogh.cc
Subject Re: Queryplan within FTS/GIN index -search.
Date
Msg-id d491705be3c4b2463b01fb8113e3e886.squirrel@shrek.krogh.cc
Whole thread Raw
In response to Re: Queryplan within FTS/GIN index -search.  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Queryplan within FTS/GIN index -search.  (Richard Huxton <dev@archonet.com>)
Re: Queryplan within FTS/GIN index -search.  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-performance
> On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote:
>> > In effect, what you want are words that aren't searched (or stored) in
>> > the index, but are included in the tsvector (so the RECHECK still
>> > works). That sounds like it would solve your problem and it would
>> reduce
>> > index size, improve update performance, etc. I don't know how
>> difficult
< > it would be to implement, but it sounds reasonable to me.
>
>> That sounds like it could require an index rebuild if the distribution
>> changes?
>
> My thought was that the common words could be declared to be common the
> same way stop words are. As long as words are only added to this list,
> it should be OK.
>
>> That would be another plan to pursue, but the MCV is allready there
>
> The problem with MCVs is that the index search can never eliminate
> documents because they don't contain a match, because it might contain a
> match that was previously an MCV, but is no longer.

No, it definately has to go visit the index/table to confirm findings, but
that why I wrote Queryplan in the subject line, because this os only about
the strategy to pursue to obtain the results. And a strategy about
limiting the amout of results as early as possible (as PG usually does)
would be what I'd expect and MCV can help it guess on that.

Similar finding, rewrite the query: (now i took the extreme and made
"raretem" a spellingerror), so result is 0.

ftstest=# explain analyze select body from ftsbody where ftsbody_body_fts
@@ to_tsquery('commonterm & spellerror') limit 100;
                                                             QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=132.63..188.89 rows=28 width=739) (actual
time=862.714..862.714 rows=0 loops=1)
   ->  Bitmap Heap Scan on ftsbody  (cost=132.63..188.89 rows=28
width=739) (actual time=862.711..862.711 rows=0 loops=1)
         Recheck Cond: (ftsbody_body_fts @@ to_tsquery('commonterm &
spellerror'::text))
         ->  Bitmap Index Scan on ftsbody_tfs_idx  (cost=0.00..132.62
rows=28 width=0) (actual time=862.702..862.702 rows=0 loops=1)
               Index Cond: (ftsbody_body_fts @@ to_tsquery('commonterm &
spellerror'::text))
 Total runtime: 862.771 ms
(6 rows)

ftstest=# explain analyze select body from ftsbody where ftsbody_body_fts
@@ to_tsquery('commonterm') and ftsbody_body_fts @@
to_tsquery('spellerror') limit 100;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=132.70..189.11 rows=28 width=739) (actual time=8.669..8.669
rows=0 loops=1)
   ->  Bitmap Heap Scan on ftsbody  (cost=132.70..189.11 rows=28
width=739) (actual time=8.665..8.665 rows=0 loops=1)
         Recheck Cond: ((ftsbody_body_fts @@
to_tsquery('commonterm'::text)) AND (ftsbody_body_fts @@
to_tsquery('spellerror'::text)))
         ->  Bitmap Index Scan on ftsbody_tfs_idx  (cost=0.00..132.70
rows=28 width=0) (actual time=8.658..8.658 rows=0 loops=1)
               Index Cond: ((ftsbody_body_fts @@
to_tsquery('commonterm'::text)) AND (ftsbody_body_fts @@
to_tsquery('spellerror'::text)))
 Total runtime: 8.724 ms
(6 rows)

So getting them with AND inbetween gives x100 better performance. All
queries are run on "hot disk" repeated 3-5 times and the number are from
the last run, so disk-read effects should be filtered away.

Shouldn't it somehow just do what it allready are capable of doing?

--
Jesper


pgsql-performance by date:

Previous
From: Scara Maccai
Date:
Subject: Re: Table Clustering & Time Range Queries
Next
From: Richard Huxton
Date:
Subject: Re: Queryplan within FTS/GIN index -search.