Re: Queryplan within FTS/GIN index -search. - Mailing list pgsql-performance

From jesper@krogh.cc
Subject Re: Queryplan within FTS/GIN index -search.
Date
Msg-id 6bc6ee9ee43ffe63da2753dad40bd549.squirrel@shrek.krogh.cc
Whole thread Raw
In response to Re: Queryplan within FTS/GIN index -search.  (Richard Huxton <dev@archonet.com>)
Responses Re: Queryplan within FTS/GIN index -search.  (Richard Huxton <dev@archonet.com>)
List pgsql-performance
> jesper@krogh.cc wrote:
>>
>> So getting them with AND inbetween gives x100 better performance. All
>> queries are run on "hot disk" repeated 3-5 times and the number are from
>> the last run, so disk-read effects should be filtered away.
>>
>> Shouldn't it somehow just do what it allready are capable of doing?
>
> I'm guessing to_tsquery(...) will produce a tree of search terms (since
> it allows for quite complex expressions). Presumably there's a standard
> order it gets processed in too, so it should be possible to generate a
> more or less efficient ordering.
>
> That structure isn't exposed to the planner though, so it doesn't
> benefit from any re-ordering the planner would normally do for normal
> (exposed) AND/OR clauses.
>
> Now, to_tsquery() can't re-order the search terms because it doesn't
> know what column it's being compared against. In fact, it might not be a
> simple column at all.

I cant follow this logic based on explain output, but I may have
misunderstood something. The only difference in these two query-plans is
that we have an additional or'd term in the to_tsquery().

What we see is that, the query-planner indeed has knowledge about changes
in the row estimates based on changes in the query to to_tsquery(). My
guess is that it is because to_tsquery actually parses the query and give
the estimates, now how can to_tsquery give estimates without having access
to the statistics for the column?

ftstest=# explain select id from ftsbody where ftsbody_body_fts @@
to_tsquery('reallyrare');
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Bitmap Heap Scan on ftsbody  (cost=132.64..190.91 rows=29 width=4)
   Recheck Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare'::text))
   ->  Bitmap Index Scan on ftsbody_tfs_idx  (cost=0.00..132.63 rows=29
width=0)
         Index Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare'::text))
(4 rows)

ftstest=# explain select id from ftsbody where ftsbody_body_fts @@
to_tsquery('reallyrare | morerare');
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on ftsbody  (cost=164.86..279.26 rows=57 width=4)
   Recheck Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare |
morerare'::text))
   ->  Bitmap Index Scan on ftsbody_tfs_idx  (cost=0.00..164.84 rows=57
width=0)
         Index Cond: (ftsbody_body_fts @@ to_tsquery('reallyrare |
morerare'::text))
(4 rows)

ftstest=# explain select id from ftsbody where ftsbody_body_fts @@
to_tsquery('reallyrare | reallycommon');
                                QUERY PLAN
--------------------------------------------------------------------------
 Seq Scan on ftsbody  (cost=0.00..1023249.39 rows=5509293 width=4)
   Filter: (ftsbody_body_fts @@ to_tsquery('reallyrare |
reallycommon'::text))
(2 rows)


> 2. A variant to_tsquery_with_sorting() which would take the column-name
> or something and look up the stats to work against.

Does above not seem like its there allready?

(sorry.. looking at C-code from my point of view would set me a couple of
weeks back, so I have troble getting closer to the answer than
interpreting the output and guessing the rest).

--
Jesper


pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Queryplan within FTS/GIN index -search.
Next
From: Richard Huxton
Date:
Subject: Re: Queryplan within FTS/GIN index -search.