Thread: tsearch2 queries faster than expected

tsearch2 queries faster than expected

From
Jeffrey Melloy
Date:
I have been using tsearch2 for quite a while with a fair amount of
success.

The other day I was playiing around with a query, and randomly changed
a few things.  I noticed a 10 times speedup and didn't know why.  Both
queries return identical results.

The idea was to do a proximity search, where one word appears within 10
minutes of the other.

I'm not sure if this is a bug or something weird.  Using postgres
7.4.2, and I think the 7.4.2 version of tsearch2.

The two queries:
Fast:
explain analyze select m1.message_date, m1.message_id from messages m1,
messages m2, to_tsquery('haley') q1, to_tsquery('birthday') q2 where
m2.message_date between m1.message_date - '5 minutes'::interval and
m1.message_date + '5 minutes'::interval and m1.idxfti @@ q1 and
m2.idxfti @@ q2 and m1.message_id <> m2.message_id;

         QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------------------------------------
  Nested Loop  (cost=485403.85..549229077.83 rows=1651000057 width=12)
(actual time=190.952..221.859 rows=4 loops=1)
    Join Filter: (("outer".message_date >= ("inner".message_date -
'00:05:00'::interval)) AND ("outer".message_date <=
("inner".message_date + '00:05:00'::interval)) AND ("inner".message_id
<> "outer".message_id))
    ->  Nested Loop  (cost=0.00..484867.85 rows=121898 width=12) (actual
time=0.879..33.273 rows=86 loops=1)
          ->  Function Scan on q2  (cost=0.00..12.50 rows=1000 width=32)
(actual time=0.034..0.037 rows=1 loops=1)
          ->  Index Scan using fti_idx on messages m2
(cost=0.00..483.33 rows=122 width=44) (actual time=0.831..32.828
rows=86 loops=1)
                Index Cond: (m2.idxfti @@ "outer".q2)
                Filter: (m2.idxfti @@ "outer".q2)
    ->  Materialize  (cost=485403.85..487158.83 rows=121898 width=12)
(actual time=0.189..1.477 rows=160 loops=86)
          ->  Nested Loop  (cost=0.00..484867.85 rows=121898 width=12)
(actual time=16.132..110.991 rows=160 loops=1)
                ->  Function Scan on q1  (cost=0.00..12.50 rows=1000
width=32) (actual time=0.061..0.065 rows=1 loops=1)
                ->  Index Scan using fti_idx on messages m1
(cost=0.00..483.33 rows=122 width=44) (actual time=16.048..99.997
rows=160 loops=1)
                      Index Cond: (m1.idxfti @@ "outer".q1)
                      Filter: (m1.idxfti @@ "outer".q1)
  Total runtime: 223.481 ms
(14 rows)

Slow:

explain analyze select m1.message_date, m1.message, m2.message_date
from messages m1, messages m2 where m2.message_date BETWEEN
m1.message_date - '5 minutes'::interval and  m1.message_date + '5
minutes'::interval and m1.idxfti @@ to_tsquery('haley') and m2.idxfti
@@ to_tsquery('birthday') and m1.message_id <> m2.message_id;

                   QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------------------------------------
  Nested Loop  (cost=0.00..59784.68 rows=1654 width=56) (actual
time=746.830..3132.006 rows=4 loops=1)
    Join Filter: (("inner".message_date >= ("outer".message_date -
'00:05:00'::interval)) AND ("inner".message_date <=
("outer".message_date + '00:05:00'::interval)) AND ("outer".message_id
<> "inner".message_id))
    ->  Index Scan using fti_idx on messages m1  (cost=0.00..483.33
rows=122 width=52) (actual time=8.770..69.013 rows=160 loops=1)
          Index Cond: (idxfti @@ '\'haley\''::tsquery)
          Filter: (idxfti @@ '\'haley\''::tsquery)
    ->  Index Scan using fti_idx on messages m2  (cost=0.00..483.33
rows=122 width=12) (actual time=0.112..18.899 rows=86 loops=160)
          Index Cond: (idxfti @@ '\'birthday\''::tsquery)
          Filter: (idxfti @@ '\'birthday\''::tsquery)
  Total runtime: 3132.665 ms
(9 rows)


Re: tsearch2 queries faster than expected

From
Tom Lane
Date:
Jeffrey Melloy <jmelloy@visualdistortion.org> writes:
> The other day I was playiing around with a query, and randomly changed
> a few things.  I noticed a 10 times speedup and didn't know why.  Both
> queries return identical results.

It looks like the planner's overestimate of the number of rows returned
by the function causes it to insert a Materialize step in the inside of
the nestloop join, so as to avoid recomputing the inner scan multiple
times.  Which is a good idea.  I wonder why it didn't do it in both
cases?

            regards, tom lane