I have been using tsearch2 for quite a while with a fair amount of
success.
The other day I was playiing around with a query, and randomly changed
a few things. I noticed a 10 times speedup and didn't know why. Both
queries return identical results.
The idea was to do a proximity search, where one word appears within 10
minutes of the other.
I'm not sure if this is a bug or something weird. Using postgres
7.4.2, and I think the 7.4.2 version of tsearch2.
The two queries:
Fast:
explain analyze select m1.message_date, m1.message_id from messages m1,
messages m2, to_tsquery('haley') q1, to_tsquery('birthday') q2 where
m2.message_date between m1.message_date - '5 minutes'::interval and
m1.message_date + '5 minutes'::interval and m1.idxfti @@ q1 and
m2.idxfti @@ q2 and m1.message_id <> m2.message_id;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------------------------------------
Nested Loop (cost=485403.85..549229077.83 rows=1651000057 width=12)
(actual time=190.952..221.859 rows=4 loops=1)
Join Filter: (("outer".message_date >= ("inner".message_date -
'00:05:00'::interval)) AND ("outer".message_date <=
("inner".message_date + '00:05:00'::interval)) AND ("inner".message_id
<> "outer".message_id))
-> Nested Loop (cost=0.00..484867.85 rows=121898 width=12) (actual
time=0.879..33.273 rows=86 loops=1)
-> Function Scan on q2 (cost=0.00..12.50 rows=1000 width=32)
(actual time=0.034..0.037 rows=1 loops=1)
-> Index Scan using fti_idx on messages m2
(cost=0.00..483.33 rows=122 width=44) (actual time=0.831..32.828
rows=86 loops=1)
Index Cond: (m2.idxfti @@ "outer".q2)
Filter: (m2.idxfti @@ "outer".q2)
-> Materialize (cost=485403.85..487158.83 rows=121898 width=12)
(actual time=0.189..1.477 rows=160 loops=86)
-> Nested Loop (cost=0.00..484867.85 rows=121898 width=12)
(actual time=16.132..110.991 rows=160 loops=1)
-> Function Scan on q1 (cost=0.00..12.50 rows=1000
width=32) (actual time=0.061..0.065 rows=1 loops=1)
-> Index Scan using fti_idx on messages m1
(cost=0.00..483.33 rows=122 width=44) (actual time=16.048..99.997
rows=160 loops=1)
Index Cond: (m1.idxfti @@ "outer".q1)
Filter: (m1.idxfti @@ "outer".q1)
Total runtime: 223.481 ms
(14 rows)
Slow:
explain analyze select m1.message_date, m1.message, m2.message_date
from messages m1, messages m2 where m2.message_date BETWEEN
m1.message_date - '5 minutes'::interval and m1.message_date + '5
minutes'::interval and m1.idxfti @@ to_tsquery('haley') and m2.idxfti
@@ to_tsquery('birthday') and m1.message_id <> m2.message_id;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
---------------------------------------------------------------------
Nested Loop (cost=0.00..59784.68 rows=1654 width=56) (actual
time=746.830..3132.006 rows=4 loops=1)
Join Filter: (("inner".message_date >= ("outer".message_date -
'00:05:00'::interval)) AND ("inner".message_date <=
("outer".message_date + '00:05:00'::interval)) AND ("outer".message_id
<> "inner".message_id))
-> Index Scan using fti_idx on messages m1 (cost=0.00..483.33
rows=122 width=52) (actual time=8.770..69.013 rows=160 loops=1)
Index Cond: (idxfti @@ '\'haley\''::tsquery)
Filter: (idxfti @@ '\'haley\''::tsquery)
-> Index Scan using fti_idx on messages m2 (cost=0.00..483.33
rows=122 width=12) (actual time=0.112..18.899 rows=86 loops=160)
Index Cond: (idxfti @@ '\'birthday\''::tsquery)
Filter: (idxfti @@ '\'birthday\''::tsquery)
Total runtime: 3132.665 ms
(9 rows)