Re: Similarity search with the tsearch2 extension - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Similarity search with the tsearch2 extension
Date
Msg-id 1386341675.6717.YahooMailNeo@web162903.mail.bf1.yahoo.com
Whole thread Raw
In response to Similarity search with the tsearch2 extension  ("Janek Sendrowski" <janek12@web.de>)
List pgsql-performance
Janek Sendrowski <janek12@web.de> wrote:

> I want to realize a Full Text Search with the tsearch2 extension.
> It should find similar sentences.
>  
> I used my own trigger to store the tsvector of the sentences and
> I created a usual gist index on them.
> I have to to use many OR statements with a low set of arguments,
> what heavy damages the performance.
> My former query looked like this:

> SELECT * FROM tablename
>   WHERE vector @@ to_tsquery('speci & tiger & total & weigh')
>     AND vector @@ to_tsquery('largest & length & m & reach')
>     AND vector @@ to_tsquery('3.3 & 306 & bodi & cat & kg');

I don't see any OR operators there.

> And thats very slow.

Are you sure it is using the index?

Anyway, it is better to show an example, with EXPLAIN ANALYZE
output.  Here's mine, involving searches of War and Peace.

test=# -- Create the table.
test=# -- In reality, I would probably make tsv NOT NULL,
test=# -- but I'm keeping the example simple...
test=# CREATE TABLE war_and_peace
test-#   (
test(#     lineno serial PRIMARY KEY,
test(#     linetext text NOT NULL,
test(#     tsv tsvector
test(#   );
CREATE TABLE
test=#
test=# -- Load from downloaded data into database.
test=# COPY war_and_peace (linetext)
test-#   FROM '/home/kgrittn/Downloads/war-and-peace.txt';
COPY 65007
test=#
test=# -- "Digest" data to lexemes.
test=# UPDATE war_and_peace
test-#   SET tsv = to_tsvector('english', linetext);
UPDATE 65007
test=#
test=# -- Index the lexemes using GIN.
test=# CREATE INDEX war_and_peace_tsv
test-#   ON war_and_peace
test-#   USING gin (tsv);
CREATE INDEX
test=#
test=# -- Make sure the database has statistics.
test=# VACUUM ANALYZE war_and_peace;
VACUUM
test=#
test=# -- Find lines with "gentlemen".
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'gentlemen');
                                                         QUERY
PLAN                                                          

-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=12.52..240.18 rows=67 width=115) (actual time=0.058..0.130 rows=84 loops=1)
   Recheck Cond: (tsv @@ '''gentlemen'''::tsquery)
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..12.50 rows=67 width=0) (actual time=0.045..0.045 rows=84
loops=1)
         Index Cond: (tsv @@ '''gentlemen'''::tsquery)
 Total runtime: 0.160 ms
(5 rows)

test=#
test=# -- Find lines with "ladies".
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies');
                                                          QUERY
PLAN                                                           

-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=13.39..547.24 rows=180 width=115) (actual time=0.062..0.215 rows=184 loops=1)
   Recheck Cond: (tsv @@ '''ladi'''::tsquery)
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..13.35 rows=180 width=0) (actual time=0.043..0.043 rows=184
loops=1)
         Index Cond: (tsv @@ '''ladi'''::tsquery)
 Total runtime: 0.247 ms
(5 rows)

test=#
test=# -- Find lines with "ladies" and "gentlemen".
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies & gentlemen');
                                                        QUERY
PLAN                                                         

---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=20.00..24.01 rows=1 width=115) (actual time=0.062..0.063 rows=1 loops=1)
   Recheck Cond: (tsv @@ '''ladi'' & ''gentlemen'''::tsquery)
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..20.00 rows=1 width=0) (actual time=0.057..0.057 rows=1
loops=1)
         Index Cond: (tsv @@ '''ladi'' & ''gentlemen'''::tsquery)
 Total runtime: 0.090 ms
(5 rows)

test=#
test=# -- Find lines with ("ladies" and "gentlemen") and ("provinces" and "distance").
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies & gentlemen')
test-#     AND tsv @@ to_tsquery('english', 'provinces & distance');
                                                        QUERY
PLAN                                                         

---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=36.00..40.02 rows=1 width=115) (actual time=0.100..0.100 rows=1 loops=1)
   Recheck Cond: ((tsv @@ '''ladi'' & ''gentlemen'''::tsquery) AND (tsv @@ '''provinc'' & ''distanc'''::tsquery))
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..36.00 rows=1 width=0) (actual time=0.095..0.095 rows=1
loops=1)
         Index Cond: ((tsv @@ '''ladi'' & ''gentlemen'''::tsquery) AND (tsv @@ '''provinc'' & ''distanc'''::tsquery))
 Total runtime: 0.130 ms
(5 rows)

test=#
test=# -- Find lines with ("ladies" or "gentlemen") and ("provinces" or "distance").
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies | gentlemen')
test-#     AND tsv @@ to_tsquery('english', 'provinces | distance');
                                                        QUERY
PLAN                                                         

---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=36.00..40.02 rows=1 width=115) (actual time=0.043..0.043 rows=1 loops=1)
   Recheck Cond: ((tsv @@ '''ladi'' | ''gentlemen'''::tsquery) AND (tsv @@ '''provinc'' | ''distanc'''::tsquery))
   ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..36.00 rows=1 width=0) (actual time=0.042..0.042 rows=1
loops=1)
         Index Cond: ((tsv @@ '''ladi'' | ''gentlemen'''::tsquery) AND (tsv @@ '''provinc'' | ''distanc'''::tsquery))
 Total runtime: 0.056 ms
(5 rows)

test=#
test=# -- Find lines with ("ladies" and "gentlemen") or ("provinces" and "distance").
test=# EXPLAIN ANALYZE
test-# SELECT * FROM war_and_peace
test-#   WHERE tsv @@ to_tsquery('english', 'ladies & gentlemen')
test-#      OR tsv @@ to_tsquery('english', 'provinces & distance');
                                                           QUERY
PLAN                                                            

---------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on war_and_peace  (cost=40.00..44.02 rows=1 width=115) (actual time=0.080..0.080 rows=1 loops=1)
   Recheck Cond: ((tsv @@ '''ladi'' & ''gentlemen'''::tsquery) OR (tsv @@ '''provinc'' & ''distanc'''::tsquery))
   ->  BitmapOr  (cost=40.00..40.00 rows=1 width=0) (actual time=0.076..0.076 rows=0 loops=1)
         ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..20.00 rows=1 width=0) (actual time=0.052..0.052 rows=1
loops=1)
               Index Cond: (tsv @@ '''ladi'' & ''gentlemen'''::tsquery)
         ->  Bitmap Index Scan on war_and_peace_tsv  (cost=0.00..20.00 rows=1 width=0) (actual time=0.024..0.024 rows=1
loops=1)
               Index Cond: (tsv @@ '''provinc'' & ''distanc'''::tsquery)
 Total runtime: 0.116 ms
(8 rows)

Can you provide a similar example which slows the slowness you report?

                                                                                                                                                                                                                                                                             
 
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-performance by date:

Previous
From: "Janek Sendrowski"
Date:
Subject: Similarity search with the tsearch2 extension
Next
From: chidamparam muthusamy
Date:
Subject: postgres performance