Home > mailing lists

Re: TSearch2: find a QUERY that does match a single document - Mailing list pgsql-general

From	Dmitry Koterov
Subject	Re: TSearch2: find a QUERY that does match a single document
Date	September 13, 2008 16:12:44
Msg-id	d7df81620809131212x1b79f1b0n81b98e0e2b9f7ad4@mail.gmail.com Whole thread Raw
In response to	Re: TSearch2: find a QUERY that does match a single document (Oleg Bartunov <oleg@sai.msu.su>)
List	pgsql-general

Tree view

explain analyze
select * from test.test_tsq
where to_tsvector('400000x400000') @@ q

why do you need tsvector @@ q ? Much better to use tsquery = tsquery

test=# explain analyze select * from test_tsq where q = '400000x400000'::tsque>
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Seq Scan on test_tsq (cost=0.00..16667.01 rows=1 width=38) (actual time=129.208..341.111 rows=1 loops=1)
Filter: (q = '''400000x400000'''::tsquery)
Total runtime: 341.134 ms
(3 rows)

M-mmm... Seems your understood me incorrectly.

I have to find NOT queries which are exactly equal to another query, BUT queries which MATCH the GIVEN document. '400000x400000' was a sample only, in real cases it will be 1-2K document.

Here is a more realistic sample:

explain analyze
select * from test.test_tsq
where to_tsvector('
Here is a real document text. It may be long, 1-2K.
In this sample it contains a lexem "400000x400000", so there is a tsquery
in test_tsq.q which matches this document. I need to find all such queries fast.
Of course, in real cases the document text is unpredictable.
') @@ q

QUERY PLAN
Seq Scan on test_tsq (cost=0.00..17477.01 rows=800 width=36) (actual
time=68.698..181.458 rows=1 loops=1)
Filter: ('''400000x400000'':1'::tsvector @@ q)
Total runtime: 181.484 ms

'800' is the number of estimated rows, which is not good, since you got only 1 row.

Why 800? The table contains 800000 rows, and seqscan is used. Does it scan the whole table or not? If yes, possibly there is a bug in explain output? (No mater if I create GIST index on test_tsq.q or not, the number of rows is still 800, so it seems to me that GIST index is not used at all.)

pgsql-general by date:

From: Gregory Stark
Date: 13 September 2008, 13:57:12
Subject: Re: "Healing" a table after massive updates

From: "Dmitry Koterov"
Date: 13 September 2008, 16:27:15
Subject: Is there bigintarray?

Re: TSearch2: find a QUERY that does match a single document - Mailing list pgsql-general

Previous

Next