Thread: Index Usage and Running Times by FullTextSearch with prefix matching

Hi

I tested the following:

CREATE TABLE t1
( id serial NOT NULL, a character varying(125), a_tsvector tsvector, CONSTRAINT t1_pkey PRIMARY KEY (id)
);

INSERT INTO t1 (a, a_tsvector) 
VALUES ('ooooo,ppppp,fffff,jjjjj,zzzzz,jjjjj',
to_tsvector('ooooo,ppppp,fffff,jjjjj,zzzzz,jjjjj');

CREATE INDEX a_tsvector_idx  ON t1 USING gin (a_tsvector);

(I have generated 900000 records with random words like this)

Now querying: normal full text search

SELECT count(a)
FROM t1
WHERE a_tsvector @@ to_tsquery('aaaaa & bbbbb & ccccc & ddddd')

(RESULT: count: 619)
Total query runtime: 353 ms.
Query Plan:
"Aggregate  (cost=6315.22..6315.23 rows=1 width=36)"
"  ->  Bitmap Heap Scan on t1  (cost=811.66..6311.46 rows=1504 width=36)"
"        Recheck Cond: (a_tsvector @@ to_tsquery('aaaaa & bbbbb & ccccc &
ddddd'::text))"
"        ->  Bitmap Index Scan on a_tsvector_idx  (cost=0.00..811.28
rows=1504 width=0)"
"              Index Cond: (a_tsvector @@ to_tsquery('aaaaa & bbbbb & ccccc
& ddddd'::text))"

And querying: FTS with prefix matching:

SELECT count(a)
FROM t1
WHERE a_tsvector @@ to_tsquery('aaa:* & b:* & c:* & d:*')

(RESULT: count: 619)
Total query runtime: 21266 ms.
Query Plan:
"Aggregate  (cost=804.02..804.03 rows=1 width=36)"
"  ->  Bitmap Heap Scan on t1  (cost=800.00..804.02 rows=1 width=36)"
"        Recheck Cond: (a_tsvector @@ to_tsquery('aaa:* & b:* & c:* &
d:*'::text))"
"        ->  Bitmap Index Scan on a_tsvector_idx  (cost=0.00..800.00 rows=1
width=0)"
"              Index Cond: (a_tsvector @@ to_tsquery('aaa:* & b:* & c:* &
d:*'::text))"

I don't understand the big query time difference, despite the explainig
index usage.
NOnetheless I'd like to simulate LIKE 'aaa%' with full text search. Would I
have a better sollution?

Many thanks in advance!

Rawi



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



rawi <only4com@web.de> writes:
> And querying: FTS with prefix matching:

> SELECT count(a)
> FROM t1
> WHERE a_tsvector @@ to_tsquery('aaa:* & b:* & c:* & d:*')

> (RESULT: count: 619)
> Total query runtime: 21266 ms.

FWIW, I get fairly decent performance for cases like this in HEAD
(at least with a GIN index; GIST seems much less able to do well with
short prefixes).  What PG version are you testing?
        regards, tom lane



Tom Lane-2 wrote
> FWIW, I get fairly decent performance for cases like this in HEAD
> (at least with a GIN index; GIST seems much less able to do well with
> short prefixes).  What PG version are you testing?

Thank you Tom,

I'm testing on PG 9.1 on UbuntuServer 12.10, 64bit

I'll update to 9.2 the next days and try again.

Kind Regards
Rawi




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021p5759045.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Tom Lane-2 wrote
> FWIW, I get fairly decent performance for cases like this in HEAD
> (at least with a GIN index; GIST seems much less able to do well with
> short prefixes).

Short or long prefixes seem to be equaly unfavorable. Even with the full
length of the words, but queried as prefix I get a runtime of 25342ms
compared to 353ms without prefixes:

SELECT count(a)
FROM t1
WHERE a_tsvector @@ to_tsquery('aaaaa:* & bbbbb:* & ccccc:* & ddddd:*')

Total query runtime: 25342 ms

"Aggregate  (cost=804.02..804.03 rows=1 width=36)"
"  ->  Bitmap Heap Scan on t1  (cost=800.00..804.02 rows=1 width=36)"
"        Recheck Cond: (a_tsvector @@ to_tsquery('aaaaa:* & bbbbb:* &
ccccc:* & ddddd:*'::text))"
"        ->  Bitmap Index Scan on a_tsvector_idx  (cost=0.00..800.00 rows=1
width=0)"
"              Index Cond: (a_tsvector @@ to_tsquery('aaaaa:* & bbbbb:* &
ccccc:* & ddddd:*'::text))"

Kind Regards
Rawi



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021p5759046.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



rawi wrote
> 
> Tom Lane-2 wrote
>> FWIW, I get fairly decent performance for cases like this in HEAD
>> (at least with a GIN index; GIST seems much less able to do well with
>> short prefixes).  What PG version are you testing?
> Thank you Tom,
> 
> I'm testing on PG 9.1 on UbuntuServer 12.10, 64bit
> 
> I'll update to 9.2 the next days and try again.
> 
> Kind Regards
> Rawi

Just tested with 9.2: pretty much the same




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021p5759645.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Tom Lane-2 wrote
> rawi <

> only4com@

> > writes:
>> And querying: FTS with prefix matching:
> 
>> SELECT count(a)
>> FROM t1
>> WHERE a_tsvector @@ to_tsquery('aaa:* & b:* & c:* & d:*')
> 
>> (RESULT: count: 619)
>> Total query runtime: 21266 ms.
> 
> FWIW, I get fairly decent performance for cases like this in HEAD
> (at least with a GIN index; GIST seems much less able to do well with
> short prefixes).  What PG version are you testing?

Last tests with PG 9.2:

Whereas the "whole words" full text search stays at ~350ms for 900,000
records I tryed the number of records, which are becoming to make one
suffer, querying for prefixes:

RECORDS        RUNTIME (ms)
10,000        13
100,000        81
200,000        172
400,000        7,284
900,000        20,888

Regards, Rawi




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021p5760297.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.