Thread: Index Usage and Running Times by FullTextSearch with prefix matching
Hi I tested the following: CREATE TABLE t1 ( id serial NOT NULL, a character varying(125), a_tsvector tsvector, CONSTRAINT t1_pkey PRIMARY KEY (id) ); INSERT INTO t1 (a, a_tsvector) VALUES ('ooooo,ppppp,fffff,jjjjj,zzzzz,jjjjj', to_tsvector('ooooo,ppppp,fffff,jjjjj,zzzzz,jjjjj'); CREATE INDEX a_tsvector_idx ON t1 USING gin (a_tsvector); (I have generated 900000 records with random words like this) Now querying: normal full text search SELECT count(a) FROM t1 WHERE a_tsvector @@ to_tsquery('aaaaa & bbbbb & ccccc & ddddd') (RESULT: count: 619) Total query runtime: 353 ms. Query Plan: "Aggregate (cost=6315.22..6315.23 rows=1 width=36)" " -> Bitmap Heap Scan on t1 (cost=811.66..6311.46 rows=1504 width=36)" " Recheck Cond: (a_tsvector @@ to_tsquery('aaaaa & bbbbb & ccccc & ddddd'::text))" " -> Bitmap Index Scan on a_tsvector_idx (cost=0.00..811.28 rows=1504 width=0)" " Index Cond: (a_tsvector @@ to_tsquery('aaaaa & bbbbb & ccccc & ddddd'::text))" And querying: FTS with prefix matching: SELECT count(a) FROM t1 WHERE a_tsvector @@ to_tsquery('aaa:* & b:* & c:* & d:*') (RESULT: count: 619) Total query runtime: 21266 ms. Query Plan: "Aggregate (cost=804.02..804.03 rows=1 width=36)" " -> Bitmap Heap Scan on t1 (cost=800.00..804.02 rows=1 width=36)" " Recheck Cond: (a_tsvector @@ to_tsquery('aaa:* & b:* & c:* & d:*'::text))" " -> Bitmap Index Scan on a_tsvector_idx (cost=0.00..800.00 rows=1 width=0)" " Index Cond: (a_tsvector @@ to_tsquery('aaa:* & b:* & c:* & d:*'::text))" I don't understand the big query time difference, despite the explainig index usage. NOnetheless I'd like to simulate LIKE 'aaa%' with full text search. Would I have a better sollution? Many thanks in advance! Rawi -- View this message in context: http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
rawi <only4com@web.de> writes: > And querying: FTS with prefix matching: > SELECT count(a) > FROM t1 > WHERE a_tsvector @@ to_tsquery('aaa:* & b:* & c:* & d:*') > (RESULT: count: 619) > Total query runtime: 21266 ms. FWIW, I get fairly decent performance for cases like this in HEAD (at least with a GIN index; GIST seems much less able to do well with short prefixes). What PG version are you testing? regards, tom lane
Tom Lane-2 wrote > FWIW, I get fairly decent performance for cases like this in HEAD > (at least with a GIN index; GIST seems much less able to do well with > short prefixes). What PG version are you testing? Thank you Tom, I'm testing on PG 9.1 on UbuntuServer 12.10, 64bit I'll update to 9.2 the next days and try again. Kind Regards Rawi -- View this message in context: http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021p5759045.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Tom Lane-2 wrote > FWIW, I get fairly decent performance for cases like this in HEAD > (at least with a GIN index; GIST seems much less able to do well with > short prefixes). Short or long prefixes seem to be equaly unfavorable. Even with the full length of the words, but queried as prefix I get a runtime of 25342ms compared to 353ms without prefixes: SELECT count(a) FROM t1 WHERE a_tsvector @@ to_tsquery('aaaaa:* & bbbbb:* & ccccc:* & ddddd:*') Total query runtime: 25342 ms "Aggregate (cost=804.02..804.03 rows=1 width=36)" " -> Bitmap Heap Scan on t1 (cost=800.00..804.02 rows=1 width=36)" " Recheck Cond: (a_tsvector @@ to_tsquery('aaaaa:* & bbbbb:* & ccccc:* & ddddd:*'::text))" " -> Bitmap Index Scan on a_tsvector_idx (cost=0.00..800.00 rows=1 width=0)" " Index Cond: (a_tsvector @@ to_tsquery('aaaaa:* & bbbbb:* & ccccc:* & ddddd:*'::text))" Kind Regards Rawi -- View this message in context: http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021p5759046.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
rawi wrote > > Tom Lane-2 wrote >> FWIW, I get fairly decent performance for cases like this in HEAD >> (at least with a GIN index; GIST seems much less able to do well with >> short prefixes). What PG version are you testing? > Thank you Tom, > > I'm testing on PG 9.1 on UbuntuServer 12.10, 64bit > > I'll update to 9.2 the next days and try again. > > Kind Regards > Rawi Just tested with 9.2: pretty much the same -- View this message in context: http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021p5759645.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Tom Lane-2 wrote > rawi < > only4com@ > > writes: >> And querying: FTS with prefix matching: > >> SELECT count(a) >> FROM t1 >> WHERE a_tsvector @@ to_tsquery('aaa:* & b:* & c:* & d:*') > >> (RESULT: count: 619) >> Total query runtime: 21266 ms. > > FWIW, I get fairly decent performance for cases like this in HEAD > (at least with a GIN index; GIST seems much less able to do well with > short prefixes). What PG version are you testing? Last tests with PG 9.2: Whereas the "whole words" full text search stays at ~350ms for 900,000 records I tryed the number of records, which are becoming to make one suffer, querying for prefixes: RECORDS RUNTIME (ms) 10,000 13 100,000 81 200,000 172 400,000 7,284 900,000 20,888 Regards, Rawi -- View this message in context: http://postgresql.1045698.n5.nabble.com/Index-Usage-and-Running-Times-by-FullTextSearch-with-prefix-matching-tp5759021p5760297.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.