Queryplan within FTS/GIN index -search. - Mailing list pgsql-performance
From | Jesper Krogh |
---|---|
Subject | Queryplan within FTS/GIN index -search. |
Date | |
Msg-id | 4AE0881D.9070201@krogh.cc Whole thread Raw |
Responses |
Re: Queryplan within FTS/GIN index -search.
Re: Queryplan within FTS/GIN index -search. Re: Queryplan within FTS/GIN index -search. |
List | pgsql-performance |
Hi My indexing base is now up to 7.5m documents, I have raise statistics target to 1000 for the tsvector column in order to make the query-planner choose more correctly. That works excellent. Table structure is still: ftstest=# \d ftsbody Table "public.ftsbody" Column | Type | Modifiers ------------------+----------+------------------------------------------------------ id | integer | not null default nextval('ftsbody_id_seq'::regclass) body | text | not null default ''::text ftsbody_body_fts | tsvector | Indexes: "ftsbody_body_md5" UNIQUE, btree (md5(body)) "ftsbody_id_pri_idx" UNIQUE, btree (id) "ftsbody_tfs_idx" gin (ftsbody_body_fts) Triggers: tsvectorupdate BEFORE INSERT OR UPDATE ON uniprot FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('ftsbody_body_fts', 'pg_catalog.english', 'body') I'm searching the gin-index for 1-5 terms, where all of them matches the same document. TERM1 is unique by itself, TERM2 is a bit more common (52 rows), TERM3 more common, TERM4 close to all and TERM5 all records. Just quering for a unique value and add in several values that match everything makes the run-time go significantly up. I somehow would expect the index-search to take advantage of the MCV's informations in the statistics that sort of translate it into a search and post-filtering (as PG's queryplanner usually does at the SQL-level). QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=102.45..102.45 rows=1 width=751) (actual time=3.726..3.729 rows=1 loops=1) -> Sort (cost=102.45..102.45 rows=1 width=751) (actual time=3.722..3.723 rows=1 loops=1) Sort Key: id Sort Method: quicksort Memory: 27kB -> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1 width=751) (actual time=3.700..3.702 rows=1 loops=1) Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 & TERM2'::text)) -> Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..100.42 rows=1 width=0) (actual time=3.683..3.683 rows=1 loops=1) Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 & TERM2'::text)) Total runtime: 3.790 ms (9 rows) QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=102.45..102.45 rows=1 width=751) (actual time=850.017..850.020 rows=1 loops=1) -> Sort (cost=102.45..102.45 rows=1 width=751) (actual time=850.013..850.015 rows=1 loops=1) Sort Key: id Sort Method: quicksort Memory: 27kB -> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1 width=751) (actual time=849.991..849.993 rows=1 loops=1) Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 & TERM2 & TERM3'::text)) -> Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..100.42 rows=1 width=0) (actual time=849.970..849.970 rows=1 loops=1) Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 & TERM2 & TERM3'::text)) Total runtime: 850.084 ms (9 rows) QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=102.45..102.45 rows=1 width=751) (actual time=1152.065..1152.068 rows=1 loops=1) -> Sort (cost=102.45..102.45 rows=1 width=751) (actual time=1152.061..1152.062 rows=1 loops=1) Sort Key: id Sort Method: quicksort Memory: 27kB -> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1 width=751) (actual time=1152.039..1152.041 rows=1 loops=1) Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 & TERM2 & TERM3 & TERM4'::text)) -> Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..100.42 rows=1 width=0) (actual time=1152.020..1152.020 rows=1 loops=1) Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 & TERM2 & TERM3 & TERM4'::text)) Total runtime: 1152.129 ms (9 rows) QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=102.45..102.45 rows=1 width=751) (actual time=1509.043..1509.046 rows=1 loops=1) -> Sort (cost=102.45..102.45 rows=1 width=751) (actual time=1509.040..1509.040 rows=1 loops=1) Sort Key: id Sort Method: quicksort Memory: 27kB -> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1 width=751) (actual time=1509.018..1509.020 rows=1 loops=1) Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 & TERM2 & TERM3 & TERM4 & TERM5'::text)) -> Bitmap Index Scan on ftsbody_tfs_idx (cost=0.00..100.42 rows=1 width=0) (actual time=1508.998..1508.998 rows=1 loops=1) Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 & TERM2 & TERM3 & TERM4 & TERM5'::text)) Total runtime: 1509.109 ms (9 rows) Can (perhaps more readable) be found at http://krogh.cc/~jesper/test.out Can this be optimized? (I cannot really prevent users from typing stuff in that are common). -- Jesper
pgsql-performance by date: