Hi.
There seems to be an "unimplemented" area around getting statistics for
wildcard searches done. Wildcards anchored to the left can be matched
up by the gin-index and the ts_match_vq operator:
testdb=# select to_tsvector('project') @@ to_tsquery('proj:*'); ?column?
---------- t
(1 row)
Searching for project gives me this estimate:
testdb=# explain select id,document_tsvector from efam.reference where
document_tsvector @@ to_tsquery('projects') order by id limit 50; QUERY PLAN
------------------------------------------------------------------------------------------------- Limit
(cost=0.00..3008.54rows=50 width=76) -> Index Scan using reference_pkey on reference
(cost=0.00..3762544.72 rows=62531 width=76) Filter: (document_tsvector @@ to_tsquery('projects'::text))
(3 rows)
whereas searching for proj:* gives:
testdb=# explain select id,document_tsvector from efam.reference where
document_tsvector @@ to_tsquery('proj:*') order by id limit 50; QUERY PLAN
--------------------------------------------------------------------------------------------- Limit (cost=73.56..73.58
rows=6width=76) -> Sort (cost=73.56..73.58 rows=6 width=76) Sort Key: id -> Bitmap Heap Scan on
reference (cost=34.55..73.49 rows=6
width=76) Recheck Cond: (document_tsvector @@
to_tsquery('proj:*'::text)) -> Bitmap Index Scan on reference_fts_idx
(cost=0.00..34.54 rows=6 width=0) Index Cond: (document_tsvector @@
to_tsquery('proj:*'::text))
(7 rows)
There are abouvios challenges in getting statistics for submatches where
there are no real information
in the pg_stats table, but there will also be a huge amount of cases
where a fairly reliable
guess can be extracted.
--
Jesper