Strange ranking with dict_xsyn - Mailing list pgsql-general
From | F Wolff |
---|---|
Subject | Strange ranking with dict_xsyn |
Date | |
Msg-id | CAFziTtG=BYgP21mqz6jiJvV06F7wpTzV5Di1jVYs7DpshEMMtA@mail.gmail.com Whole thread Raw |
List | pgsql-general |
Hi everybody I'm trying to use the dict_xsyn contrib module to implement query expansion. I'm baffled by what seems like incorrect behaviour, and would appreciate some help. Here is a simple example using the packaged example "xsyn_sample.rules": speel=# CREATE EXTENSION dict_xsyn; CREATE EXTENSION speel=# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='xsyn_sample'); ALTER TEXT SEARCH DICTIONARY speel=# ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR asciiword WITH xsyn,english_stem; ALTER TEXT SEARCH CONFIGURATION speel=# SELECT to_tsquery('english', 'supernova'); --dict_xsyn is working to_tsquery ---------------------- 'supernova' & 'sn' & 'sne' & '1987a' (1 row) speel=# select ts_rank(to_tsvector('english', 'supernova'), to_tsquery('english', 'cute|supernova')); --no surprise ts_rank --------- 0.0486342 (1 row) speel=# select ts_rank(to_tsvector('english', 'supernova'), to_tsquery('english', 'supernova')); --unexpected ts_rank ------- 1e-20 (1 row) speel=# select to_tsvector('english', 'supernova') @@ to_tsquery('english', 'supernova'); --no surprise ?column? ------- t (1 row) speel=# select ts_rank(to_tsvector('english', 'cute supernova'), to_tsquery('english', 'supernova')); --unexpected ts_rank ------- 1e-20 (1 row) speel=# select ts_rank(to_tsvector('english', 'cute supernova'), to_tsquery('english', 'cute')); --no surprise ts_rank ------- 0.0607927 (1 row) speel=# select ts_rank(to_tsvector('english', 'supernova'), to_tsquery('english', 'cute & supernova')); --unexpected, was expecting 0 ts_rank ------- 1e-20 (1 row) speel=# select to_tsvector('english', 'supernova') @@ to_tsquery('english', 'cute & supernova'); --no surprise ?column? ------- f (1 row) The ranking seems like a bug, and has been causing some problems for my application. I would expect the ranking to be the same (or similar) if a single word query matches one of the two words in the document. It seems like a query with an entry in the extended synonym dictionary somehow creates this extremely small rank value, which is below any reasonable threshold that might be used to reduce the number of results in a full text query. I was also expecting a rank of 0 when the @@ operator returns false. In the above examples there are ranks of 1e-20 where the @@ operator gave true and false respectively, which seems unintuitive (and wrong) to me. I'm not sure if 1e-20 is supposed to mean "as good as 0". As a comparison, for a search configuration without dict_xsyn, the results are mostly as expected (but of course without the query expansion): speel=# select ts_rank(to_tsvector('simple', 'supernova'), to_tsquery('simple', 'supernova')); ts_rank ------- 0.0607927 (1 row) speel=# select ts_rank(to_tsvector('simple', 'cute supernova'), to_tsquery('simple', 'supernova')); ts_rank ------- 0.0607927 (1 row) speel=# select ts_rank(to_tsvector('simple', 'supernova'), to_tsquery('simple', 'cute|supernova')); --only slightly different; that's ok ts_rank ------- 0.0303964 (1 row) speel=# select ts_rank(to_tsvector('simple', 'supernova'), to_tsquery('simple', 'cute & supernova')); --mm, that 1e-20 again :-( ts_rank ------- 1e-20 (1 row) Thank you for any help anyone can provide. Friedel
pgsql-general by date: