Strange ranking with dict_xsyn - Mailing list pgsql-general

From F Wolff
Subject Strange ranking with dict_xsyn
Date
Msg-id CAFziTtG=BYgP21mqz6jiJvV06F7wpTzV5Di1jVYs7DpshEMMtA@mail.gmail.com
Whole thread Raw
List pgsql-general
Hi everybody

I'm trying to use the dict_xsyn contrib module to implement query
expansion. I'm baffled by what seems like incorrect behaviour, and
would appreciate some help. Here is a simple example using the
packaged example "xsyn_sample.rules":


speel=# CREATE EXTENSION dict_xsyn;
CREATE EXTENSION

speel=# ALTER TEXT SEARCH DICTIONARY xsyn (RULES='xsyn_sample');
ALTER TEXT SEARCH DICTIONARY

speel=# ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR
asciiword WITH xsyn,english_stem;
ALTER TEXT SEARCH CONFIGURATION

speel=# SELECT to_tsquery('english', 'supernova'); --dict_xsyn is working
      to_tsquery
----------------------
'supernova' & 'sn' & 'sne' & '1987a'
(1 row)

speel=# select ts_rank(to_tsvector('english', 'supernova'),
to_tsquery('english', 'cute|supernova')); --no surprise
 ts_rank
---------
0.0486342
(1 row)

speel=# select ts_rank(to_tsvector('english', 'supernova'),
to_tsquery('english', 'supernova'));  --unexpected
ts_rank
-------
  1e-20
(1 row)

speel=# select to_tsvector('english', 'supernova') @@
to_tsquery('english', 'supernova');  --no surprise
?column?
-------
  t
(1 row)

speel=# select ts_rank(to_tsvector('english', 'cute supernova'),
to_tsquery('english', 'supernova'));  --unexpected
ts_rank
-------
  1e-20
(1 row)

speel=# select ts_rank(to_tsvector('english', 'cute supernova'),
to_tsquery('english', 'cute')); --no surprise
ts_rank
-------
  0.0607927
(1 row)

speel=# select ts_rank(to_tsvector('english', 'supernova'),
to_tsquery('english', 'cute & supernova'));  --unexpected, was
expecting 0
ts_rank
-------
  1e-20
(1 row)

speel=# select to_tsvector('english', 'supernova') @@
to_tsquery('english', 'cute & supernova');  --no surprise
?column?
-------
  f
(1 row)


The ranking seems like a bug, and has been causing some problems for
my application. I would expect the ranking to be the same (or similar)
if a single word query matches one of the two words in the document.
It seems like a query with an entry in the extended synonym dictionary
somehow creates this extremely small rank value, which is below any
reasonable threshold that might be used to reduce the number of
results in a full text query. I was also expecting a rank of 0 when
the @@ operator returns false. In the above examples there are ranks
of 1e-20 where the @@ operator gave true and false respectively, which
seems unintuitive (and wrong) to me. I'm not sure if 1e-20 is supposed
to mean "as good as 0".

As a comparison, for a search configuration without dict_xsyn, the
results are mostly as expected (but of course without the query
expansion):

speel=# select ts_rank(to_tsvector('simple', 'supernova'),
to_tsquery('simple', 'supernova'));
ts_rank
-------
  0.0607927
(1 row)

speel=# select ts_rank(to_tsvector('simple', 'cute supernova'),
to_tsquery('simple', 'supernova'));
ts_rank
-------
  0.0607927
(1 row)

speel=# select ts_rank(to_tsvector('simple', 'supernova'),
to_tsquery('simple', 'cute|supernova')); --only slightly different;
that's ok
ts_rank
-------
  0.0303964
(1 row)

speel=# select ts_rank(to_tsvector('simple', 'supernova'),
to_tsquery('simple', 'cute & supernova')); --mm, that 1e-20 again :-(
ts_rank
-------
  1e-20
(1 row)


Thank you for any help anyone can provide.

Friedel


pgsql-general by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Function tracking
Next
From: François Beausoleil
Date:
Subject: Re: Slave promotion failure