Hi
I have a question about ts_headline, when the query includes word like
'on-line' - only the 'line' part is highlighted, even though the whole
phrase is indexed too, some details below.
Postgresql 9.1.6
select
token, dictionary, lexemes
from
ts_debug('play on-line') where alias <> 'blank';
token | dictionary | lexemes
---------+--------------+----------
play | english_stem | {play}
on-line | english_stem | {on-lin}
on | english_stem | {}
line | english_stem | {line}
select to_tsquery('play & on-line');
to_tsquery
----------------------------
'play' & 'on-lin' & 'line'
select ts_headline('play on-line', to_tsquery('play & on-line'));
ts_headline
----------------------------
<b>play</b> on-<b>line</b>
Same as
select ts_headline('play on-line', to_tsquery('play & line'));
ts_headline
----------------------------
<b>play</b> on-<b>line</b>
Is that the intended behaviour? I guess the problem here is that 'on' is
not a lexem, but then what about on-lin?
In another example, I thought that a hyphenated match would have some
kind of preference
select token, dictionary, lexemes from ts_debug('custom-built query')
where alias <> 'blank';
token | dictionary | lexemes
--------------+--------------+----------------
custom-built | english_stem | {custom-built}
custom | english_stem | {custom}
built | english_stem | {built}
query | english_stem | {queri}
select to_tsquery('query & custom-built');
to_tsquery
-----------------------------------------------
'queri' & 'custom-built' & 'custom' & 'built'
select ts_headline('custom-built query', to_tsquery('query &
custom-built'));
ts_headline
-----------------------------------------
<b>custom</b>-<b>built</b> <b>query</b>
This works better, but still both parts of 'custom-built' are
highlighted separately. But maybe ts_headline understands or operates on
single, not hyphenated words only?
thanks
daniel