This bug still exists in my testing.
---------------------------------------------------------------------------
Tom Lane wrote:
> "Denis Monsieur" <dmonsieur@gmail.com> writes:
> > The problem is a space being added to text in the form of
> > http://some.url/path
> > Compare the output:
>
> > shs=# SELECT ts_headline('http://some.url', to_tsquery('sometext'));
> > ts_headline
> > -----------------
> > http://some.url
> > (1 row)
>
> > shs=# SELECT ts_headline('http://some.url/path', to_tsquery('sometext'));
> > ts_headline
> > -----------------------
> > http:// some.url/path
> > (1 row)
>
> I looked into this, and it seems that the problem is that
> generateHeadline() emits a space for any token marked as replace = 1.
> I think it probably shouldn't emit anything at all. AFAICS the cases
> where replace will get set are token types URL, TAG, NUMHWORD,
> ASCIIHWORD, HWORD. For URL and the HWORD variants the space is
> certainly undesirable, because these token types are just respecifying
> text that is also covered by their component tokens. The only case
> where you could make an argument that the space is useful is TAG,
> as in
>
> regression=# SELECT ts_headline('http<foo>blah', to_tsquery('sometext'));
> ts_headline
> -------------
> http blah
> (1 row)
>
> But it seems to me to be at least as plausible that you should get
> nothing as that you should get a space for a removed tag.
>
> Comments?
>
> regards, tom lane
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +