Home > mailing lists

Re: BUG #15277: ts_headline strips things that look like HTML tagsand it cannot be disabled - Mailing list pgsql-bugs

From	Dan Book
Subject	Re: BUG #15277: ts_headline strips things that look like HTML tagsand it cannot be disabled
Date	July 12, 2018 18:33:52
Msg-id	CABMkAVUjc7Bh4WWTnF_US95_t8L6hpPFV8yJQJ51YQmWjG=Spg@mail.gmail.com Whole thread
In response to	Re: BUG #15277: ts_headline strips things that look like HTML tagsand it cannot be disabled (Arthur Zakirov <a.zakirov@postgrespro.ru>)
List	pgsql-bugs

Tree view

On Thu, Jul 12, 2018 at 5:22 AM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

Hello,

On Thu, Jul 12, 2018 at 07:59:40AM +0000, PG Bug reporting form wrote:
> I have text that is not HTML and contains things that look like HTML tags.
> The headlines are HTML escaped when output. It is very odd to have this text
> missing from the resulting headlines and no way to control the behavior.

 and are recognized as "tag" token. By default they are
ignored. You need to modify existing configuration or create new one:

=# CREATE TEXT SEARCH CONFIGURATION english_tag (COPY = english);
=# alter text search configuration english_tag
add mapping for tag with simple;

Then tags aren't skipped:

=# select * from ts_debug('english_tag', 'query test');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+----------------+--------------+---------
asciiword | Word, all ASCII | query | {english_stem} | english_stem | {queri}
blank | Space symbols | | {} | (null) | (null)
tag | XML tag | | {simple} | simple | {}
asciiword | Word, all ASCII | test | {english_stem} | english_stem | {test}
tag | XML tag | | {simple} | simple | {}

But even in this case ts_headline will skip tags. Because it is
hardcoded [1].

I think it isn't good to change the behaviour for existing versions of
PostgreSQL. But there is a workaround of course if it is appropriate for
someone. It is possible to create your own text search parser extension.
Example [2]. And change

#define HLIDREPLACE(x) ( (x)==TAG_T )

to

#define HLIDREPLACE(x) ( false )

Thanks for the response. It's good to know this is possible but defining a custom parser is not ideal.

-Dan

pgsql-bugs by date:

From: Arthur Zakirov
Date: 12 July 2018, 12:22:06
Subject: Re: BUG #15277: ts_headline strips things that look like HTML tagsand it cannot be disabled

From: Moshe Jacobson
Date: 12 July 2018, 22:48:04
Subject: pg_restore: All GRANTs on table fail when any one role is missing

Re: BUG #15277: ts_headline strips things that look like HTML tagsand it cannot be disabled - Mailing list pgsql-bugs

Previous

Next