BUG #8750: 'simple' parser in to_tsvector() splits words on underscores - Mailing list pgsql-bugs

From drx@a-blast.org
Subject BUG #8750: 'simple' parser in to_tsvector() splits words on underscores
Date
Msg-id E1W0z20-0007Gz-9W@wrigleys.postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      8750
Logged by:          Dragan Espenschied
Email address:      drx@a-blast.org
PostgreSQL version: 9.3.2
Operating system:   Ubuntu 12.04 x64_64
Description:

If to convert a text to a tsvector with the 'simple' parser, words are split
on underscores. For example:


select to_tsvector('simple', 'light_bulb');
    to_tsvector
--------------------
 'bulb':2 'light':1


The underscore is typically used if a term that should be kept together
contains a space, so it is an explicit note that a term should not be
split.


At least, this is how I understand it.


I suggest that words are not split on underscores by default. It would make
for example typical tasks of tagging very comfortable to implement, without
much need to modify the parser.


Thanks for considering my suggestion!
Dragan

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Out of memory in CIFS leads to database crash
Next
From: rabigul@gmail.com
Date:
Subject: BUG #8760: Large Objects