On Wed, Aug 16, 2023 at 09:00:43AM +0900, Michael Paquier wrote:
> Agreed that this looks incorrect as-is. This goes as far as 9a206d0
> when these has been introduced, and it looks like the culprit is
> around initTrie() where the entries are loaded. See around t_isspace,
> for example.
I was looking at the code, and my first impression was right. All
leading and trailing whitespaces between the two characters listed in
the rule file are discarded. The thing is that we clearly document
the parsing rules for the sake of any custom files one can feed to the
extension:
https://www.postgresql.org/docs/devel/unaccent.html
I am not sure what we can do here. Doing nothing is certainly an
option, but I am wondering if we could put in place an extra rule
where whitespaces can be part of the translated character if it uses
double quotes, for example. Thoughts?
--
Michael