Re: PATCH: Allow empty targets in unaccent dictionary - Mailing list pgsql-hackers

From Abhijit Menon-Sen
Subject Re: PATCH: Allow empty targets in unaccent dictionary
Date
Msg-id 20140630201039.GA11973@toroid.org
Whole thread Raw
In response to Re: PATCH: Allow empty targets in unaccent dictionary  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PATCH: Allow empty targets in unaccent dictionary  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: PATCH: Allow empty targets in unaccent dictionary  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
At 2014-06-30 15:19:17 -0400, tgl@sss.pgh.pa.us wrote:
>
> Anyway, this raises the question of whether the current patch is
> actually a desirable way to do things, or whether it would be better
> if the unaccenting rules were like "base-char accent-char" ->
> "base-char".

It might be useful to be able to write such rules, but it would be
highly impractical to do so instead of being able to single out
accent-chars for removal.

In all the languages I'm familiar with that use such accent-chars, any
accent-char would form a valid combination with nearly every base-char,
unlike European languages where you don't have to worry about k-umlaut,
say. Also, a standalone accent-char would always be meaningless.

(These accent-chars don't actually exist independently in the syllabary
that a Hindi speaker might learn in school: they're combining forms of
vowels and are treated differently from characters in practice.)

> Also, if there are any contexts where the right translation of an
> accent-char depends on the base-char, you couldn't do it with the
> patch as it stands.

I can't think of a satisfactory example at the moment, but that sounds
entirely plausible.

> It's not unlikely that we want this patch *and* an improvement that
> allows multi-character src strings

I think it's enough to apply just this patch, but I wouldn't object to
doing both if it were easy. It's not clear to me if that's true after a
quick glance at the code, but I'll look again when I'm properly awake.

> Lastly, I didn't especially like the coding details of either proposed
> patch, and rewrote it as attached.

:-)

-- Abhijit



pgsql-hackers by date:

Previous
From: Christian Ullrich
Date:
Subject: Re: PostgreSQL in Windows console and Ctrl-C
Next
From: Andres Freund
Date:
Subject: Re: better atomics - v0.5