On Sun, May 28, 2017 at 7:55 PM, Dang Minh Huong <
kakalot49@gmail.com> wrote:
Thanks for reporting and lecture about unicode.
I attached a patch as the instruction from Thomas. Could you confirm it.
- is_plain_letter(table[codepoint.combining_ids[0]]) and \
+ (is_plain_letter(table[codepoint.combining_ids[0]]) or\
+ len(table[codepoint.combining_ids[0]].combining_ids) > 1) and \
Shouldn't you use "or is_letter_with_marks()", instead of "or len(...)
1"? Your test might catch something that isn't based on a 'letter'
(according to is_plain_letter). Otherwise this looks pretty good to
me. Please add it to the next commitfest.
Thanks for confirm, sir.
I will add it to the next CF soon.
I expect that some users in Vietnam will consider this to be a bugfix,
which raises the question of whether to backpatch it. Perhaps we
could consider fixing it for 10. Then users of older versions could
grab the rules file from 10 to use with 9.whatever if they want to do
that and reindex their data as appropriate.
I am also inclined to the fixing it for 10, because it will not affect to current users.
But do you want to back-patch to all supported versions Kha Nguyen?
# I would also want to note that, not only Vietnamese characters were missed to add from the rule list.
---
Thanks and best regards,
Dang Minh Huong