Zeugswetter Andreas ADI SD wrote:
>
>> You have to be on a first byte before you can meaningfully
>> apply NextChar, and you have to use NextChar or else you
>> don't count characters correctly (eg "__" must match 2 chars
>> not 2 bytes).
>>
>
> Well, for utf8 NextChar could advance to the next char even if the
> current byte
> position is in the middle of a multibyte char (skip over all 10xxxxxx).
>
>
>
It doesn't matter - we are satisfied that it won't happen. However, this
might well be a useful optimisation of NextChar() for the UTF8 case as
something like
do { (t)++; (tlen)--} while ((*(t) & 0xC0) == 0x80 && tlen > 0)
In fact, I'm wondering if that might make the other UTF8 stuff redundant
- the whole point of what we're doing is to avoid expensive calls to
NextChar;
cheers
andrew