> > However, I have just about convinced myself that we don't need
> > IsFirstByte for matching "_" for UTF8, either preceded by "%" or
not,
> > as it should always be true. Can anyone come up with a counter
example?
>
> You have to be on a first byte before you can meaningfully
> apply NextChar, and you have to use NextChar or else you
> don't count characters correctly (eg "__" must match 2 chars
> not 2 bytes).
Well, for utf8 NextChar could advance to the next char even if the
current byte
position is in the middle of a multibyte char (skip over all 10xxxxxx).
(Assuming utf16 surrogate pairs are not encoded as 2 x 3bytes, which is
not valid utf8 anyway)
Andreas