Andrew Dunstan <andrew@dunslane.net> writes:
> do { (t)++; (tlen)--} while ((*(t) & 0xC0) == 0x80 && tlen > 0)
The while *must* test those two conditions in the other order.
(Don't laugh --- we've had reproducible bugs before in which the backend
dumped core because of running off the end of memory due to this type
of mistake.)
> In fact, I'm wondering if that might make the other UTF8 stuff redundant
> - the whole point of what we're doing is to avoid expensive calls to
> NextChar;
+1 I think. This test will be approximately the same expense as what
the outer loop would otherwise be (tlen > 0 and *t != firstpat), and
doing it this way removes an entire layer of intellectual complexity.
Even though the code is hardly different, we are no longer dealing in
misaligned pointers anywhere in the match algorithm.
regards, tom lane