Nathan Bossart <nathandbossart@gmail.com> writes:
> On Tue, Nov 19, 2024 at 06:09:44PM -0500, Tom Lane wrote:
>> Also, we could bypass the multiple lookups unless both the
>> NAMEDATALEN-1'th and NAMEDATALEN-2'th bytes are non-ASCII, which
>> should be rare enough to make it not much of a performance issue.
> I'm admittedly not an expert in the multi-byte code, but since there are
> encodings like LATIN1 that use a byte per character, don't we need to do
> multiple lookups any time the NAMEDATALEN-1'th byte is non-ASCII?
I don't think so, but maybe I'm missing something. An important
property of backend-legal encodings is that all bytes of a multibyte
character have their high bits set. Thus if the NAMEDATALEN-2'th
byte does not have that, it is not part of a multibyte character.
That's also the reason we can stop if we reach a high-bit-clear
byte while backing up to earlier bytes.
regards, tom lane