Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails - Mailing list pgsql-bugs

From Bruce Momjian
Subject Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Date
Msg-id Zz9B3KQGXFCGVPXy@momjian.us
Whole thread Raw
In response to Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
List pgsql-bugs
On Thu, Nov 21, 2024 at 07:27:22AM +0000, Bertrand Drouvot wrote:
> +        /*
> +         * If the original name is too long and we see two consecutive bytes
> +         * with their high bits set at the truncation point, we might have
> +         * truncated in the middle of a multibyte character. In multibyte
> +         * encodings, every byte of a multibyte character has its high bit
> +         * set. So if IS_HIGHBIT_SET is true for both NAMEDATALEN-1 and
> +         * NAMEDATALEN-2, we know we're in the middle of a multibyte
> +         * character. We need to try truncating one more byte back to find the
> +         * start of the next character.
> +         */
...
> +                /*
> +                 * If we've hit a byte with high bit clear (an ASCII byte), we
> +                 * know we can't be in the middle of a multibyte character,
> +                 * because all bytes of a multibyte character must have their
> +                 * high bits set. Any following byte must therefore be the
> +                 * start of a new character, so we can stop looking for
> +                 * earlier truncation points.
> +                 */

I don't understand this logic.  Why are two bytes important?  If we knew
it was UTF8 we could check for non-first bytes always starting with
bits 10, but we can't know that.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"



pgsql-bugs by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: BUG #18718: Incorrect Twitter/X Logo Displayed on PostgreSQL Documentation Page
Next
From: Daniel Gustafsson
Date:
Subject: Re: BUG #18718: Incorrect Twitter/X Logo Displayed on PostgreSQL Documentation Page