On Tue, Nov 19, 2024 at 02:33:27PM -0500, Tom Lane wrote:
> I did think of a way that we could approximate encoding-correct
> truncation here, relying on the fact that what's in pg_database
> is encoding-correct according to somebody:
>
> 1. If NAMEDATALEN-1'th byte is ASCII (high bit clear), just truncate
> there and look up as usual.
>
> 2. If it's non-ASCII, truncate there and try to look up. On success,
> we're good. On failure, if the next-to-last byte is non-ASCII,
> truncate that too and try to look up. Repeat a maximum of
> MAX_MULTIBYTE_CHAR_LEN-1 times before failing.
>
> I think this works unconditionally so long as all entries in
> pg_database.datname are in the same encoding. If there's a
> mixture of encodings (which we don't forbid) then in principle
> you could probably select a database other than the one the
> client thought it was asking for. But that seems mighty
> improbable, and the answer can always be "so connect using
> the name as it appears in the catalog".
That's an interesting idea. That code would probably need to live in
GetDatabaseTuple(), but it seems doable. We might be able to avoid the
"mighty improbable" case by always truncating up to
MAX_MULTIBYTE_CHAR_LEN-1 times and failing if there are multiple matches,
too.
--
nathan