Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails - Mailing list pgsql-bugs

From Bruce Momjian
Subject Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Date
Msg-id Zz92cka2VlBDsat3@momjian.us
Whole thread Raw
In response to Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
List pgsql-bugs
On Thu, Nov 21, 2024 at 11:09:14AM -0600, Nathan Bossart wrote:
> On Thu, Nov 21, 2024 at 11:44:44AM -0500, Bruce Momjian wrote:
> > On Thu, Nov 21, 2024 at 09:14:23AM -0600, Nathan Bossart wrote:
> >> Tom provided a concise explanation upthread [0].  My understanding is the
> >> same as Bertrand's, i.e., this is an easy way to rule out a bunch of cases
> >> where we know that we couldn't possibly have truncated in the middle of a
> >> multi-byte character.  This allows us to avoid doing multiple pg_database
> >> lookups.
> > 
> > Where does Tom mention anything about checking two bytes?
> 
> Here [0].  And he further elaborated on this idea here [1].
> 
> > He is
> > basically saying remove all trailing high-bit characters until you get a
> > match, because once you get a match, you are have found the point of
> > valid truncation for the encoding.
> 
> Yes, we still need to do that if it's possible the truncation wiped out
> part of a multi-byte character.  But it's not possible that we truncated
> part of a multi-byte character if the NAMEDATALEN-1'th or NAMEDATALEN-2'th
> byte is ASCII, in which case we can avoid doing extra lookups.

Why would you check for two characters at the end rather than just a
normal check in the main loop?

> > needs to be fixed, at a minimum, specifically, "So if IS_HIGHBIT_SET is
> > true for both NAMEDATALEN-1 and NAMEDATALEN-2, we know we're in the
> > middle of a multibyte character."
> 
> Agreed, the second-to-last sentence should be adjusted to something like
> "we might be in the middle of a multibyte character."  We don't know for
> sure.
> 
> >> * Try to do multibyte-aware truncation (the patch at hand).
> > 
> > Yes, I am fine with that, but we need to do more than the patch does to
> > accomplish this, unless I am totally confused.
> 
> What more do you think is required?

I think the IS_HIGHBIT_SET needs to be integrated into the 'for' loop
more clearly;  the 'if' check plus the comment above it is just
confusing.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"



pgsql-bugs by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails
Next
From: Nathan Bossart
Date:
Subject: Re: BUG #18711: Attempting a connection with a database name longer than 63 characters now fails