Thread: Re: Decision by Monday: PQescapeString() vs. encoding violation

Re: Decision by Monday: PQescapeString() vs. encoding violation

From
Laurenz Albe
Date:
On Fri, 2025-02-14 at 17:27 -0800, Noah Misch wrote:
> Commit 5dc1e42b4fa6a4434afa7d7cdcf0291351a7b873 for this week's CVE-2025-1094
> changed how PQescapeString()[1] reacts to input that is not valid in the
> client encoding.  Before that commit, the function would ignore encoding
> problems except at the end of the string.  Now, it replaces the bad sequence
> up to the length implied by the first byte.  For example, if UTF8 input has
> 0xc2 followed by an ASCII byte, the function removes both bytes.
>
> Jeff Davis reported to the security team that
> http://www.unicode.org/reports/tr36/#Ill-Formed_Subsequences forbids something
> like this, saying a UTF-8 converter "must not consume the [second byte] if it
> continues".  (That's my summary, not his.  He might reply to add color here.)
> While PQescapeString() is not a UTF-8 converter, that standard still felt to
> multiple security team members like a decent argument for removing only the
> invalid 0xc2, not the byte following it.
>
> Security team members disagreed on whether next week's releases are the last
> reasonable chance to change this, or whether changing it in e.g. May would be
> reasonable.  If applications make changes to cope with the new behavior, that
> could be an argument against further change.
>
> Question for all: would you switch to the "remove fewer bytes" behavior in
> next week's releases, switch later, or switch never?  Why so?  Please answer
> in the next 24hr if possible, given the little time until we wrap next week's
> releases on Monday.  I regret the late notice.

I don't have a very strong opinion, but I'm leaning towards removing fewer bytes.

My reasoning is that most encoding confusion comes from Microsoft's tenacious
clinging to non-Unicode encodings, and in my part of the world single-byte
encodings are prevalent on Windows systems.  Removing more bytes from the string
would mean removing more characters, which would mutilate the string more than
necessary.

About when to make that change: having fewer backward-incompatible behavior
changes is desirable, which would speak for doing it now.  If you feel that the
short time frame is long enough for you to write good, reliable code, great.
If you feel that it would be better for the quality of the code to have more
time, take the time.  Better one more subtle change than another security bug.

Yours,
Laurenz Albe

--

*E-Mail Disclaimer*
Der Inhalt dieser E-Mail ist ausschliesslich fuer den
bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat
dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte,
dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung oder
Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich
in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.

*CONFIDENTIALITY NOTICE & DISCLAIMER
*This message and any attachment are
confidential and may be privileged or otherwise protected from disclosure
and solely for the use of the person(s) or entity to whom it is intended.
If you have received this message in error and are not the intended
recipient, please notify the sender immediately and delete this message and
any attachment from your system. If you are not the intended recipient, be
advised that any use of this message is prohibited and may be unlawful, and
you must not copy this message or attachment or disclose the contents to
any other person.