Re: Decision by Monday: PQescapeString() vs. encoding violation - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Decision by Monday: PQescapeString() vs. encoding violation
Date
Msg-id 20250215183852.6f@rfd.leadboat.com
Whole thread Raw
In response to Re: Decision by Monday: PQescapeString() vs. encoding violation  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Decision by Monday: PQescapeString() vs. encoding violation
List pgsql-hackers
On Sat, Feb 15, 2025 at 01:23:51PM -0500, Andres Freund wrote:
> On 2025-02-15 13:08:03 -0500, Tom Lane wrote:
> > I studied the v3 patch a little and realized that it only fixes the
> > behavior for the case of a complete-but-invalid multibyte character.
> > If we have an incomplete character at the end of the string, the
> > whole thing still gets dropped.  Surely that's not what we want if
> > we are going to adopt this behavior.

Thanks for doing that.

> Good catch! The different behaviour made sense when we were dropping the
> entire multi-byte character, but not anymore.
> 
> 
> > Here's a v4 that fixes that.  As a bonus, we can get rid of
> > duplicative coding since the "incomplete" and "invalid" cases
> > are now treated identically.
> 
> > One minor point is that the error messages from PQescapeStringInternal
> > no longer distinguish "incomplete" from "invalid".  I don't find that
> > to be a terribly useful distinction, so that's fine with me.  But if
> > someone feels that's important to preserve, we could make it do
> > something like
> 
> I think there's some value in the distinction, because incomplete characters
> can happen a lot more easily due to truncating longer strings. I don't know if
> it's worth the code complexity though.

While I agree it doesn't feel like a terribly-important distinction, I'd
definitely keep that distinction today rather than drop it with so little
notice.  (I'd feel fine about dropping it in a major release.  I'd probably
not drop it in any minor release, but especially not this close to wrap.)

On Sat, Feb 15, 2025 at 01:35:10PM -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It seems that nobody is arguing against the "just skip one byte" behaviour, so
> > I'm inclined to push this fairly soon, even if Noah's "24 hours" haven't quite
> > elapsed.  A few more cycles in the buildfarm wouldn't hurt.

Works for me; we can always revert in the event of further discussion.  For
what it's worth, I didn't intend this 24hr as a hard period for gating any
particular action.

> Agreed.  I thought there would be more discussion, but it seems
> nobody really objects to changing this.
> 
> The other thing that was discussed in the security thread was
> modifying PQescapeStringInternal and PQescapeInternal to produce
> no more than one complaint about invalid multibyte characters,
> on the grounds that input that's just plain in some other encoding
> would otherwise produce a ton of repetitive messages.  That seems
> trivial enough to mechanize with a bool already_complained flag,
> so I think we should incorporate that refinement while we're here.

+1



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Decision by Monday: PQescapeString() vs. encoding violation
Next
From: Tom Lane
Date:
Subject: Re: Decision by Monday: PQescapeString() vs. encoding violation