On Sat, Feb 15, 2025 at 01:23:51PM -0500, Andres Freund wrote:
> On 2025-02-15 13:08:03 -0500, Tom Lane wrote:
> > I studied the v3 patch a little and realized that it only fixes the
> > behavior for the case of a complete-but-invalid multibyte character.
> > If we have an incomplete character at the end of the string, the
> > whole thing still gets dropped. Surely that's not what we want if
> > we are going to adopt this behavior.
Thanks for doing that.
> Good catch! The different behaviour made sense when we were dropping the
> entire multi-byte character, but not anymore.
>
>
> > Here's a v4 that fixes that. As a bonus, we can get rid of
> > duplicative coding since the "incomplete" and "invalid" cases
> > are now treated identically.
>
> > One minor point is that the error messages from PQescapeStringInternal
> > no longer distinguish "incomplete" from "invalid". I don't find that
> > to be a terribly useful distinction, so that's fine with me. But if
> > someone feels that's important to preserve, we could make it do
> > something like
>
> I think there's some value in the distinction, because incomplete characters
> can happen a lot more easily due to truncating longer strings. I don't know if
> it's worth the code complexity though.
While I agree it doesn't feel like a terribly-important distinction, I'd
definitely keep that distinction today rather than drop it with so little
notice. (I'd feel fine about dropping it in a major release. I'd probably
not drop it in any minor release, but especially not this close to wrap.)
On Sat, Feb 15, 2025 at 01:35:10PM -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It seems that nobody is arguing against the "just skip one byte" behaviour, so
> > I'm inclined to push this fairly soon, even if Noah's "24 hours" haven't quite
> > elapsed. A few more cycles in the buildfarm wouldn't hurt.
Works for me; we can always revert in the event of further discussion. For
what it's worth, I didn't intend this 24hr as a hard period for gating any
particular action.
> Agreed. I thought there would be more discussion, but it seems
> nobody really objects to changing this.
>
> The other thing that was discussed in the security thread was
> modifying PQescapeStringInternal and PQescapeInternal to produce
> no more than one complaint about invalid multibyte characters,
> on the grounds that input that's just plain in some other encoding
> would otherwise produce a ton of repetitive messages. That seems
> trivial enough to mechanize with a bool already_complained flag,
> so I think we should incorporate that refinement while we're here.
+1