Re: Decision by Monday: PQescapeString() vs. encoding violation - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Decision by Monday: PQescapeString() vs. encoding violation
Date
Msg-id 7vmfuavitlhjgruidll67y5cd4s3m66m4x7c4yxezkxlo4dxvv@4haeja2xdn2q
Whole thread Raw
In response to Re: Decision by Monday: PQescapeString() vs. encoding violation  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Decision by Monday: PQescapeString() vs. encoding violation
List pgsql-hackers
Hi,

On 2025-02-15 13:08:03 -0500, Tom Lane wrote:
> I studied the v3 patch a little and realized that it only fixes the
> behavior for the case of a complete-but-invalid multibyte character.
> If we have an incomplete character at the end of the string, the
> whole thing still gets dropped.  Surely that's not what we want if
> we are going to adopt this behavior.

Good catch! The different behaviour made sense when we were dropping the
entire multi-byte character, but not anymore.


> Here's a v4 that fixes that.  As a bonus, we can get rid of
> duplicative coding since the "incomplete" and "invalid" cases
> are now treated identically.

> One minor point is that the error messages from PQescapeStringInternal
> no longer distinguish "incomplete" from "invalid".  I don't find that
> to be a terribly useful distinction, so that's fine with me.  But if
> someone feels that's important to preserve, we could make it do
> something like

I think there's some value in the distinction, because incomplete characters
can happen a lot more easily due to truncating longer strings. I don't know if
it's worth the code complexity though.


> From 935ccea6da82d0150f65672825c8f5e6f86bca89 Mon Sep 17 00:00:00 2001
> From: Tom Lane <tgl@sss.pgh.pa.us>
> Date: Sat, 15 Feb 2025 12:58:22 -0500
> Subject: [PATCH v4] Make escaping functions retain trailing bytes of an
>  invalid character.
> 
> Instead of dropping the trailing byte(s) of an invalid or incomplete
> multibyte character, replace only the first byte with a known-invalid
> sequence, and process the rest normally.  This seems less likely to
> confuse incautious callers than the behavior adopted in 5dc1e42b4.
> 
> Author: Andres Freund <andres@anarazel.de>
> Reviewed-by: Jeff Davis <pgsql@j-davis.com>

This should have been a Reported-by, not a Reviewed-by, my mistake...


It seems that nobody is arguing against the "just skip one byte" behaviour, so
I'm inclined to push this fairly soon, even if Noah's "24 hours" haven't quite
elapsed.  A few more cycles in the buildfarm wouldn't hurt.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Decision by Monday: PQescapeString() vs. encoding violation
Next
From: Tom Lane
Date:
Subject: Re: Decision by Monday: PQescapeString() vs. encoding violation