Hi,
On 2025-02-15 13:08:03 -0500, Tom Lane wrote:
> I studied the v3 patch a little and realized that it only fixes the
> behavior for the case of a complete-but-invalid multibyte character.
> If we have an incomplete character at the end of the string, the
> whole thing still gets dropped. Surely that's not what we want if
> we are going to adopt this behavior.
Good catch! The different behaviour made sense when we were dropping the
entire multi-byte character, but not anymore.
> Here's a v4 that fixes that. As a bonus, we can get rid of
> duplicative coding since the "incomplete" and "invalid" cases
> are now treated identically.
> One minor point is that the error messages from PQescapeStringInternal
> no longer distinguish "incomplete" from "invalid". I don't find that
> to be a terribly useful distinction, so that's fine with me. But if
> someone feels that's important to preserve, we could make it do
> something like
I think there's some value in the distinction, because incomplete characters
can happen a lot more easily due to truncating longer strings. I don't know if
it's worth the code complexity though.
> From 935ccea6da82d0150f65672825c8f5e6f86bca89 Mon Sep 17 00:00:00 2001
> From: Tom Lane <tgl@sss.pgh.pa.us>
> Date: Sat, 15 Feb 2025 12:58:22 -0500
> Subject: [PATCH v4] Make escaping functions retain trailing bytes of an
> invalid character.
>
> Instead of dropping the trailing byte(s) of an invalid or incomplete
> multibyte character, replace only the first byte with a known-invalid
> sequence, and process the rest normally. This seems less likely to
> confuse incautious callers than the behavior adopted in 5dc1e42b4.
>
> Author: Andres Freund <andres@anarazel.de>
> Reviewed-by: Jeff Davis <pgsql@j-davis.com>
This should have been a Reported-by, not a Reviewed-by, my mistake...
It seems that nobody is arguing against the "just skip one byte" behaviour, so
I'm inclined to push this fairly soon, even if Noah's "24 hours" haven't quite
elapsed. A few more cycles in the buildfarm wouldn't hurt.
Greetings,
Andres Freund