Re: [PATCH] json_lex_string: don't overread on bad UTF8 - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [PATCH] json_lex_string: don't overread on bad UTF8
Date
Msg-id ZjxQnOD1OoCkEeMN@paquier.xyz
Whole thread Raw
In response to Re: [PATCH] json_lex_string: don't overread on bad UTF8  (Jacob Champion <jacob.champion@enterprisedb.com>)
Responses Re: [PATCH] json_lex_string: don't overread on bad UTF8
List pgsql-hackers
On Wed, May 08, 2024 at 07:01:08AM -0700, Jacob Champion wrote:
> On Tue, May 7, 2024 at 10:31 PM Michael Paquier <michael@paquier.xyz> wrote:
>> But looking closer, I can see that in the JSON_INVALID_TOKEN case,
>> when !tok_done, we set token_terminator to point to the end of the
>> token, and that would include an incomplete byte sequence like in your
>> case.  :/
>
> Ah, I see what you're saying. Yeah, that approach would need some more
> invasive changes.

My first feeling was actually to do that, and report the location in
the input string where we are seeing issues.  All code paths playing
with token_terminator would need to track that.

> Agreed. Fortunately (or unfortunately?) I think the JSON
> client-encoding work is now a prerequisite for OAuth in libpq, so
> hopefully some improvements can fall out of that work too.

I'm afraid so.  I don't quite see how this would be OK to tweak on
stable branches, but all areas that could report error states with
partial byte sequence contents would benefit from such a change.

>> Thoughts and/or objections?
>
> None here.

This is a bit mitigated by the fact that d6607016c738 is recent, but
this is incorrect since v13 so backpatched down to that.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Paul Jungwirth
Date:
Subject: Re: SQL:2011 application time
Next
From: David Rowley
Date:
Subject: Re: First draft of PG 17 release notes