Home > mailing lists

Re: [PATCH] json_lex_string: don't overread on bad UTF8 - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: [PATCH] json_lex_string: don't overread on bad UTF8
Date	May 8 08:30:43
Msg-id	ZjsOA6R0q-BXcDc1@paquier.xyz Whole thread Raw
In response to	Re: [PATCH] json_lex_string: don't overread on bad UTF8 (Jacob Champion <jacob.champion@enterprisedb.com>)
Responses	Re: [PATCH] json_lex_string: don't overread on bad UTF8
List	pgsql-hackers

Tree view

On Tue, May 07, 2024 at 02:06:10PM -0700, Jacob Champion wrote:
> Maybe I've misunderstood, but isn't that what's being done in v2?

Something a bit different..  I was wondering if it could be possible
to tweak this code to truncate the data in the generated error string
so as the incomplete multi-byte sequence is entirely cut out, which
would come to setting token_terminator to "s" (last byte before the
incomplete byte sequence) rather than "term" (last byte available,
even if incomplete):
#define FAIL_AT_CHAR_END(code) \
do { \
   char       *term = s + pg_encoding_mblen(lex->input_encoding, s); \
   lex->token_terminator = (term <= end) ? term : s; \
   return code; \
} while (0)

But looking closer, I can see that in the JSON_INVALID_TOKEN case,
when !tok_done, we set token_terminator to point to the end of the
token, and that would include an incomplete byte sequence like in your
case.  :/

At the end of the day, I think that I'm OK with your patch and avoid
the overread for now in the back-branches.  This situation makes me
uncomfortable and we should put more effort in printing error messages
in a readable format, but that could always be tackled later as a
separate problem..  And I don't see something backpatchable at short
sight for v16.

Thoughts and/or objections?
--
Michael

Attachment

signature.asc

pgsql-hackers by date:

From: Michael Paquier
Date: 08 May, 07:24:23
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica

From: Andreas Karlsson
Date: 08 May, 10:03:01
Subject: Re: Fix for recursive plpython triggers

Re: [PATCH] json_lex_string: don't overread on bad UTF8 - Mailing list pgsql-hackers

Attachment

Previous

Next