Home > mailing lists

Re: [PATCH] json_lex_string: don't overread on bad UTF8 - Mailing list pgsql-hackers

From	Jacob Champion
Subject	Re: [PATCH] json_lex_string: don't overread on bad UTF8
Date	May 2 02:22:24
Msg-id	CAOYmi+=AQKWhcU1nuW8pcMdby6UV6Sd-zugqv4v4a+NUF3FPBg@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] json_lex_string: don't overread on bad UTF8 (Michael Paquier <michael@paquier.xyz>)
Responses	Re: [PATCH] json_lex_string: don't overread on bad UTF8
List	pgsql-hackers

Tree view

On Tue, Apr 30, 2024 at 11:09 PM Michael Paquier <michael@paquier.xyz> wrote:
> Not sure to like much the fact that this advances token_terminator
> first.  Wouldn't it be better to calculate pg_encoding_mblen() first,
> then save token_terminator?  I feel a bit uneasy about saving a value
> in token_terminator past the end of the string.  That a nit in this
> context, still..

v2 tries it that way; see what you think. Is the concern that someone
might add code later that escapes that macro early?

> Hmm.  Keeping it around as currently designed means that it could
> cause more harm than anything in the long term, even in the stable
> branches if new code uses it.  There is a risk of seeing this new code
> incorrectly using it again, even if its top comment tells that we rely
> on the string being nul-terminated.  A safer alternative would be to
> redesign it so as the length of the string is provided in input,
> removing the dependency of strlen in its internals, perhaps.  Anyway,
> without any callers of it, I'd be tempted to wipe it from HEAD and
> call it a day.

Removed in v2.

> > The new test needs to record two versions of the error message, one
> > for invalid token and one for invalid escape sequence. This is
> > because, for smaller chunk sizes, the partial-token logic in the
> > incremental JSON parser skips the affected code entirely when it can't
> > find an ending double-quote.
>
> Ah, that makes sense.  That looks OK here.  A comment around the test
> would be adapted to document that, I guess.

Done.

> Advancing the tracking pointer 's' before reporting an error related
> the end of the string is a bad practive, IMO, and we should avoid
> that.  json_lex_string() does not offer a warm feeling regarding that
> with escape characters, at least :/

Yeah... I think some expansion of the json_errdetail test coverage is
probably in my future. :)

Thanks,
--Jacob

Attachment

v2-0001-json_lex_string-don-t-overread-on-bad-UTF8.patch

pgsql-hackers by date:

From: Noah Misch
Date: 02 May, 02:12:14
Subject: Re: Weird test mixup

From: David Rowley
Date: 02 May, 05:02:46
Subject: Re: New GUC autovacuum_max_threshold ?

Re: [PATCH] json_lex_string: don't overread on bad UTF8 - Mailing list pgsql-hackers

Attachment

Previous

Next