Re: pg_parse_json() should not leak token copies on failure - Mailing list pgsql-hackers

From Jacob Champion
Subject Re: pg_parse_json() should not leak token copies on failure
Date
Msg-id CAOYmi+=toTcJ7BS+=74-7=2GARU4sPW_JRxZzp=ur=i9sV2jUQ@mail.gmail.com
Whole thread Raw
In response to Re: pg_parse_json() should not leak token copies on failure  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: pg_parse_json() should not leak token copies on failure
List pgsql-hackers
On Mon, Jun 3, 2024 at 9:32 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Hi,

Thanks for the review!

> I understand that the part enclosed in parentheses refers to the "if
> (ptr) pfree(ptr)" structure. I believe we rely on the behavior of
> free(NULL), which safely does nothing. (I couldn't find the related
> discussion due to a timeout error on the ML search page.)

For the frontend, right. For the backend, pfree(NULL) causes a crash.
I think [1] is a related discussion on the list, maybe the one you
were looking for?

> Although I don't fully understand the entire parser code, it seems
> that the owner transfer only occurs in JSON_TOKEN_STRING cases. That
> is, the memory pointed by scalar_val would become dangling in
> JSON_TOKEN_NUBMER cases.

It should happen for both cases, I think. I see that the
JSON_SEM_SCALAR_CALL case passes scalar_val into the callback
regardless of whether we're parsing a string or a number, and
parse_scalar() does the same thing over in the recursive
implementation. Have I missed a code path?

> Even if this is not the case, the ownership
> transition apperas quite callenging to follow.

I agree!

> It might be safer or clearer to pstrdup the token in jsonb_in_scalar()
> and avoid NULLifying scalar_val after calling callbacks, or to let
> jsonb_in_sclar() NULLify the pointer.

Maybe. jsonb_in_scalar isn't the only scalar callback implementation,
though. It might be a fairly cruel change for dependent extensions,
since there will be no compiler complaint.

If a compatibility break is acceptable, maybe we should make the API
zero-copy for the recursive case, and just pass the length of the
parsed token? Either way, we'd have to change the object field
callbacks, too, but maybe an API change makes even more sense there,
given how convoluted it is right now.

Thanks,
--Jacob

[1] https://www.postgresql.org/message-id/flat/cf26e970-8e92-59f1-247a-aa265235075b%40enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: meson "experimental"?
Next
From: Bruce Momjian
Date:
Subject: Re: Partial aggregates pushdown