Re: jsonpath - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: jsonpath
Date
Msg-id CAPpHfdtYvffCJfChnFMcOOP6O0RgJVkXy1-_1=teFc8aRJUY7g@mail.gmail.com
Whole thread Raw
In response to Re: jsonpath  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: jsonpath  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List pgsql-hackers
On Sat, Jan 19, 2019 at 1:32 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 1/18/19 6:58 PM, Alexander Korotkov wrote:
> > So, it looks like we can just remove then.  But I think we're very
> > likely can commit jsonpath patch to PostgreSQL 12, but I'm not sure
> > about other patches.  So, having those functions exposed to user can
> > be extremely useful until we have separate nodes for SQL/JSON.  So, I
> > vote to document these functions.
>
> That seems a bit strange. If those functions are meant to be used by
> other patches (which ones?) then why should not we make them part of
> those future patches?

No, these functions aren't used by other patches (it was my original
wrong idea).  Other patches provides SQL expressions making these
functions not that necessary.  But I think we should keep these
functions in jsonpath patch.

> But it seems more like those functions are actually meant to be used by
> users in the first place, in cases when we need to provide a third
> parameter (which operators can't do). In that case yes - we should have
> them documented properly, but also tested. Which is not the case
> currently, because the regression tests only use the operators.

+1
Thank you for noticing.  Tests should be provided as well.

> Two more comments:
>
> 1) I'm wondering why we even need separate functions for the different
> numbers of arguments at the C level, as both simply call to the same
> function anyway with a PG_NARGS() condition in it. Can't we ditch those
> wrappers and reference that target function directly?

That was surprising for me too.  Technically, it's OK to do this.  And
we do this for extensions.  But for in-core functions we have
following sanity check.

-- Considering only built-in procs (prolang = 12), look for multiple uses
-- of the same internal function (ie, matching prosrc fields).  It's OK to
-- have several entries with different pronames for the same internal function,
-- but conflicts in the number of arguments and other critical items should
-- be complained of.  (We don't check data types here; see next query.)
-- Note: ignore aggregate functions here, since they all point to the same
-- dummy built-in function.

SELECT p1.oid, p1.proname, p2.oid, p2.proname
FROM pg_proc AS p1, pg_proc AS p2
WHERE p1.oid < p2.oid AND
    p1.prosrc = p2.prosrc AND
    p1.prolang = 12 AND p2.prolang = 12 AND
    (p1.prokind != 'a' OR p2.prokind != 'a') AND
    (p1.prolang != p2.prolang OR
     p1.prokind != p2.prokind OR
     p1.prosecdef != p2.prosecdef OR
     p1.proleakproof != p2.proleakproof OR
     p1.proisstrict != p2.proisstrict OR
     p1.proretset != p2.proretset OR
     p1.provolatile != p2.provolatile OR
     p1.pronargs != p2.pronargs);

And we already have some code written especially to make this check happy.

/* This is separate to keep the opr_sanity regression test from complaining */
Datum
regexp_split_to_table_no_flags(PG_FUNCTION_ARGS)
{
    return regexp_split_to_table(fcinfo);
}

Personally I'm not fan of this approach, and I would rather relax this
sanity check.  But that doesn't seem to be a subject of this patch.
For jsonpath, it's OK to just keep this tradition.

> 2) I once again ran into the jsonb->json business, which essentially
> means that when you do this:
>
>     select json '{"a": { "b" : 10}}' @? '$.a.b';
>
> it ends up calling jsonb_jsonpath_exists(), which then does this:
>
>     Jsonb *jb = PG_GETARG_JSONB_P(0);
>
> I and am not sure why/how it seems to work, but I find it confusing
> because the stack still looks like this:
>
> #0  jsonb_jsonpath_exists (fcinfo=0x162f800) at jsonpath_exec.c:2857
> #1  0x000000000096d721 in json_jsonpath_exists2 (fcinfo=0x162f800) at
> jsonpath_exec.c:2882
> #2  0x00000000006c790a in ExecInterpExpr (state=0x162f300,
> econtext=0x162ee18, isnull=0x7ffcea4c3857) at execExprInterp.c:648
> ...
>
> and of course, the fcinfo->flinfo still points to the json-specific
> function, which say the first parameter type is 'json'.
>
> (gdb) print *fcinfo->flinfo
> $23 = {fn_addr = 0x96d709 <json_jsonpath_exists2>, fn_oid = 6043,
> fn_nargs = 2, fn_strict = true, fn_retset = false, fn_stats = 2 '\002',
> fn_extra = 0x0, fn_mcxt = 0x162e990, fn_expr = 0x15db378}
>
>
> test=# select proname, prosrc, proargtypes from pg_proc where oid = 6043;
>      proname     |        prosrc         | proargtypes
> -----------------+-----------------------+-------------
>  jsonpath_exists | json_jsonpath_exists2 | 114 6050
> (1 row)
>
> test=# select typname from pg_type where oid = 114;
>  typname
> ---------
>  json
> (1 row)

It's tricky.  There is jsonpath_json.c, which includes jsonpath_exec.c
with set of macro definitions making that code work with json type.
I'm going to refactor that.  But general approach to include same
functions more than once seems OK for me.

Some more notes:

1) It seems that @* and @# are not going to be supported by any
indexes.  I think we should remove these operators and let users use
functions instead.
2) I propose to rename @~ operator to @@.  We already use @@ as
"satisfies" in multiple places, and I thinks this case fits too.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Next
From: Michael Paquier
Date:
Subject: Re: pgsql: Remove references to Majordomo