Re: Is it possible to index "deep" into a JSONB column? - Mailing list pgsql-general

From Shaheed Haque
Subject Re: Is it possible to index "deep" into a JSONB column?
Date
Msg-id CAHAc2jewEk8Zm572o_0Qm9-uETfZNLKJOxCooTiRwowgoQXMoA@mail.gmail.com
Whole thread Raw
In response to Re: Is it possible to index "deep" into a JSONB column?  (Shaheed Haque <shaheedhaque@gmail.com>)
List pgsql-general
I forgot to say...

I see the documentation on jsonpath indexing says:

"GIN index extracts statements of following form out of jsonpath:
accessors_chain = const. Accessors chain may consist of .key, [*], and
[index] accessors. jsonb_ops additionally supports .* and .**
accessors"

But I'm unable to make "CREATE INDEX...USING gin ((snapshot ->
'$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true
|| @.state.employment[last][2] == 0)'))" trigger that.

On Tue, 31 May 2022 at 10:16, Shaheed Haque <shaheedhaque@gmail.com> wrote:
>
> OK, I was able to translate your excellent note into this:
>
>     CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot ->
> '$.employee.*.works_id'));
>
> and query using "@>" to see it in use:
>
>    SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')...
>
>    EXPLAIN ANALYSE...
>         ->  Bitmap Index Scan on foo  (cost=0.00..8.22 rows=29
> width=0) (actual time=0.005..0.005 rows=0 loops=1)
>               Index Cond: ((snapshot -> '$.employee.*.works_id'::text)
> @> '1091'::jsonb)
>
> Unfortunately, my real query involved multiple AND clauses which (as
> per another recent thread) seems to end up fetching the large
> (probably TOASTed) JSONB once for each AND clause at a cost of 150ms
> each. So, I got rid of the multiple ANDs by staying inside the
> jsonpath like this:
>
>     SELECT ... WHERE (... AND
>         (snapshot @? '$.employee."2209" ? (
>             @.pay_graph <> 0 || @.last_run_of_employment == true ||
> @.state.employment[last][2] == 0)'
>         )
>     );
>
> But I have not been able to find an index formulation the new jsonpath
> can use. I tried adding
>
>     CREATE INDEX ... USING gin ((snapshot -> '$.employee.*'));
>     CREATE INDEX ... USING gin ((snapshot -> '$.employee'));
>     CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot ->
> '$.employee.*.pay_graph'));
>
> Any thoughts on indexing for this case? If it makes any difference,
> I'm presently on PG12 and PG13 but looking to move to PG14.
>
> Thanks, Shaheed
>
> On Mon, 30 May 2022 at 19:59, Shaheed Haque <shaheedhaque@gmail.com> wrote:
> >
> > Imre,
> >
> > I'm gradually working my way into the combination of SQL, JSONB and
> > jsonpath that this involves even without the indexing, but this looks
> > very helpful/promising, especially on the indexing. Thanks a lot for
> > the input,
> >
> > Shaheed
> >
> > On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote:
> > >
> > > > In principle, I believe this allows index-assisted access to keys and
> > > > values nested in arrays and inner objects but in practice, it seems
> > > > the planner "often" decides to ignore the index in favour of a table scan.
> > >
> > > part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"
> > >
> > > SET enable_seqscan = OFF;
> > > select * from jsonb_table
> > >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > > ;
> > >
+----+---------------------------------------------------------------------------------------------------------------+
> > > | id |                                                   jsonb_col
  | 
> > >
+----+---------------------------------------------------------------------------------------------------------------+
> > > |  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth":
"1971-02-02"}}}| 
> > >
+----+---------------------------------------------------------------------------------------------------------------+
> > > (1 row)
> > >
> > >
> > > EXPLAIN ANALYZE select * from jsonb_table
> > >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > > ;
> > >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > > |                                                                       QUERY PLAN
                                       | 
> > >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > > | Bitmap Heap Scan on jsonb_table  (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1)
                                       | 
> > > |   Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text))     | 
> > > |   Heap Blocks: exact=1
                                       | 
> > > |   ->  Bitmap Index Scan on jpqarr_idx  (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1
loops=1)                                      | 
> > > |         Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text)) | 
> > > | Planning Time: 0.255 ms
                                       | 
> > > | Execution Time: 0.122 ms
                                       | 
> > >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > > (7 rows)
> > >
> > > regards,
> > >   Imre
> > >
> > >
> > > Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30):
> > >>
> > >> Hi Shaheed,
> > >>
> > >> > WHAT GOES HERE
> > >>
> > >> imho check the:   jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )
> > >>
> > >> may example:
> > >>
> > >> CREATE TABLE jsonb_table (
> > >>     id serial primary key,
> > >>     jsonb_col JSONB
> > >> );
> > >>
> > >> INSERT INTO jsonb_table(jsonb_col)
> > >> VALUES
> > >>   ('{"stuff": {},"employee": {"8011":  {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth":
"1982-02-02"}}}'),
> > >>   ('{"stuff": {},"employee": {"7011":  {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth":
"1971-02-02"}}}'),
> > >>   ('{"stuff": {},"employee": {"a12":   {"date_of_birth": "2000-01-01"},"b56":  {"date_of_birth":
"2000-02-02"}}}')
> > >> ;
> > >>
> > >> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )  from jsonb_table;
> > >>
> > >> -- create index
> > >> create index jpqarr_idx
> > >>   on jsonb_table
> > >>   using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
> > >> -- tests:
> > >> select id from jsonb_table
> > >>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
> > >> ;
> > >> DROP TABLE
> > >> CREATE TABLE
> > >> INSERT 0 3
> > >> +------------------------------+
> > >> |    jsonb_path_query_array    |
> > >> +------------------------------+
> > >> | ["1980-01-01", "1982-02-02"] |
> > >> | ["1970-01-01", "1971-02-02"] |
> > >> | ["2000-01-01", "2000-02-02"] |
> > >> +------------------------------+
> > >> (3 rows)
> > >>
> > >> CREATE INDEX
> > >> +----+
> > >> | id |
> > >> +----+
> > >> |  3 |
> > >> +----+
> > >> (1 row)
> > >>
> > >>
> > >> Regards,
> > >>  Imre
> > >>
> > >> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
> > >>>
> > >>> Suppose I have a JSONB field called "snapshot". I can create a GIN
> > >>> index on it like this:
> > >>>
> > >>>   create index idx1 on mytable using gin (snapshot);
> > >>>
> > >>> In principle, I believe this allows index-assisted access to keys and
> > >>> values nested in arrays and inner objects but in practice, it seems
> > >>> the planner "often" decides to ignore the index in favour of a table
> > >>> scan. (As discussed elsewhere, this is influenced by the number of
> > >>> rows, and possibly other criteria too).
> > >>>
> > >>> Now, I know it is possible to index inner objects, so that is snapshot
> > >>> looks like this:
> > >>>
> > >>> {
> > >>>     "stuff": {},
> > >>>     "more other stuff": {},
> > >>>     "employee": {
> > >>>          "1234": {"date_of_birth": "1970-01-01"},
> > >>>          "56B789": {"date_of_birth": "1971-02-02"},
> > >>>     }
> > >>> }
> > >>>
> > >>> I can say:
> > >>>
> > >>>   create index idx2 on mytable using gin ((snapshot -> 'employee'));
> > >>>
> > >>> But what is the syntax to index only on date_of_birth? I assume a
> > >>> btree would work since it is a primitive value, but WHAT GOES HERE in
> > >>> this:
> > >>>
> > >>>   create index idx3 on mytable using btree ((snapshot ->'employee' ->
> > >>> WHAT GOES HERE -> 'date_of_birth'));
> > >>>
> > >>> I believe an asterisk "*" would work if 'employee' was an array, but
> > >>> here it is  nested object with keys. If it helps, the keys are
> > >>> invariably numbers (in quoted string form, as per JSON).
> > >>>
> > >>> Thanks, Shaheed
> > >>>
> > >>>



pgsql-general by date:

Previous
From: Shaheed Haque
Date:
Subject: Re: Is it possible to index "deep" into a JSONB column?
Next
From: Danny Shemesh
Date:
Subject: The use of partial, expressional indices in pg < 14