Re: Is it possible to index "deep" into a JSONB column? - Mailing list pgsql-general

From Shaheed Haque
Subject Re: Is it possible to index "deep" into a JSONB column?
Date
Msg-id CAHAc2jeMQe=uc2meF5f=RGbwPe10+Wz66fOoBxVu4B=Uy1noiQ@mail.gmail.com
Whole thread Raw
In response to Re: Is it possible to index "deep" into a JSONB column?  (Shaheed Haque <shaheedhaque@gmail.com>)
Responses Re: Is it possible to index "deep" into a JSONB column?
List pgsql-general
OK, I was able to translate your excellent note into this:

    CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot ->
'$.employee.*.works_id'));

and query using "@>" to see it in use:

   SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')...

   EXPLAIN ANALYSE...
        ->  Bitmap Index Scan on foo  (cost=0.00..8.22 rows=29
width=0) (actual time=0.005..0.005 rows=0 loops=1)
              Index Cond: ((snapshot -> '$.employee.*.works_id'::text)
@> '1091'::jsonb)

Unfortunately, my real query involved multiple AND clauses which (as
per another recent thread) seems to end up fetching the large
(probably TOASTed) JSONB once for each AND clause at a cost of 150ms
each. So, I got rid of the multiple ANDs by staying inside the
jsonpath like this:

    SELECT ... WHERE (... AND
        (snapshot @? '$.employee."2209" ? (
            @.pay_graph <> 0 || @.last_run_of_employment == true ||
@.state.employment[last][2] == 0)'
        )
    );

But I have not been able to find an index formulation the new jsonpath
can use. I tried adding

    CREATE INDEX ... USING gin ((snapshot -> '$.employee.*'));
    CREATE INDEX ... USING gin ((snapshot -> '$.employee'));
    CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot ->
'$.employee.*.pay_graph'));

Any thoughts on indexing for this case? If it makes any difference,
I'm presently on PG12 and PG13 but looking to move to PG14.

Thanks, Shaheed

On Mon, 30 May 2022 at 19:59, Shaheed Haque <shaheedhaque@gmail.com> wrote:
>
> Imre,
>
> I'm gradually working my way into the combination of SQL, JSONB and
> jsonpath that this involves even without the indexing, but this looks
> very helpful/promising, especially on the indexing. Thanks a lot for
> the input,
>
> Shaheed
>
> On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote:
> >
> > > In principle, I believe this allows index-assisted access to keys and
> > > values nested in arrays and inner objects but in practice, it seems
> > > the planner "often" decides to ignore the index in favour of a table scan.
> >
> > part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"
> >
> > SET enable_seqscan = OFF;
> > select * from jsonb_table
> >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > ;
> >
+----+---------------------------------------------------------------------------------------------------------------+
> > | id |                                                   jsonb_col
| 
> >
+----+---------------------------------------------------------------------------------------------------------------+
> > |  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth":
"1971-02-02"}}}| 
> >
+----+---------------------------------------------------------------------------------------------------------------+
> > (1 row)
> >
> >
> > EXPLAIN ANALYZE select * from jsonb_table
> >   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > ;
> >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > |                                                                       QUERY PLAN
                                     | 
> >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > | Bitmap Heap Scan on jsonb_table  (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1)
                                     | 
> > |   Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text))     | 
> > |   Heap Blocks: exact=1
                                     | 
> > |   ->  Bitmap Index Scan on jpqarr_idx  (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1)
                                     | 
> > |         Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text)) | 
> > | Planning Time: 0.255 ms
                                     | 
> > | Execution Time: 0.122 ms
                                     | 
> >
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > (7 rows)
> >
> > regards,
> >   Imre
> >
> >
> > Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30):
> >>
> >> Hi Shaheed,
> >>
> >> > WHAT GOES HERE
> >>
> >> imho check the:   jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )
> >>
> >> may example:
> >>
> >> CREATE TABLE jsonb_table (
> >>     id serial primary key,
> >>     jsonb_col JSONB
> >> );
> >>
> >> INSERT INTO jsonb_table(jsonb_col)
> >> VALUES
> >>   ('{"stuff": {},"employee": {"8011":  {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth":
"1982-02-02"}}}'),
> >>   ('{"stuff": {},"employee": {"7011":  {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth":
"1971-02-02"}}}'),
> >>   ('{"stuff": {},"employee": {"a12":   {"date_of_birth": "2000-01-01"},"b56":  {"date_of_birth": "2000-02-02"}}}')
> >> ;
> >>
> >> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )  from jsonb_table;
> >>
> >> -- create index
> >> create index jpqarr_idx
> >>   on jsonb_table
> >>   using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
> >> -- tests:
> >> select id from jsonb_table
> >>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
> >> ;
> >> DROP TABLE
> >> CREATE TABLE
> >> INSERT 0 3
> >> +------------------------------+
> >> |    jsonb_path_query_array    |
> >> +------------------------------+
> >> | ["1980-01-01", "1982-02-02"] |
> >> | ["1970-01-01", "1971-02-02"] |
> >> | ["2000-01-01", "2000-02-02"] |
> >> +------------------------------+
> >> (3 rows)
> >>
> >> CREATE INDEX
> >> +----+
> >> | id |
> >> +----+
> >> |  3 |
> >> +----+
> >> (1 row)
> >>
> >>
> >> Regards,
> >>  Imre
> >>
> >> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
> >>>
> >>> Suppose I have a JSONB field called "snapshot". I can create a GIN
> >>> index on it like this:
> >>>
> >>>   create index idx1 on mytable using gin (snapshot);
> >>>
> >>> In principle, I believe this allows index-assisted access to keys and
> >>> values nested in arrays and inner objects but in practice, it seems
> >>> the planner "often" decides to ignore the index in favour of a table
> >>> scan. (As discussed elsewhere, this is influenced by the number of
> >>> rows, and possibly other criteria too).
> >>>
> >>> Now, I know it is possible to index inner objects, so that is snapshot
> >>> looks like this:
> >>>
> >>> {
> >>>     "stuff": {},
> >>>     "more other stuff": {},
> >>>     "employee": {
> >>>          "1234": {"date_of_birth": "1970-01-01"},
> >>>          "56B789": {"date_of_birth": "1971-02-02"},
> >>>     }
> >>> }
> >>>
> >>> I can say:
> >>>
> >>>   create index idx2 on mytable using gin ((snapshot -> 'employee'));
> >>>
> >>> But what is the syntax to index only on date_of_birth? I assume a
> >>> btree would work since it is a primitive value, but WHAT GOES HERE in
> >>> this:
> >>>
> >>>   create index idx3 on mytable using btree ((snapshot ->'employee' ->
> >>> WHAT GOES HERE -> 'date_of_birth'));
> >>>
> >>> I believe an asterisk "*" would work if 'employee' was an array, but
> >>> here it is  nested object with keys. If it helps, the keys are
> >>> invariably numbers (in quoted string form, as per JSON).
> >>>
> >>> Thanks, Shaheed
> >>>
> >>>



pgsql-general by date:

Previous
From: Matthias Apitz
Date:
Subject: Re: existing row not found by SELECT ... WHERE CTID = ?
Next
From: Shaheed Haque
Date:
Subject: Re: Is it possible to index "deep" into a JSONB column?