Re: Is it possible to index "deep" into a JSONB column? - Mailing list pgsql-general

From Shaheed Haque
Subject Re: Is it possible to index "deep" into a JSONB column?
Date
Msg-id CAHAc2jdnrMe0N6hTjM2tzWR9C2m4MiqpdqJ0P36Bjm-6WCAxtA@mail.gmail.com
Whole thread Raw
In response to Re: Is it possible to index "deep" into a JSONB column?  (Imre Samu <pella.samu@gmail.com>)
Responses Re: Is it possible to index "deep" into a JSONB column?
List pgsql-general
Imre,

I'm gradually working my way into the combination of SQL, JSONB and
jsonpath that this involves even without the indexing, but this looks
very helpful/promising, especially on the indexing. Thanks a lot for
the input,

Shaheed

On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote:
>
> > In principle, I believe this allows index-assisted access to keys and
> > values nested in arrays and inner objects but in practice, it seems
> > the planner "often" decides to ignore the index in favour of a table scan.
>
> part II.  index usage ;  see  the "Bitmap Index Scan on jpqarr_idx"
>
> SET enable_seqscan = OFF;
> select * from jsonb_table
>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> ;
>
+----+---------------------------------------------------------------------------------------------------------------+
> | id |                                                   jsonb_col
|
>
+----+---------------------------------------------------------------------------------------------------------------+
> |  2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}}
|
>
+----+---------------------------------------------------------------------------------------------------------------+
> (1 row)
>
>
> EXPLAIN ANALYZE select * from jsonb_table
>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> ;
>
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> |                                                                       QUERY PLAN
                                   | 
>
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Bitmap Heap Scan on jsonb_table  (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1)
                                   | 
> |   Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false)
@>to_jsonb('1971-02-02'::text))     | 
> |   Heap Blocks: exact=1
                                   | 
> |   ->  Bitmap Index Scan on jpqarr_idx  (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1)
                                   | 
> |         Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb,
false)@> to_jsonb('1971-02-02'::text)) | 
> | Planning Time: 0.255 ms
                                   | 
> | Execution Time: 0.122 ms
                                   | 
>
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
> (7 rows)
>
> regards,
>   Imre
>
>
> Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30):
>>
>> Hi Shaheed,
>>
>> > WHAT GOES HERE
>>
>> imho check the:   jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )
>>
>> may example:
>>
>> CREATE TABLE jsonb_table (
>>     id serial primary key,
>>     jsonb_col JSONB
>> );
>>
>> INSERT INTO jsonb_table(jsonb_col)
>> VALUES
>>   ('{"stuff": {},"employee": {"8011":  {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'),
>>   ('{"stuff": {},"employee": {"7011":  {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'),
>>   ('{"stuff": {},"employee": {"a12":   {"date_of_birth": "2000-01-01"},"b56":  {"date_of_birth": "2000-02-02"}}}')
>> ;
>>
>> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )  from jsonb_table;
>>
>> -- create index
>> create index jpqarr_idx
>>   on jsonb_table
>>   using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
>> -- tests:
>> select id from jsonb_table
>>   where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
>> ;
>> DROP TABLE
>> CREATE TABLE
>> INSERT 0 3
>> +------------------------------+
>> |    jsonb_path_query_array    |
>> +------------------------------+
>> | ["1980-01-01", "1982-02-02"] |
>> | ["1970-01-01", "1971-02-02"] |
>> | ["2000-01-01", "2000-02-02"] |
>> +------------------------------+
>> (3 rows)
>>
>> CREATE INDEX
>> +----+
>> | id |
>> +----+
>> |  3 |
>> +----+
>> (1 row)
>>
>>
>> Regards,
>>  Imre
>>
>> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
>>>
>>> Suppose I have a JSONB field called "snapshot". I can create a GIN
>>> index on it like this:
>>>
>>>   create index idx1 on mytable using gin (snapshot);
>>>
>>> In principle, I believe this allows index-assisted access to keys and
>>> values nested in arrays and inner objects but in practice, it seems
>>> the planner "often" decides to ignore the index in favour of a table
>>> scan. (As discussed elsewhere, this is influenced by the number of
>>> rows, and possibly other criteria too).
>>>
>>> Now, I know it is possible to index inner objects, so that is snapshot
>>> looks like this:
>>>
>>> {
>>>     "stuff": {},
>>>     "more other stuff": {},
>>>     "employee": {
>>>          "1234": {"date_of_birth": "1970-01-01"},
>>>          "56B789": {"date_of_birth": "1971-02-02"},
>>>     }
>>> }
>>>
>>> I can say:
>>>
>>>   create index idx2 on mytable using gin ((snapshot -> 'employee'));
>>>
>>> But what is the syntax to index only on date_of_birth? I assume a
>>> btree would work since it is a primitive value, but WHAT GOES HERE in
>>> this:
>>>
>>>   create index idx3 on mytable using btree ((snapshot ->'employee' ->
>>> WHAT GOES HERE -> 'date_of_birth'));
>>>
>>> I believe an asterisk "*" would work if 'employee' was an array, but
>>> here it is  nested object with keys. If it helps, the keys are
>>> invariably numbers (in quoted string form, as per JSON).
>>>
>>> Thanks, Shaheed
>>>
>>>



pgsql-general by date:

Previous
From: andrew cooke
Date:
Subject: Appending data locally to a logical replication subscriber
Next
From: Jeff Janes
Date:
Subject: Re: JSONB index not in use, but is TOAST the real cause of slow query?