Re: Is it possible to index "deep" into a JSONB column? - Mailing list pgsql-general
From | Shaheed Haque |
---|---|
Subject | Re: Is it possible to index "deep" into a JSONB column? |
Date | |
Msg-id | CAHAc2jewEk8Zm572o_0Qm9-uETfZNLKJOxCooTiRwowgoQXMoA@mail.gmail.com Whole thread Raw |
In response to | Re: Is it possible to index "deep" into a JSONB column? (Shaheed Haque <shaheedhaque@gmail.com>) |
List | pgsql-general |
I forgot to say... I see the documentation on jsonpath indexing says: "GIN index extracts statements of following form out of jsonpath: accessors_chain = const. Accessors chain may consist of .key, [*], and [index] accessors. jsonb_ops additionally supports .* and .** accessors" But I'm unable to make "CREATE INDEX...USING gin ((snapshot -> '$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true || @.state.employment[last][2] == 0)'))" trigger that. On Tue, 31 May 2022 at 10:16, Shaheed Haque <shaheedhaque@gmail.com> wrote: > > OK, I was able to translate your excellent note into this: > > CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot -> > '$.employee.*.works_id')); > > and query using "@>" to see it in use: > > SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')... > > EXPLAIN ANALYSE... > -> Bitmap Index Scan on foo (cost=0.00..8.22 rows=29 > width=0) (actual time=0.005..0.005 rows=0 loops=1) > Index Cond: ((snapshot -> '$.employee.*.works_id'::text) > @> '1091'::jsonb) > > Unfortunately, my real query involved multiple AND clauses which (as > per another recent thread) seems to end up fetching the large > (probably TOASTed) JSONB once for each AND clause at a cost of 150ms > each. So, I got rid of the multiple ANDs by staying inside the > jsonpath like this: > > SELECT ... WHERE (... AND > (snapshot @? '$.employee."2209" ? ( > @.pay_graph <> 0 || @.last_run_of_employment == true || > @.state.employment[last][2] == 0)' > ) > ); > > But I have not been able to find an index formulation the new jsonpath > can use. I tried adding > > CREATE INDEX ... USING gin ((snapshot -> '$.employee.*')); > CREATE INDEX ... USING gin ((snapshot -> '$.employee')); > CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot -> > '$.employee.*.pay_graph')); > > Any thoughts on indexing for this case? If it makes any difference, > I'm presently on PG12 and PG13 but looking to move to PG14. > > Thanks, Shaheed > > On Mon, 30 May 2022 at 19:59, Shaheed Haque <shaheedhaque@gmail.com> wrote: > > > > Imre, > > > > I'm gradually working my way into the combination of SQL, JSONB and > > jsonpath that this involves even without the indexing, but this looks > > very helpful/promising, especially on the indexing. Thanks a lot for > > the input, > > > > Shaheed > > > > On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote: > > > > > > > In principle, I believe this allows index-assisted access to keys and > > > > values nested in arrays and inner objects but in practice, it seems > > > > the planner "often" decides to ignore the index in favour of a table scan. > > > > > > part II. index usage ; see the "Bitmap Index Scan on jpqarr_idx" > > > > > > SET enable_seqscan = OFF; > > > select * from jsonb_table > > > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT); > > > ; > > > +----+---------------------------------------------------------------------------------------------------------------+ > > > | id | jsonb_col | > > > +----+---------------------------------------------------------------------------------------------------------------+ > > > | 2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}}| > > > +----+---------------------------------------------------------------------------------------------------------------+ > > > (1 row) > > > > > > > > > EXPLAIN ANALYZE select * from jsonb_table > > > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT); > > > ; > > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > > > | QUERY PLAN | > > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > > > | Bitmap Heap Scan on jsonb_table (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1) | > > > | Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false)@> to_jsonb('1971-02-02'::text)) | > > > | Heap Blocks: exact=1 | > > > | -> Bitmap Index Scan on jpqarr_idx (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1) | > > > | Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false)@> to_jsonb('1971-02-02'::text)) | > > > | Planning Time: 0.255 ms | > > > | Execution Time: 0.122 ms | > > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > > > (7 rows) > > > > > > regards, > > > Imre > > > > > > > > > Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30): > > >> > > >> Hi Shaheed, > > >> > > >> > WHAT GOES HERE > > >> > > >> imho check the: jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) > > >> > > >> may example: > > >> > > >> CREATE TABLE jsonb_table ( > > >> id serial primary key, > > >> jsonb_col JSONB > > >> ); > > >> > > >> INSERT INTO jsonb_table(jsonb_col) > > >> VALUES > > >> ('{"stuff": {},"employee": {"8011": {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'), > > >> ('{"stuff": {},"employee": {"7011": {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'), > > >> ('{"stuff": {},"employee": {"a12": {"date_of_birth": "2000-01-01"},"b56": {"date_of_birth": "2000-02-02"}}}') > > >> ; > > >> > > >> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) from jsonb_table; > > >> > > >> -- create index > > >> create index jpqarr_idx > > >> on jsonb_table > > >> using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) ); > > >> -- tests: > > >> select id from jsonb_table > > >> where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT); > > >> ; > > >> DROP TABLE > > >> CREATE TABLE > > >> INSERT 0 3 > > >> +------------------------------+ > > >> | jsonb_path_query_array | > > >> +------------------------------+ > > >> | ["1980-01-01", "1982-02-02"] | > > >> | ["1970-01-01", "1971-02-02"] | > > >> | ["2000-01-01", "2000-02-02"] | > > >> +------------------------------+ > > >> (3 rows) > > >> > > >> CREATE INDEX > > >> +----+ > > >> | id | > > >> +----+ > > >> | 3 | > > >> +----+ > > >> (1 row) > > >> > > >> > > >> Regards, > > >> Imre > > >> > > >> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53): > > >>> > > >>> Suppose I have a JSONB field called "snapshot". I can create a GIN > > >>> index on it like this: > > >>> > > >>> create index idx1 on mytable using gin (snapshot); > > >>> > > >>> In principle, I believe this allows index-assisted access to keys and > > >>> values nested in arrays and inner objects but in practice, it seems > > >>> the planner "often" decides to ignore the index in favour of a table > > >>> scan. (As discussed elsewhere, this is influenced by the number of > > >>> rows, and possibly other criteria too). > > >>> > > >>> Now, I know it is possible to index inner objects, so that is snapshot > > >>> looks like this: > > >>> > > >>> { > > >>> "stuff": {}, > > >>> "more other stuff": {}, > > >>> "employee": { > > >>> "1234": {"date_of_birth": "1970-01-01"}, > > >>> "56B789": {"date_of_birth": "1971-02-02"}, > > >>> } > > >>> } > > >>> > > >>> I can say: > > >>> > > >>> create index idx2 on mytable using gin ((snapshot -> 'employee')); > > >>> > > >>> But what is the syntax to index only on date_of_birth? I assume a > > >>> btree would work since it is a primitive value, but WHAT GOES HERE in > > >>> this: > > >>> > > >>> create index idx3 on mytable using btree ((snapshot ->'employee' -> > > >>> WHAT GOES HERE -> 'date_of_birth')); > > >>> > > >>> I believe an asterisk "*" would work if 'employee' was an array, but > > >>> here it is nested object with keys. If it helps, the keys are > > >>> invariably numbers (in quoted string form, as per JSON). > > >>> > > >>> Thanks, Shaheed > > >>> > > >>>
pgsql-general by date: