Thread: JSONB index not in use, but is TOAST the real cause of slow query?
Hi, I have a database table with a modest number of rows (<1000) but where one column in the table is a JSONB "snapshot" which can be a few MB in size. Generally, this is a great fit for the read-write access patterns involved, but there is a read-query which is VERY slow. I've searched via Google and in this mailing list archive and found lots of material, but nothing that explains what is going on...or how to fix it. The table columns look like this: ...a primary key... ...various other keys including a couple of FKs... snapshot JSONB and before I did anything, the indices looked like this: "paiyroll_payrun_pkey" PRIMARY KEY, btree (process_ptr_id) "paiyroll_payrun_company_id_ce341888" btree (company_id) "paiyroll_payrun_schedule_id_1593f55f" btree (schedule_id) The data in one row's "snapshot" looks a bit like this: { "stuff": {}, "more other stuff": {}, "employee": { "1234": {}, "56789": {}, } } The query that is slow can be approximated like this: SELECT snapshot ->'stuff'->'item' FROM paiyroll_payrun WHERE snapshot ->'employee' ? '2209'; When I add this index: CREATE INDEX idx1 ON paiyroll_payrun USING gin ((snapshot ->'employee')); the analyser says this: ====================== foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM paiyroll_payrun WHERE snapshot ->'employee' ? '2209'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on paiyroll_payrun (cost=0.00..29.13 rows=9 width=32) (actual time=50.185..2520.983 rows=104 loops=1) Filter: ((snapshot -> 'employee'::text) ? '2209'::text) Rows Removed by Filter: 835 Planning Time: 0.075 ms Execution Time: 2521.004 ms (5 rows) ====================== So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to presume the INDEX is not used because of the number of rows? Is there a way to verify that? And how can I understand the dreadful amount of time (of course, this is just on my dev machine, but still...)? Is there a way to see/tweak what TOAST costs or indeed to confirm if it is even in use? Any help appreciated. Thanks, Shaheed
Shaheed Haque <shaheedhaque@gmail.com> writes: > ====================== > foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM > paiyroll_payrun WHERE snapshot ->'employee' ? '2209'; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------- > Seq Scan on paiyroll_payrun (cost=0.00..29.13 rows=9 width=32) > (actual time=50.185..2520.983 rows=104 loops=1) > Filter: ((snapshot -> 'employee'::text) ? '2209'::text) > Rows Removed by Filter: 835 > Planning Time: 0.075 ms > Execution Time: 2521.004 ms > (5 rows) > ====================== > So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to > presume the INDEX is not used because of the number of rows? Is there > a way to verify that? You could do "set enable_seqscan = off" and see if the EXPLAIN results change. My guess is that you'll find that the indexscan alternative is costed at a bit more than 29.13 units and thus the planner thinks seqscan is cheaper. > And how can I understand the dreadful amount of > time (of course, this is just on my dev machine, but still...)? In the seqscan case, the -> operator is going to retrieve the whole JSONB value from each row, which of course is pretty darn expensive if it's a few megabytes. Unfortunately the planner doesn't account for detoasting costs when making such estimates, so it doesn't realize that the seqscan case is going to be expensive. (Fixing that has been on the to-do list for a long time, but we seldom see cases where it matters this much, so it hasn't gotten done.) The problem would likely go away by itself if your table had more than a few hundred rows, but if you don't anticipate that happening then you need some sort of band-aid. I don't recommend turning enable_seqscan off as a production fix; it'd likely have negative effects on other queries. Personally I'd experiment with reducing random_page_cost a bit to see if I could encourage use of the index that way. The default value of 4.0 is tuned for spinning-rust storage and is not too appropriate for a lot of modern hardware, so there's probably room to fix it that way without detuning your setup for other queries. You should probably also rethink whether you really want to store your data in this format, because anything at all that you do with that big JSONB column is going to be expensive. (Another thing that's been on the to-do list for awhile is enabling partial retrieval of large JSONB values, but AFAIK that hasn't happened yet either.) regards, tom lane
Tom, Thanks for the considered advice and insights. My takeaway is that based on what I've said,you are mostly unsurprised by the results I see. In the longer term, the number of rows will increase but I will have to ponder options for the immediate future. I'll have a play with the knobs you suggested and will report back with anything of note. One last thought about TOAST. If the cost of the -> retrieving the data cannot be obviated, is there any way to tweak how that works? Thanks, Shaheed On Sat, 28 May 2022 at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Shaheed Haque <shaheedhaque@gmail.com> writes: > > ====================== > > foo =# explain analyse SELECT snapshot ->'company'->'legal_name' FROM > > paiyroll_payrun WHERE snapshot ->'employee' ? '2209'; > > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------- > > Seq Scan on paiyroll_payrun (cost=0.00..29.13 rows=9 width=32) > > (actual time=50.185..2520.983 rows=104 loops=1) > > Filter: ((snapshot -> 'employee'::text) ? '2209'::text) > > Rows Removed by Filter: 835 > > Planning Time: 0.075 ms > > Execution Time: 2521.004 ms > > (5 rows) > > ====================== > > > So, over 2.5 seconds to sequentially scan ~850 rows. Am I right to > > presume the INDEX is not used because of the number of rows? Is there > > a way to verify that? > > You could do "set enable_seqscan = off" and see if the EXPLAIN > results change. My guess is that you'll find that the indexscan > alternative is costed at a bit more than 29.13 units and thus > the planner thinks seqscan is cheaper. > > > And how can I understand the dreadful amount of > > time (of course, this is just on my dev machine, but still...)? > > In the seqscan case, the -> operator is going to retrieve the whole > JSONB value from each row, which of course is pretty darn expensive > if it's a few megabytes. Unfortunately the planner doesn't account > for detoasting costs when making such estimates, so it doesn't > realize that the seqscan case is going to be expensive. (Fixing > that has been on the to-do list for a long time, but we seldom > see cases where it matters this much, so it hasn't gotten done.) > > The problem would likely go away by itself if your table had more > than a few hundred rows, but if you don't anticipate that happening > then you need some sort of band-aid. I don't recommend turning > enable_seqscan off as a production fix; it'd likely have negative > effects on other queries. Personally I'd experiment with reducing > random_page_cost a bit to see if I could encourage use of the index > that way. The default value of 4.0 is tuned for spinning-rust > storage and is not too appropriate for a lot of modern hardware, > so there's probably room to fix it that way without detuning your > setup for other queries. > > You should probably also rethink whether you really want to store > your data in this format, because anything at all that you do with > that big JSONB column is going to be expensive. (Another thing > that's been on the to-do list for awhile is enabling partial > retrieval of large JSONB values, but AFAIK that hasn't happened > yet either.) > > regards, tom lane
On 5/28/22 12:38, Shaheed Haque wrote: > Tom, > > Thanks for the considered advice and insights. My takeaway is that > based on what I've said,you are mostly unsurprised by the results I > see. In the longer term, the number of rows will increase but I will > have to ponder options for the immediate future. I'll have a play with > the knobs you suggested and will report back with anything of note. > > One last thought about TOAST. If the cost of the -> retrieving the > data cannot be obviated, is there any way to tweak how that works? See: https://www.postgresql.org/docs/current/storage-toast.html and from there: https://www.postgresql.org/docs/current/sql-altertable.html Search for SET STORAGE. It might help with(from Tom Lane): "Unfortunately the planner doesn't account for detoasting costs when making such estimates, ..." but probably not much with: " ...because anything at all that you do with that big JSONB column is going to be expensive. (Another thing that's been on the to-do list for awhile is enabling partial retrieval of large JSONB values, but AFAIK that hasn't happened yet either.) > > Thanks, Shaheed > -- Adrian Klaver adrian.klaver@aklaver.com
Shaheed Haque <shaheedhaque@gmail.com> writes: > One last thought about TOAST. If the cost of the -> retrieving the > data cannot be obviated, is there any way to tweak how that works? The only thing that's readily tweakable is to disable data compression for the out-of-line values (see ALTER TABLE ... SET STORAGE, and note that you have to rebuild the table for already-stored values to be changed). It seems unlikely that that will help you much though, since doing that would save CPU at the cost of more disk I/O, and it seems that the I/O side is your bottleneck already. It *would* help if jsonb had logic for partial fetches, because that'd require uncompressed storage to work. But AFAICS that's not there yet. I distinctly recall that that was planned for when we were designing jsonb's on-disk layout, but I see no code actually doing anything of that sort ... the -> operator certainly doesn't. regards, tom lane
Adrian, Tom, thanks for the input. Based on that, it occurred to me to create some "dummy" rows and - almost magically - the index kicks in! Before is 2500ms: ============ Seq Scan on paiyroll_payrun (cost=0.00..52.43 rows=17 width=32) (actual time=53.127..2567.024 rows=104 loops=1) Filter: ((snapshot -> 'employee'::text) ? '2209'::text) Rows Removed by Filter: 1835 Planning Time: 0.060 ms Execution Time: 2567.044 ms (5 rows) ============ After is 300ms: ============ Bitmap Heap Scan on paiyroll_payrun (cost=36.11..64.67 rows=14 width=32) (actual time=4.189..311.932 rows=104 loops=1) Recheck Cond: ((snapshot -> 'employee'::text) ? '2209'::text) Rows Removed by Index Recheck: 1 Heap Blocks: exact=8 -> Bitmap Index Scan on idx1 (cost=0.00..36.10 rows=14 width=0) (actual time=0.087..0.087 rows=105 loops=1) Index Cond: ((snapshot -> 'employee'::text) ? '2209'::text) Planning Time: 0.167 ms Execution Time: 311.962 ms (8 rows) ============ Woot! Unfortunately, the real query which I think should behave very similarly is still at the several-seconds level despite using the index. Before 3600ms: ============ SELECT "paiyroll_payrun"."actual_t" FROM "paiyroll_payrun" WHERE ("paiyroll_payrun"."company_id" = 173 AND ("paiyroll_payrun"."snapshot" -> 'employee') ? '16376' AND NOT (("paiyroll_payrun"."snapshot" #> ARRAY['employee','16376','last_run_of_employment']) = 'true' AND ("paiyroll_payrun"."snapshot" #> ARRAY['employee','16376','pay_graph']) = '0' AND ("paiyroll_payrun"."snapshot" #> ARRAY['employee','16376','state','employment','-1','2']) > '0')) ORDER BY "paiyroll_payrun"."actual_t" DESC LIMIT 1 ============ ============ Limit (cost=31.33..31.33 rows=1 width=4) (actual time=3595.174..3595.176 rows=1 loops=1) -> Sort (cost=31.33..31.33 rows=3 width=4) (actual time=3595.174..3595.174 rows=1 loops=1) Sort Key: actual_t DESC Sort Method: top-N heapsort Memory: 25kB -> Bitmap Heap Scan on paiyroll_payrun (cost=6.43..31.31 rows=3 width=4) (actual time=44.575..3595.082 rows=62 loops=1) Recheck Cond: (company_id = 173) Filter: (((snapshot -> 'employee'::text) ? '16376'::text) AND (((snapshot #> '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb) OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb) OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <= '0'::jsonb))) Rows Removed by Filter: 242 Heap Blocks: exact=9 -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888 (cost=0.00..6.43 rows=304 width=0) (actual time=0.013..0.013 rows=304 loops=1) Index Cond: (company_id = 173) Planning Time: 0.258 ms Execution Time: 3595.195 ms ============ After, with the index in use, 2200ms: ================ Limit (cost=30.92..30.93 rows=1 width=4) (actual time=2258.989..2258.990 rows=1 loops=1) -> Sort (cost=30.92..30.93 rows=1 width=4) (actual time=2258.988..2258.989 rows=1 loops=1) Sort Key: actual_t DESC Sort Method: top-N heapsort Memory: 25kB -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1 width=4) (actual time=32.488..2258.891 rows=62 loops=1) Recheck Cond: ((company_id = 173) AND ((snapshot -> 'employee'::text) ? '16376'::text)) Filter: (((snapshot #> '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb) OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb) OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <= '0'::jsonb)) Heap Blocks: exact=5 -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual time=0.038..0.039 rows=0 loops=1) -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888 (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304 loops=1) Index Cond: (company_id = 173) -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9 width=0) (actual time=0.021..0.021 rows=62 loops=1) Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text) Planning Time: 0.245 ms Execution Time: 2259.019 ms =================== IIUC, at the bottom, the indices are doing their thing, but a couple of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I cannot quite see why. Have I missed a needed index or what? I'm pretty much a novice at SQL, so it is quite possible I've missed something very basic. Thanks, Shaheed On Sat, 28 May 2022 at 20:59, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Shaheed Haque <shaheedhaque@gmail.com> writes: > > One last thought about TOAST. If the cost of the -> retrieving the > > data cannot be obviated, is there any way to tweak how that works? > > The only thing that's readily tweakable is to disable data compression > for the out-of-line values (see ALTER TABLE ... SET STORAGE, and note > that you have to rebuild the table for already-stored values to be > changed). It seems unlikely that that will help you much though, > since doing that would save CPU at the cost of more disk I/O, and > it seems that the I/O side is your bottleneck already. It *would* > help if jsonb had logic for partial fetches, because that'd require > uncompressed storage to work. But AFAICS that's not there yet. > I distinctly recall that that was planned for when we were designing > jsonb's on-disk layout, but I see no code actually doing anything > of that sort ... the -> operator certainly doesn't. > > regards, tom lane
Shaheed Haque <shaheedhaque@gmail.com> writes: > Unfortunately, the real query which I think should behave very > similarly is still at the several-seconds level despite using the > index. ... > -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1 > width=4) (actual time=32.488..2258.891 rows=62 loops=1) > Recheck Cond: ((company_id = 173) AND ((snapshot -> > 'employee'::text) ? '16376'::text)) > Filter: (((snapshot #> > '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb) > OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb) > OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <= > '0'::jsonb)) > Heap Blocks: exact=5 > -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual > time=0.038..0.039 rows=0 loops=1) > -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888 > (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304 > loops=1) > Index Cond: (company_id = 173) > -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9 > width=0) (actual time=0.021..0.021 rows=62 loops=1) > Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text) > IIUC, at the bottom, the indices are doing their thing, but a couple > of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I > cannot quite see why. I suppose it's the execution of that "Filter" condition, which will require perhaps as many as three fetches of the "snapshot" column. You really need to rethink that data structure. Sure, you can store tons of unorganized data in a jsonb column, but you pay for that convenience with slow access. Normalizing the bits you need frequently into a more traditional relational schema is the route to better-performing queries. regards, tom lane
On Sun, 29 May 2022, 15:58 Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Shaheed Haque <shaheedhaque@gmail.com> writes:
> Unfortunately, the real query which I think should behave very
> similarly is still at the several-seconds level despite using the
> index. ...
> -> Bitmap Heap Scan on paiyroll_payrun (cost=26.88..30.91 rows=1
> width=4) (actual time=32.488..2258.891 rows=62 loops=1)
> Recheck Cond: ((company_id = 173) AND ((snapshot ->
> 'employee'::text) ? '16376'::text))
> Filter: (((snapshot #>
> '{employee,16376,last_run_of_employment}'::text[]) <> 'true'::jsonb)
> OR ((snapshot #> '{employee,16376,pay_graph}'::text[]) <> '0'::jsonb)
> OR ((snapshot #> '{employee,16376,state,employment,-1,2}'::text[]) <=
> '0'::jsonb))
> Heap Blocks: exact=5
> -> BitmapAnd (cost=26.88..26.88 rows=1 width=0) (actual
> time=0.038..0.039 rows=0 loops=1)
> -> Bitmap Index Scan on paiyroll_payrun_company_id_ce341888
> (cost=0.00..6.56 rows=304 width=0) (actual time=0.016..0.016 rows=304
> loops=1)
> Index Cond: (company_id = 173)
> -> Bitmap Index Scan on idx1 (cost=0.00..20.07 rows=9
> width=0) (actual time=0.021..0.021 rows=62 loops=1)
> Index Cond: ((snapshot -> 'employee'::text) ? '16376'::text)
> IIUC, at the bottom, the indices are doing their thing, but a couple
> of layers up, the "Bitmap Heap Scan" jumps from ~30ms to 2200ms. But I
> cannot quite see why.
I suppose it's the execution of that "Filter" condition, which will
require perhaps as many as three fetches of the "snapshot" column.
Thanks, that's clearly in the frame.
You really need to rethink that data structure. Sure, you can store tons
of unorganized data in a jsonb column, but you pay for that convenience
with slow access. Normalizing the bits you need frequently into a more
traditional relational schema is the route to better-performing queries.
Ack. Indeed, the current design works very well for all of the access patterns other than this one, which only recently came into view as a problem.
Ahead of contemplating a design change I have been looking at how to optimise this bit. I'm currently mired in a crash course on SQL syntax as pertains to JSONB, jsonpath et. al. And the equally mysterious side effects of "?" and "@>" and so on in terms of the amount of data being fetched etc. (and all wrapped in a dose of ORM for good measure).
I'll write separately with more specific questions if needed on those details.
Thanks again for the kind help.
Shaheed
regards, tom lane
Suppose I have a JSONB field called "snapshot". I can create a GIN index on it like this: create index idx1 on mytable using gin (snapshot); In principle, I believe this allows index-assisted access to keys and values nested in arrays and inner objects but in practice, it seems the planner "often" decides to ignore the index in favour of a table scan. (As discussed elsewhere, this is influenced by the number of rows, and possibly other criteria too). Now, I know it is possible to index inner objects, so that is snapshot looks like this: { "stuff": {}, "more other stuff": {}, "employee": { "1234": {"date_of_birth": "1970-01-01"}, "56B789": {"date_of_birth": "1971-02-02"}, } } I can say: create index idx2 on mytable using gin ((snapshot -> 'employee')); But what is the syntax to index only on date_of_birth? I assume a btree would work since it is a primitive value, but WHAT GOES HERE in this: create index idx3 on mytable using btree ((snapshot ->'employee' -> WHAT GOES HERE -> 'date_of_birth')); I believe an asterisk "*" would work if 'employee' was an array, but here it is nested object with keys. If it helps, the keys are invariably numbers (in quoted string form, as per JSON). Thanks, Shaheed
shaheedhaque@gmail.com wrote:Suppose I have a JSONB field called "snapshot". I can create a GIN
index on it like this:
create index idx1 on mytable using gin (snapshot);
In principle, I believe this allows index-assisted access to keys and
values nested in arrays and inner objects but in practice, it seems
the planner "often" decides to ignore the index in favour of a table
scan. (As discussed elsewhere, this is influenced by the number of
rows, and possibly other criteria too).
Now, I know it is possible to index inner objects, so that is snapshot
looks like this:
{
"stuff": {},
"more other stuff": {},
"employee": {
"1234": {"date_of_birth": "1970-01-01"},
"56B789": {"date_of_birth": "1971-02-02"},
}
}
I can say:
create index idx2 on mytable using gin ((snapshot -> 'employee'));
But what is the syntax to index only on date_of_birth? I assume a
btree would work since it is a primitive value, but WHAT GOES HERE in
this:
create index idx3 on mytable using btree ((snapshot ->'employee' ->
WHAT GOES HERE -> 'date_of_birth'));
I believe an asterisk "*" would work if 'employee' was an array, but
here it is nested object with keys. If it helps, the keys are
invariably numbers (in quoted string form, as per JSON).
Try this:
snapshot -> ‘employee’->>’date_of_birth’
Hi Bryn, On Mon, 30 May 2022 at 03:12, Bryn Llewellyn <bryn@yugabyte.com> wrote: ... > > Try this: > > snapshot -> ‘employee’->>’date_of_birth’ > Syntactically, that works: create index bryn on paiyroll_payrun using btree ((snapshot -> 'employee'->>'date_of_birth')); But IIUC it is looking for 'date_of_birth' in the wrong level. it would need to do something like this: create index bryn on paiyroll_payrun using btree ((snapshot -> 'employee'->'2209'->>'date_of_birth')); To paraphrase, my question is about how to replace the '2209' with all possible object keys. For the employee-is-an-array, the documentation says this should work: create index bryn2 on paiyroll_payrun using btree ((snapshot -> 'employee[*]'->>'date_of_birth')); but is silent on the employee-is-an-object case. (As I said, in case it helps, in my case the keys are in fact stringified numbers). Thanks, Shaheed
Hi Shaheed,
> WHAT GOES HERE
imho check the: jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )
CREATE TABLE jsonb_table (
id serial primary key,
jsonb_col JSONB
);
INSERT INTO jsonb_table(jsonb_col)
VALUES
('{"stuff": {},"employee": {"8011": {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'),
('{"stuff": {},"employee": {"7011": {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'),
('{"stuff": {},"employee": {"a12": {"date_of_birth": "2000-01-01"},"b56": {"date_of_birth": "2000-02-02"}}}')
;
select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) from jsonb_table;
-- create index
create index jpqarr_idx
on jsonb_table
using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
-- tests:
select id from jsonb_table
where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
;
DROP TABLE
CREATE TABLE
INSERT 0 3
+------------------------------+
| jsonb_path_query_array |
+------------------------------+
| ["1980-01-01", "1982-02-02"] |
| ["1970-01-01", "1971-02-02"] |
| ["2000-01-01", "2000-02-02"] |
+------------------------------+
(3 rows)
CREATE INDEX
+----+
| id |
+----+
| 3 |
+----+
(1 row)
id serial primary key,
jsonb_col JSONB
);
INSERT INTO jsonb_table(jsonb_col)
VALUES
('{"stuff": {},"employee": {"8011": {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'),
('{"stuff": {},"employee": {"7011": {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'),
('{"stuff": {},"employee": {"a12": {"date_of_birth": "2000-01-01"},"b56": {"date_of_birth": "2000-02-02"}}}')
;
select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) from jsonb_table;
-- create index
create index jpqarr_idx
on jsonb_table
using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
-- tests:
select id from jsonb_table
where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
;
DROP TABLE
CREATE TABLE
INSERT 0 3
+------------------------------+
| jsonb_path_query_array |
+------------------------------+
| ["1980-01-01", "1982-02-02"] |
| ["1970-01-01", "1971-02-02"] |
| ["2000-01-01", "2000-02-02"] |
+------------------------------+
(3 rows)
CREATE INDEX
+----+
| id |
+----+
| 3 |
+----+
(1 row)
Regards,
Imre
Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
Suppose I have a JSONB field called "snapshot". I can create a GIN
index on it like this:
create index idx1 on mytable using gin (snapshot);
In principle, I believe this allows index-assisted access to keys and
values nested in arrays and inner objects but in practice, it seems
the planner "often" decides to ignore the index in favour of a table
scan. (As discussed elsewhere, this is influenced by the number of
rows, and possibly other criteria too).
Now, I know it is possible to index inner objects, so that is snapshot
looks like this:
{
"stuff": {},
"more other stuff": {},
"employee": {
"1234": {"date_of_birth": "1970-01-01"},
"56B789": {"date_of_birth": "1971-02-02"},
}
}
I can say:
create index idx2 on mytable using gin ((snapshot -> 'employee'));
But what is the syntax to index only on date_of_birth? I assume a
btree would work since it is a primitive value, but WHAT GOES HERE in
this:
create index idx3 on mytable using btree ((snapshot ->'employee' ->
WHAT GOES HERE -> 'date_of_birth'));
I believe an asterisk "*" would work if 'employee' was an array, but
here it is nested object with keys. If it helps, the keys are
invariably numbers (in quoted string form, as per JSON).
Thanks, Shaheed
> In principle, I believe this allows index-assisted access to keys and
> values nested in arrays and inner objects but in practice, it seems
> the planner "often" decides to ignore the index in favour of a table scan.
> values nested in arrays and inner objects but in practice, it seems
> the planner "often" decides to ignore the index in favour of a table scan.
part II. index usage ; see the "Bitmap Index Scan on jpqarr_idx"
select * from jsonb_table
where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
;
+----+---------------------------------------------------------------------------------------------------------------+
| id | jsonb_col |
+----+---------------------------------------------------------------------------------------------------------------+
| 2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}} |
+----+---------------------------------------------------------------------------------------------------------------+
(1 row)
EXPLAIN ANALYZE select * from jsonb_table
where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
;
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Bitmap Heap Scan on jsonb_table (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1) |
| Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> to_jsonb('1971-02-02'::text)) |
| Heap Blocks: exact=1 |
| -> Bitmap Index Scan on jpqarr_idx (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1) |
| Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> to_jsonb('1971-02-02'::text)) |
| Planning Time: 0.255 ms |
| Execution Time: 0.122 ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
(7 rows)
where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
;
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Bitmap Heap Scan on jsonb_table (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1) |
| Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> to_jsonb('1971-02-02'::text)) |
| Heap Blocks: exact=1 |
| -> Bitmap Index Scan on jpqarr_idx (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1) |
| Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> to_jsonb('1971-02-02'::text)) |
| Planning Time: 0.255 ms |
| Execution Time: 0.122 ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
(7 rows)
regards,
Imre
Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30):
Hi Shaheed,> WHAT GOES HEREimho check the: jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )may example:CREATE TABLE jsonb_table (
id serial primary key,
jsonb_col JSONB
);
INSERT INTO jsonb_table(jsonb_col)
VALUES
('{"stuff": {},"employee": {"8011": {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'),
('{"stuff": {},"employee": {"7011": {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'),
('{"stuff": {},"employee": {"a12": {"date_of_birth": "2000-01-01"},"b56": {"date_of_birth": "2000-02-02"}}}')
;
select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) from jsonb_table;
-- create index
create index jpqarr_idx
on jsonb_table
using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
-- tests:
select id from jsonb_table
where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
;
DROP TABLE
CREATE TABLE
INSERT 0 3
+------------------------------+
| jsonb_path_query_array |
+------------------------------+
| ["1980-01-01", "1982-02-02"] |
| ["1970-01-01", "1971-02-02"] |
| ["2000-01-01", "2000-02-02"] |
+------------------------------+
(3 rows)
CREATE INDEX
+----+
| id |
+----+
| 3 |
+----+
(1 row)Regards,ImreShaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53):Suppose I have a JSONB field called "snapshot". I can create a GIN
index on it like this:
create index idx1 on mytable using gin (snapshot);
In principle, I believe this allows index-assisted access to keys and
values nested in arrays and inner objects but in practice, it seems
the planner "often" decides to ignore the index in favour of a table
scan. (As discussed elsewhere, this is influenced by the number of
rows, and possibly other criteria too).
Now, I know it is possible to index inner objects, so that is snapshot
looks like this:
{
"stuff": {},
"more other stuff": {},
"employee": {
"1234": {"date_of_birth": "1970-01-01"},
"56B789": {"date_of_birth": "1971-02-02"},
}
}
I can say:
create index idx2 on mytable using gin ((snapshot -> 'employee'));
But what is the syntax to index only on date_of_birth? I assume a
btree would work since it is a primitive value, but WHAT GOES HERE in
this:
create index idx3 on mytable using btree ((snapshot ->'employee' ->
WHAT GOES HERE -> 'date_of_birth'));
I believe an asterisk "*" would work if 'employee' was an array, but
here it is nested object with keys. If it helps, the keys are
invariably numbers (in quoted string form, as per JSON).
Thanks, Shaheed
Imre, I'm gradually working my way into the combination of SQL, JSONB and jsonpath that this involves even without the indexing, but this looks very helpful/promising, especially on the indexing. Thanks a lot for the input, Shaheed On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote: > > > In principle, I believe this allows index-assisted access to keys and > > values nested in arrays and inner objects but in practice, it seems > > the planner "often" decides to ignore the index in favour of a table scan. > > part II. index usage ; see the "Bitmap Index Scan on jpqarr_idx" > > SET enable_seqscan = OFF; > select * from jsonb_table > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT); > ; > +----+---------------------------------------------------------------------------------------------------------------+ > | id | jsonb_col | > +----+---------------------------------------------------------------------------------------------------------------+ > | 2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}} | > +----+---------------------------------------------------------------------------------------------------------------+ > (1 row) > > > EXPLAIN ANALYZE select * from jsonb_table > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT); > ; > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > | QUERY PLAN | > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > | Bitmap Heap Scan on jsonb_table (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1) | > | Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @>to_jsonb('1971-02-02'::text)) | > | Heap Blocks: exact=1 | > | -> Bitmap Index Scan on jpqarr_idx (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1) | > | Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false)@> to_jsonb('1971-02-02'::text)) | > | Planning Time: 0.255 ms | > | Execution Time: 0.122 ms | > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > (7 rows) > > regards, > Imre > > > Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30): >> >> Hi Shaheed, >> >> > WHAT GOES HERE >> >> imho check the: jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) >> >> may example: >> >> CREATE TABLE jsonb_table ( >> id serial primary key, >> jsonb_col JSONB >> ); >> >> INSERT INTO jsonb_table(jsonb_col) >> VALUES >> ('{"stuff": {},"employee": {"8011": {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'), >> ('{"stuff": {},"employee": {"7011": {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'), >> ('{"stuff": {},"employee": {"a12": {"date_of_birth": "2000-01-01"},"b56": {"date_of_birth": "2000-02-02"}}}') >> ; >> >> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) from jsonb_table; >> >> -- create index >> create index jpqarr_idx >> on jsonb_table >> using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) ); >> -- tests: >> select id from jsonb_table >> where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT); >> ; >> DROP TABLE >> CREATE TABLE >> INSERT 0 3 >> +------------------------------+ >> | jsonb_path_query_array | >> +------------------------------+ >> | ["1980-01-01", "1982-02-02"] | >> | ["1970-01-01", "1971-02-02"] | >> | ["2000-01-01", "2000-02-02"] | >> +------------------------------+ >> (3 rows) >> >> CREATE INDEX >> +----+ >> | id | >> +----+ >> | 3 | >> +----+ >> (1 row) >> >> >> Regards, >> Imre >> >> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53): >>> >>> Suppose I have a JSONB field called "snapshot". I can create a GIN >>> index on it like this: >>> >>> create index idx1 on mytable using gin (snapshot); >>> >>> In principle, I believe this allows index-assisted access to keys and >>> values nested in arrays and inner objects but in practice, it seems >>> the planner "often" decides to ignore the index in favour of a table >>> scan. (As discussed elsewhere, this is influenced by the number of >>> rows, and possibly other criteria too). >>> >>> Now, I know it is possible to index inner objects, so that is snapshot >>> looks like this: >>> >>> { >>> "stuff": {}, >>> "more other stuff": {}, >>> "employee": { >>> "1234": {"date_of_birth": "1970-01-01"}, >>> "56B789": {"date_of_birth": "1971-02-02"}, >>> } >>> } >>> >>> I can say: >>> >>> create index idx2 on mytable using gin ((snapshot -> 'employee')); >>> >>> But what is the syntax to index only on date_of_birth? I assume a >>> btree would work since it is a primitive value, but WHAT GOES HERE in >>> this: >>> >>> create index idx3 on mytable using btree ((snapshot ->'employee' -> >>> WHAT GOES HERE -> 'date_of_birth')); >>> >>> I believe an asterisk "*" would work if 'employee' was an array, but >>> here it is nested object with keys. If it helps, the keys are >>> invariably numbers (in quoted string form, as per JSON). >>> >>> Thanks, Shaheed >>> >>>
On Sat, May 28, 2022 at 1:54 PM Shaheed Haque <shaheedhaque@gmail.com> wrote:
And how can I understand the dreadful amount of
time (of course, this is just on my dev machine, but still...)? Is
there a way to see/tweak what TOAST costs or indeed to confirm if it
is even in use?
Turn on track_io_timing, and then do explain (analyse, buffers). This won't distinguish TOAST from main table IO, but will at least confirm if a large amount of IO is happening and how long it is taking.
You can look at pg_statio_user_tables before the after the query to distinguish TOAST, but this won't give you the timing, just the block counts. And the counts might also get incremented by concurrent users, so on a busy system it is hard to interpret.
Are these queries fast upon repeat execution? A few meg time less than 1000 it just not that much with modern hardware, you should be able to just cache it in RAM.
Cheers,
Jeff
OK, I was able to translate your excellent note into this: CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot -> '$.employee.*.works_id')); and query using "@>" to see it in use: SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')... EXPLAIN ANALYSE... -> Bitmap Index Scan on foo (cost=0.00..8.22 rows=29 width=0) (actual time=0.005..0.005 rows=0 loops=1) Index Cond: ((snapshot -> '$.employee.*.works_id'::text) @> '1091'::jsonb) Unfortunately, my real query involved multiple AND clauses which (as per another recent thread) seems to end up fetching the large (probably TOASTed) JSONB once for each AND clause at a cost of 150ms each. So, I got rid of the multiple ANDs by staying inside the jsonpath like this: SELECT ... WHERE (... AND (snapshot @? '$.employee."2209" ? ( @.pay_graph <> 0 || @.last_run_of_employment == true || @.state.employment[last][2] == 0)' ) ); But I have not been able to find an index formulation the new jsonpath can use. I tried adding CREATE INDEX ... USING gin ((snapshot -> '$.employee.*')); CREATE INDEX ... USING gin ((snapshot -> '$.employee')); CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot -> '$.employee.*.pay_graph')); Any thoughts on indexing for this case? If it makes any difference, I'm presently on PG12 and PG13 but looking to move to PG14. Thanks, Shaheed On Mon, 30 May 2022 at 19:59, Shaheed Haque <shaheedhaque@gmail.com> wrote: > > Imre, > > I'm gradually working my way into the combination of SQL, JSONB and > jsonpath that this involves even without the indexing, but this looks > very helpful/promising, especially on the indexing. Thanks a lot for > the input, > > Shaheed > > On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote: > > > > > In principle, I believe this allows index-assisted access to keys and > > > values nested in arrays and inner objects but in practice, it seems > > > the planner "often" decides to ignore the index in favour of a table scan. > > > > part II. index usage ; see the "Bitmap Index Scan on jpqarr_idx" > > > > SET enable_seqscan = OFF; > > select * from jsonb_table > > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT); > > ; > > +----+---------------------------------------------------------------------------------------------------------------+ > > | id | jsonb_col | > > +----+---------------------------------------------------------------------------------------------------------------+ > > | 2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}}| > > +----+---------------------------------------------------------------------------------------------------------------+ > > (1 row) > > > > > > EXPLAIN ANALYZE select * from jsonb_table > > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT); > > ; > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > > | QUERY PLAN | > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > > | Bitmap Heap Scan on jsonb_table (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1) | > > | Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false)@> to_jsonb('1971-02-02'::text)) | > > | Heap Blocks: exact=1 | > > | -> Bitmap Index Scan on jpqarr_idx (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1) | > > | Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false)@> to_jsonb('1971-02-02'::text)) | > > | Planning Time: 0.255 ms | > > | Execution Time: 0.122 ms | > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > > (7 rows) > > > > regards, > > Imre > > > > > > Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30): > >> > >> Hi Shaheed, > >> > >> > WHAT GOES HERE > >> > >> imho check the: jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) > >> > >> may example: > >> > >> CREATE TABLE jsonb_table ( > >> id serial primary key, > >> jsonb_col JSONB > >> ); > >> > >> INSERT INTO jsonb_table(jsonb_col) > >> VALUES > >> ('{"stuff": {},"employee": {"8011": {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'), > >> ('{"stuff": {},"employee": {"7011": {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'), > >> ('{"stuff": {},"employee": {"a12": {"date_of_birth": "2000-01-01"},"b56": {"date_of_birth": "2000-02-02"}}}') > >> ; > >> > >> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) from jsonb_table; > >> > >> -- create index > >> create index jpqarr_idx > >> on jsonb_table > >> using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) ); > >> -- tests: > >> select id from jsonb_table > >> where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT); > >> ; > >> DROP TABLE > >> CREATE TABLE > >> INSERT 0 3 > >> +------------------------------+ > >> | jsonb_path_query_array | > >> +------------------------------+ > >> | ["1980-01-01", "1982-02-02"] | > >> | ["1970-01-01", "1971-02-02"] | > >> | ["2000-01-01", "2000-02-02"] | > >> +------------------------------+ > >> (3 rows) > >> > >> CREATE INDEX > >> +----+ > >> | id | > >> +----+ > >> | 3 | > >> +----+ > >> (1 row) > >> > >> > >> Regards, > >> Imre > >> > >> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53): > >>> > >>> Suppose I have a JSONB field called "snapshot". I can create a GIN > >>> index on it like this: > >>> > >>> create index idx1 on mytable using gin (snapshot); > >>> > >>> In principle, I believe this allows index-assisted access to keys and > >>> values nested in arrays and inner objects but in practice, it seems > >>> the planner "often" decides to ignore the index in favour of a table > >>> scan. (As discussed elsewhere, this is influenced by the number of > >>> rows, and possibly other criteria too). > >>> > >>> Now, I know it is possible to index inner objects, so that is snapshot > >>> looks like this: > >>> > >>> { > >>> "stuff": {}, > >>> "more other stuff": {}, > >>> "employee": { > >>> "1234": {"date_of_birth": "1970-01-01"}, > >>> "56B789": {"date_of_birth": "1971-02-02"}, > >>> } > >>> } > >>> > >>> I can say: > >>> > >>> create index idx2 on mytable using gin ((snapshot -> 'employee')); > >>> > >>> But what is the syntax to index only on date_of_birth? I assume a > >>> btree would work since it is a primitive value, but WHAT GOES HERE in > >>> this: > >>> > >>> create index idx3 on mytable using btree ((snapshot ->'employee' -> > >>> WHAT GOES HERE -> 'date_of_birth')); > >>> > >>> I believe an asterisk "*" would work if 'employee' was an array, but > >>> here it is nested object with keys. If it helps, the keys are > >>> invariably numbers (in quoted string form, as per JSON). > >>> > >>> Thanks, Shaheed > >>> > >>>
I forgot to say... I see the documentation on jsonpath indexing says: "GIN index extracts statements of following form out of jsonpath: accessors_chain = const. Accessors chain may consist of .key, [*], and [index] accessors. jsonb_ops additionally supports .* and .** accessors" But I'm unable to make "CREATE INDEX...USING gin ((snapshot -> '$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true || @.state.employment[last][2] == 0)'))" trigger that. On Tue, 31 May 2022 at 10:16, Shaheed Haque <shaheedhaque@gmail.com> wrote: > > OK, I was able to translate your excellent note into this: > > CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot -> > '$.employee.*.works_id')); > > and query using "@>" to see it in use: > > SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')... > > EXPLAIN ANALYSE... > -> Bitmap Index Scan on foo (cost=0.00..8.22 rows=29 > width=0) (actual time=0.005..0.005 rows=0 loops=1) > Index Cond: ((snapshot -> '$.employee.*.works_id'::text) > @> '1091'::jsonb) > > Unfortunately, my real query involved multiple AND clauses which (as > per another recent thread) seems to end up fetching the large > (probably TOASTed) JSONB once for each AND clause at a cost of 150ms > each. So, I got rid of the multiple ANDs by staying inside the > jsonpath like this: > > SELECT ... WHERE (... AND > (snapshot @? '$.employee."2209" ? ( > @.pay_graph <> 0 || @.last_run_of_employment == true || > @.state.employment[last][2] == 0)' > ) > ); > > But I have not been able to find an index formulation the new jsonpath > can use. I tried adding > > CREATE INDEX ... USING gin ((snapshot -> '$.employee.*')); > CREATE INDEX ... USING gin ((snapshot -> '$.employee')); > CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot -> > '$.employee.*.pay_graph')); > > Any thoughts on indexing for this case? If it makes any difference, > I'm presently on PG12 and PG13 but looking to move to PG14. > > Thanks, Shaheed > > On Mon, 30 May 2022 at 19:59, Shaheed Haque <shaheedhaque@gmail.com> wrote: > > > > Imre, > > > > I'm gradually working my way into the combination of SQL, JSONB and > > jsonpath that this involves even without the indexing, but this looks > > very helpful/promising, especially on the indexing. Thanks a lot for > > the input, > > > > Shaheed > > > > On Mon, 30 May 2022 at 11:50, Imre Samu <pella.samu@gmail.com> wrote: > > > > > > > In principle, I believe this allows index-assisted access to keys and > > > > values nested in arrays and inner objects but in practice, it seems > > > > the planner "often" decides to ignore the index in favour of a table scan. > > > > > > part II. index usage ; see the "Bitmap Index Scan on jpqarr_idx" > > > > > > SET enable_seqscan = OFF; > > > select * from jsonb_table > > > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT); > > > ; > > > +----+---------------------------------------------------------------------------------------------------------------+ > > > | id | jsonb_col | > > > +----+---------------------------------------------------------------------------------------------------------------+ > > > | 2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}}| > > > +----+---------------------------------------------------------------------------------------------------------------+ > > > (1 row) > > > > > > > > > EXPLAIN ANALYZE select * from jsonb_table > > > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT); > > > ; > > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > > > | QUERY PLAN | > > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > > > | Bitmap Heap Scan on jsonb_table (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1) | > > > | Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false)@> to_jsonb('1971-02-02'::text)) | > > > | Heap Blocks: exact=1 | > > > | -> Bitmap Index Scan on jpqarr_idx (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1) | > > > | Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false)@> to_jsonb('1971-02-02'::text)) | > > > | Planning Time: 0.255 ms | > > > | Execution Time: 0.122 ms | > > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+ > > > (7 rows) > > > > > > regards, > > > Imre > > > > > > > > > Imre Samu <pella.samu@gmail.com> ezt írta (időpont: 2022. máj. 30., H, 12:30): > > >> > > >> Hi Shaheed, > > >> > > >> > WHAT GOES HERE > > >> > > >> imho check the: jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) > > >> > > >> may example: > > >> > > >> CREATE TABLE jsonb_table ( > > >> id serial primary key, > > >> jsonb_col JSONB > > >> ); > > >> > > >> INSERT INTO jsonb_table(jsonb_col) > > >> VALUES > > >> ('{"stuff": {},"employee": {"8011": {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'), > > >> ('{"stuff": {},"employee": {"7011": {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'), > > >> ('{"stuff": {},"employee": {"a12": {"date_of_birth": "2000-01-01"},"b56": {"date_of_birth": "2000-02-02"}}}') > > >> ; > > >> > > >> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) from jsonb_table; > > >> > > >> -- create index > > >> create index jpqarr_idx > > >> on jsonb_table > > >> using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) ); > > >> -- tests: > > >> select id from jsonb_table > > >> where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT); > > >> ; > > >> DROP TABLE > > >> CREATE TABLE > > >> INSERT 0 3 > > >> +------------------------------+ > > >> | jsonb_path_query_array | > > >> +------------------------------+ > > >> | ["1980-01-01", "1982-02-02"] | > > >> | ["1970-01-01", "1971-02-02"] | > > >> | ["2000-01-01", "2000-02-02"] | > > >> +------------------------------+ > > >> (3 rows) > > >> > > >> CREATE INDEX > > >> +----+ > > >> | id | > > >> +----+ > > >> | 3 | > > >> +----+ > > >> (1 row) > > >> > > >> > > >> Regards, > > >> Imre > > >> > > >> Shaheed Haque <shaheedhaque@gmail.com> ezt írta (időpont: 2022. máj. 29., V, 22:53): > > >>> > > >>> Suppose I have a JSONB field called "snapshot". I can create a GIN > > >>> index on it like this: > > >>> > > >>> create index idx1 on mytable using gin (snapshot); > > >>> > > >>> In principle, I believe this allows index-assisted access to keys and > > >>> values nested in arrays and inner objects but in practice, it seems > > >>> the planner "often" decides to ignore the index in favour of a table > > >>> scan. (As discussed elsewhere, this is influenced by the number of > > >>> rows, and possibly other criteria too). > > >>> > > >>> Now, I know it is possible to index inner objects, so that is snapshot > > >>> looks like this: > > >>> > > >>> { > > >>> "stuff": {}, > > >>> "more other stuff": {}, > > >>> "employee": { > > >>> "1234": {"date_of_birth": "1970-01-01"}, > > >>> "56B789": {"date_of_birth": "1971-02-02"}, > > >>> } > > >>> } > > >>> > > >>> I can say: > > >>> > > >>> create index idx2 on mytable using gin ((snapshot -> 'employee')); > > >>> > > >>> But what is the syntax to index only on date_of_birth? I assume a > > >>> btree would work since it is a primitive value, but WHAT GOES HERE in > > >>> this: > > >>> > > >>> create index idx3 on mytable using btree ((snapshot ->'employee' -> > > >>> WHAT GOES HERE -> 'date_of_birth')); > > >>> > > >>> I believe an asterisk "*" would work if 'employee' was an array, but > > >>> here it is nested object with keys. If it helps, the keys are > > >>> invariably numbers (in quoted string form, as per JSON). > > >>> > > >>> Thanks, Shaheed > > >>> > > >>>