Thread: JSON Path and GIN Questions
Greetings Hackers, Been a while! I’m working on some experiments with JSONB columns and GIN indexes, and have operated on the assumption thatJSON Path operations would take advantage of GIN indexes, with json_path_ops as a nice optimization. But I’ve run intowhat appear to be some inconsistencies and oddities I’m hoping to figure out with your help. For the examples in this email, I’m using this simple table: CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL); \copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json| jq -c ".[]" | sed "s|\\\\|\\\\\\\\|g"'; create index on movies using gin (movie); analyze movies; That gives me a simple table with around 3600 rows. Not a lot of data, but hopefully enough to demonstrate the issues. Issue 1: @@ vs @? ----------------- I have been confused as to the difference between @@ vs @?: Why do these return different results? david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")'; id ---- (0 rows) david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")'; id ---- 10 (1 row) I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), and from the suggestion I got there,it seems that @@ expects a boolean to be returned by the path query, while @? wraps it in an implicit exists(). Isthat right? If so, I’d like to submit a patch to the docs talking about this, and suggesting the use of jsonb_path_query() to test pathsto see if they return a boolean or not. Issue 2: @? Index Use --------------------- From Oleg’s (happy belated birthday!) notes (https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#jsonpath-operators): > Operators @? and @@ are interchangeable: > > js @? '$.a' <=> js @@ 'exists($.a)’ > js @@ '$.a == 1' <=> js @? '$ ? ($.a == 1)’ For the purposes of the above example, this appears to hold true: if I wrap the path query in exists(), @@ returns a result: david=# select id from movies where movie @@ 'exists($ ?(@.title == "New Life Rescue"))'; id ---- 10 (1 row) Yay! However, @@ and @? don’t seem to use an index the same way: @@ uses a GIN index while @? does not. Or, no, fiddling with it again just now, I think I have still been confusing these operators! @@ was using the index withan an explicit exists(), but @? was not…because I was still using an explicit exists. In other words: * @@ 'exists($ ?($.year == 1944))' Uses the index * @? '$ ?(@.year == 1944)' Uses the index * @? 'exists($ ?($.year == 1944))' Does not use the index That last one presumably doesn’t work, because there is an implicit exists() around the exists(), making it `exists(exists($?($.year == 1944)))`, which returns true for every row (true and false both exists)! 🤦🏻♂️. Anyway, if I have this right, I’d like to flesh out the docs a bit. Issue 3: Index Use for Comparison --------------------------------- From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path querywould be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage ofthe GIN index, apparently the >= operator cannot: david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)'; QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on movies (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1) Filter: (movie @? '$?($."year" >= 2023)'::jsonpath) Rows Removed by Filter: 36081 Planning Time: 1.864 ms Execution Time: 36.338 ms (5 rows) Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes of paths,which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is thereperhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on `movie-> 'year'`? Thanks your your patience with my questions! Best, David
Attachment
Hi David, On 13/09/2023 02:16 CEST David E. Wheeler <david@justatheory.com> wrote: > CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL); > \copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json| jq -c ".[]" | sed "s|\\\\|\\\\\\\\|g"'; > create index on movies using gin (movie); > analyze movies; > > I have been confused as to the difference between @@ vs @?: Why do these > return different results? > > david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")'; > id > ---- > (0 rows) > > david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")'; > id > ---- > 10 > (1 row) > > I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), > and from the suggestion I got there, it seems that @@ expects a boolean to be > returned by the path query, while @? wraps it in an implicit exists(). Is that > right? That's also my understanding. We had a discussion about the docs on @@, @?, and jsonb_path_query on -general a while back [1]. Maybe it's useful also. > If so, I’d like to submit a patch to the docs talking about this, and > suggesting the use of jsonb_path_query() to test paths to see if they return > a boolean or not. +1 [1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com -- Erik
Op 9/13/23 om 03:00 schreef Erik Wienhold: > Hi David, > > On 13/09/2023 02:16 CEST David E. Wheeler <david@justatheory.com> wrote: > >> CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL); >> \copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json| jq -c ".[]" | sed "s|\\\\|\\\\\\\\|g"'; >> create index on movies using gin (movie); >> analyze movies; >> >> I have been confused as to the difference between @@ vs @?: Why do these >> return different results? >> >> david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")'; >> id >> ---- >> (0 rows) >> >> david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")'; >> id >> ---- >> 10 >> (1 row) >> >> I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), >> and from the suggestion I got there, it seems that @@ expects a boolean to be >> returned by the path query, while @? wraps it in an implicit exists(). Is that >> right? > > That's also my understanding. We had a discussion about the docs on @@, @?, and > jsonb_path_query on -general a while back [1]. Maybe it's useful also. > >> If so, I’d like to submit a patch to the docs talking about this, and >> suggesting the use of jsonb_path_query() to test paths to see if they return >> a boolean or not. > > +1 > > [1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com > > -- > Erik "All use of json*() functions preclude index usage." That sentence is missing from the documentation. Erik Rijkers
On Sep 13, 2023, at 01:11, Erik Rijkers <er@xs4all.nl> wrote: > "All use of json*() functions preclude index usage." > > That sentence is missing from the documentation. Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why wouldthe corresponding functions behave the same? D
Attachment
p 9/13/23 om 22:01 schreef David E. Wheeler: > On Sep 13, 2023, at 01:11, Erik Rijkers <er@xs4all.nl> wrote: > >> "All use of json*() functions preclude index usage." >> >> That sentence is missing from the documentation. > > Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why wouldthe corresponding functions behave the same? > > D Sorry, perhaps my reply was a bit off-topic. But you mentioned perhaps touching the docs and the not-use-of-index is just so unexpected. Compare these two statements: select count(id) from movies where movie @? '$ ? (@.year == 2023)' Time: 1.259 ms (index used) select count(id) from movies where jsonb_path_match(movie, '$.year == 2023'); Time: 17.260 ms (no index used - unexpectedly slower) With these two indexes available: using gin (movie); using gin (movie jsonb_path_ops); (REL_15_STABLE; but it's the same in HEAD and the not-yet-committed SQL/JSON patches.) Erik Rijkers
Erik Rijkers <er@xs4all.nl> writes: > p 9/13/23 om 22:01 schreef David E. Wheeler: >> On Sep 13, 2023, at 01:11, Erik Rijkers <er@xs4all.nl> wrote: >>> "All use of json*() functions preclude index usage." >> Where did that come from? Why wouldn’t JSON* functions use indexes? I see that the docs only mention operators; why wouldthe corresponding functions behave the same? > Sorry, perhaps my reply was a bit off-topic. > But you mentioned perhaps touching the docs and > the not-use-of-index is just so unexpected. Unexpected to who? I think the docs make it pretty plain that only operators on indexed columns are considered as index qualifications. Admittedly, 11.2 Index Types [1] makes the point only by not discussing any other case, but when you get to 11.10 Operator Classes and Operator Families [2] and discover that the entire index definition mechanism is based around operators not functions, you should be able to reach that conclusion. The point is made even more directly in 38.16 Interfacing Extensions to Indexes [3], though I'll concede that that's not material I'd expect the average PG user to read. As far as json in particular is concerned, 8.14.4 jsonb Indexing [4] is pretty clear about what is or is not supported. regards, tom lane [1] https://www.postgresql.org/docs/current/indexes-types.html [2] https://www.postgresql.org/docs/current/indexes-opclass.html [3] https://www.postgresql.org/docs/current/xindex.html [4] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
On Sep 14, 2023, at 00:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > As far as json in particular is concerned, 8.14.4 jsonb Indexing [4] > is pretty clear about what is or is not supported. How do you feel about this note, then? diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml index b6c2ddbf55..7dda727f0d 100644 --- a/doc/src/sgml/json.sgml +++ b/doc/src/sgml/json.sgml @@ -413,6 +413,13 @@ SELECT doc->'site_name' FROM websites Two GIN <quote>operator classes</quote> are provided, offering different performance and flexibility trade-offs. </para> + <note> + <para> + As with all indexes, only operators on indexed columns are considered as + index qualifications. In other words, only <type>jsonb</type> operators can + take advantage of GIN indexes; <type>jsonb</type> functions cannot. + </para> + </note> <para> The default GIN operator class for <type>jsonb</type> supports queries with the key-exists operators <literal>?</literal>, <literal>?|</literal> Best, David
Attachment
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote: > That's also my understanding. We had a discussion about the docs on @@, @?, and > jsonb_path_query on -general a while back [1]. Maybe it's useful also. Okay, I’ll take a pass at expanding the docs on this. I think a little mini-tutorial on these two operators would be useful. Meanwhile, I’d like to re-up this question about the index qualification of non-equality JSON Path operators. On Sep 12, 2023, at 20:16, David E. Wheeler <david@justatheory.com> wrote: > Issue 3: Index Use for Comparison > --------------------------------- > > From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path querywould be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage ofthe GIN index, apparently the >= operator cannot: > > david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)'; > QUERY PLAN --------------------------------------------------------------------------------------------------------- > Seq Scan on movies (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1) > Filter: (movie @? '$?($."year" >= 2023)'::jsonpath) > Rows Removed by Filter: 36081 > Planning Time: 1.864 ms > Execution Time: 36.338 ms > (5 rows) > > Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes ofpaths, which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is thereperhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on `movie-> 'year'`? Thanks, David
Attachment
"David E. Wheeler" <david@justatheory.com> writes: > On Sep 14, 2023, at 00:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> As far as json in particular is concerned, 8.14.4 jsonb Indexing [4] >> is pretty clear about what is or is not supported. > How do you feel about this note, then? I think it's unnecessary. If we did consider it necessary, why wouldn't just about every subsection in chapter 8 need similar wording? regards, tom lane
Op 9/15/23 om 22:27 schreef David E. Wheeler: > On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote: > >> That's also my understanding. We had a discussion about the docs on @@, @?, and >> jsonb_path_query on -general a while back [1]. Maybe it's useful also. > > Okay, I’ll take a pass at expanding the docs on this. I think a little mini-tutorial on these two operators would be useful. > > Meanwhile, I’d like to re-up this question about the index qualification of non-equality JSON Path operators. > > On Sep 12, 2023, at 20:16, David E. Wheeler <david@justatheory.com> wrote: > >> Issue 3: Index Use for Comparison >> --------------------------------- >> >> From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Pathquery would be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantageof the GIN index, apparently the >= operator cannot: >> >> david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)'; >> QUERY PLAN --------------------------------------------------------------------------------------------------------- >> Seq Scan on movies (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1) >> Filter: (movie @? '$?($."year" >= 2023)'::jsonpath) >> Rows Removed by Filter: 36081 >> Planning Time: 1.864 ms >> Execution Time: 36.338 ms >> (5 rows) >> >> Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes ofpaths, which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is thereperhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on `movie-> 'year'`? > movie @? '$ ?($.year >= 2023)' I believe it is indeed not possible to have such a unequality-search use the GIN index. It is another weakness of JSON that can be unexpected to those not in the fullness of Knowledge of the manual. Yes, this too would be good to explain in the doc where JSON indexes are explained. Erik Rijkers > Thanks, > > David >
On Sep 15, 2023, at 20:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think that that indicates that you're putting the info in the > wrong place. Perhaps the right answer is to insert something > more explicit in section 11.2, which is the first place where > we really spend any effort discussing what can be indexed. Fair enough. How ’bout this? --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -120,7 +120,7 @@ CREATE INDEX test1_id_index ON test1 (id); B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension <link linkend="bloom">bloom</link>. Each index type uses a different - algorithm that is best suited to different types of queries. + algorithm that is best suited to different types of queries and operators. By default, the <link linkend="sql-createindex"><command>CREATE INDEX</command></link> command creates B-tree indexes, which fit the most common situations. @@ -132,6 +132,14 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> </programlisting> </para> + <note> + <para> + Only operators on indexed columns are considered as index qualifications. + Functions never qualify for index usage, aside from + <link linkend="indexes-expressional">indexes on expressions</link>. + </para> + </note> + <sect2 id="indexes-types-btree"> <title>B-Tree</title>
Attachment
On Sep 15, 2023, at 23:59, Erik Rijkers <er@xs4all.nl> wrote: > movie @? '$ ?($.year >= 2023)' > > I believe it is indeed not possible to have such a unequality-search use the GIN index. It is another weakness of JSONthat can be unexpected to those not in the fullness of Knowledge of the manual. Yes, this too would be good to explainin the doc where JSON indexes are explained. Is that a limitation of GIN indexes in general? Or could there be opclass improvements in the future that would enable suchcomparisons? Thanks, David
Attachment
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote: >> If so, I’d like to submit a patch to the docs talking about this, and >> suggesting the use of jsonb_path_query() to test paths to see if they return >> a boolean or not. > > +1 I’ve started work on this; there’s so much to learn! Here’s a new example that surprised me a bit. Using the GPS trackerexample from the docs [1] loaded into a `:json` psql variable, this output of this query makes perfect sense to me: david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 14)'); jsonb_path_query ------------------ 13.4034 13.2635 Because `[*]` selects all the values. This, however, I did not expect: david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 14)'); jsonb_path_query ------------------ 13.4034 13.2635 (2 rows) I had expected it to return two single-value arrays, instead: [13.4034] [13.2635] It appears that the filter expression is doing some sub-selection, too. Is that expected? Best, David [1]: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH
Attachment
On 16/09/2023 22:19 CEST David E. Wheeler <david@justatheory.com> wrote: > On Sep 15, 2023, at 23:59, Erik Rijkers <er@xs4all.nl> wrote: > > > movie @? '$ ?($.year >= 2023)' > > > > I believe it is indeed not possible to have such a unequality-search use > > the GIN index. It is another weakness of JSON that can be unexpected to > > those not in the fullness of Knowledge of the manual. Yes, this too would > > be good to explain in the doc where JSON indexes are explained. > > Is that a limitation of GIN indexes in general? Or could there be opclass > improvements in the future that would enable such comparisons? This detail is mentioned in docs [1]: "For these operators, a GIN index extracts clauses of the form **accessors_chain = constant** out of the jsonpath pattern, and does the index search based on the keys and values mentioned in these clauses." I don't know if this is a general limitation of GIN indexes or just how these operators are implemented right now. [1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING -- Erik
On Sep 16, 2023, at 16:50, Erik Wienhold <ewie@ewie.name> wrote: > "For these operators, a GIN index extracts clauses of the form > **accessors_chain = constant** out of the jsonpath pattern, and does the > index search based on the keys and values mentioned in these clauses." > > I don't know if this is a general limitation of GIN indexes or just how these > operators are implemented right now. > > [1] https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING The detail that jumps out at me is this one on jsonb_path_ops: “Basically, each jsonb_path_ops index item is a hash of the value and the key(s) leading to it” Because jsonb_path_ops indexes hashes, I would assume it would only support path equality. But it’s not clear to me fromthese docs that jsonb_ops also indexes hashes. Does it? Best, D
Attachment
On 16/09/2023 22:26 CEST David E. Wheeler <david@justatheory.com> wrote: > I’ve started work on this; there’s so much to learn! Here’s a new example > that surprised me a bit. Using the GPS tracker example from the docs [1] > loaded into a `:json` psql variable, this output of this query makes perfect > sense to me: > > david=# select jsonb_path_query(:'json', '$.track.segments.location[*] ? (@ < 14)'); > jsonb_path_query > ------------------ > 13.4034 > 13.2635 > > Because `[*]` selects all the values. This, however, I did not expect: > > david=# select jsonb_path_query(:'json', '$.track.segments.location ? (@[*] < 14)'); > jsonb_path_query > ------------------ > 13.4034 > 13.2635 > (2 rows) > > I had expected it to return two single-value arrays, instead: > > [13.4034] > [13.2635] > > It appears that the filter expression is doing some sub-selection, too. > Is that expected? Looks like the effect of lax mode which may unwrap arrays when necessary [1]. The array unwrapping looks like the result of jsonb_array_elements(). It kinda works in strict mode: SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] < 14)'); jsonb_path_query ----------------------- [47.763, 13.4034] [47.706, 13.2635] (2 rows) But it does not remove elements from the matching arrays. Which I don't even expect here because the path specifies the location array as the object to be returned. The filter expression then only decides whether to return the location array or not. Nowhere in the docs does it say that the filter expression itself removes any elements from a matched array. Here's a query that filter's out individual array elements. It's quite a mouthful (especially to preserve the order of array elements): WITH location AS ( SELECT loc, row_number() OVER () AS array_num FROM jsonb_path_query(:'json', 'strict $.track.segments[*].location') loc ), element AS ( SELECT array_num, e.num AS elem_num, e.elem FROM location CROSS JOIN jsonb_array_elements(loc) WITH ORDINALITY AS e (elem, num) ) SELECT jsonb_agg(elem ORDER BY elem_num) FROM element WHERE jsonb_path_exists(elem, '$ ? (@ < 14)') GROUP BY array_num; jsonb_agg --------------- [13.2635] [13.4034] (2 rows) [1] https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES -- Erik
On Sep 16, 2023, at 18:13, Erik Wienhold <ewie@ewie.name> wrote: > Looks like the effect of lax mode which may unwrap arrays when necessary [1]. > The array unwrapping looks like the result of jsonb_array_elements(). > > It kinda works in strict mode: > > SELECT jsonb_path_query(:'json', 'strict $.track.segments[*].location ? (@[*] < 14)'); > > jsonb_path_query > ----------------------- > [47.763, 13.4034] > [47.706, 13.2635] > (2 rows) > > But it does not remove elements from the matching arrays. Which I don't even > expect here because the path specifies the location array as the object to be > returned. The filter expression then only decides whether to return the > location array or not. Nowhere in the docs does it say that the filter > expression itself removes any elements from a matched array. Yes, this is what I expected. It means “select the location array if any of its contents is less that 14.” I don’t understand why it’s different in lax mode, though, as `@[*]` is not a structural error; it confirms to the schema,as the docs say. The flattening in this case seems weird. Ah, here’s why:, from the docs: "Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSON arrays out-of-the-box.” There follow some discussion of the need to specify `[*]` on segments in strict mode, but since that’s exactly what my exampledoes (and the same for the locations array inside the filter), it doesn’t seem right to me that it would be unwrappedhere. > Here's a query that filter's out individual array elements. It's quite a > mouthful (especially to preserve the order of array elements): Wow fun, and yeah, it makes sense to take things apart in SQL for this sort of thing! Best, David
Attachment
"David E. Wheeler" <david@justatheory.com> writes: > On Sep 15, 2023, at 20:36, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think that that indicates that you're putting the info in the >> wrong place. Perhaps the right answer is to insert something >> more explicit in section 11.2, which is the first place where >> we really spend any effort discussing what can be indexed. > Fair enough. How ’bout this? After thinking about it for awhile, I think we need some more discursive explanation of what's allowed, perhaps along the lines of the attached. (I still can't shake the feeling that this is duplicative; but I can't find anything comparable until you get into the weeds in Section V.) I put the new text at the end of section 11.1, but perhaps it belongs a little further up in that section; it seems more important than some of the preceding paras. regards, tom lane diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index 55122129d5..1a0b003fb0 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -109,6 +109,39 @@ CREATE INDEX test1_id_index ON test1 (id); Therefore indexes that are seldom or never used in queries should be removed. </para> + + <para> + In general, <productname>PostgreSQL</productname> indexes can be used + to optimize queries that contain one or more <literal>WHERE</literal> + or <literal>JOIN</literal> clauses of the form + +<synopsis> +<replaceable>indexed-column</replaceable> <replaceable>indexable-operator</replaceable> <replaceable>comparison-value</replaceable> +</synopsis> + + Here, the <replaceable>indexed-column</replaceable> is whatever + column or expression the index has been defined on. + The <replaceable>indexable-operator</replaceable> is an operator that + is a member of the index's <firstterm>operator class</firstterm> for + the indexed column. (More details about that appear below.) + And the <replaceable>comparison-value</replaceable> can be any + expression that is not volatile and does not reference the index's + table. + </para> + + <para> + In some cases the query planner can extract an indexable clause of + this form from another SQL construct. A simple example is that if + the original clause was + +<synopsis> +<replaceable>comparison-value</replaceable> <replaceable>operator</replaceable> <replaceable>indexed-column</replaceable> +</synopsis> + + then it can be flipped around into indexable form if the + original <replaceable>operator</replaceable> has a commutator + operator that is a member of the index's operator class. + </para> </sect1> @@ -120,7 +153,7 @@ CREATE INDEX test1_id_index ON test1 (id); B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and the extension <link linkend="bloom">bloom</link>. Each index type uses a different - algorithm that is best suited to different types of queries. + algorithm that is best suited to different types of indexable clauses. By default, the <link linkend="sql-createindex"><command>CREATE INDEX</command></link> command creates B-tree indexes, which fit the most common situations.
On Sep 17, 2023, at 12:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > After thinking about it for awhile, I think we need some more > discursive explanation of what's allowed, perhaps along the lines > of the attached. (I still can't shake the feeling that this is > duplicative; but I can't find anything comparable until you get > into the weeds in Section V.) > > I put the new text at the end of section 11.1, but perhaps it > belongs a little further up in that section; it seems more > important than some of the preceding paras. I think this is useful, but also that it’s worth calling out explicitly that functions do not count as indexable operators.True by definition, of course, but I at least had assumed that since an operator is, in a sense, syntax sugar fora function call, they are in some sense the same thing. A header might be useful, something like “What Counts as an indexable expression”. Best, David
Attachment
On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote: >> I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), >> and from the suggestion I got there, it seems that @@ expects a boolean to be >> returned by the path query, while @? wraps it in an implicit exists(). Is that >> right? > > That's also my understanding. We had a discussion about the docs on @@, @?, and > jsonb_path_query on -general a while back [1]. Maybe it's useful also. Hi, finally getting back to this, still fiddling to figure out the differences. From the thread you reference [1], is thepoint that @@ and jsonb_path_match() can only be properly used with a JSON Path expression that’s a predicate check? If so, as far as I can tell, only exists() around the entire path query, or the deviation from the SQL standard that allowsan expression to be a predicate? This suggest to me that the "Only the first item of the result is taken into account” bit from the docs may not be quiteright. Consider this example: david=# select jsonb_path_query('{"a":[false,true,false]}', '$.a ?(@[*] == false)'); jsonb_path_query ------------------ false false (2 rows) david=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ?(@[*] == false)'); ERROR: single boolean result is expected jsonb_path_match(), it turns out, only wants a single result. But furthermore perhaps the use of a filter predicate ratherthan a predicate expression for the entire path query is an error? Curiously, @@ seems okay with it: david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)'; ?column? ---------- t Not a predicate query, and somehow returns true even though the first item of the result is false? Is that how it shouldbe? Best, David [1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com
On 2023-10-09 01:13 +0200, David E. Wheeler write: > On Sep 12, 2023, at 21:00, Erik Wienhold <ewie@ewie.name> wrote: > > >> I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), > >> and from the suggestion I got there, it seems that @@ expects a boolean to be > >> returned by the path query, while @? wraps it in an implicit exists(). Is that > >> right? > > > > That's also my understanding. We had a discussion about the docs on @@, @?, and > > jsonb_path_query on -general a while back [1]. Maybe it's useful also. > > Hi, finally getting back to this, still fiddling to figure out the > differences. From the thread you reference [1], is the point that @@ > and jsonb_path_match() can only be properly used with a JSON Path > expression that’s a predicate check? I think so. That's also supported by the existing docs which only mention "JSON path predicate" for @@ and jsonb_path_match(). > If so, as far as I can tell, only exists() around the entire path > query, or the deviation from the SQL standard that allows an > expression to be a predicate? Looks like that. But note that exists() is also a filter expression. So wrapping the entire jsonpath in exists() is also a deviation from the SQL standard which only allows predicates in filter expressions, i.e. '<path> ? (<predicate>)'. > This suggest to me that the "Only the first item of the result is > taken into account” bit from the docs may not be quite right. Yes, this was also the issue in the referenced thread[1]. I think my suggesstion in [2] explains it (as far as I understand it). > Consider this example: > > david=# select jsonb_path_query('{"a":[false,true,false]}', '$.a ?(@[*] == false)'); > jsonb_path_query > ------------------ > false > false > (2 rows) > > david=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ?(@[*] == false)'); > ERROR: single boolean result is expected > > jsonb_path_match(), it turns out, only wants a single result. But > furthermore perhaps the use of a filter predicate rather than a > predicate expression for the entire path query is an error? Yes, I think @@ and jsonb_path_match() should not be used with filter expressions because the jsonpath returns whatever the path expression yields (which may be an actual boolean value in the jsonb). The filter expression only filters (as the name suggests) what the path expression yields. > Curiously, @@ seems okay with it: > > david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)'; > ?column? > ---------- > t > > Not a predicate query, and somehow returns true even though the first > item of the result is false? Is that how it should be? Your example does a text search equivalent to: select to_tsvector('{"a":[false,true,false]}') @@ plainto_tsquery('$.a ? (@[*] == true)') You forgot the cast to jsonb. jsonb @@ jsonpath actually returns null: test=# select '{"a":[false,true,false]}'::jsonb @@ '$.a ? (@[*] == false)'; ?column? ---------- <null> (1 row) This matches the note right after the docs for @@: "The jsonpath operators @? and @@ suppress the following errors: missing object field or array element, unexpected JSON item type, datetime and numeric errors. The jsonpath-related functions described below can also be told to suppress these types of errors. This behavior might be helpful when searching JSON document collections of varying structure." That would be the silent argument of jsonb_path_match(): test=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ? (@[*] == false)', silent => true); jsonb_path_match ------------------ <null> (1 row) [1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com [2] https://www.postgresql.org/message-id/880194083.579916.1680598906819%40office.mailbox.org -- Erik
Thanks for the reply, Erik. Have appreciated collaborating with you on a few different things lately! > On Oct 13, 2023, at 22:50, Erik Wienhold <ewie@ewie.name> wrote: >> Hi, finally getting back to this, still fiddling to figure out the >> differences. From the thread you reference [1], is the point that @@ >> and jsonb_path_match() can only be properly used with a JSON Path >> expression that’s a predicate check? > > I think so. That's also supported by the existing docs which only > mention "JSON path predicate" for @@ and jsonb_path_match(). Okay, good. >> If so, as far as I can tell, only exists() around the entire path >> query, or the deviation from the SQL standard that allows an >> expression to be a predicate? > > Looks like that. But note that exists() is also a filter expression. > So wrapping the entire jsonpath in exists() is also a deviation from the > SQL standard which only allows predicates in filter expressions, i.e. > '<path> ? (<predicate>)'. Yeah. I’m starting to get the sense that the Postgres extension of the standard to allow predicates without filters is almosta different thing, like there are two Pg SQL/JSON Path languages: 1. SQL Standard path language for selecting values and includes predicates. Returns the selected value(s). Supported by `@?`and jsonb_path_exists(). 2. The Postgres predicate path language which returns a boolean, akin to a WHERE expression. Supported by `@@` and jsonb_path_match() Both are supported by jsonb_path_query(), but if you use a standard path you get the values and if you use a predicate pathyou get a boolean. This feels a big overloaded to me, TBH; I find myself wanting them to be separate types since thebehaviors vary quite a bit! >> This suggest to me that the "Only the first item of the result is >> taken into account” bit from the docs may not be quite right. > > Yes, this was also the issue in the referenced thread[1]. I think my > suggesstion in [2] explains it (as far as I understand it). Yeah, lax vs. strict mode stuff definitely creates some added complexity. I see now I missed the rest of that thread; seeingthe entire thread on one page[1] really helps. I’d like to take a stab at the doc improvements Tom suggests[2]. >> jsonb_path_match(), it turns out, only wants a single result. But >> furthermore perhaps the use of a filter predicate rather than a >> predicate expression for the entire path query is an error? > > Yes, I think @@ and jsonb_path_match() should not be used with filter > expressions because the jsonpath returns whatever the path expression > yields (which may be an actual boolean value in the jsonb). The filter > expression only filters (as the name suggests) what the path expression > yields. Agreed. It only gets worse with a filter expression that selects a single value: david=# select jsonb_path_match('{"a":[false,true]}', '$.a ?(@[*] == false)'); jsonb_path_match ------------------ f Presumably it returns false because the value selected is JSON `false`: david=# select jsonb_path_query('{"a":[false,true]}', '$.a ?(@[*] == false)'); jsonb_path_query ------------------ false Which seems misleading, frankly. Would it be possible to update jsonb_path_match and @@ to raise an error when the path expressionis not a predicate? >> Curiously, @@ seems okay with it: >> >> david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)'; >> ?column? >> ---------- >> t >> >> Not a predicate query, and somehow returns true even though the first >> item of the result is false? Is that how it should be? > > Your example does a text search equivalent to: > > select to_tsvector('{"a":[false,true,false]}') @@ plainto_tsquery('$.a ? (@[*] == true)') > > You forgot the cast to jsonb. Oh good grief 🤦🏻♂️ > jsonb @@ jsonpath actually returns null: > > test=# select '{"a":[false,true,false]}'::jsonb @@ '$.a ? (@[*] == false)'; > ?column? > ---------- > <null> > (1 row) Yes, much better, though see the result above that returns a single `false` and confuses things. > This matches the note right after the docs for @@: Yeah, that makes sense. But here’s a bit about lax mode[3] that confuses me: > The lax mode facilitates matching of a JSON document structure and path expression if the JSON data does not conform tothe expected schema. If an operand does not match the requirements of a particular operation, it can be automatically wrappedas an SQL/JSON array or unwrapped by converting its elements into an SQL/JSON sequence before performing this operation.Besides, comparison operators automatically unwrap their operands in the lax mode, so you can compare SQL/JSONarrays out-of-the-box. This automatic flattening in lax mode seems odd, because it means you get different results in strict and lax mode wherethere are no errors. In lax mode, you get a set: david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)'); jsonb_path_query ------------------ 3 4 5 (3 rows) But in strict mode, you get the array selected by `$.a`, which is more what I would expect: david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)'); jsonb_path_query ------------------ [1, 2, 3, 4, 5] This seems like an odd inconsistency in return values, but perhaps the standard calls for this? I don’t have access to it,but MSSQL docs[4], at least, say: > * In **lax** mode, the function returns empty values if the path expression contains an error. For example, if you requestthe value **$.name**, and the JSON text doesn't contain a **name** key, the function returns null, but does not raisean error. > > * In **strict** mode, the function raises an error if the path expression contains an error. No flattening, only error suppression. The Oracle docs[5] mention array flattening, but I don’t have it up and running tosee if that means query *results* are flattened. Best, David [1] https://www.postgresql.org/message-id/flat/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com [2] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us [3] https://www.postgresql.org/docs/current/functions-json.html#STRICT-AND-LAX-MODES [4] https://learn.microsoft.com/en-us/sql/relational-databases/json/json-path-expressions-sql-server?view=sql-server-ver16#PATHMODE [5] https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/json-path-expressions.html#GUID-8656CAB9-C293-4A99-BB62-F38F3CFC4C13
On Sep 17, 2023, at 18:09, David E. Wheeler <david@justatheory.com> wrote: > I think this is useful, but also that it’s worth calling out explicitly that functions do not count as indexable operators.True by definition, of course, but I at least had assumed that since an operator is, in a sense, syntax sugar fora function call, they are in some sense the same thing. > > A header might be useful, something like “What Counts as an indexable expression”. Hey Tom, are you still thinking about adding this bit to the docs? I took a quick look at master and didn’t see it there. Thanks, David
"David E. Wheeler" <david@justatheory.com> writes: > Hey Tom, are you still thinking about adding this bit to the docs? I took a quick look at master and didn’t see it there. I'd waited because the discussion was still active, and then it kind of slipped off the radar. I'll take another look and push some form of what I suggested. That doesn't really address the jsonpath oddities you were on about, though. regards, tom lane
On Dec 17, 2023, at 16:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'd waited because the discussion was still active, and then it > kind of slipped off the radar. I'll take another look and push > some form of what I suggested. Right on. > That doesn't really address the > jsonpath oddities you were on about, though. No, I attempted to address those in [a patch][1]. [1]: https://commitfest.postgresql.org/45/4624/ Best, David