Thread: [HACKERS] jsonb problematic operators
There is this problem with the jsonb operators "? text" "?| text[]" and "?& text[]" that the question mark is typically used for prepared statement parameters in the most used abstraction APIs in Java and PHP. This really needs an alternative. Something like "HAS text", "HAS ANY(text[])" and "HAS ALL(text[])" same as regular array usage. It probably should be another word that has less chance of becoming a conflict with another operator in future SQL specifications, but that's for you to decide. It's not a good idea to expect everyone else to make for workarounds for problems you choose to create.
On 9 December 2016 at 11:50, Jordan Gigov <coladict@gmail.com> wrote: > There is this problem with the jsonb operators "? text" "?| text[]" > and "?& text[]" that the question mark is typically used for prepared > statement parameters in the most used abstraction APIs in Java and > PHP. > > This really needs an alternative. Something like "HAS text", "HAS > ANY(text[])" and "HAS ALL(text[])" same as regular array usage. It > probably should be another word that has less chance of becoming a > conflict with another operator in future SQL specifications, but > that's for you to decide. You mean something like the jsonb_ functions ? \df jsonb* > It's not a good idea to expect everyone else to make for workarounds > for problems you choose to create. I'd say it's not a good idea to come asking questions of a mailing list with an attitude like that, but hey, it's nearly Holidays. Geoff
There is this problem with the jsonb operators "? text" "?| text[]"
and "?& text[]" that the question mark is typically used for prepared
statement parameters in the most used abstraction APIs in Java and
PHP.
Unfortunately true. These APIs made a poor decision in taking a very useful query operator character, the question mark, and turning it into a position-important placeholder for query parameters. Using "$1, $2, $3, etc..." is a much better design since you have fewer things to worry about if you modify the query and add (or want to reuse) a parameter.
Given that PostgreSQL already choose to go with the better designed API here the fact that the relatively new JSON feature allows "?" to be used as an operator character makes perfect sense.
This really needs an alternative. Something like "HAS text", "HAS
ANY(text[])" and "HAS ALL(text[])" same as regular array usage. It
probably should be another word that has less chance of becoming a
conflict with another operator in future SQL specifications, but
that's for you to decide.
I don't think making it a keyword is a good idea, or possible via extension, but otherwise you are free to create custom operators (and related infrastructure) if this bothers you enough. Such a "JDBC compatability extension" would probably be welcomed by the community.
It's not a good idea to expect everyone else to make for workarounds
for problems you choose to create.
The choosing was for a superior, and internally consistent, design. While I see the problem you bring up I don't see introducing yet another set of alternative operators to core. But as mentioned one of the advantages of PostgreSQL is its extensability and hopefully someone will enjoy working with Open Source PostgreSQL in their affected language enough to consider doing something about it.
David J.
On Fri, Dec 9, 2016 at 6:50 AM, Jordan Gigov <coladict@gmail.com> wrote: > It's not a good idea to expect everyone else to make for workarounds > for problems you choose to create. True. I actually kinda agree that the use of ? wasn't a great choice here, precisely because a number of drivers do use it to indicate a placeholder. However, I also think that it was done without realizing that it was going to create problems. Your phrasing implies that we did that on purpose just to mess with users, which isn't true. As Geoff says, you don't have to use the operators; you could use the equivalent functions instead. Every operator just gets turned into a function call internally, so this is always possible. It would also be smart for driver authors who use ? to indicate a placeholder to also provide some way of escaping it. There are plenty of perfectly valid PostgreSQL queries that include a ? as something other than a driver-interpreted placeholder, and if driver authors have failed to foresee that, it's not entirely our fault. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Dec 9, 2016 at 5:50 AM, Jordan Gigov <coladict@gmail.com> wrote: > There is this problem with the jsonb operators "? text" "?| text[]" > and "?& text[]" that the question mark is typically used for prepared > statement parameters in the most used abstraction APIs in Java and > PHP. > > This really needs an alternative. Something like "HAS text", "HAS > ANY(text[])" and "HAS ALL(text[])" same as regular array usage. It > probably should be another word that has less chance of becoming a > conflict with another operator in future SQL specifications, but > that's for you to decide. > > It's not a good idea to expect everyone else to make for workarounds > for problems you choose to create. You are griping in the wrong place. "everyone else" has reserved characters for its own use that were not allowed to be reserved without a clean escaping mechanism -- hibernate does this, for example reserving ':' which is used in many places within SQL. Typically when you embed special characters in strings designed to be processed by something else you allow for that character to be directly. In the computer science world we generally call this escaping strings and it a very common and well understood practice. For some odd reason however the authors of java various frameworks seem to be impervious to the utility of the concept. merlin
As Geoff says, you don't have to use the operators; you could use the
equivalent functions instead. Every operator just gets turned into a
function call internally, so this is always possible.
In most cases - the decision to tie indexing to operators makes this imperfect. Whether there is an actual problem with these operators I am unsure.
David J.
On 2016-12-09 12:17:32 -0500, Robert Haas wrote: > As Geoff says, you don't have to use the operators; you could use the > equivalent functions instead. Every operator just gets turned into a > function call internally, so this is always possible. Well, except that only operators support indexing :( - Andres
On 9 Dec 2016 17:54, "Andres Freund" <andres@anarazel.de> wrote:
On 2016-12-09 12:17:32 -0500, Robert Haas wrote:Well, except that only operators support indexing :(
> As Geoff says, you don't have to use the operators; you could use the
> equivalent functions instead. Every operator just gets turned into a
> function call internally, so this is always possible.
Really? Seems like an odd design decision.
The only other simple suggestion then would be to use PDO named parameters instead of positional ones. Much nicer syntax anyway, IMO.
Geoff
On 11 December 2016 at 18:52, Geoff Winkless <gwinkless@gmail.com> wrote: > On 9 Dec 2016 17:54, "Andres Freund" <andres@anarazel.de> wrote: > > On 2016-12-09 12:17:32 -0500, Robert Haas wrote: >> As Geoff says, you don't have to use the operators; you could use the >> equivalent functions instead. Every operator just gets turned into a >> function call internally, so this is always possible. > > Well, except that only operators support indexing :( > > > Really? Seems like an odd design decision. > > The only other simple suggestion then would be to use PDO named parameters > instead of positional ones. Much nicer syntax anyway, IMO. You can also create alternate names for the operators, but it's a bit of a pain if you want indexing support. Though I thought we defined alternative names for exactly this reason, but I don't see them... It's definitely annoying, in both directions. ? wasn't a great choice for an operator character but it's logical and was grandfathered over from hstore. PDO not offering any way to escape parameter binding characters is at least as bad. What client interface provides no way to pass-through strings it would otherwise treat as special? Does PDO cope if you use the OPERATOR("pg_catalog".?) form? Or does it still try to bind the parameter? e.g. postgres=> SELECT '{}'::jsonb OPERATOR("pg_catalog".?) 'fred';?column? ----------f (1 row) Does PDO let you double question marks to escape them, writing ?? or \? instead of ? or anything like that? If not, I suggest that you (a) submit a postgres patch adding alternative operator names for ? and ?|, and (b) submit a PDO patch to allow ?? or \? as an escape for ? . -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer <craig@2ndquadrant.com> writes: > It's definitely annoying, in both directions. ? wasn't a great choice > for an operator character but it's logical and was grandfathered over > from hstore. It was grandfathered from a lot further back than that. A quick look into the system catalogs says that core Postgres currently has 21 operators that include "?" in their names. Three of those are the jsonb operators, and the other 18 have been there since circa 1997. (Most of them seem to date to Tom Lockhart's commit 3c2d74d2a, but "<?>" is present in Berkeley Postgres v4r2, released in 1994.) I do not have a lot of patience with client-side code that's unable to deal with operator names containing "?". It's not like this requirement has blindsided anybody in this century. regards, tom lane
On 12 December 2016 at 12:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig Ringer <craig@2ndquadrant.com> writes: >> It's definitely annoying, in both directions. ? wasn't a great choice >> for an operator character but it's logical and was grandfathered over >> from hstore. > > It was grandfathered from a lot further back than that. A quick look > into the system catalogs says that core Postgres currently has 21 > operators that include "?" in their names. Three of those are the > jsonb operators, and the other 18 have been there since circa 1997. > (Most of them seem to date to Tom Lockhart's commit 3c2d74d2a, but > "<?>" is present in Berkeley Postgres v4r2, released in 1994.) > > I do not have a lot of patience with client-side code that's unable > to deal with operator names containing "?". It's not like this > requirement has blindsided anybody in this century. Pretty much. Nor is it the only oddity you have to deal with when working with different DBMSes. PgJDBC allows you to write ??, which is ugly, but tolerable, since the JDBC spec doesn't have an escape syntax for it. We could've extended the JDBC escape syntax with a new kind of {postgres } escape if we'd wanted instead, but it'd still be nonportable so the driver went for the less verbose option. PDO should do something similar. It's not like client code using our ? operators has to be portable. I didn't realise Pg's use of ? was that old, so thanks. That makes offering alternatives much less appealing. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12 December 2016 at 04:59, Craig Ringer <craig@2ndquadrant.com> wrote: > I didn't realise Pg's use of ? was that old, so thanks. That makes > offering alternatives much less appealing. One option might be for Postgres to define duplicate operator names using ¿ or something else. I think ¿ is a good choice because it's a common punctuation mark in spanish so it's probably not hard to find on a lot of keyboards or hard to find instructions on how to type one. There is always a risk in allowing redundant syntaxes though. For example people running grep to find all uses of an operator will miss the alternate spelling. There may even be security implications for that though to be honest that seems unlikely in this case. -- greg
On Sun, Dec 11, 2016 at 10:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > PgJDBC allows you to write ??, which is ugly, but tolerable, since the > JDBC spec doesn't have an escape syntax for it. This is the core problem; *JDBC* is busted. SQL reserves words but not punctuation marks so any assumption by client side code that characters are not going to be interpreted by the server are going to cause problems. As noted earlier ":" is equally problematic as that is hibernate's parameter marker and hibernate is probably in even greater usage than naked JDBC in the java community. Imagine trying to embed, say, perl, in java and reserving the very same punctuation marks and then complaining to the perl community that their language is broken due to usage overlap...that's what's happening here. If we really wanted to fix this, maybe the right way to think about the problem is a highly reduced character set and a pre-processor or an extension. merlin
On 12 Dec. 2016 22:22, "Merlin Moncure" <mmoncure@gmail.com> wrote:
If we really wanted to fix this, maybe the right wayto think about the problem is a highly reduced character set and a
pre-processor or an extension.
I'm pretty OK with expecting client drivers not to be stupid and offer escape syntax for their placeholders. I was kind of astonished the jdbc spec doesn't given how hard it works to be compatible. I'd assumed the ?? was jdbc standard not a postgres extension until I looked into it.
Certainly PDO needs a patch for its postgres driver if not (preferably) to support escaping at a higher level.
Users can work around it by defining their own operator aliases + opclasses or using function forms.
Doesn't seem worth too much drama though it sucks for usability.
On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark <stark@mit.edu> wrote: > On 12 December 2016 at 04:59, Craig Ringer <craig@2ndquadrant.com> wrote: >> I didn't realise Pg's use of ? was that old, so thanks. That makes >> offering alternatives much less appealing. > > One option might be for Postgres to define duplicate operator names > using ¿ or something else. I think ¿ is a good choice because it's a > common punctuation mark in spanish so it's probably not hard to find > on a lot of keyboards or hard to find instructions on how to type one. > > There is always a risk in allowing redundant syntaxes though. For > example people running grep to find all uses of an operator will miss > the alternate spelling. There may even be security implications for > that though to be honest that seems unlikely in this case. Are you sure that using a non-ASCII character is a good idea for an in-core operator? I would think no. -- Michael
On Mon, Dec 12, 2016 at 8:26 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark <stark@mit.edu> wrote: >> On 12 December 2016 at 04:59, Craig Ringer <craig@2ndquadrant.com> wrote: >>> I didn't realise Pg's use of ? was that old, so thanks. That makes >>> offering alternatives much less appealing. >> >> One option might be for Postgres to define duplicate operator names >> using ¿ or something else. I think ¿ is a good choice because it's a >> common punctuation mark in spanish so it's probably not hard to find >> on a lot of keyboards or hard to find instructions on how to type one. >> >> There is always a risk in allowing redundant syntaxes though. For >> example people running grep to find all uses of an operator will miss >> the alternate spelling. There may even be security implications for >> that though to be honest that seems unlikely in this case. > > Are you sure that using a non-ASCII character is a good idea for an > in-core operator? I would think no. I would agree with your thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 13, 2016 at 10:26:24AM +0900, Michael Paquier wrote: > On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark <stark@mit.edu> wrote: > > One option might be for Postgres to define duplicate operator names > > using ¿ or something else. I think ¿ is a good choice because it's a > > common punctuation mark in spanish so it's probably not hard to find > > on a lot of keyboards or hard to find instructions on how to type one. > > Are you sure that using a non-ASCII character is a good idea for an > in-core operator? I would think no. Eventually language designers will cross that Rubicon in mainstream languages. And why not? It sure would be convenient... from the designer's p.o.v. Of course, _users_ would be annoyed, as most users in the English-speaking world will have no idea how to type such characters, most others also will not know how to, and there will be users still using non-Unicode locales who will be unable to type such characters at all. Cut-n-paste will save the day, not doubt, though mostly/only for users using Unicode locales. But it is tempting. Using non-ASCII Unicode characters for _alternatives_ seems like a possible starting point though, since that leaves users with a universally- available ASCII alternative. Still, now users would then have to recognize multiple equivalent forms... ugh. Nico --
Nico Williams <nico@cryptonector.com> writes: > On Tue, Dec 13, 2016 at 10:26:24AM +0900, Michael Paquier wrote: >> On Mon, Dec 12, 2016 at 10:22 PM, Greg Stark <stark@mit.edu> wrote: >>> One option might be for Postgres to define duplicate operator names >>> using ¿ or something else. >> Are you sure that using a non-ASCII character is a good idea for an >> in-core operator? I would think no. > Eventually language designers will cross that Rubicon in mainstream > languages. And why not? Because it will create all sorts of character-set-conversion hazards. In the current design of Postgres it flat out would not work, because template0 has to be encoding-agnostic. regards, tom lane
Hi, On 12/12/2016 05:09, Craig Ringer wrote: > Does PDO let you double question marks to escape them, writing ?? or > \? instead of ? or anything like that? > > If not, I suggest that you (a) submit a postgres patch adding > alternative operator names for ? and ?|, and (b) submit a PDO patch to > allow ?? or \? as an escape for ? . For reference, my plan would be to get "\?" into PDO_pgsql for PHP 7.2. I've tried to get it into 7.1, but I was a bit too late into the RC process to safely do that. Since PDO itself has no escaping yet, I'm open to suggestions wrt to the actual escape method to use. Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
On 16 December 2016 at 17:08, Matteo Beccati <php@beccati.com> wrote: > Hi, > > On 12/12/2016 05:09, Craig Ringer wrote: >> Does PDO let you double question marks to escape them, writing ?? or >> \? instead of ? or anything like that? >> >> If not, I suggest that you (a) submit a postgres patch adding >> alternative operator names for ? and ?|, and (b) submit a PDO patch to >> allow ?? or \? as an escape for ? . > > For reference, my plan would be to get "\?" into PDO_pgsql for PHP 7.2. > I've tried to get it into 7.1, but I was a bit too late into the RC > process to safely do that. > > Since PDO itself has no escaping yet, I'm open to suggestions wrt to the > actual escape method to use. SQL typically uses doubling, such that the literal I'm becomes 'I''m' and the identifier Bob "Kaboom" Jones becomes "Bob ""Kaboom"" Jones" so it would be consistent with that to use ?? as a literal ? in the output query. This is also what PgJDBC does, per https://jdbc.postgresql.org/documentation/head/statement.html . So it's consistent . PHP usually uses backslash escapes, C-style. But this is UGLY when you're escaping something in a string. Since \? already has a defined meaning in PHP, you have to write \\? so that the first \ is consumed by string parsing and the resulting \? is sent to PDO, which then turns it into ? in the output SQL. This will confuse a lot of users. Using ?? has no such issues. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 16 December 2016 at 09:35, Craig Ringer <craig@2ndquadrant.com> wrote: > so it would be consistent with that to use ?? as a literal ? in the > output query. > > This is also what PgJDBC does, per > https://jdbc.postgresql.org/documentation/head/statement.html . So > it's consistent . "Me too". To look at this from the other angle, is there a reason why the jsonb indexes don't work with the jsonb_ functions but only with the operators? Is this something that could be changed easily? It seems like that would workaround this issue without requiring effort or agreement from the PHP side. Geoff
Geoff Winkless <gwinkless@gmail.com> writes: > To look at this from the other angle, is there a reason why the jsonb > indexes don't work with the jsonb_ functions but only with the > operators? Is this something that could be changed easily? Yes. No. However, if you're desperate, you could make SQL wrapper functions: regression=# create function my_jsonb_exists(jsonb, text) returns bool regression-# language sql as 'select $1 ? $2'; CREATE FUNCTION regression=# create table foo(f1 jsonb); CREATE TABLE regression=# create index on foo using gin(f1); CREATE INDEX regression=# explain select * from foo where my_jsonb_exists(f1, 'bar'); QUERY PLAN -------------------------------------------------------------------------Bitmap Heap Scan on foo (cost=8.01..12.02 rows=1width=32) Recheck Cond: (f1 ? 'bar'::text) -> Bitmap Index Scan on foo_f1_idx (cost=0.00..8.01 rows=1 width=0) Index Cond: (f1 ? 'bar'::text) (4 rows) This works because a simple SQL function like this will get inlined before any interesting planning decisions are made. regards, tom lane