Re: Define jsonpath functions as stable - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Define jsonpath functions as stable |
Date | |
Msg-id | 17122.1568760023@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Define jsonpath functions as stable ("Jonathan S. Katz" <jkatz@postgresql.org>) |
Responses |
Re: Define jsonpath functions as stable
|
List | pgsql-hackers |
"Jonathan S. Katz" <jkatz@postgresql.org> writes: > v2 attached. Thanks! I whacked this around some (well, quite a bit actually); notably, I thought we'd better describe things that are in our engine but not XQuery, as well as vice-versa. After a re-read of the XQuery spec, it seems to me that the character entry form that they have and we don't is actually "NNNN;" like HTML, rather than just "#NN". Can anyone double-check that? Does it work outside bracket expressions, or only inside? regards, tom lane diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2b4fe0c..32e6610 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5970,6 +5970,134 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); <!-- end re_syntax.n man page --> + <sect3 id="posix-vs-xquery"> + <title>Differences From XQuery (<literal>LIKE_REGEX</literal>)</title> + + <para> + Since SQL:2008, the SQL standard includes + a <literal>LIKE_REGEX</literal> operator that performs pattern + matching according to the XQuery regular expression + standard. <productname>PostgreSQL</productname> does not yet + implement this operator, but you can get very similar behavior using + the <function>regexp_match()</function> function. + </para> + + <para> + Notable differences between the existing POSIX-based + regular-expression feature and XQuery regular expressions include: + + <itemizedlist> + <listitem> + <para> + XQuery character class subtraction is not supported. An example of + this feature is using the following to match only English + consonants: <literal>[a-z-[aeiou]]</literal>. + </para> + </listitem> + <listitem> + <para> + XQuery allows a literal character in the pattern to be written as + an HTML-style Unicode character reference, for + instance <literal>&#<replaceable>NNNN</replaceable>;</literal>. + This is not supported by POSIX, but you can get the same effect by + writing <literal>\u<replaceable>NNNN</replaceable></literal>. (The + equivalence is only exact when the database encoding is UTF-8.) + </para> + </listitem> + <listitem> + <para> + The SQL standard (not XQuery itself) attempts to cater for more + variants of <quote>newline</quote> than POSIX does. The + newline-sensitive matching options described above consider only + ASCII NL (<literal>\n</literal>) to be a newline, but SQL would have + us treat CR (<literal>\r</literal>), CRLF (<literal>\r\n</literal>) + (a Windows-style newline), and some Unicode-only characters like + LINE SEPARATOR (U+2028) as newlines as well. + Notably, <literal>.</literal> and <literal>\s</literal> should + count <literal>\r\n</literal> as one character not two according to + SQL. + </para> + </listitem> + <listitem> + <para> + XQuery character class shorthands <literal>\c</literal>, + <literal>\C</literal>, <literal>\i</literal>, + and <literal>\I</literal> are not supported. + </para> + </listitem> + <listitem> + <para> + XQuery character class elements + using <literal>\p{UnicodeProperty}</literal> or the + inverse <literal>\P{UnicodeProperty}</literal> are not supported. + </para> + </listitem> + <listitem> + <para> + POSIX interprets character classes such as <literal>\w</literal> + (see <xref linkend="posix-class-shorthand-escapes-table"/>) + according to the prevailing locale (which you can control by + attaching a <literal>COLLATE</literal> clause to the operator or + function). XQuery specifies these classes by reference to Unicode + character properties, so equivalent behavior is obtained only with + a locale that follows the Unicode rules. + </para> + </listitem> + <listitem> + <para> + Of the character-entry escapes described in + <xref linkend="posix-character-entry-escapes-table"/>, + XQuery supports only <literal>\n</literal>, <literal>\r</literal>, + and <literal>\t</literal>. + </para> + </listitem> + <listitem> + <para> + XQuery does not support + the <literal>[:<replaceable>name</replaceable>:]</literal> syntax + for character classes within bracket expressions. + </para> + </listitem> + <listitem> + <para> + XQuery does not have lookahead or lookbehind constraints, + nor any of the constraint escapes described in + <xref linkend="posix-constraint-escapes-table"/>. + </para> + </listitem> + <listitem> + <para> + The metasyntax forms described in <xref linkend="posix-metasyntax"/> + do not exist in XQuery. + </para> + </listitem> + <listitem> + <para> + The regular expression flag letters defined by XQuery are + related to but not the same as the option letters for POSIX + (<xref linkend="posix-embedded-options-table"/>). While the + <literal>i</literal> and <literal>q</literal> options behave the + same, others do not. + XQuery's <literal>s</literal> (allow dot to match newline) + and <literal>m</literal> (allow <literal>^</literal> + and <literal>$</literal> to match at newlines) flags provide access + to the same behaviors as POSIX's <literal>n</literal>, + <literal>p</literal> and <literal>w</literal> flags, but + do <emphasis>not</emphasis> match the behavior of + POSIX's <literal>s</literal> and <literal>m</literal> flags. + Note in particular that dot-matches-newline is the default behavior + in POSIX but not XQuery. + Also, XQuery's <literal>x</literal> (ignore whitespace in pattern) + flag is noticeably different from POSIX's expanded-mode flag. + POSIX's <literal>x</literal> flag also allows <literal>#</literal> to + begin a comment in the pattern, and POSIX will not ignore a + whitespace character after a backslash. + </para> + </listitem> + </itemizedlist> + </para> + + </sect3> </sect2> </sect1> @@ -11793,6 +11921,14 @@ table2-mapping </programlisting> </para> </listitem> + + <listitem> + <para> + There are minor differences in the interpretation of regular + expression patterns used in <literal>like_regex</literal> filters, as + described in <xref linkend="jsonpath-regular-expressions"/>. + </para> + </listitem> </itemizedlist> <sect3 id="strict-and-lax-modes"> @@ -11872,6 +12008,36 @@ table2-mapping </sect3> + <sect3 id="jsonpath-regular-expressions"> + <title>Regular Expressions</title> + + <para> + SQL/JSON path expressions allow matching text to a regular expression + with the <literal>like_regex</literal> filter. For example, the + following SQL/JSON path query would case-insensitively match all + strings in an array that start with an English vowel: +<programlisting> +'$[*] ? (@ like_regex "^[aeiou]" flag "i")' +</programlisting> + </para> + + <para> + The SQL/JSON standard borrows its definition for regular expressions + from the <literal>LIKE_REGEX</literal> operator, which in turn uses the + XQuery standard. PostgreSQL does not currently support the + <literal>LIKE_REGEX</literal> operator. Therefore, + the <literal>like_regex</literal> filter is implemented using the + POSIX regular expression engine described in + <xref linkend="functions-posix-regexp"/>. This leads to various minor + discrepancies from standard SQL/JSON behavior, which are cataloged in + <xref linkend="posix-vs-xquery"/>. + Note, however, that the flag-letter incompatibilities described there + do not apply to SQL/JSON, as it translates the XQuery flag letters to + match what the POSIX engine expects. + </para> + + </sect3> + <sect3 id="functions-sqljson-path-operators"> <title>SQL/JSON Path Operators and Methods</title> @@ -12113,10 +12279,13 @@ table2-mapping <row> <entry><literal>like_regex</literal></entry> <entry> - Tests pattern matching with POSIX regular expressions - (see <xref linkend="functions-posix-regexp"/>). Supported flags - are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>, - <literal>x</literal>, and <literal>q</literal>.</entry> + Tests whether the first operand matches the regular expression + given by the second operand (see + <xref linkend="jsonpath-regular-expressions"/>). + An optional <literal>flag</literal> string can be given. + Supported flags are <literal>i</literal>, <literal>m</literal>, + <literal>s</literal>, and <literal>q</literal>. + </entry> <entry><literal>["abc", "abd", "aBdC", "abdacb", "babc"]</literal></entry> <entry><literal>$[*] ? (@ like_regex "^ab.*c" flag "i")</literal></entry> <entry><literal>"abc", "aBdC", "abdacb"</literal></entry>
pgsql-hackers by date: