Re: Define jsonpath functions as stable - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Define jsonpath functions as stable
Date
Msg-id 17122.1568760023@sss.pgh.pa.us
Whole thread Raw
In response to Re: Define jsonpath functions as stable  ("Jonathan S. Katz" <jkatz@postgresql.org>)
Responses Re: Define jsonpath functions as stable
List pgsql-hackers
"Jonathan S. Katz" <jkatz@postgresql.org> writes:
> v2 attached. Thanks!

I whacked this around some (well, quite a bit actually); notably,
I thought we'd better describe things that are in our engine but
not XQuery, as well as vice-versa.

After a re-read of the XQuery spec, it seems to me that the character
entry form that they have and we don't is actually "&#NNNN;" like
HTML, rather than just "#NN".  Can anyone double-check that?  Does
it work outside bracket expressions, or only inside?

            regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2b4fe0c..32e6610 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5970,6 +5970,134 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');

 <!-- end re_syntax.n man page -->

+   <sect3 id="posix-vs-xquery">
+   <title>Differences From XQuery (<literal>LIKE_REGEX</literal>)</title>
+
+    <para>
+     Since SQL:2008, the SQL standard includes
+     a <literal>LIKE_REGEX</literal> operator that performs pattern
+     matching according to the XQuery regular expression
+     standard.  <productname>PostgreSQL</productname> does not yet
+     implement this operator, but you can get very similar behavior using
+     the <function>regexp_match()</function> function.
+    </para>
+
+    <para>
+     Notable differences between the existing POSIX-based
+     regular-expression feature and XQuery regular expressions include:
+
+     <itemizedlist>
+      <listitem>
+       <para>
+        XQuery character class subtraction is not supported.  An example of
+        this feature is using the following to match only English
+        consonants: <literal>[a-z-[aeiou]]</literal>.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        XQuery allows a literal character in the pattern to be written as
+        an HTML-style Unicode character reference, for
+        instance <literal>&#<replaceable>NNNN</replaceable>;</literal>.
+        This is not supported by POSIX, but you can get the same effect by
+        writing <literal>\u<replaceable>NNNN</replaceable></literal>.  (The
+        equivalence is only exact when the database encoding is UTF-8.)
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        The SQL standard (not XQuery itself) attempts to cater for more
+        variants of <quote>newline</quote> than POSIX does.  The
+        newline-sensitive matching options described above consider only
+        ASCII NL (<literal>\n</literal>) to be a newline, but SQL would have
+        us treat CR (<literal>\r</literal>), CRLF (<literal>\r\n</literal>)
+        (a Windows-style newline), and some Unicode-only characters like
+        LINE SEPARATOR (U+2028) as newlines as well.
+        Notably, <literal>.</literal> and <literal>\s</literal> should
+        count <literal>\r\n</literal> as one character not two according to
+        SQL.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        XQuery character class shorthands <literal>\c</literal>,
+        <literal>\C</literal>, <literal>\i</literal>,
+        and <literal>\I</literal> are not supported.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        XQuery character class elements
+        using <literal>\p{UnicodeProperty}</literal> or the
+        inverse <literal>\P{UnicodeProperty}</literal> are not supported.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        POSIX interprets character classes such as <literal>\w</literal>
+        (see <xref linkend="posix-class-shorthand-escapes-table"/>)
+        according to the prevailing locale (which you can control by
+        attaching a <literal>COLLATE</literal> clause to the operator or
+        function).  XQuery specifies these classes by reference to Unicode
+        character properties, so equivalent behavior is obtained only with
+        a locale that follows the Unicode rules.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        Of the character-entry escapes described in
+        <xref linkend="posix-character-entry-escapes-table"/>,
+        XQuery supports only <literal>\n</literal>, <literal>\r</literal>,
+        and <literal>\t</literal>.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        XQuery does not support
+        the <literal>[:<replaceable>name</replaceable>:]</literal> syntax
+        for character classes within bracket expressions.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        XQuery does not have lookahead or lookbehind constraints,
+        nor any of the constraint escapes described in
+        <xref linkend="posix-constraint-escapes-table"/>.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        The metasyntax forms described in <xref linkend="posix-metasyntax"/>
+        do not exist in XQuery.
+       </para>
+      </listitem>
+      <listitem>
+       <para>
+        The regular expression flag letters defined by XQuery are
+        related to but not the same as the option letters for POSIX
+        (<xref linkend="posix-embedded-options-table"/>).  While the
+        <literal>i</literal> and <literal>q</literal> options behave the
+        same, others do not.
+        XQuery's <literal>s</literal> (allow dot to match newline)
+        and <literal>m</literal> (allow <literal>^</literal>
+        and <literal>$</literal> to match at newlines) flags provide access
+        to the same behaviors as POSIX's <literal>n</literal>,
+        <literal>p</literal> and <literal>w</literal> flags, but
+        do <emphasis>not</emphasis> match the behavior of
+        POSIX's <literal>s</literal> and <literal>m</literal> flags.
+        Note in particular that dot-matches-newline is the default behavior
+        in POSIX but not XQuery.
+        Also, XQuery's <literal>x</literal> (ignore whitespace in pattern)
+        flag is noticeably different from POSIX's expanded-mode flag.
+        POSIX's <literal>x</literal> flag also allows <literal>#</literal> to
+        begin a comment in the pattern, and POSIX will not ignore a
+        whitespace character after a backslash.
+       </para>
+      </listitem>
+     </itemizedlist>
+    </para>
+
+   </sect3>
   </sect2>
  </sect1>

@@ -11793,6 +11921,14 @@ table2-mapping
 </programlisting>
     </para>
    </listitem>
+
+   <listitem>
+    <para>
+     There are minor differences in the interpretation of regular
+     expression patterns used in <literal>like_regex</literal> filters, as
+     described in <xref linkend="jsonpath-regular-expressions"/>.
+    </para>
+   </listitem>
   </itemizedlist>

    <sect3 id="strict-and-lax-modes">
@@ -11872,6 +12008,36 @@ table2-mapping

    </sect3>

+   <sect3 id="jsonpath-regular-expressions">
+    <title>Regular Expressions</title>
+
+    <para>
+     SQL/JSON path expressions allow matching text to a regular expression
+     with the <literal>like_regex</literal> filter.  For example, the
+     following SQL/JSON path query would case-insensitively match all
+     strings in an array that start with an English vowel:
+<programlisting>
+'$[*] ? (@ like_regex "^[aeiou]" flag "i")'
+</programlisting>
+    </para>
+
+    <para>
+     The SQL/JSON standard borrows its definition for regular expressions
+     from the <literal>LIKE_REGEX</literal> operator, which in turn uses the
+     XQuery standard.  PostgreSQL does not currently support the
+     <literal>LIKE_REGEX</literal> operator.  Therefore,
+     the <literal>like_regex</literal> filter is implemented using the
+     POSIX regular expression engine described in
+     <xref linkend="functions-posix-regexp"/>.  This leads to various minor
+     discrepancies from standard SQL/JSON behavior, which are cataloged in
+     <xref linkend="posix-vs-xquery"/>.
+     Note, however, that the flag-letter incompatibilities described there
+     do not apply to SQL/JSON, as it translates the XQuery flag letters to
+     match what the POSIX engine expects.
+    </para>
+
+   </sect3>
+
    <sect3 id="functions-sqljson-path-operators">
    <title>SQL/JSON Path Operators and Methods</title>

@@ -12113,10 +12279,13 @@ table2-mapping
        <row>
         <entry><literal>like_regex</literal></entry>
         <entry>
-          Tests pattern matching with POSIX regular expressions
-          (see <xref linkend="functions-posix-regexp"/>).  Supported flags
-          are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>,
-          <literal>x</literal>, and <literal>q</literal>.</entry>
+          Tests whether the first operand matches the regular expression
+          given by the second operand (see
+          <xref linkend="jsonpath-regular-expressions"/>).
+          An optional <literal>flag</literal> string can be given.
+          Supported flags are <literal>i</literal>, <literal>m</literal>,
+          <literal>s</literal>, and <literal>q</literal>.
+        </entry>
         <entry><literal>["abc", "abd", "aBdC", "abdacb", "babc"]</literal></entry>
         <entry><literal>$[*] ? (@ like_regex "^ab.*c" flag "i")</literal></entry>
         <entry><literal>"abc", "aBdC", "abdacb"</literal></entry>

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Attempt to consolidate reading of XLOG page
Next
From: Thomas Munro
Date:
Subject: scorpionfly needs more semaphores