Re: Ltree syntax improvement - Mailing list pgsql-hackers

From Nikita Glukhov
Subject Re: Ltree syntax improvement
Date
Msg-id 38c5bbeb-c3eb-a016-0b8f-907461bbd0fc@postgrespro.ru
Whole thread Raw
In response to Re: Ltree syntax improvement  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Ltree syntax improvement  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Attached new version of the patch. 

I did major refactoring of ltree label parsing, extracting common parsing 
code for ltree, lquery, and ltxtquery.  This greatly simplified state 
machines.

On the advice of Tomas Vondra, I also extracted a preliminary patch with
'if' to 'switch' conversion.

On 21.01.2020 22:13, Tom Lane wrote:

Dmitry Belyavsky <beldmit@gmail.com> writes:
If the C part will be reviewed and considered mergeable, I'll update the
plpython tests.
I haven't looked at any of the code involved in this, but I did examine
the "failing" plpython test, and I'm quite distressed about that change
of behavior.  Simply removing the test case is certainly not okay,
and I do not think that just changing it to accept this new behavior
is okay either.  As Nikita said upthread:

7. ltree_plpython test does not fail now because Python list is converted to a
text and then to a ltree, and the textual representation of a Python list has
become a valid ltree text:

SELECT $$['foo', 'bar', 'baz']$$::ltree;          ltree
------------------------- "['foo', 'bar', 'baz']"
(1 row)

So Python lists can be now successfully converted to ltrees without a transform,
but the result is not that is expected ('foo.bar.baz'::ltree).
If this case doesn't throw an error, then we're going to have a
compatibility problem whenever somebody finally gets around to
implementing the python-to-ltree transform properly, because it
would break any code that might be relying on this (wrong) behavior.

In general, I think it's a mistake to allow unquoted punctuation to be
taken as part of an ltree label, which is what this patch is evidently
doing.  By doing that, you'll make it impossible for anyone to ever
again extend the ltree syntax, because if they want to assign special
meaning to braces or whatever, they can't do so without breaking
existing applications.  For example, if the existing code allowed
double-quote or backslash as a label character, we'd already have
rejected this patch as being too big a compatibility break.  So it's
not very forward-thinking to close off future improvements like this.

Thus, what I think you should do is require non-alphanumeric label
characters to be quoted, either via double-quotes or backslashes
(although it's questionable whether we really need two independent
quoting mechanisms here).  That would preserve extensibility, and
it'd also preserve our freedom to fix ltree_plpython later, since
the case of interest would still be an error for now.  And it would
mean that you don't have subtly different rules for what's data in
ltree versus what's data in lquery or ltxtquery.
Now non-alphanumeric label characters should be escaped in ltree, 
lquery and ltxtquery.  Plpython tests does not require changes now.
BTW, the general rule in existing backend code that's doing string
parsing is to allow non-ASCII (high-bit-set) characters to be taken as
data without inquiring too closely as to what they are.  This avoids a
bunch of locale and encoding issues without much loss of flexibility.
(If we do ever extend the ltree syntax again, we'd certainly choose
ASCII punctuation characters for whatever special symbols we need,
else the feature might not be available in all encodings.)  So for
instance in your examples involving "Ñ", it's fine to take that as a
label character without concern for locale/encoding.
I'm not sure what I think about the whitespace business.  It looks
like what you propose is to strip unquoted leading and trailing
whitespace but allow embedded whitespace.  There's precedent for that,
certainly, but I wonder whether it isn't too confusing.  In any case
you didn't document that.
Embedded whitespace should also be escaped now.  I'm also not sure
about stripping unquoted leading and trailing whitespace.
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: pg_ls_tmpdir to show directories and shared filesets
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] Finally split StdRdOptions into HeapOptions andToastOptions