Re: Fix XML handling with DOCTYPE - Mailing list pgsql-hackers

From Ryan Lambert
Subject Re: Fix XML handling with DOCTYPE
Date
Msg-id CAN-V+g8NZHpvb+_kOaZTAq0azBBoQg1WaNU7+wDjvMTd2vejjg@mail.gmail.com
Whole thread Raw
In response to Fix XML handling with DOCTYPE  (Ryan Lambert <ryan@rustprooflabs.com>)
List pgsql-hackers
Thanks for putting up with a new reviewer! 

  --with-libxml is the PostgreSQL configure option to make it use libxml2.
 
  The very web page http://xmlsoft.org/index.html says "The XML C parser
  and toolkit of Gnome: libxml" and is all about libxml2.
 
So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?  

I think leaving it as libxml makes sense with all that.  Good point that --with-libxml is used to build so I think staying with that works and is consistent.  I agree that having this point included does clarify the how and why of the limitations of this implementation.

I also over-parenthesize so I'm used to looking for that in my own writing.  The full sentences were the ones that seemed excessive to me, I think the others are ok and I won't nit-pick either way on those (unless you want me to!). 

If you agree, I should go through and fix my nodesets to be node-sets. 

Yes, I like node-sets better, especially knowing it conforms to the spec's language.

Thanks,

Ryan Lambert


On Tue, Mar 26, 2019 at 11:05 PM Chapman Flack <chap@anastigmatix.net> wrote:
On 03/26/19 21:39, Ryan Lambert wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, passed
> Spec compliant:           not tested
> Documentation:            tested, passed

Thanks for the review!

> I have two recommendations for features.sgml.  You state:
>
>>  relies on the libxml library
>
> Should this be clarified as the libxml2 library?  That's what I installed
> to build postgres from source (Ubuntu 16/18).  If it is the libxml library
> and the "2" is irrelevant

That's a good catch. I'm not actually sure whether there is any "libxml"
library that isn't libxml2. Maybe there was once and nobody admits to
hanging out with it. Most Google hits on "libxml" seem to be modules
that have libxml in their names and libxml2 as their actual dependency.

  Perl XML:LibXML: "This module is an interface to libxml2"
  Haskell libxml: "Binding to libxml2"
  libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
    for the GNOME Libxml2 ..."

  --with-libxml is the PostgreSQL configure option to make it use libxml2.

  The very web page http://xmlsoft.org/index.html says "The XML C parser
  and toolkit of Gnome: libxml" and is all about libxml2.

So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?

On 03/26/19 23:52, Tom Lane wrote:
> Do we need to mention that at all?  If you're not building from source,
> it doesn't seem very interesting ... but maybe I'm missing some reason
> why end users would care.

The three places I've mentioned it were the ones where I thought users
might care:

 - why are we stuck at XPath 1.0? It's what we get from the library we use.

 - in what order do we get things out from a (hmm) node-set? Per XPath 1.0,
   it's indeterminate (it's a set!), unlike XPath 2.0/XQuery where there's
   a sequence type and you have order control. Observable behavior from
   libxml2 (and you could certainly want to know this) is you get things out
   in document order, whether that's what you wanted or not, even though
   this is undocumented, and even counter-documented[1], libxml2 behavior.
   So it's an example of something you would fundamentally like to know,
   where the only available answer depends precariously on the library
   we happen to be using.

 - which limits in our implementation are inherent to the library, and
   which are just current limits in our embedding of it? (Maybe this is
   right at the border of what a user would care to know, but I know it's
   a question that crosses my mind when I bonk into a limit I wasn't
   expecting.)

> There are a few places where the parenthesis around a block of text
> seem unnecessary.

)blush( that's a long-standing wart in my writing ... seems I often think
in parentheses, then look and say "those aren't needed" and take them out,
only sometimes I don't.

I skimmed just now and found a few instances of parenthesized whole
sentence: the one you quoted, and some (if argument is null, the result
is null), and (No rows will be produced if ....). Shall I deparenthesize
them all? Did you have other instances in mind?

> It seems you are standardizing from "node set" to "nodeset", is that
> the preferred nomenclature or preference?

Another good catch. I remember consciously making a last pass to get them
all consistent, and I wanted them consistent with the spec, and I see now
I messed up.

XPath 1.0 [2] has zero instances of "nodeset", two of "node set" and about
six dozen of "node-set". The only appearances of "node set" without the
hyphen are in a heading and its ToC entry. The stuff under that heading
consistently uses node-set. It seems that's the XPath 1.0 term for sure.

When I made my consistency pass, I must have been looking too recently
in libxml2 C source, rather than the spec.

On 03/26/19 23:52, Tom Lane wrote:
> That seemed a bit jargon-y to me too.  If that's standard terminology
> in the XML world, maybe it's fine; but I'd have stuck with "node set".

It really was my intention (though I flubbed it) to use XPath 1.0's term
for XPath 1.0's concept; in my doc philosophy, that gives readers
the most breadcrumbs to follow for the rest of the details if they want
them. "Node set" might be some sort of squishy expository concept I'm
using, but node-set is a thing, in a spec.

If you agree, I should go through and fix my nodesets to be node-sets.

I do think the terminology matters here, especially because of the
differences between what you can do with a node-set (XPath 1.0 thing)
and with a sequence (XPath 2.0+,XQuery,SQL/XML thing).

Let me know what you'd like best on these points and I'll revise the patch.

Regards,
-Chap


[1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
    in no particular order"

[2] https://www.w3.org/TR/1999/REC-xpath-19991116/

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Usage of epoch in txid_current
Next
From: Amit Kapila
Date:
Subject: Re: Transaction commits VS Transaction commits (with parallel) VSquery mean time