Thread: New XML section for documentation

New XML section for documentation

From
Bruce Momjian
Date:
Here is an new XML section for our SGML documentation.  It explains the
various XML capabilities, if we support them, and how to use them.

Comments?

---------------------------------------------------------------------------


XML Document Support
====================
XML support is not one capability, but a variety of features supported
by a database.  These capabilities include storage, import/export,
validation, indexing, efficiency of modification,  searching,
transformating, and XML to SQL mapping.  PostgreSQL supports some but
not all of these XML capabilities.  Future releases of PostgreSQL will
continue to improve XML support.

Storage
-------
PostgreSQL stores XML documents as ordinary text documents.  It does not
split apart XML documents into its component parts and store each
element separately.  You can use middle-ware solutions to do that, but
once done, the data becomes relational and has to be processed
accordingly.

Import/Export
-------------
Because XML documents are stored as normal text documents, they can be
imported/exported with little complexity.  A simple TEXT field can hold
up to 1 gigabyte of text, and large objects are available for larger
documents.

Validation
----------
/contrib/xml2 has a function called xml_valid() that can be used in
a CHECK constraint to enforce that a field contains valid XML.  It
does not support validation against a specific XML schema.  A
server-side language with XML capabilities could be used to do
schema-specific XML checks.

Indexing
--------
Because XML documents are stored as text, full-text indexing tool
/contrib/tsearch2 can be used to index XML documents.  Of course, the
searches are text searches, with no XML awareness, but tsearch2 can be
used with other XML capabilities to dramatically reduce the amount of
data processed at the XML level.

Modification
------------
If an UPDATE does not modify an XML field, the XML data is shared
between the old and new rows.  However, if the UPDATE modifies a XML
field, a full modified copy of the XML field must be created internally.

Searching
---------
XPath searches are implemented using /contrib/xml2.  It processes XML
text documents and returns results based on the requested query.

Transforming
------------
/contrib/xml2 supports XSL transformations.

XML to SQL Mapping
-------------------
This involves converting XML data to and from relational structures.
PostgreSQL has no internal support for such mapping, and relies on
external tools to do such conversions.

Missing Features
----------------
    o  XQuery
    o  SQL/XML syntax (ISO/IEC 9075-14)
    o  XML data type optimized for XML storage

See also http://www.rpbourret.com/xml/XMLAndDatabases.htm

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: New XML section for documentation

From
David Fetter
Date:
On Fri, Aug 25, 2006 at 07:46:57PM -0400, Bruce Momjian wrote:
> Here is an new XML section for our SGML documentation.  It explains
> the various XML capabilities, if we support them, and how to use
> them.
>
> Comments?

This looks hauntingly similar to Peter's presentation at the
conference. :)  I'd add a http://wiscorp.com/SQLStandards.html to the
reference section.

Speaking of other parts of the SQL:2003 standard, how about one
section each that mentions them?  There's

Part 4: SQL/PSM (Persistent Stored Modules)
Part 9: SQL/MED (Management of External Data) (my favorite)
Part 10: SQL/OLB (Object Language Binding)
Part 11: SQL/Schemata
Part 13: SQL/JRT (Java Routines and Types)

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: [DOCS] New XML section for documentation

From
Bruce Momjian
Date:
David Fetter wrote:
> On Fri, Aug 25, 2006 at 07:46:57PM -0400, Bruce Momjian wrote:
> > Here is an new XML section for our SGML documentation.  It explains

> > the various XML capabilities, if we support them, and how to use
> > them.
> >
> > Comments?
>
> This looks hauntingly similar to Peter's presentation at the

I used the XML/SQL and validation part from his talk, but the rest was
from earlier email discussions.

> conference. :)  I'd add a http://wiscorp.com/SQLStandards.html to the

This seems to be the best URL, but it seems too detailed:

    http://wiscorp.com/H2-2005-197-SC32N1293-WG3_Presentation_for_SC32_20050418.pdf

> reference section.
>
> Speaking of other parts of the SQL:2003 standard, how about one
> section each that mentions them?  There's
>
> Part 4: SQL/PSM (Persistent Stored Modules)
> Part 9: SQL/MED (Management of External Data) (my favorite)
> Part 10: SQL/OLB (Object Language Binding)
> Part 11: SQL/Schemata
> Part 13: SQL/JRT (Java Routines and Types)

I don't know anything about them.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: New XML section for documentation

From
Bruce Momjian
Date:
Euler Taveira de Oliveira wrote:
> Bruce Momjian wrote:
>
> > Here is an new XML section for our SGML documentation.  It explains the
> > various XML capabilities, if we support them, and how to use them.
> >
> > Comments?
> >
> +1. Users often ask this in the mailing lists. Where are you want to
> put this? I'll suggest: FAQ. What do you all think?

Our main documentation.  Once it is there, people will find it rather
than on the FAQ.

> > Missing Features
> > ----------------
> >     o  XQuery
> >     o  SQL/XML syntax (ISO/IEC 9075-14)
> >     o  XML data type optimized for XML storage
> >
> Another section in TODO?

Perhaps, yea.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: New XML section for documentation

From
Euler Taveira de Oliveira
Date:
Bruce Momjian wrote:

> Here is an new XML section for our SGML documentation.  It explains the
> various XML capabilities, if we support them, and how to use them.
>
> Comments?
>
+1. Users often ask this in the mailing lists. Where are you want to
put this? I'll suggest: FAQ. What do you all think?

> Missing Features
> ----------------
>     o  XQuery
>     o  SQL/XML syntax (ISO/IEC 9075-14)
>     o  XML data type optimized for XML storage
>
Another section in TODO?


--
  Euler Taveira de Oliveira
  http://www.timbira.com/


Re: [DOCS] New XML section for documentation

From
"Magnus Hagander"
Date:
> Indexing
> --------
> Because XML documents are stored as text, full-text indexing tool
> /contrib/tsearch2 can be used to index XML documents.  Of
> course, the searches are text searches, with no XML
> awareness, but tsearch2 can be used with other XML
> capabilities to dramatically reduce the amount of data
> processed at the XML level.


You can also use a functional index and /contrib/xml2 to do limited
XPath indexing. (Can't make it "subtree-aware" for example, unless you
are willing to change your queries, but you can index specific xpath
nodes).


//Magnus

Re: New XML section for documentation

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> XML Document Support
> ====================
> XML support is not one capability, but a variety of features
> supported by a database.

database system

> Storage
> -------
> PostgreSQL stores XML documents as ordinary text documents.

It is "possible" to do that, but this sounds like it's done
automatically or implicitly.  Maybe:

"PostgreSQL does not have a specialized XML data type.  The recommended
way is to store XML documents as text."

> Import/Export
> -------------
> Because XML documents are stored as normal text documents, they can
> be imported/exported with little complexity.

Import/export refers to exporting schema data with XML decorations.  Of
course you can export column data trivially, but that's not what this
is about.

> Validation
> ----------
> /contrib/xml2 has a function called xml_valid() that can be used in
> a CHECK constraint to enforce that a field contains valid XML.  It
> does not support validation against a specific XML schema.

Then this is not validation but only checking for well-formedness.  The
xml2 README says so, in fact.

> Indexing
> --------

I think the expression index capability combined with contrib/xml2 is
more relevant here than the full-text search capability.

> Transforming
> ------------
> /contrib/xml2 supports XSL transformations.

That's XSLT.

> XML to SQL Mapping
> -------------------
> This involves converting XML data to and from relational structures.
> PostgreSQL has no internal support for such mapping, and relies on
> external tools to do such conversions.

Are there instances of such tools?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: New XML section for documentation

From
"Nikolay Samokhvalov"
Date:
On 8/26/06, Peter Eisentraut <peter_e@gmx.net> wrote:
> Bruce Momjian wrote:
> > Validation
> > ----------
> > /contrib/xml2 has a function called xml_valid() that can be used in
> > a CHECK constraint to enforce that a field contains valid XML.  It
> > does not support validation against a specific XML schema.
>
> Then this is not validation but only checking for well-formedness.  The
> xml2 README says so, in fact.

Exactly. contrib/xml2 mixes the term here, xml_valid() should be
another function, that takes two types of data - XML value and
corresponding XML schema - and validate the XML data. Actually, the
latest version of SQL/XML standard includes such a function
(XMLVALIDATE).

If you decide to include the mentioning about contrib/xml2 to docs, I
would suggest the patch for this module. The patch renames that
function to xml_check() and adds xml_array() (issue from the current
TODO). Or it's too late for 8.2?

Also, I would add a little introduction to XML terms (from XML
standards) to this documentation section.

-- 
Best regards,
Nikolay


Re: New XML section for documentation

From
"Nikolay Samokhvalov"
Date:
On 8/26/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
[...]
> If you decide to include the mentioning about contrib/xml2 to docs, I
> would suggest the patch for this module. The patch renames that
> function to xml_check() and adds xml_array() (issue from the current
> TODO). Or it's too late for 8.2?
[...]

Typo :-( I mean "xpath_array()"

-- 
Best regards,
Nikolay


Re: [DOCS] New XML section for documentation

From
David Fetter
Date:
On Fri, Aug 25, 2006 at 08:37:19PM -0400, Bruce Momjian wrote:
> David Fetter wrote:
> > On Fri, Aug 25, 2006 at 07:46:57PM -0400, Bruce Momjian wrote:
> > > Here is an new XML section for our SGML documentation.  It
> > > explains the various XML capabilities, if we support them, and
> > > how to use them.
> > >
> > > Comments?
> >
> > This looks hauntingly similar to Peter's presentation at the
>
> I used the XML/SQL and validation part from his talk, but the rest
> was from earlier email discussions.

Reuse is good :)

> > conference. :)  I'd add a http://wiscorp.com/SQLStandards.html to the
>
> This seems to be the best URL, but it seems too detailed:
>
>     http://wiscorp.com/H2-2005-197-SC32N1293-WG3_Presentation_for_SC32_20050418.pdf

I'd just put the http://wiscorp.com/SQLStandards.html URL in, as it
contains several references in varying levels of detail.

> > reference section.
> >
> > Speaking of other parts of the SQL:2003 standard, how about one
> > section each that mentions them?  There's
> >
> > Part 4: SQL/PSM (Persistent Stored Modules)
> > Part 9: SQL/MED (Management of External Data) (my favorite)
> > Part 10: SQL/OLB (Object Language Binding)
> > Part 11: SQL/Schemata
> > Part 13: SQL/JRT (Java Routines and Types)
>
> I don't know anything about them.

We claim SQL standard compliance, so since those are part of SQL:2003,
we probably ought to mention them.  SQL/PSM is a programming language
that lives inside the database, and DB2 and MySQL have it.  SQL/MED
lets people talk to other data stores.  SQL/OLB appears to be derived
from equel, which we have as ecpg.  SQL/Schemata contains the
information schema.  SQL/JRT appears to bear some similarity to
PL/Java and PL/J.

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: [DOCS] New XML section for documentation

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> Bruce Momjian wrote:
> > XML Document Support
> > ====================
> > XML support is not one capability, but a variety of features
> > supported by a database.
>
> database system

Done.

> > Storage
> > -------
> > PostgreSQL stores XML documents as ordinary text documents.
>
> It is "possible" to do that, but this sounds like it's done
> automatically or implicitly.  Maybe:
>
> "PostgreSQL does not have a specialized XML data type.  The recommended
> way is to store XML documents as text."

Clarified.

> > Import/Export
> > -------------
> > Because XML documents are stored as normal text documents, they can
> > be imported/exported with little complexity.
>
> Import/export refers to exporting schema data with XML decorations.  Of
> course you can export column data trivially, but that's not what this
> is about.

OK, section redone.

> > Validation
> > ----------
> > /contrib/xml2 has a function called xml_valid() that can be used in
> > a CHECK constraint to enforce that a field contains valid XML.  It
> > does not support validation against a specific XML schema.
>
> Then this is not validation but only checking for well-formedness.  The
> xml2 README says so, in fact.

I made it clear in the section that the XML syntax was being checked,
not validation against a schema.  You want Check and Validation
sections?

> > Indexing
> > --------
>
> I think the expression index capability combined with contrib/xml2 is
> more relevant here than the full-text search capability.

Agreed, added.

> > Transforming
> > ------------
> > /contrib/xml2 supports XSL transformations.
>
> That's XSLT.

OK.

> > XML to SQL Mapping
> > -------------------
> > This involves converting XML data to and from relational structures.
> > PostgreSQL has no internal support for such mapping, and relies on
> > external tools to do such conversions.
>
> Are there instances of such tools?

Well, it seems EMS has a product that does it, and I assume other XML
tools have database interfaces.  Also, psql can do it if you want to
convert XHTML to XML, so I mentioned that too.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [DOCS] New XML section for documentation

From
Bruce Momjian
Date:
Magnus Hagander wrote:
> > Indexing
> > --------
> > Because XML documents are stored as text, full-text indexing tool
> > /contrib/tsearch2 can be used to index XML documents.  Of
> > course, the searches are text searches, with no XML
> > awareness, but tsearch2 can be used with other XML
> > capabilities to dramatically reduce the amount of data
> > processed at the XML level.
>
>
> You can also use a functional index and /contrib/xml2 to do limited
> XPath indexing. (Can't make it "subtree-aware" for example, unless you
> are willing to change your queries, but you can index specific xpath
> nodes).

Good point, added.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: New XML section for documentation

From
Bruce Momjian
Date:
Nikolay Samokhvalov wrote:
> On 8/26/06, Peter Eisentraut <peter_e@gmx.net> wrote:
> > Bruce Momjian wrote:
> > > Validation
> > > ----------
> > > /contrib/xml2 has a function called xml_valid() that can be used in
> > > a CHECK constraint to enforce that a field contains valid XML.  It
> > > does not support validation against a specific XML schema.
> >
> > Then this is not validation but only checking for well-formedness.  The
> > xml2 README says so, in fact.
> 
> Exactly. contrib/xml2 mixes the term here, xml_valid() should be
> another function, that takes two types of data - XML value and
> corresponding XML schema - and validate the XML data. Actually, the
> latest version of SQL/XML standard includes such a function
> (XMLVALIDATE).

I understand, but do we want to break backward compatibility to rename
it?  We could create a xml_check, and keep xml_valid as a
single-argument function, and implement schema-checks as a two-parameter
function, but that seems odd too.

> If you decide to include the mentioning about contrib/xml2 to docs, I
> would suggest the patch for this module. The patch renames that
> function to xml_check() and adds xml_array() (issue from the current
> TODO). Or it's too late for 8.2?

Hard to say.  What does xml_array do?  We are more lenient about
/contrib additions after feature freeze, but it is pretty late.  Aren't
you working on updating the new XML syntax support in the backend?  Are
you done with that patch?

> Also, I would add a little introduction to XML terms (from XML
> standards) to this documentation section.

OK, but which terms.  I only see XML and XSLT, and I documented those on
first mention in the newest version.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [DOCS] New XML section for documentation

From
Bruce Momjian
Date:
David Fetter wrote:
> On Fri, Aug 25, 2006 at 08:37:19PM -0400, Bruce Momjian wrote:
> > David Fetter wrote:
> > > On Fri, Aug 25, 2006 at 07:46:57PM -0400, Bruce Momjian wrote:
> > > > Here is an new XML section for our SGML documentation.  It
> > > > explains the various XML capabilities, if we support them, and
> > > > how to use them.
> > > >
> > > > Comments?
> > >
> > > This looks hauntingly similar to Peter's presentation at the
> >
> > I used the XML/SQL and validation part from his talk, but the rest
> > was from earlier email discussions.
>
> Reuse is good :)
>
> > > conference. :)  I'd add a http://wiscorp.com/SQLStandards.html to the
> >
> > This seems to be the best URL, but it seems too detailed:
> >
> >     http://wiscorp.com/H2-2005-197-SC32N1293-WG3_Presentation_for_SC32_20050418.pdf
>
> I'd just put the http://wiscorp.com/SQLStandards.html URL in, as it
> contains several references in varying levels of detail.

OK, added.

> > > reference section.
> > >
> > > Speaking of other parts of the SQL:2003 standard, how about one
> > > section each that mentions them?  There's
> > >
> > > Part 4: SQL/PSM (Persistent Stored Modules)
> > > Part 9: SQL/MED (Management of External Data) (my favorite)
> > > Part 10: SQL/OLB (Object Language Binding)
> > > Part 11: SQL/Schemata
> > > Part 13: SQL/JRT (Java Routines and Types)
> >
> > I don't know anything about them.
>
> We claim SQL standard compliance, so since those are part of SQL:2003,
> we probably ought to mention them.  SQL/PSM is a programming language
> that lives inside the database, and DB2 and MySQL have it.  SQL/MED
> lets people talk to other data stores.  SQL/OLB appears to be derived
> from equel, which we have as ecpg.  SQL/Schemata contains the
> information schema.  SQL/JRT appears to bear some similarity to
> PL/Java and PL/J.

I think the big question is whether we are ever going to implement
these?  I think we need to decide that before I mention them.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: New XML section for documentation

From
Bruce Momjian
Date:
Updated XML documentation based on feedback.  Comments?

---------------------------------------------------------------------------

XML Document Support
====================
XML (eXtensible Markup Language) support is not one capability, but a
variety of features supported by a database system.  These capabilities
include storage, import/export, validation, indexing, efficiency of
modification,  searching, transformating, and XML to SQL mapping. 
PostgreSQL supports some but not all of these XML capabilities.  Future
releases of PostgreSQL will continue to improve XML support.

Storage
-------

PostgreSQL does not have a specialized XML data type.  Users should
store XML documents in ordinary TEXT fields.  If you need the document
split apart into its component parts so each element is stored
separately, you must use a middle-ware solution to do that, but once
done, the data becomes relational and has to be processed accordingly.

Import/Export
-------------
There is no facility for mapping XML to relational tables.  An external
tool must be used for this.  One simple way to export XML is to use psql
in HTML mode ("\pset format html"), and convert the XHTML to XML using
an external tool.

Validation
----------
/contrib/xml2 has a function called xml_valid() that can be used in
a CHECK constraint to enforce that a field contains valid XML.  It
does not support validation against a specific XML schema.  A
server-side language with XML capabilities could be used to do
schema-specific XML checks.

Indexing
--------
/contrib/xml2 functions can be used in expression indexes to index
specific XML fields.  To index the full contents of XML documents, the
full-text indexing tool /contrib/tsearch2 can be used.  Of course,
tsearch2 indexes have no XML awareness so additional /contrib/xml2
checks should be added to queries.

Modification
------------ 
If an UPDATE does not modify an XML field, the XML data is shared
between the old and new rows.  However, if the UPDATE modifies a XML
field, a full modified copy of the XML field must be created internally.

Searching
---------
XPath searches are implemented using /contrib/xml2.  It processes XML
text documents and returns results based on the requested query.

Transforming
------------
/contrib/xml2 supports XSLT (XML Stylesheet Language Transformation).

XML to SQL Mapping
-------------------
This involves converting XML data to and from relational structures. 
PostgreSQL has no internal support for such mapping, and relies on
external tools to do such conversions.

Missing Features
----------------o  XQueryo  SQL/XML syntax (ISO/IEC 9075-14)o  XML data type optimized for XML storage

See also http://www.rpbourret.com/xml/XMLAndDatabases.htm for an
overview XML use in databases, and http://wiscorp.com/SQLStandards.html
for the XML standards.


--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [DOCS] New XML section for documentation

From
David Fetter
Date:
On Sat, Aug 26, 2006 at 12:48:32PM -0400, Bruce Momjian wrote:
> David Fetter wrote:
> > On Fri, Aug 25, 2006 at 08:37:19PM -0400, Bruce Momjian wrote:

> > > > Speaking of other parts of the SQL:2003 standard, how about one
> > > > section each that mentions them?  There's
> > > >
> > > > Part 4: SQL/PSM (Persistent Stored Modules)
> > > > Part 9: SQL/MED (Management of External Data) (my favorite)
> > > > Part 10: SQL/OLB (Object Language Binding)
> > > > Part 11: SQL/Schemata
> > > > Part 13: SQL/JRT (Java Routines and Types)
> > >
> > > I don't know anything about them.
> >
> > We claim SQL standard compliance, so since those are part of
> > SQL:2003, we probably ought to mention them.  SQL/PSM is a
> > programming language that lives inside the database, and DB2 and
> > MySQL have it.  SQL/MED lets people talk to other data stores.
> > SQL/OLB appears to be derived from equel, which we have as ecpg.
> > SQL/Schemata contains the information schema.  SQL/JRT appears to
> > bear some similarity to PL/Java and PL/J.
>
> I think the big question is whether we are ever going to implement
> these?  I think we need to decide that before I mention them.

The SQL/Schemata thing is already in.  I think we should at least
mention which features that we already have are from what part of the
standard.  As far as the rest of the standard goes, we might want to
mention whether we've even considered any of each piece in the TODO
list, and what sub-pieces, if any, are already included/scheduled/too
silly to contemplate :)

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: [DOCS] New XML section for documentation

From
Bruce Momjian
Date:
David Fetter wrote:
> On Sat, Aug 26, 2006 at 12:48:32PM -0400, Bruce Momjian wrote:
> > David Fetter wrote:
> > > On Fri, Aug 25, 2006 at 08:37:19PM -0400, Bruce Momjian wrote:
>
> > > > > Speaking of other parts of the SQL:2003 standard, how about one
> > > > > section each that mentions them?  There's
> > > > >
> > > > > Part 4: SQL/PSM (Persistent Stored Modules)
> > > > > Part 9: SQL/MED (Management of External Data) (my favorite)
> > > > > Part 10: SQL/OLB (Object Language Binding)
> > > > > Part 11: SQL/Schemata
> > > > > Part 13: SQL/JRT (Java Routines and Types)
> > > >
> > > > I don't know anything about them.
> > >
> > > We claim SQL standard compliance, so since those are part of
> > > SQL:2003, we probably ought to mention them.  SQL/PSM is a
> > > programming language that lives inside the database, and DB2 and
> > > MySQL have it.  SQL/MED lets people talk to other data stores.
> > > SQL/OLB appears to be derived from equel, which we have as ecpg.
> > > SQL/Schemata contains the information schema.  SQL/JRT appears to
> > > bear some similarity to PL/Java and PL/J.
> >
> > I think the big question is whether we are ever going to implement
> > these?  I think we need to decide that before I mention them.
>
> The SQL/Schemata thing is already in.  I think we should at least

Uh, what is the SQL/Schemata?  Are you sure it is in CVS?

> mention which features that we already have are from what part of the
> standard.  As far as the rest of the standard goes, we might want to
> mention whether we've even considered any of each piece in the TODO
> list, and what sub-pieces, if any, are already included/scheduled/too
> silly to contemplate :)

Well, this seems like something that belongs in our chapter on how we
support the SQL standard.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [DOCS] New XML section for documentation

From
David Fetter
Date:
On Sat, Aug 26, 2006 at 01:16:06PM -0400, Bruce Momjian wrote:
> David Fetter wrote:
> > On Sat, Aug 26, 2006 at 12:48:32PM -0400, Bruce Momjian wrote:
> > > David Fetter wrote:
> > > > On Fri, Aug 25, 2006 at 08:37:19PM -0400, Bruce Momjian wrote:
> >
> > > > > > Speaking of other parts of the SQL:2003 standard, how about one
> > > > > > section each that mentions them?  There's
> > > > > >
> > > > > > Part 4: SQL/PSM (Persistent Stored Modules)
> > > > > > Part 9: SQL/MED (Management of External Data) (my favorite)
> > > > > > Part 10: SQL/OLB (Object Language Binding)
> > > > > > Part 11: SQL/Schemata
> > > > > > Part 13: SQL/JRT (Java Routines and Types)
> > > > >
> > > > > I don't know anything about them.
> > > >
> > > > We claim SQL standard compliance, so since those are part of
> > > > SQL:2003, we probably ought to mention them.  SQL/PSM is a
> > > > programming language that lives inside the database, and DB2 and
> > > > MySQL have it.  SQL/MED lets people talk to other data stores.
> > > > SQL/OLB appears to be derived from equel, which we have as ecpg.
> > > > SQL/Schemata contains the information schema.  SQL/JRT appears to
> > > > bear some similarity to PL/Java and PL/J.
> > >
> > > I think the big question is whether we are ever going to implement
> > > these?  I think we need to decide that before I mention them.
> >
> > The SQL/Schemata thing is already in.  I think we should at least
>
> Uh, what is the SQL/Schemata?  Are you sure it is in CVS?

It contains the information schema, among other things.  We've had the
information schema for awhile. :)

> > mention which features that we already have are from what part of
> > the standard.  As far as the rest of the standard goes, we might
> > want to mention whether we've even considered any of each piece in
> > the TODO list, and what sub-pieces, if any, are already
> > included/scheduled/too silly to contemplate :)
>
> Well, this seems like something that belongs in our chapter on how
> we support the SQL standard.

I'm not too fussy about where it first goes in.  Just *that* it goes
in somewhere.  I'll be happy to start the needed patches. :)

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: [DOCS] New XML section for documentation

From
Bruce Momjian
Date:
David Fetter wrote:
> > > mention which features that we already have are from what part of
> > > the standard.  As far as the rest of the standard goes, we might
> > > want to mention whether we've even considered any of each piece in
> > > the TODO list, and what sub-pieces, if any, are already
> > > included/scheduled/too silly to contemplate :)
> >
> > Well, this seems like something that belongs in our chapter on how
> > we support the SQL standard.
>
> I'm not too fussy about where it first goes in.  Just *that* it goes
> in somewhere.  I'll be happy to start the needed patches. :)

OK, I think the SGML docs are the proper place.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [DOCS] New XML section for documentation

From
Peter Eisentraut
Date:
David Fetter wrote:
> We claim SQL standard compliance,

No, we don't.  And SQL conformance doesn't require you to implement all
parts anyway.

> so since those are part of
> SQL:2003, we probably ought to mention them.  SQL/PSM is a
> programming language that lives inside the database, and DB2 and
> MySQL have it.  SQL/MED lets people talk to other data stores.
> SQL/OLB appears to be derived from equel, which we have as ecpg.
> SQL/Schemata contains the information schema.  SQL/JRT appears to
> bear some similarity to PL/Java and PL/J.

It's pretty useless to talk about stuff that we don't have yet.  The
point of the XML section is that we have a number of things, and users
are having trouble (understandably) fitting them together.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [DOCS] New XML section for documentation

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> I made it clear in the section that the XML syntax was being checked,
> not validation against a schema.  You want Check and Validation
> sections?

"Valid" and "well-formed" have very specific distinct meanings in XML.
(Note that "check" doesn't have any meaning there.)  We will eventually
want a method to verify both the validity and the well-formedness.

I think that a function called xml_valid checks for well-formedness is
an outright bug and needs to be fixed.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [DOCS] New XML section for documentation

From
Peter Eisentraut
Date:
David Fetter wrote:
> The SQL/Schemata thing is already in.  I think we should at least
> mention which features that we already have are from what part of the
> standard.

We do.  Read the documentation.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [DOCS] New XML section for documentation

From
"Joshua D. Drake"
Date:
Peter Eisentraut wrote:
> David Fetter wrote:
>> We claim SQL standard compliance,
>
> No, we don't.  And SQL conformance doesn't require you to implement all
> parts anyway.
>
>> so since those are part of
>> SQL:2003, we probably ought to mention them.  SQL/PSM is a
>> programming language that lives inside the database, and DB2 and
>> MySQL have it.  SQL/MED lets people talk to other data stores.
>> SQL/OLB appears to be derived from equel, which we have as ecpg.
>> SQL/Schemata contains the information schema.  SQL/JRT appears to
>> bear some similarity to PL/Java and PL/J.
>
> It's pretty useless to talk about stuff that we don't have yet.  The
> point of the XML section is that we have a number of things, and users
> are having trouble (understandably) fitting them together.

As separate sections I agree with Peter. However it would be a good idea
to have a section that talks about Potential and/or Upcoming technologies.

All of the above could be covered under that.

Joshua D. Drake


>


--

    === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
    Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/



Re: [DOCS] New XML section for documentation

From
David Fetter
Date:
On Sat, Aug 26, 2006 at 08:38:43PM +0200, Peter Eisentraut wrote:
> David Fetter wrote:
> > We claim SQL standard compliance,
>
> No, we don't.  And SQL conformance doesn't require you to implement
> all parts anyway.

Right.  It'd be nice to be able to tell what level of conformance we
have to which parts of the standard.

> > so since those are part of SQL:2003, we probably ought to mention
> > them.  SQL/PSM is a programming language that lives inside the
> > database, and DB2 and MySQL have it.  SQL/MED lets people talk to
> > other data stores.  SQL/OLB appears to be derived from equel,
> > which we have as ecpg.  SQL/Schemata contains the information
> > schema.  SQL/JRT appears to bear some similarity to PL/Java and
> > PL/J.
>
> It's pretty useless to talk about stuff that we don't have yet.

I think it's useful to mention what's arriving, what's being worked
on, and what's not even being contemplated in the long term.

> The point of the XML section is that we have a number of things, and
> users are having trouble (understandably) fitting them together.

Similar troubles apply--on a smaller scale--to the information schema,
SQL/OLB, SQL/JRT, etc.

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: [DOCS] New XML section for documentation

From
"Joshua D. Drake"
Date:
>>> bear some similarity to PL/Java and PL/J.
>> I think the big question is whether we are ever going to implement
>> these?  I think we need to decide that before I mention them.
>
> The SQL/Schemata thing is already in.  I think we should at least
> mention which features that we already have are from what part of the
> standard.

I also see PSM and OLB as a target.

Joshua D. Drake


> As far as the rest of the standard goes, we might want to
> mention whether we've even considered any of each piece in the TODO
> list, and what sub-pieces, if any, are already included/scheduled/too
> silly to contemplate :)
>
> Cheers,
> D


--

    === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
    Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/



Re: [DOCS] New XML section for documentation

From
Peter Eisentraut
Date:
David Fetter wrote:
> I think it's useful to mention what's arriving, what's being worked
> on, and what's not even being contemplated in the long term.

We don't even have a roadmap of any kind, so the last thing we can do is
put claims of that sort in the documentation.

> Similar troubles apply--on a smaller scale--to the information
> schema, SQL/OLB, SQL/JRT, etc.

The information schema is quite extensively documentated.  If you have
something to add on OLB and JRT, then let's hear your suggestions.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: [DOCS] New XML section for documentation

From
"Nikolay Samokhvalov"
Date:
On 8/26/06, Peter Eisentraut <peter_e@gmx.net> wrote:
> Bruce Momjian wrote:
> > I made it clear in the section that the XML syntax was being checked,
> > not validation against a schema.  You want Check and Validation
> > sections?
>
> "Valid" and "well-formed" have very specific distinct meanings in XML.
> (Note that "check" doesn't have any meaning there.)  We will eventually
> want a method to verify both the validity and the well-formedness.
>
> I think that a function called xml_valid checks for well-formedness is
> an outright bug and needs to be fixed.

That's exactly what I'm talking about. xml_valid() is wrong name and
it may confuse people.
I what to add that, with XML section in the documentation, this bug
becomes more significant.

Bruce suggested to use overload to keep backward compat. - in other
words, 1-arg function for checking for well-formedness and 2-arg
function for validation process. That's bad too:
 - two _different_ actions for one function => another confusion
  - I (as a user) would think that 1-arg function is designed for
validation process for cases when XML document contains a reference to
DTD (as an example).

I stand for fixing it via renaming, breaking backward compatibility.
Later it will be more painful.

BTW, what is the deadline for changes (additions) in docs? I would add
general XML terms (such as what is XML, what is well-formed document,
what is validation; short overview of XML standards and SQL/XML as a
part of SQL:200n, etc Maybe about contrib/xml2 installation process -
actually, XSLT support requires additional lib). Moreover, if SQL/XML
patch will be accepted it will require several words too.
--
Best regards,
Nikolay

Re: New XML section for documentation

From
Bruce Momjian
Date:
I have added a modified version of this to the SGML documentation,
under data types.

---------------------------------------------------------------------------

bruce wrote:
> Here is an new XML section for our SGML documentation.  It explains the
> various XML capabilities, if we support them, and how to use them.
>
> Comments?
>
> ---------------------------------------------------------------------------
>
>
> XML Document Support
> ====================
> XML support is not one capability, but a variety of features supported
> by a database.  These capabilities include storage, import/export,
> validation, indexing, efficiency of modification,  searching,
> transformating, and XML to SQL mapping.  PostgreSQL supports some but
> not all of these XML capabilities.  Future releases of PostgreSQL will
> continue to improve XML support.
>
> Storage
> -------
> PostgreSQL stores XML documents as ordinary text documents.  It does not
> split apart XML documents into its component parts and store each
> element separately.  You can use middle-ware solutions to do that, but
> once done, the data becomes relational and has to be processed
> accordingly.
>
> Import/Export
> -------------
> Because XML documents are stored as normal text documents, they can be
> imported/exported with little complexity.  A simple TEXT field can hold
> up to 1 gigabyte of text, and large objects are available for larger
> documents.
>
> Validation
> ----------
> /contrib/xml2 has a function called xml_valid() that can be used in
> a CHECK constraint to enforce that a field contains valid XML.  It
> does not support validation against a specific XML schema.  A
> server-side language with XML capabilities could be used to do
> schema-specific XML checks.
>
> Indexing
> --------
> Because XML documents are stored as text, full-text indexing tool
> /contrib/tsearch2 can be used to index XML documents.  Of course, the
> searches are text searches, with no XML awareness, but tsearch2 can be
> used with other XML capabilities to dramatically reduce the amount of
> data processed at the XML level.
>
> Modification
> ------------
> If an UPDATE does not modify an XML field, the XML data is shared
> between the old and new rows.  However, if the UPDATE modifies a XML
> field, a full modified copy of the XML field must be created internally.
>
> Searching
> ---------
> XPath searches are implemented using /contrib/xml2.  It processes XML
> text documents and returns results based on the requested query.
>
> Transforming
> ------------
> /contrib/xml2 supports XSL transformations.
>
> XML to SQL Mapping
> -------------------
> This involves converting XML data to and from relational structures.
> PostgreSQL has no internal support for such mapping, and relies on
> external tools to do such conversions.
>
> Missing Features
> ----------------
>     o  XQuery
>     o  SQL/XML syntax (ISO/IEC 9075-14)
>     o  XML data type optimized for XML storage
>
> See also http://www.rpbourret.com/xml/XMLAndDatabases.htm
>
> --
>   Bruce Momjian   bruce@momjian.us
>   EnterpriseDB    http://www.enterprisedb.com
>
>   + If your life is a hard drive, Christ can be your backup. +

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [DOCS] New XML section for documentation

From
Tom Lane
Date:
"Nikolay Samokhvalov" <samokhvalov@gmail.com> writes:
> On 8/26/06, Peter Eisentraut <peter_e@gmx.net> wrote:
>> "Valid" and "well-formed" have very specific distinct meanings in XML.
>> (Note that "check" doesn't have any meaning there.)  We will eventually
>> want a method to verify both the validity and the well-formedness.
>>
>> I think that a function called xml_valid checks for well-formedness is
>> an outright bug and needs to be fixed.

> That's exactly what I'm talking about. xml_valid() is wrong name and
> it may confuse people.

> Bruce suggested to use overload to keep backward compat. - in other
> words, 1-arg function for checking for well-formedness and 2-arg
> function for validation process. That's bad too:

ISTM the right answer is to add xml_is_well_formed() in this release
and have xml_valid as an alias for it, with documentation explaining
that xml_valid is deprecated and will be removed in the next release.
Then we can add a proper validity-checking function too.

Nikolay submitted a patch later
http://archives.postgresql.org/pgsql-patches/2006-09/msg00123.php
that does part of this and can easily be adapted to add the alias.

His patch also adds an xpath_array() function --- what do people
think about that?  It's well past feature freeze ... now we've always
been laxer about contrib than the core code, but still I'm inclined
to say that that function should wait for 8.3.

Comments?  It's time to get a move on with resolving this.

            regards, tom lane

Re: [DOCS] New XML section for documentation

From
Tom Lane
Date:
I wrote:
> ISTM the right answer is to add xml_is_well_formed() in this release
> and have xml_valid as an alias for it, with documentation explaining
> that xml_valid is deprecated and will be removed in the next release.

Not hearing any objection, I've done this.

> His patch also adds an xpath_array() function --- what do people
> think about that?  It's well past feature freeze ... now we've always
> been laxer about contrib than the core code, but still I'm inclined
> to say that that function should wait for 8.3.

I didn't add xpath_array(), but am still open to doing it if there
is any consensus in favor of it.

            regards, tom lane