Thread: From TODO, XML?

From TODO, XML?

From

mlw

Date:

27 July 2001, 08:05:54

I was looking over the todo list and saw that someone wanted to support XML. I
have some quick and dirty stuff that could be used.

OK, what should the feature look like?

Should it be grafted onto pg_dump or should a new utility pg_xml be created?

How strict should it be? A stricter parser is easier to write, one can use a
library, unfortunately most xml is crap and for the utility to be useful, it
has to be real fuzzy.

Any input would be appreciated.


-- 
5-4-3-2-1 Thunderbirds are GO!
------------------------
http://www.mohawksoft.com

Re: From TODO, XML?

From

jgray@beansindustry.co.uk

Date:

27 July 2001, 14:36:22

In article <3B615336.D654E7E1@mohawksoft.com>, markw@mohawksoft.com (mlw)
wrote:
> I was looking over the todo list and saw that someone wanted to support
> XML. I have some quick and dirty stuff that could be used.
> 

I'm not clear from the TODO what that "XML support" might involve. The
reference to pg_dump suggests an XML dump format for databases. That only
makes sense if we build an XML frontend that can load XML-based pg_dump
files.

I can't see any very useful application though, unless someone has a
standard for database dumps using XML -I'd have thought that our current
"list of SQL statements" dump is fine (and useful if you speak SQL)

> OK, what should the feature look like?
> 

What's the feature for? The things I've been working on are trying to make
an XML parser available in the backend, and to build some XML document
manipulation functions/operators. This is useful for what I'm doing (using
XML documents as short packets of human and machine-readable descriptive
data) and may be useful to other people. This work hasn't progressed very
far (I did only spend an afternoon or so writing it though....):
(available at http://www.cabbage.uklinux.net/pgxml.tar.gz)

One obvious (and current) topic is XQuery and we might ask whether PG
could/should implement it. I think some thinking would be needed on that
because  a) It involves having a second, non-SQL parser on the front-end
and that could be quite a large undertaking  and  b)  there's probably
(from my initial reading) some discrepancy  between the PG (and indeed
SQL) data model and the XQuery one. If  we could work round that, XQuery
*might* be an attraction to people. Certainly the ability to form one XML
document out of another via a query may be good for some projects.

Perhaps if people interested in XML "stuff" could add here, we might flesh
out a little more of what's desired.

> Should it be grafted onto pg_dump or should a new utility pg_xml be
> created?
> 
> How strict should it be? A stricter parser is easier to write, one can
> use a library, unfortunately most xml is crap and for the utility to be
> useful, it has to be real fuzzy.
> 

I don't think you really can write a non-strict XML parser. At least, not
if you want the resulting DOM to be useful - violations of well-formedness
probably result in logical difficulties wth the document structure. i.e. 

<a>
<b>text
<c>more text</c>
</a>

Is <c> within <b>? Are <b> and <c> siblings? These are answerable with
well-formed XML -And they're very relevant questions to ask for many XML
processing tasks. 

> Any input would be appreciated.
> 

Likewise -I'd be very insterested to know what sort of things people were
interested in -as I've found an area where I have a need which others
might share. I'd like to contribute some effort into it.

Regards

John

Re: From TODO, XML?

From

mlw

Date:

27 July 2001, 20:09:21

jgray@beansindustry.co.uk wrote:
> 
> In article <3B615336.D654E7E1@mohawksoft.com>, markw@mohawksoft.com (mlw)
> wrote:
> > I was looking over the todo list and saw that someone wanted to support
> > XML. I have some quick and dirty stuff that could be used.
> >
> 
> I'm not clear from the TODO what that "XML support" might involve. The
> reference to pg_dump suggests an XML dump format for databases. That only
> makes sense if we build an XML frontend that can load XML-based pg_dump
> files.
> 
> I can't see any very useful application though, unless someone has a
> standard for database dumps using XML -I'd have thought that our current
> "list of SQL statements" dump is fine (and useful if you speak SQL)

Actually I have been thinking about a couple projects I have done. Vendors like
to think XML is a way to distribute databases.

So a parser that can scan a DTD and make a usable create table (...) line would
be very helpful. One which could compare a DTD to an existing SQL table and map
XML data correctly. (Or error if conversion from data to SQL types yields an
error.)

During a database export, a SQL table could be used to create a DTD.

I was thinking along the line of being able to use XML as a fairly portable
import/export feature. Having this ability, as a generic solution, would have
made several tasks MUCH easier.

I would also like the XML parser to be fuzzy enough to take some bad XML
(because ALL XML is bad), because a lot of vendors like to distribute data in
bad XML.

-- 
5-4-3-2-1 Thunderbirds are GO!
------------------------
http://www.mohawksoft.com

Re: From TODO, XML?

From

fche@redhat.com (Frank Ch. Eigler)

Date:

27 July 2001, 22:24:55

markw wrote:

: [...]  Actually I have been thinking about a couple projects I have
: done. Vendors like to think XML is a way to distribute databases.

I would find it very helpful to see a table of what sorts of XML
functionality each major vendor supports.


: So a parser that can scan a DTD and make a usable create table (...) 
: line would be very helpful. [...]

Hmm, but hierarchically structured documents such as XML don't map
well to a relational model.  The former tend to be recursive (e.g.,
have more levels of containment than the one or two that might be
mappable to tables and columns.)


: During a database export, a SQL table could be used to create a DTD.
: [...]

This mapping (relational model -> XML) is more straightforward.


- FChE

Re: From TODO, XML?

From

mlw

Date:

28 July 2001, 03:29:15

"Frank Ch. Eigler" wrote:
> 
> markw wrote:
> 
> : [...]  Actually I have been thinking about a couple projects I have
> : done. Vendors like to think XML is a way to distribute databases.
> 
> I would find it very helpful to see a table of what sorts of XML
> functionality each major vendor supports.

Actually I was thinking of databases of data, not database systems.

> 
> : So a parser that can scan a DTD and make a usable create table (...)
> : line would be very helpful. [...]
> 
> Hmm, but hierarchically structured documents such as XML don't map
> well to a relational model.  The former tend to be recursive (e.g.,
> have more levels of containment than the one or two that might be
> mappable to tables and columns.)

Yes!!! Exactly, being able to understand the recursive nature of XML and create
relations on the fly would be a very cool feature.

> 
> : During a database export, a SQL table could be used to create a DTD.
> : [...]
> 
> This mapping (relational model -> XML) is more straightforward.

Totally.

Re: Re: From TODO, XML?

From

"Ken Hirsch"

Date:

29 July 2001, 11:52:29

mlw <markw@mohawksoft.com> wrote:

> "Frank Ch. Eigler" wrote:
> > : So a parser that can scan a DTD and make a usable create table (...)
> > : line would be very helpful. [...]
> >
> > Hmm, but hierarchically structured documents such as XML don't map
> > well to a relational model.  The former tend to be recursive (e.g.,
> > have more levels of containment than the one or two that might be
> > mappable to tables and columns.)
>
> Yes!!! Exactly, being able to understand the recursive nature of XML and
create
> relations on the fly would be a very cool feature.

I think there is a pretty straight forward mapping, except for one possible
ambiguity.

If an element, say <address>, is contained within another element, say
<employee>, it could either be a column (or group of columns) in an Employee
table, or it could be a table Address which references Employee.

When you say "create relations on the fly", what exactly do you mean?  I can
see it would be handy to have CREATE TABLE statements written for you, but
it seems likely that a human would want to edit them before the tables are
actually created.  You cannot infer much type information from the DTD.  I
don't think there's a way to infer a primary key from a DTD, so you would
want to either specify one or add a serial column (or perhaps that would
always be done automatically).  An XML schema would have more information,
of course.

Re: Re: From TODO, XML?

From

mlw

Date:

29 July 2001, 12:52:05

Ken Hirsch wrote:
> 
> mlw <markw@mohawksoft.com> wrote:
> 
> > "Frank Ch. Eigler" wrote:
> > > : So a parser that can scan a DTD and make a usable create table (...)
> > > : line would be very helpful. [...]
> > >
> > > Hmm, but hierarchically structured documents such as XML don't map
> > > well to a relational model.  The former tend to be recursive (e.g.,
> > > have more levels of containment than the one or two that might be
> > > mappable to tables and columns.)
> >
> > Yes!!! Exactly, being able to understand the recursive nature of XML and
> create
> > relations on the fly would be a very cool feature.
> 
> I think there is a pretty straight forward mapping, except for one possible
> ambiguity.
> 
> If an element, say <address>, is contained within another element, say
> <employee>, it could either be a column (or group of columns) in an Employee
> table, or it could be a table Address which references Employee.
> 
> When you say "create relations on the fly", what exactly do you mean?  I can
> see it would be handy to have CREATE TABLE statements written for you, but
> it seems likely that a human would want to edit them before the tables are
> actually created.  You cannot infer much type information from the DTD.  I
> don't think there's a way to infer a primary key from a DTD, so you would
> want to either specify one or add a serial column (or perhaps that would
> always be done automatically).  An XML schema would have more information,
> of course.

I have been thinking about this. A lot of guessing would have to be done, of
course. But, unless some extra information is specified, when you have an XML
record, contained within another, the parser would have to generate its own
primary key and a sequence for each table. Obviously, the user should be able
to specify the primary key for each table, but lacking that input, the XML
parser/importer should do it automatically.

So this:

<employee>
<name>Bill</name>
<position>Programmer</position>
<address><number>1290</number><street>    <name>Canton Ave</name></street><town>    <name>Milton</name></town>
</address>
</emplyee>

The above is almost impossible to convert to a relational format without
additional information or a good set of rules. However, we can determine which
XML titles are "containers" and which are "data." "employee" is a container
because it has sub tags. "position" is "data" because it has no sub tags.

We can recursively scan this hierarchy, decide which are containers and which
are data. Data gets assigned an appropriate SQL type and containers get
separated from the parent container, and an integer index is put in its place.
For each container, either a primary key is specified or created on the fly. 

We insert sub containers first and pop back the primary key value, until we
have the whole record. The primary key could even be the OID.

A second strategy is to concatenate the hierarchy into the field name, as
street_name, town_name, and so on.

What do you think?

-- 
5-4-3-2-1 Thunderbirds are GO!
------------------------
http://www.mohawksoft.com

Re: Re: From TODO, XML?

From

"Ross J. Reedstrom"

Date:

29 July 2001, 15:49:19

On Sun, Jul 29, 2001 at 12:19:48PM -0400, mlw wrote:
> 
> <employee>
> <name>Bill</name>
> <position>Programmer</position>
> <address>
>     <number>1290</number>
>     <street>
>         <name>Canton Ave</name>
>     </street>
>     
>     <town>
>         <name>Milton</name>
>     </town>
> </address>
> </emplyee>
> 
> The above is almost impossible to convert to a relational format without
> additional information or a good set of rules. However, we can determine which
> XML titles are "containers" and which are "data." "employee" is a container
> because it has sub tags. "position" is "data" because it has no sub tags.
> 
> We can recursively scan this hierarchy, decide which are containers and which
> are data. Data gets assigned an appropriate SQL type and containers get
> separated from the parent container, and an integer index is put in its place.
> For each container, either a primary key is specified or created on the fly. 
> 
> We insert sub containers first and pop back the primary key value, until we
> have the whole record. The primary key could even be the OID.
> 
> A second strategy is to concatenate the hierarchy into the field name, as
> street_name, town_name, and so on.
> 
> 
> What do you think?

What about attributes on tags. They're data, certainly. Do they then
promote the tag they're in to a container?

Ross

Re: From TODO, XML?

From

mlw

Date:

29 July 2001, 16:02:48

"Ross J. Reedstrom" wrote:
> 
> On Sun, Jul 29, 2001 at 12:19:48PM -0400, mlw wrote:
> >
> > <employee>
> > <name>Bill</name>
> > <position>Programmer</position>
> > <address>
> >       <number>1290</number>
> >       <street>
> >               <name>Canton Ave</name>
> >       </street>
> >
> >       <town>
> >               <name>Milton</name>
> >       </town>
> > </address>
> > </emplyee>
> >
> > The above is almost impossible to convert to a relational format without
> > additional information or a good set of rules. However, we can determine which
> > XML titles are "containers" and which are "data." "employee" is a container
> > because it has sub tags. "position" is "data" because it has no sub tags.
> >
> > We can recursively scan this hierarchy, decide which are containers and which
> > are data. Data gets assigned an appropriate SQL type and containers get
> > separated from the parent container, and an integer index is put in its place.
> > For each container, either a primary key is specified or created on the fly.
> >
> > We insert sub containers first and pop back the primary key value, until we
> > have the whole record. The primary key could even be the OID.
> >
> > A second strategy is to concatenate the hierarchy into the field name, as
> > street_name, town_name, and so on.
> >
> >
> > What do you think?
> 
> What about attributes on tags. They're data, certainly. Do they then
> promote the tag they're in to a container?

Attribute tags are normally something you should know about before hand. There
has to be a number of tags which do not force a container. 

This whole thing depends on a good DTD.

-- 
5-4-3-2-1 Thunderbirds are GO!
------------------------
http://www.mohawksoft.com

Re: From TODO, XML?

From

Bruce Momjian

Date:

29 July 2001, 23:19:44

> I was looking over the todo list and saw that someone wanted to support XML. I
> have some quick and dirty stuff that could be used.
> 
> OK, what should the feature look like?
> 
> Should it be grafted onto pg_dump or should a new utility pg_xml be created?
> 
> How strict should it be? A stricter parser is easier to write, one can use a
> library, unfortunately most xml is crap and for the utility to be useful, it
> has to be real fuzzy.
> 
> Any input would be appreciated.

The updated TODO item is:
* Add XML interface:  psql, pg_dump, COPY, separate server (?)

I am unsure where we want it.  We could do COPY and hence a flag in
pg_dump, or psql like we do HTML from psql.  A separate server would
accept XML queries and return results.

What do people want?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: From TODO, XML?

From

Bruce Momjian

Date:

30 July 2001, 00:12:06

> > I would find it very helpful to see a table of what sorts of XML
> > functionality each major vendor supports.
> 
> Actually I was thinking of databases of data, not database systems.

I think we can go two ways.  Allow COPY/pg_dump to read/write XML, or
write some perl scripts to convert XML to/from our pg_dump format.  The
latter seems quite easy and fast.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: From TODO, XML?

From

Bruce Momjian

Date:

30 July 2001, 00:37:21

> I have been fighting, for a while now, with idiot data vendors that think XML
> is a cure all. The problem is that XML is a hierarchical format where as SQL is
> a relational format.
> 
> It would be good to get pg_dump to write an XML file and DTD, but getting
> external sources of XML into PostgreSQL is WAY more complicated. If an XML
> import is to be useful beyond just a different format for pg_dump, there has to
> be some intelligent database construction based on the XML information.
> 
> Go to mp3.com, and download some of their XML format data, first, it is bad
> XML, second, it is hierarchical.
> 
> I have managed to get several XML files into PostgreSQL by writing a parser,
> and it is a huge hassle, the public parsers are too picky. I am thinking that a
> fuzzy parser, combined with some intelligence and an XML DTD reader, could make
> a very cool utility, one which I have not been able to find.
> 
> Perhaps it is a two stage process? First pass creates a schema which can be
> modified/corrected, the second pass loads the data.

Can we accept only relational XML.  Does that buy us anything?  Are the
other database vendors outputting heirchical XML?  Are they using
foreign/primary keys to do it?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: From TODO, XML?

From

mlw

Date:

30 July 2001, 00:54:09

Bruce Momjian wrote:
> 
> > > I would find it very helpful to see a table of what sorts of XML
> > > functionality each major vendor supports.
> >
> > Actually I was thinking of databases of data, not database systems.
> 
> I think we can go two ways.  Allow COPY/pg_dump to read/write XML, or
> write some perl scripts to convert XML to/from our pg_dump format.  The
> latter seems quite easy and fast.

I have been fighting, for a while now, with idiot data vendors that think XML
is a cure all. The problem is that XML is a hierarchical format where as SQL is
a relational format.

It would be good to get pg_dump to write an XML file and DTD, but getting
external sources of XML into PostgreSQL is WAY more complicated. If an XML
import is to be useful beyond just a different format for pg_dump, there has to
be some intelligent database construction based on the XML information.

Go to mp3.com, and download some of their XML format data, first, it is bad
XML, second, it is hierarchical.

I have managed to get several XML files into PostgreSQL by writing a parser,
and it is a huge hassle, the public parsers are too picky. I am thinking that a
fuzzy parser, combined with some intelligence and an XML DTD reader, could make
a very cool utility, one which I have not been able to find.

Perhaps it is a two stage process? First pass creates a schema which can be
modified/corrected, the second pass loads the data.

Re: Re: From TODO, XML?

From

Gavin Sherry

Date:

30 July 2001, 01:32:38

On Mon, 30 Jul 2001, mlw wrote:

> Bruce Momjian wrote:
> > 
> > > > I would find it very helpful to see a table of what sorts of XML
> > > > functionality each major vendor supports.
> > >
> > > Actually I was thinking of databases of data, not database systems.
> > 
> > I think we can go two ways.  Allow COPY/pg_dump to read/write XML, or
> > write some perl scripts to convert XML to/from our pg_dump format.  The
> > latter seems quite easy and fast.
> 

> I have managed to get several XML files into PostgreSQL by writing a parser,
> and it is a huge hassle, the public parsers are too picky. I am thinking that a
> fuzzy parser, combined with some intelligence and an XML DTD reader, could make
> a very cool utility, one which I have not been able to find.

I have had the same problem. The best XML parser I could find was the
gnome-xml library at xmlsoft.org (libxml). I am currently using this in C
to replicate a client's legacy Notes system on to Postgres. In this case I
was lucky in as much as I had some input on the XML namespace etc. XML was
used because they had already designed an XML based dump utility.

However, the way XML is being used is very basic. Only creation of tables,
insert and delete are handled. Libxml works fine with this however,
handling DTD/XML parsing, UTF-8, UTF-16 and iso-8859-1, validation
etc.

The main problem then is that every vendor has a different xml name
space. If people really want to pursue this, the best thing to do would be
to try to work with other open source database developers and design a
suitable XML namespace for open source databases. Naturally, there will be
much contention here about he most suitable this and that. It will be
difficult to get a real spec going and will probably be much more
trouble than it is worth. As such, if this fails, then we cannot expect
Oracle, IBM, Sybase, MS and the rest to ever do it.

Perhaps then it would be sufficient for pg_dump/restore to identify the
name space of a given database dump and parse it according to that name
space. Based on command-line arguments, pg_restore/dump could either
die/ignore/transmogrify instructions in the XML which PG does not support 
or recognise. It would also be useful if pg_dump could dump data from
postgres in the supported XML namespaces.

So it essentially comes down to how useful it will be and who has time to
code it up =) (as always).

**Creative Solution**

For those who have too much time on their hands and have managed to
untangle some of the syntax in the W3C XSLT 1.0 specification, how about
an XSL stylesheet to transform an XML based database dump from some third
party into (postgres) SQL. Erk! There would have to be an award for such a
thing ;-).

Gavin

Re: Re: From TODO, XML?

From

Karel Zak

Date:

30 July 2001, 04:40:40

On Mon, Jul 30, 2001 at 03:43:26PM +1000, Gavin Sherry wrote:
> On Mon, 30 Jul 2001, mlw wrote:
> 
> I have had the same problem. The best XML parser I could find was the
> gnome-xml library at xmlsoft.org (libxml). I am currently using this in C
What happen if you use DOM type of XML parser for large file? A dump from
SQL DB can be realy large. IMHO is for this (data dump from SQL DB) is
better SAX type of XML parser.

> an XSL stylesheet to transform an XML based database dump from some third
Yes, it's right way how use XML.
        Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz,
http://mape.jcu.cz

Re: From TODO, XML?

From

mlw

Date:

30 July 2001, 04:40:44

Bruce Momjian wrote:
> 
> > I have been fighting, for a while now, with idiot data vendors that think XML
> > is a cure all. The problem is that XML is a hierarchical format where as SQL is
> > a relational format.
> >
> > It would be good to get pg_dump to write an XML file and DTD, but getting
> > external sources of XML into PostgreSQL is WAY more complicated. If an XML
> > import is to be useful beyond just a different format for pg_dump, there has to
> > be some intelligent database construction based on the XML information.
> >
> > Go to mp3.com, and download some of their XML format data, first, it is bad
> > XML, second, it is hierarchical.
> >
> > I have managed to get several XML files into PostgreSQL by writing a parser,
> > and it is a huge hassle, the public parsers are too picky. I am thinking that a
> > fuzzy parser, combined with some intelligence and an XML DTD reader, could make
> > a very cool utility, one which I have not been able to find.
> >
> > Perhaps it is a two stage process? First pass creates a schema which can be
> > modified/corrected, the second pass loads the data.
> 
> Can we accept only relational XML.  Does that buy us anything?  Are the
> other database vendors outputting heirchical XML?  Are they using
> foreign/primary keys to do it?

Then what's the point? Almost no one creates a non-hierarchical XML. For the
utility to be usefull, beyond just a different format for pg_dump, it has to
deal with these issues and do the right thing.


> 
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
5-4-3-2-1 Thunderbirds are GO!
------------------------
http://www.mohawksoft.com

Re: Re: From TODO, XML?

From

Bruce Momjian

Date:

30 July 2001, 11:12:49

> > > I have managed to get several XML files into PostgreSQL by writing a parser,
> > > and it is a huge hassle, the public parsers are too picky. I am thinking that a
> > > fuzzy parser, combined with some intelligence and an XML DTD reader, could make
> > > a very cool utility, one which I have not been able to find.
> > >
> > > Perhaps it is a two stage process? First pass creates a schema which can be
> > > modified/corrected, the second pass loads the data.
> > 
> > Can we accept only relational XML.  Does that buy us anything?  Are the
> > other database vendors outputting heirchical XML?  Are they using
> > foreign/primary keys to do it?
> 
> Then what's the point? Almost no one creates a non-hierarchical XML. For the
> utility to be usefull, beyond just a different format for pg_dump, it has to
> deal with these issues and do the right thing.

Oh, seems XML will be much more complicated than I thought.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: From TODO, XML?

From

jgray@beansindustry.co.uk

Date:

30 July 2001, 11:28:29

In article <200107300319.f6U3JGY24953@candle.pha.pa.us>,
pgman@candle.pha.pa.us (Bruce Momjian) wrote:

> The updated TODO item is:
> 
>     * Add XML interface:  psql, pg_dump, COPY, separate server (?)
> 
> I am unsure where we want it.  We could do COPY and hence a flag in
> pg_dump, or psql like we do HTML from psql.  A separate server would
> accept XML queries and return results.
> 
> What do people want?
> 

I am interested in the side suggested by "separate server" -namely the
extent that pg can provide a high-performance, transactional document
store with good query capabilities. 

On the other hand, "high-performance" and "transactional" are not
especially necessary for my current work, so I'm looking at query
capabilities at present (thus the parser interface code I've done to
date). It's also worth pointing out that we (Azuli) are not working for a
very big market, so maybe best not to launch a huge project on account of
my peccadilloes with XML :)

--

John Gray
Azuli IT

Re: From TODO, XML?

From

mlw

Date:

31 July 2001, 09:55:25

Bruce Momjian wrote:
>
> > > > I have managed to get several XML files into PostgreSQL by writing a parser,
> > > > and it is a huge hassle, the public parsers are too picky. I am thinking that a
> > > > fuzzy parser, combined with some intelligence and an XML DTD reader, could make
> > > > a very cool utility, one which I have not been able to find.
> > > >
> > > > Perhaps it is a two stage process? First pass creates a schema which can be
> > > > modified/corrected, the second pass loads the data.
> > >
> > > Can we accept only relational XML.  Does that buy us anything?  Are the
> > > other database vendors outputting heirchical XML?  Are they using
> > > foreign/primary keys to do it?
> >
> > Then what's the point? Almost no one creates a non-hierarchical XML. For the
> > utility to be usefull, beyond just a different format for pg_dump, it has to
> > deal with these issues and do the right thing.
>
> Oh, seems XML will be much more complicated than I thought.

I think an XML "output" for pg_dump would be a pretty good/easy feature. It is
easy to create XML.

<record>
<field1>bla bla</field1>
<field2>foo bar</field2>
</record>

Is very easy to create. Of course a little work would be needed to take
information of the field (column) types into a DTD, but the actual XML is not
much more complicated than a printf, i.e. printf("<%s>%s</%s>", name, data,
name);

The real issues is reading XML. Postgres can make a DTD and XML file which can
be read by a strict parser, but that does not imply the inverse.

Attached is an XML file from MP3.com. For an XML import to be anything but
market/feature list candy, it should be able to import this sort of data,
because when people say XML, this is the sort of data they are thinking about.

If you take the time to examine the file, you will see it represents four or
five distinct tables in a relational database. These tables are Song, Artist,
[Cdlist,] Cd, and Genre.

Song has a number of fields, plus foreign keys: Artist, Cdlist, and Genre.
Cdlist would have to have a synthetic primary key (OID, sequence?), which
tables Cd and Song would reference. Cdlist would probably never be used in a
query.

I think it is doable as a project (which everyone will be glad to have and
complain about), but I think it is far more complicated than a one or two day
modification of pg_dump.

--
5-4-3-2-1 Thunderbirds are GO!
------------------------
http://www.mohawksoft.com<Song>
  <id>1004001</id>
  <Artist>
    <id>121693</id>
    <xmlUrl>http://contentfeeds.mp3.com/standard_2.0/data_files/artists/121/121693.xml</xmlUrl>
  </Artist>
  <status>AVAILABLE</status>
  <name>conscious</name>
  <url>http://artists.mp3s.com/artist_song/1004/1004001.html</url>
  <description>A hard techno tune in the European style.  Deep sleepers regain consciousness.  Are they awake, are they
aliveor is it all a dream..</description> 
  <creditsCopyrightInfo>e n c r y p t i o n </creditsCopyrightInfo>
  <sourceCd>portal</sourceCd>
  <recordLabel>white label</recordLabel>
  <Cdlist>
    <Cd>
      <id>66944</id>
      <xmlUrl>http://contentfeeds.mp3.com/standard_2.0/data_files/dam_cds/66/66944.xml</xmlUrl>
    </Cd>
  </Cdlist>
  <parentalAdvisory>N</parentalAdvisory>
  <pictureUrl>http://images.mp3.com/mp3s/71/e_n_c_r_y_p_t_i_o_n/artificial1.jpg</pictureUrl>
  <Genre>
    <name>Intelligent Techno</name>
    <url>http://genres.mp3.com/music/electronic/techno/intelligent_techno</url>
  </Genre>
  <filesize>7.8</filesize>

<addToMyMp3>http://my.mp3.com/music?Action=sl-1004001,cm-ad,cs-pc&origin=song&origin_id=1004001&location=generic&wqx=AddToMyMP3</addToMyMp3>

<greetingsUrl>http://musicgreetings.mp3.com/email_song?song_id=1004001&origin=song&origin_id=1004001&location=generic&band_id=121693</greetingsUrl>
</Song>

Re: Re: From TODO, XML?

From

Gilles DAROLD

Date:

31 July 2001, 11:32:36

Hi,

Why don't use the excellent DBIx-XML_RDB perl module ? Give it the query
it will return XML output as you sample. With some hack you can do what you
want...

Regards

Gilles DAROLD

mlw wrote:

> Bruce Momjian wrote:
> >
> > > > > I have managed to get several XML files into PostgreSQL by writing a parser,
> > > > > and it is a huge hassle, the public parsers are too picky. I am thinking that a
> > > > > fuzzy parser, combined with some intelligence and an XML DTD reader, could make
> > > > > a very cool utility, one which I have not been able to find.
> > > > >
> > > > > Perhaps it is a two stage process? First pass creates a schema which can be
> > > > > modified/corrected, the second pass loads the data.
> > > >
> > > > Can we accept only relational XML.  Does that buy us anything?  Are the
> > > > other database vendors outputting heirchical XML?  Are they using
> > > > foreign/primary keys to do it?
> > >
> > > Then what's the point? Almost no one creates a non-hierarchical XML. For the
> > > utility to be usefull, beyond just a different format for pg_dump, it has to
> > > deal with these issues and do the right thing.
> >
> > Oh, seems XML will be much more complicated than I thought.
>
> I think an XML "output" for pg_dump would be a pretty good/easy feature. It is
> easy to create XML.
>
> <record>
> <field1>bla bla</field1>
> <field2>foo bar</field2>
> </record>
>

Re: From TODO, XML?

From

mlw

Date:

31 July 2001, 12:19:04

Gilles DAROLD wrote:
> 
> Hi,
> 
> Why don't use the excellent DBIx-XML_RDB perl module ? Give it the query
> it will return XML output as you sample. With some hack you can do what you
> want...
> 
The point I was trying to make is that XML is trivial to create. It is much
more difficult to read. I think pg_dump is a "better" place for the export, in
that all the logic to find the fields, types, and data are already there. Were
a simple option presented, --xml, it would probably be easier to add xml export
to pg_dump than create a new utility, but then again, I didn't write pg_dump
and therefore do not know for sure.



-- 
5-4-3-2-1 Thunderbirds are GO!
------------------------
http://www.mohawksoft.com