Thread: From TODO, XML?
I was looking over the todo list and saw that someone wanted to support XML. I have some quick and dirty stuff that could be used. OK, what should the feature look like? Should it be grafted onto pg_dump or should a new utility pg_xml be created? How strict should it be? A stricter parser is easier to write, one can use a library, unfortunately most xml is crap and for the utility to be useful, it has to be real fuzzy. Any input would be appreciated. -- 5-4-3-2-1 Thunderbirds are GO! ------------------------ http://www.mohawksoft.com
In article <3B615336.D654E7E1@mohawksoft.com>, markw@mohawksoft.com (mlw) wrote: > I was looking over the todo list and saw that someone wanted to support > XML. I have some quick and dirty stuff that could be used. > I'm not clear from the TODO what that "XML support" might involve. The reference to pg_dump suggests an XML dump format for databases. That only makes sense if we build an XML frontend that can load XML-based pg_dump files. I can't see any very useful application though, unless someone has a standard for database dumps using XML -I'd have thought that our current "list of SQL statements" dump is fine (and useful if you speak SQL) > OK, what should the feature look like? > What's the feature for? The things I've been working on are trying to make an XML parser available in the backend, and to build some XML document manipulation functions/operators. This is useful for what I'm doing (using XML documents as short packets of human and machine-readable descriptive data) and may be useful to other people. This work hasn't progressed very far (I did only spend an afternoon or so writing it though....): (available at http://www.cabbage.uklinux.net/pgxml.tar.gz) One obvious (and current) topic is XQuery and we might ask whether PG could/should implement it. I think some thinking would be needed on that because a) It involves having a second, non-SQL parser on the front-end and that could be quite a large undertaking and b) there's probably (from my initial reading) some discrepancy between the PG (and indeed SQL) data model and the XQuery one. If we could work round that, XQuery *might* be an attraction to people. Certainly the ability to form one XML document out of another via a query may be good for some projects. Perhaps if people interested in XML "stuff" could add here, we might flesh out a little more of what's desired. > Should it be grafted onto pg_dump or should a new utility pg_xml be > created? > > How strict should it be? A stricter parser is easier to write, one can > use a library, unfortunately most xml is crap and for the utility to be > useful, it has to be real fuzzy. > I don't think you really can write a non-strict XML parser. At least, not if you want the resulting DOM to be useful - violations of well-formedness probably result in logical difficulties wth the document structure. i.e. <a> <b>text <c>more text</c> </a> Is <c> within <b>? Are <b> and <c> siblings? These are answerable with well-formed XML -And they're very relevant questions to ask for many XML processing tasks. > Any input would be appreciated. > Likewise -I'd be very insterested to know what sort of things people were interested in -as I've found an area where I have a need which others might share. I'd like to contribute some effort into it. Regards John
jgray@beansindustry.co.uk wrote: > > In article <3B615336.D654E7E1@mohawksoft.com>, markw@mohawksoft.com (mlw) > wrote: > > I was looking over the todo list and saw that someone wanted to support > > XML. I have some quick and dirty stuff that could be used. > > > > I'm not clear from the TODO what that "XML support" might involve. The > reference to pg_dump suggests an XML dump format for databases. That only > makes sense if we build an XML frontend that can load XML-based pg_dump > files. > > I can't see any very useful application though, unless someone has a > standard for database dumps using XML -I'd have thought that our current > "list of SQL statements" dump is fine (and useful if you speak SQL) Actually I have been thinking about a couple projects I have done. Vendors like to think XML is a way to distribute databases. So a parser that can scan a DTD and make a usable create table (...) line would be very helpful. One which could compare a DTD to an existing SQL table and map XML data correctly. (Or error if conversion from data to SQL types yields an error.) During a database export, a SQL table could be used to create a DTD. I was thinking along the line of being able to use XML as a fairly portable import/export feature. Having this ability, as a generic solution, would have made several tasks MUCH easier. I would also like the XML parser to be fuzzy enough to take some bad XML (because ALL XML is bad), because a lot of vendors like to distribute data in bad XML. -- 5-4-3-2-1 Thunderbirds are GO! ------------------------ http://www.mohawksoft.com
markw wrote: : [...] Actually I have been thinking about a couple projects I have : done. Vendors like to think XML is a way to distribute databases. I would find it very helpful to see a table of what sorts of XML functionality each major vendor supports. : So a parser that can scan a DTD and make a usable create table (...) : line would be very helpful. [...] Hmm, but hierarchically structured documents such as XML don't map well to a relational model. The former tend to be recursive (e.g., have more levels of containment than the one or two that might be mappable to tables and columns.) : During a database export, a SQL table could be used to create a DTD. : [...] This mapping (relational model -> XML) is more straightforward. - FChE
"Frank Ch. Eigler" wrote: > > markw wrote: > > : [...] Actually I have been thinking about a couple projects I have > : done. Vendors like to think XML is a way to distribute databases. > > I would find it very helpful to see a table of what sorts of XML > functionality each major vendor supports. Actually I was thinking of databases of data, not database systems. > > : So a parser that can scan a DTD and make a usable create table (...) > : line would be very helpful. [...] > > Hmm, but hierarchically structured documents such as XML don't map > well to a relational model. The former tend to be recursive (e.g., > have more levels of containment than the one or two that might be > mappable to tables and columns.) Yes!!! Exactly, being able to understand the recursive nature of XML and create relations on the fly would be a very cool feature. > > : During a database export, a SQL table could be used to create a DTD. > : [...] > > This mapping (relational model -> XML) is more straightforward. Totally.
mlw <markw@mohawksoft.com> wrote: > "Frank Ch. Eigler" wrote: > > : So a parser that can scan a DTD and make a usable create table (...) > > : line would be very helpful. [...] > > > > Hmm, but hierarchically structured documents such as XML don't map > > well to a relational model. The former tend to be recursive (e.g., > > have more levels of containment than the one or two that might be > > mappable to tables and columns.) > > Yes!!! Exactly, being able to understand the recursive nature of XML and create > relations on the fly would be a very cool feature. I think there is a pretty straight forward mapping, except for one possible ambiguity. If an element, say <address>, is contained within another element, say <employee>, it could either be a column (or group of columns) in an Employee table, or it could be a table Address which references Employee. When you say "create relations on the fly", what exactly do you mean? I can see it would be handy to have CREATE TABLE statements written for you, but it seems likely that a human would want to edit them before the tables are actually created. You cannot infer much type information from the DTD. I don't think there's a way to infer a primary key from a DTD, so you would want to either specify one or add a serial column (or perhaps that would always be done automatically). An XML schema would have more information, of course.
Ken Hirsch wrote: > > mlw <markw@mohawksoft.com> wrote: > > > "Frank Ch. Eigler" wrote: > > > : So a parser that can scan a DTD and make a usable create table (...) > > > : line would be very helpful. [...] > > > > > > Hmm, but hierarchically structured documents such as XML don't map > > > well to a relational model. The former tend to be recursive (e.g., > > > have more levels of containment than the one or two that might be > > > mappable to tables and columns.) > > > > Yes!!! Exactly, being able to understand the recursive nature of XML and > create > > relations on the fly would be a very cool feature. > > I think there is a pretty straight forward mapping, except for one possible > ambiguity. > > If an element, say <address>, is contained within another element, say > <employee>, it could either be a column (or group of columns) in an Employee > table, or it could be a table Address which references Employee. > > When you say "create relations on the fly", what exactly do you mean? I can > see it would be handy to have CREATE TABLE statements written for you, but > it seems likely that a human would want to edit them before the tables are > actually created. You cannot infer much type information from the DTD. I > don't think there's a way to infer a primary key from a DTD, so you would > want to either specify one or add a serial column (or perhaps that would > always be done automatically). An XML schema would have more information, > of course. I have been thinking about this. A lot of guessing would have to be done, of course. But, unless some extra information is specified, when you have an XML record, contained within another, the parser would have to generate its own primary key and a sequence for each table. Obviously, the user should be able to specify the primary key for each table, but lacking that input, the XML parser/importer should do it automatically. So this: <employee> <name>Bill</name> <position>Programmer</position> <address><number>1290</number><street> <name>Canton Ave</name></street><town> <name>Milton</name></town> </address> </emplyee> The above is almost impossible to convert to a relational format without additional information or a good set of rules. However, we can determine which XML titles are "containers" and which are "data." "employee" is a container because it has sub tags. "position" is "data" because it has no sub tags. We can recursively scan this hierarchy, decide which are containers and which are data. Data gets assigned an appropriate SQL type and containers get separated from the parent container, and an integer index is put in its place. For each container, either a primary key is specified or created on the fly. We insert sub containers first and pop back the primary key value, until we have the whole record. The primary key could even be the OID. A second strategy is to concatenate the hierarchy into the field name, as street_name, town_name, and so on. What do you think? -- 5-4-3-2-1 Thunderbirds are GO! ------------------------ http://www.mohawksoft.com
On Sun, Jul 29, 2001 at 12:19:48PM -0400, mlw wrote: > > <employee> > <name>Bill</name> > <position>Programmer</position> > <address> > <number>1290</number> > <street> > <name>Canton Ave</name> > </street> > > <town> > <name>Milton</name> > </town> > </address> > </emplyee> > > The above is almost impossible to convert to a relational format without > additional information or a good set of rules. However, we can determine which > XML titles are "containers" and which are "data." "employee" is a container > because it has sub tags. "position" is "data" because it has no sub tags. > > We can recursively scan this hierarchy, decide which are containers and which > are data. Data gets assigned an appropriate SQL type and containers get > separated from the parent container, and an integer index is put in its place. > For each container, either a primary key is specified or created on the fly. > > We insert sub containers first and pop back the primary key value, until we > have the whole record. The primary key could even be the OID. > > A second strategy is to concatenate the hierarchy into the field name, as > street_name, town_name, and so on. > > > What do you think? What about attributes on tags. They're data, certainly. Do they then promote the tag they're in to a container? Ross
"Ross J. Reedstrom" wrote: > > On Sun, Jul 29, 2001 at 12:19:48PM -0400, mlw wrote: > > > > <employee> > > <name>Bill</name> > > <position>Programmer</position> > > <address> > > <number>1290</number> > > <street> > > <name>Canton Ave</name> > > </street> > > > > <town> > > <name>Milton</name> > > </town> > > </address> > > </emplyee> > > > > The above is almost impossible to convert to a relational format without > > additional information or a good set of rules. However, we can determine which > > XML titles are "containers" and which are "data." "employee" is a container > > because it has sub tags. "position" is "data" because it has no sub tags. > > > > We can recursively scan this hierarchy, decide which are containers and which > > are data. Data gets assigned an appropriate SQL type and containers get > > separated from the parent container, and an integer index is put in its place. > > For each container, either a primary key is specified or created on the fly. > > > > We insert sub containers first and pop back the primary key value, until we > > have the whole record. The primary key could even be the OID. > > > > A second strategy is to concatenate the hierarchy into the field name, as > > street_name, town_name, and so on. > > > > > > What do you think? > > What about attributes on tags. They're data, certainly. Do they then > promote the tag they're in to a container? Attribute tags are normally something you should know about before hand. There has to be a number of tags which do not force a container. This whole thing depends on a good DTD. -- 5-4-3-2-1 Thunderbirds are GO! ------------------------ http://www.mohawksoft.com
> I was looking over the todo list and saw that someone wanted to support XML. I > have some quick and dirty stuff that could be used. > > OK, what should the feature look like? > > Should it be grafted onto pg_dump or should a new utility pg_xml be created? > > How strict should it be? A stricter parser is easier to write, one can use a > library, unfortunately most xml is crap and for the utility to be useful, it > has to be real fuzzy. > > Any input would be appreciated. The updated TODO item is: * Add XML interface: psql, pg_dump, COPY, separate server (?) I am unsure where we want it. We could do COPY and hence a flag in pg_dump, or psql like we do HTML from psql. A separate server would accept XML queries and return results. What do people want? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > I would find it very helpful to see a table of what sorts of XML > > functionality each major vendor supports. > > Actually I was thinking of databases of data, not database systems. I think we can go two ways. Allow COPY/pg_dump to read/write XML, or write some perl scripts to convert XML to/from our pg_dump format. The latter seems quite easy and fast. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I have been fighting, for a while now, with idiot data vendors that think XML > is a cure all. The problem is that XML is a hierarchical format where as SQL is > a relational format. > > It would be good to get pg_dump to write an XML file and DTD, but getting > external sources of XML into PostgreSQL is WAY more complicated. If an XML > import is to be useful beyond just a different format for pg_dump, there has to > be some intelligent database construction based on the XML information. > > Go to mp3.com, and download some of their XML format data, first, it is bad > XML, second, it is hierarchical. > > I have managed to get several XML files into PostgreSQL by writing a parser, > and it is a huge hassle, the public parsers are too picky. I am thinking that a > fuzzy parser, combined with some intelligence and an XML DTD reader, could make > a very cool utility, one which I have not been able to find. > > Perhaps it is a two stage process? First pass creates a schema which can be > modified/corrected, the second pass loads the data. Can we accept only relational XML. Does that buy us anything? Are the other database vendors outputting heirchical XML? Are they using foreign/primary keys to do it? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > > I would find it very helpful to see a table of what sorts of XML > > > functionality each major vendor supports. > > > > Actually I was thinking of databases of data, not database systems. > > I think we can go two ways. Allow COPY/pg_dump to read/write XML, or > write some perl scripts to convert XML to/from our pg_dump format. The > latter seems quite easy and fast. I have been fighting, for a while now, with idiot data vendors that think XML is a cure all. The problem is that XML is a hierarchical format where as SQL is a relational format. It would be good to get pg_dump to write an XML file and DTD, but getting external sources of XML into PostgreSQL is WAY more complicated. If an XML import is to be useful beyond just a different format for pg_dump, there has to be some intelligent database construction based on the XML information. Go to mp3.com, and download some of their XML format data, first, it is bad XML, second, it is hierarchical. I have managed to get several XML files into PostgreSQL by writing a parser, and it is a huge hassle, the public parsers are too picky. I am thinking that a fuzzy parser, combined with some intelligence and an XML DTD reader, could make a very cool utility, one which I have not been able to find. Perhaps it is a two stage process? First pass creates a schema which can be modified/corrected, the second pass loads the data.
On Mon, 30 Jul 2001, mlw wrote: > Bruce Momjian wrote: > > > > > > I would find it very helpful to see a table of what sorts of XML > > > > functionality each major vendor supports. > > > > > > Actually I was thinking of databases of data, not database systems. > > > > I think we can go two ways. Allow COPY/pg_dump to read/write XML, or > > write some perl scripts to convert XML to/from our pg_dump format. The > > latter seems quite easy and fast. > > I have managed to get several XML files into PostgreSQL by writing a parser, > and it is a huge hassle, the public parsers are too picky. I am thinking that a > fuzzy parser, combined with some intelligence and an XML DTD reader, could make > a very cool utility, one which I have not been able to find. I have had the same problem. The best XML parser I could find was the gnome-xml library at xmlsoft.org (libxml). I am currently using this in C to replicate a client's legacy Notes system on to Postgres. In this case I was lucky in as much as I had some input on the XML namespace etc. XML was used because they had already designed an XML based dump utility. However, the way XML is being used is very basic. Only creation of tables, insert and delete are handled. Libxml works fine with this however, handling DTD/XML parsing, UTF-8, UTF-16 and iso-8859-1, validation etc. The main problem then is that every vendor has a different xml name space. If people really want to pursue this, the best thing to do would be to try to work with other open source database developers and design a suitable XML namespace for open source databases. Naturally, there will be much contention here about he most suitable this and that. It will be difficult to get a real spec going and will probably be much more trouble than it is worth. As such, if this fails, then we cannot expect Oracle, IBM, Sybase, MS and the rest to ever do it. Perhaps then it would be sufficient for pg_dump/restore to identify the name space of a given database dump and parse it according to that name space. Based on command-line arguments, pg_restore/dump could either die/ignore/transmogrify instructions in the XML which PG does not support or recognise. It would also be useful if pg_dump could dump data from postgres in the supported XML namespaces. So it essentially comes down to how useful it will be and who has time to code it up =) (as always). **Creative Solution** For those who have too much time on their hands and have managed to untangle some of the syntax in the W3C XSLT 1.0 specification, how about an XSL stylesheet to transform an XML based database dump from some third party into (postgres) SQL. Erk! There would have to be an award for such a thing ;-). Gavin
On Mon, Jul 30, 2001 at 03:43:26PM +1000, Gavin Sherry wrote: > On Mon, 30 Jul 2001, mlw wrote: > > I have had the same problem. The best XML parser I could find was the > gnome-xml library at xmlsoft.org (libxml). I am currently using this in C What happen if you use DOM type of XML parser for large file? A dump from SQL DB can be realy large. IMHO is for this (data dump from SQL DB) is better SAX type of XML parser. > an XSL stylesheet to transform an XML based database dump from some third Yes, it's right way how use XML. Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
Bruce Momjian wrote: > > > I have been fighting, for a while now, with idiot data vendors that think XML > > is a cure all. The problem is that XML is a hierarchical format where as SQL is > > a relational format. > > > > It would be good to get pg_dump to write an XML file and DTD, but getting > > external sources of XML into PostgreSQL is WAY more complicated. If an XML > > import is to be useful beyond just a different format for pg_dump, there has to > > be some intelligent database construction based on the XML information. > > > > Go to mp3.com, and download some of their XML format data, first, it is bad > > XML, second, it is hierarchical. > > > > I have managed to get several XML files into PostgreSQL by writing a parser, > > and it is a huge hassle, the public parsers are too picky. I am thinking that a > > fuzzy parser, combined with some intelligence and an XML DTD reader, could make > > a very cool utility, one which I have not been able to find. > > > > Perhaps it is a two stage process? First pass creates a schema which can be > > modified/corrected, the second pass loads the data. > > Can we accept only relational XML. Does that buy us anything? Are the > other database vendors outputting heirchical XML? Are they using > foreign/primary keys to do it? Then what's the point? Almost no one creates a non-hierarchical XML. For the utility to be usefull, beyond just a different format for pg_dump, it has to deal with these issues and do the right thing. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- 5-4-3-2-1 Thunderbirds are GO! ------------------------ http://www.mohawksoft.com
> > > I have managed to get several XML files into PostgreSQL by writing a parser, > > > and it is a huge hassle, the public parsers are too picky. I am thinking that a > > > fuzzy parser, combined with some intelligence and an XML DTD reader, could make > > > a very cool utility, one which I have not been able to find. > > > > > > Perhaps it is a two stage process? First pass creates a schema which can be > > > modified/corrected, the second pass loads the data. > > > > Can we accept only relational XML. Does that buy us anything? Are the > > other database vendors outputting heirchical XML? Are they using > > foreign/primary keys to do it? > > Then what's the point? Almost no one creates a non-hierarchical XML. For the > utility to be usefull, beyond just a different format for pg_dump, it has to > deal with these issues and do the right thing. Oh, seems XML will be much more complicated than I thought. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
In article <200107300319.f6U3JGY24953@candle.pha.pa.us>, pgman@candle.pha.pa.us (Bruce Momjian) wrote: > The updated TODO item is: > > * Add XML interface: psql, pg_dump, COPY, separate server (?) > > I am unsure where we want it. We could do COPY and hence a flag in > pg_dump, or psql like we do HTML from psql. A separate server would > accept XML queries and return results. > > What do people want? > I am interested in the side suggested by "separate server" -namely the extent that pg can provide a high-performance, transactional document store with good query capabilities. On the other hand, "high-performance" and "transactional" are not especially necessary for my current work, so I'm looking at query capabilities at present (thus the parser interface code I've done to date). It's also worth pointing out that we (Azuli) are not working for a very big market, so maybe best not to launch a huge project on account of my peccadilloes with XML :) -- John Gray Azuli IT
Bruce Momjian wrote: > > > > > I have managed to get several XML files into PostgreSQL by writing a parser, > > > > and it is a huge hassle, the public parsers are too picky. I am thinking that a > > > > fuzzy parser, combined with some intelligence and an XML DTD reader, could make > > > > a very cool utility, one which I have not been able to find. > > > > > > > > Perhaps it is a two stage process? First pass creates a schema which can be > > > > modified/corrected, the second pass loads the data. > > > > > > Can we accept only relational XML. Does that buy us anything? Are the > > > other database vendors outputting heirchical XML? Are they using > > > foreign/primary keys to do it? > > > > Then what's the point? Almost no one creates a non-hierarchical XML. For the > > utility to be usefull, beyond just a different format for pg_dump, it has to > > deal with these issues and do the right thing. > > Oh, seems XML will be much more complicated than I thought. I think an XML "output" for pg_dump would be a pretty good/easy feature. It is easy to create XML. <record> <field1>bla bla</field1> <field2>foo bar</field2> </record> Is very easy to create. Of course a little work would be needed to take information of the field (column) types into a DTD, but the actual XML is not much more complicated than a printf, i.e. printf("<%s>%s</%s>", name, data, name); The real issues is reading XML. Postgres can make a DTD and XML file which can be read by a strict parser, but that does not imply the inverse. Attached is an XML file from MP3.com. For an XML import to be anything but market/feature list candy, it should be able to import this sort of data, because when people say XML, this is the sort of data they are thinking about. If you take the time to examine the file, you will see it represents four or five distinct tables in a relational database. These tables are Song, Artist, [Cdlist,] Cd, and Genre. Song has a number of fields, plus foreign keys: Artist, Cdlist, and Genre. Cdlist would have to have a synthetic primary key (OID, sequence?), which tables Cd and Song would reference. Cdlist would probably never be used in a query. I think it is doable as a project (which everyone will be glad to have and complain about), but I think it is far more complicated than a one or two day modification of pg_dump. -- 5-4-3-2-1 Thunderbirds are GO! ------------------------ http://www.mohawksoft.com<Song> <id>1004001</id> <Artist> <id>121693</id> <xmlUrl>http://contentfeeds.mp3.com/standard_2.0/data_files/artists/121/121693.xml</xmlUrl> </Artist> <status>AVAILABLE</status> <name>conscious</name> <url>http://artists.mp3s.com/artist_song/1004/1004001.html</url> <description>A hard techno tune in the European style. Deep sleepers regain consciousness. Are they awake, are they aliveor is it all a dream..</description> <creditsCopyrightInfo>e n c r y p t i o n </creditsCopyrightInfo> <sourceCd>portal</sourceCd> <recordLabel>white label</recordLabel> <Cdlist> <Cd> <id>66944</id> <xmlUrl>http://contentfeeds.mp3.com/standard_2.0/data_files/dam_cds/66/66944.xml</xmlUrl> </Cd> </Cdlist> <parentalAdvisory>N</parentalAdvisory> <pictureUrl>http://images.mp3.com/mp3s/71/e_n_c_r_y_p_t_i_o_n/artificial1.jpg</pictureUrl> <Genre> <name>Intelligent Techno</name> <url>http://genres.mp3.com/music/electronic/techno/intelligent_techno</url> </Genre> <filesize>7.8</filesize> <addToMyMp3>http://my.mp3.com/music?Action=sl-1004001,cm-ad,cs-pc&origin=song&origin_id=1004001&location=generic&wqx=AddToMyMP3</addToMyMp3> <greetingsUrl>http://musicgreetings.mp3.com/email_song?song_id=1004001&origin=song&origin_id=1004001&location=generic&band_id=121693</greetingsUrl> </Song>
Hi, Why don't use the excellent DBIx-XML_RDB perl module ? Give it the query it will return XML output as you sample. With some hack you can do what you want... Regards Gilles DAROLD mlw wrote: > Bruce Momjian wrote: > > > > > > > I have managed to get several XML files into PostgreSQL by writing a parser, > > > > > and it is a huge hassle, the public parsers are too picky. I am thinking that a > > > > > fuzzy parser, combined with some intelligence and an XML DTD reader, could make > > > > > a very cool utility, one which I have not been able to find. > > > > > > > > > > Perhaps it is a two stage process? First pass creates a schema which can be > > > > > modified/corrected, the second pass loads the data. > > > > > > > > Can we accept only relational XML. Does that buy us anything? Are the > > > > other database vendors outputting heirchical XML? Are they using > > > > foreign/primary keys to do it? > > > > > > Then what's the point? Almost no one creates a non-hierarchical XML. For the > > > utility to be usefull, beyond just a different format for pg_dump, it has to > > > deal with these issues and do the right thing. > > > > Oh, seems XML will be much more complicated than I thought. > > I think an XML "output" for pg_dump would be a pretty good/easy feature. It is > easy to create XML. > > <record> > <field1>bla bla</field1> > <field2>foo bar</field2> > </record> >
Gilles DAROLD wrote: > > Hi, > > Why don't use the excellent DBIx-XML_RDB perl module ? Give it the query > it will return XML output as you sample. With some hack you can do what you > want... > The point I was trying to make is that XML is trivial to create. It is much more difficult to read. I think pg_dump is a "better" place for the export, in that all the logic to find the fields, types, and data are already there. Were a simple option presented, --xml, it would probably be easier to add xml export to pg_dump than create a new utility, but then again, I didn't write pg_dump and therefore do not know for sure. -- 5-4-3-2-1 Thunderbirds are GO! ------------------------ http://www.mohawksoft.com