Thread: XML and Postgres
Hello- I am fairly new to postgres, but have been coding for over 20 years and would like to know what, if anything, is available to post data in XML format to postgres. I have read (seemingly) conflicting reports that Oracle will allow users to post an xml formatted document directly to the database and my sysadmin "seems to recall" reading about it. I am in the process of dissecting a fairly complex xml file (health care claims data in EDI x12 format) and would much rather use an existing technology to post it if it exists. So, does such a thing exist? Or perhaps I have misread the literature and am off base? Thanks for your help. --greg Greg Lindstrom 501 975.4859 Computer Programmer greg.lindstrom@novasyshealth.com NovaSys Health Little Rock, Arkansas "We are the music makers, and we are the dreamers of dreams." W.W.
On Wed, 16 Mar 2005, Greg Lindstrom wrote: > I am fairly new to postgres, but have been coding for over 20 years and > would like to know what, if anything, is available to post data in XML > format to postgres. Hopefully someone can tell you of a direct solution, but I can tell you my experiences using perl. I had to synchronize an on site office database (File Maker Pro) with a webserver database. File Maker Pro could output in XML and that's way the DBA wanted to do it. He set it up so that it ftped the data in XML to the web server and I use cron to start a perl script and import the data into the webserver DB. There are several XML perl modules available but all the ones I tried had the same flaw - they grabbed the entire XML file and loaded it into a huge variable in memory. The XML file they wanted to upload was so big it used up all the available memory and the perl script failed. I had to write a parsing routine that churns through the XML file and loads the data for one record, then dumps it into the database before going back for the next record - it's much more memory efficient, albeit slow - no grouped transactions, but that's all right - it runs in the middle of the night. I hesitate to post it since it may not work with all XML data files, I wrote and tested it for MY particular datafile. In particular it may not work for empty elements ( like <data /> ). I'll be glad to share it and explain how it works if you need it. I confess the webserver DB in this case is mySQL (it pre-existed me on this project), but since I used the perl DBI (Database Interface Library) it will work with postgreSQL just as easily. Hope this helps if nobody steps forward with a better solution. brew ========================================================================== Strange Brew (brew@theMode.com) Check out my Stock Option Covered Call website http://www.callpix.com and my Musician's Online Database Exchange http://www.TheMode.com ==========================================================================
On 3/17/05 2:53, "brew@theMode.com" <brew@theMode.com> wrote: > > On Wed, 16 Mar 2005, Greg Lindstrom wrote: > >> I am fairly new to postgres, but have been coding for over 20 years and >> would like to know what, if anything, is available to post data in XML >> format to postgres. > > Hopefully someone can tell you of a direct solution, but I can tell you my > experiences using perl. > > I had to synchronize an on site office database (File Maker Pro) with a > webserver database. File Maker Pro could output in XML and that's way the > DBA wanted to do it. He set it up so that it ftped the data in XML to the > web server and I use cron to start a perl script and import the data into > the webserver DB. > > There are several XML perl modules available but all the ones I tried had > the same flaw - they grabbed the entire XML file and loaded it into a huge > variable in memory. The XML file they wanted to upload was so big it used > up all the available memory and the perl script failed. > > I had to write a parsing routine that churns through the XML file and > loads the data for one record, then dumps it into the database before > going back for the next record - it's much more memory efficient, albeit > slow - no grouped transactions, but that's all right - it runs in the > middle of the night. In these cases, you can use XML::Twig to break up the file into smaller parts--it is quite easy and VERY fast. Sean
brew@theMode.com wrote: <blockquote cite="midPine.BSF.4.58.0503170223470.35543@themode.com" type="cite"> Hopefully someone can tell you of a direct solution, but I can tell you my experiences using perl. Thanks for the reply. Can I ask some more basic questions before I jump into this? Do I understand correctly that using one of these tools I can hand my xml-formatted data file to postgres and it will be inserted into the database? How does the database "know" the layout? I am dealing with health case claim data using the EDI X12 standard (such as it is). It is well defined, but large and complex (well beyond the "mailing list" examples I've seen). Do I need to post a DTD or Schema to Postgres? Or does it simply (!) create tables and links to accommodate whatever relationships are defined in the data file? Or is the xml dumped into 1 large field? When I add additional data to the database (another claim file from another vendor or day), will it be added to the existing data, or -- I assume -- entered into the base using the new root node? How do I extract data from the database? Do I use XPath type queries? Is it possible to insert, update, change, delete data from the data file on the base? If so, can I issue a command to extract the entire document with the modified data? Can you point me to documentation on any of this? Is it worth the trouble? Or should I dissect the file into traditionally normalized tables? Thank-you so much for your help and advice. This is a new world to me and I'd like to understand what I might be getting in to. --greg Greg Lindstrom 501 975.4859 Computer Programmer greg.lindstrom@novasyshealth.com NovaSys Health Little Rock, Arkansas "We are the music makers, and we are the dreamers of dreams." W.W.
Sean.... > > I had to write a parsing routine that churns through the XML file and > > loads the data for one record, then dumps it into the database before > > going back for the next record - it's much more memory efficient, albeit > > slow - no grouped transactions, but that's all right - it runs in the > > middle of the night. > > In these cases, you can use XML::Twig to break up the file into smaller > parts--it is quite easy and VERY fast. Thanks, it was a couple of years ago that I wrote my parsing code, so I'm not going to change it at this time, but I'll keep that in mind for the future. I experimented with Twig (but don't remember the exact results - it was a year and a half ago), but somehow I missed that you could break the data up in smaller chunks. That's really ironic because now I see that the name Twig is because it's a branch off the tree!!! Darn. Thanks.... brew ========================================================================== Strange Brew (brew@theMode.com) Check out my Stock Option Covered Call website http://www.callpix.com and my Musician's Online Database Exchange http://www.TheMode.com ==========================================================================
Greg..... > 1. Do I understand correctly that using one of these tools I can hand my > xml-formatted data file to postgres and it will be inserted into the > database? Hopefully for you somebody else might know of something like that! Sorry to tell you, but I'm handling it manually. The perl program just serves as the bridge between the two databases. I had to manually normalize the database that I was going to pump the data into, then I had to write a bridging program to parse the data out of the XML file, loading each piece of data into a variable, then inserting or updating the proper table in the database with it. It's not the automatic solution you sound like you are hoping for. That's what I meant by writing 'hopefully somebody else can give you a direct solution'. Good luck finding what you need, maybe somebody else knows how to make postgreSQL handle XML data directly or knows of a database that will (and still be fast, etc.). brew ========================================================================== Strange Brew (brew@theMode.com) Check out my Stock Option Covered Call website http://www.callpix.com and my Musician's Online Database Exchange http://www.TheMode.com ==========================================================================
On Mar 17, 2005, at 12:36 PM, brew@theMode.com wrote: > > Greg..... > >> 1. Do I understand correctly that using one of these tools I can >> hand my >> xml-formatted data file to postgres and it will be inserted into >> the >> database? > > Hopefully for you somebody else might know of something like that! > > Sorry to tell you, but I'm handling it manually. The perl program just > serves as the bridge between the two databases. I had to manually > normalize the database that I was going to pump the data into, then I > had > to write a bridging program to parse the data out of the XML file, > loading > each piece of data into a variable, then inserting or updating the > proper > table in the database with it. > > It's not the automatic solution you sound like you are hoping for. > That's what I meant by writing 'hopefully somebody else can give you a > direct solution'. > > Good luck finding what you need, maybe somebody else knows how to make > postgreSQL handle XML data directly or knows of a database that will > (and > still be fast, etc.). Unfortunately, I, like Brew, do it "by hand". There are modules for xml/dbi connection, but I haven't used them. In practice, if you know the database structure from which your data is coming, you can recreate it easily enough. Then, it is just a question of parsing the XML (also relatively trivial) and them making a loader using DBI (again, fairly straightforward). It sounds like you data is coming out fairly normalized already, so the process shouldn't require too much data munging. If you need to munge, you can always load to a "loader" schema and then use triggers to change the data into another format in the "working" schema. As for querying your database, you could look at contrib/xml, but that is only if you load straight XML. You will probably load into regular tables for a more "standard" solution, so you would do your queries in SQL. In case you want to look at more automated solution, you might check out: http://www.xml.com/pub/a/2000/12/13/perlxmldb.html For a total different approach, there are a number of XML databases and drivers: http://xml.apache.org/xindice/ http://search.cpan.org/author/GSEAMAN/XML-DB/lib/XML/DB/Database/ Xindice.pm http://search.cpan.org/author/GSEAMAN/XML-DB/lib/XML/DB/Database.pm http://search.cpan.org/author/GSEAMAN/XML-DB/lib/XML/DB/Database/ Exist.pm I haven't used them at all and have no idea how well they scale, etc. Sean
Greg, I would suggest you check out "The xpsql Project -- XpSQL, a XML database enviroment using PostgreSQL ": http://gborg.postgresql.org/project/xpsql/projdisplay.php Regards, George ----- Original Message ----- From: <brew@theMode.com> To: "Greg Lindstrom" <greg.lindstrom@novasyshealth.com> Cc: <pgsql-novice@postgresql.org> Sent: Thursday, March 17, 2005 11:36 AM Subject: Re: [NOVICE] XML and Postgres > > Greg..... > >> 1. Do I understand correctly that using one of these tools I can hand my >> xml-formatted data file to postgres and it will be inserted into the >> database? > > Hopefully for you somebody else might know of something like that! > > Sorry to tell you, but I'm handling it manually. The perl program just > serves as the bridge between the two databases. I had to manually > normalize the database that I was going to pump the data into, then I had > to write a bridging program to parse the data out of the XML file, loading > each piece of data into a variable, then inserting or updating the proper > table in the database with it. > > It's not the automatic solution you sound like you are hoping for. > That's what I meant by writing 'hopefully somebody else can give you a > direct solution'. > > Good luck finding what you need, maybe somebody else knows how to make > postgreSQL handle XML data directly or knows of a database that will (and > still be fast, etc.). > > brew > > ========================================================================== > Strange Brew (brew@theMode.com) > Check out my Stock Option Covered Call website http://www.callpix.com > and my Musician's Online Database Exchange http://www.TheMode.com > ========================================================================== > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >