Thread: Storing XML in PostgreSQL

Storing XML in PostgreSQL

From
Jean-Michel POURE
Date:
Hello friends,

What is the best way to parse and store an XML document in PostgreSQL?
I would like to store fwbuilder (http://www.fwbuilder.org) objects in 
PostgreSQL.

Any information is welcome.

Regards, Jean-Michel POURE
pgAdmin Development Team



Re: Storing XML in PostgreSQL

From
"John Gray"
Date:
In article <4.2.2.20010724150449.00a9ea90@192.168.0.67>,
jm.poure@freesurf.fr (Jean-Michel POURE) wrote:
> Hello friends,
> 
> What is the best way to parse and store an XML document in PostgreSQL? I
> would like to store fwbuilder (http://www.fwbuilder.org) objects in 
> PostgreSQL.
> 

I think the best way depends on what you're trying to achieve with the
document once you have it in the database. One approach is to have tables
for  elements, attributes and cdata and use an XML parser to insert
appropriate database records.

I have used a schema such as the following- in the cdata table "element" is
the  ID of the containing element, and itempos is just an integer used to
order the  entries. I used this with a bit of java which hooks up to the
Lark parser (using SAX) to do the parsing and fires off INSERT queries 
through the jdbc driver. 

CREATE SEQUENCE cdata_seq; 
CREATE SEQUENCE attribute_seq; 
CREATE SEQUENCE element_seq; 

CREATE TABLE element (      document integer,        element integer not null PRIMARY KEY      default
nextval('element_seq'),name text, parent integer, itempos      integer      );
 

CREATE TABLE attribute (      document integer,        attribute integer not null default
nextval('attribute_seq'),name text, value text, element integer,      itempos integer      );
 


CREATE TABLE cdata (      document integer,        cdata integer not null default      nextval('cdata_seq'), value
text,element integer, itempos integer      );
 

In my example, I was interested in selecting all the cdata content 
of a <type> tag immediately contained within a <feature> tag path.

The easiest solution is to create a view, which can then be queried to 
find  all cases where, for example, feature type = 'Ditch'.

CREATE VIEW featuretype AS featuretype
SELECT c.document,c.value
FROM cdata c, element e, element e1
WHERE c.element = e.element 
AND   e.parent = e1.element
AND   e.name = 'type'
AND  e1.name = 'feature'
AND   c.document = e.document
AND   e.document = e1.document;

if you are interested I can provide the very basic (java) code I used for 
this.

OR, depending on what these fwbuilder objects involve,  you can of 
course just store XML documents in fields of type text (especially if 
you use 7.1 which has TOAST, so you can store long documents). IT's 
not difficult to hook up a parser (I'm using expat) to a PostgreSQL 
function written in C and parse on the fly.

I haven't really finished that code, but after I've commented it, I can 
certainly post it if anyone is interested. It does work, but probably
needs some tidying. It really wasn't difficult to write the functions 
though. In fact, I've been surprised by how easy it is to write 
PostgreSQL C functions...

Please contact me if you have any questions -I've been away for a bit
so haven't worked on that code for a couple of weeks -I'm hoping to 
get back into it soon.

Regards

John




Re: Re: Storing XML in PostgreSQL

From
Gunnar Rønning
Date:
* "John Gray" <jgray@beansindustry.co.uk> wrote:
|

| OR, depending on what these fwbuilder objects involve,  you can of 
| course just store XML documents in fields of type text (especially if 
| you use 7.1 which has TOAST, so you can store long documents). IT's 
| not difficult to hook up a parser (I'm using expat) to a PostgreSQL 
| function written in C and parse on the fly.
| 

Do you have any documentation on your C functions ? I'm just interested
in knowing what functions they provide.

regards, 
       Gunnar

-- 
Gunnar Rønning - gunnar@polygnosis.com
Senior Consultant, Polygnosis AS, http://www.polygnosis.com/


Re: Re: Storing XML in PostgreSQL

From
"John Gray"
Date:
In article <m2elr5b2cx.fsf@smaug.polygnosis.com>, gunnar@polygnosis.com
(Gunnar Rønning) wrote:

> Do you have any documentation on your C functions ? I'm just interested
> in knowing what functions they provide.
> 

There are only two (so far). They're very basic. I have:

pgxml_parse(text) returns bool -parses the provided text and returns true or false if it is 
well-formed or not.

pgxml_xpath(text doc, text xpath, int n) returns text -parses doc and returns the cdata of the nth occurence of
the "XPath" listed. This does handle relative and absolute paths 
but nothing else at present. I have a few variants of this. 

So, given a  table docstore:
Attribute |  Type   | Modifier 
-----------+---------+----------docid     | integer | document  | text    | 

containing documents such as:

<?XML version="1.0"?>
<site provider="Foundations" sitecode="ak97" version="1">  <name>Church Farm, Ashton Keynes</name>  <invtype>watching
brief</invtype> <location scheme="osgb">SU04209424</location>
 
</site>

I can type:
select docid, 
pgxml_xpath(document,'/site/name',1) as sitename,
pgxml_xpath(document,'/site/location',1) as locationfrom docstore;
and I get:
docid |          sitename           |  location  
-------+-----------------------------+------------    1 | Church Farm, Ashton Keynes  | SU04209424    2 | Glebe Farm,
LongItchington | SP41506500
 
(2 rows)

The next thing is to use the "function as tuple source" support which is
underway in order to allow the return of a list (in the DTD I'm using
-and doubtless many others- certain elements might be repeated, and
I think it would be good to be able to join against all the data from a 
particular element.

I hope this helps give a flavour. I'll try and tidy up the functions in the
next couple of days and then I can post what I've got so far. I'm keen to
build on this, as it's part of an (unfunded, unfortunately) project we're 
doing. Expat is MIT-licensed so I don't imagine there's a problem linking
it into PostgreSQL.

One aim is to allow people to set pg functions as the handlers "direct"
from the parser -the catch is that the expat API has lots of handlers
(OK, so most of them are less commonly used), so it's a matter of 
working out  a) an efficient API for setting handlers on a particular 
parser and b) how persistent a parser instance should be (each expat
instance can only do one document). Of course, expat may not be the 
best one to use -it would be great to be parser-agnostic and use SAX
with a java parser, but I don't think we have java as a language for 
user functions yet :-)
Incidentally, I'll be changing my email address over the next couple 
of daysto jgray@azuli.co.uk -just so you can follow this thread after
I've done that....

Regards

John
Azuli IT




Re: Re: Storing XML in PostgreSQL

From
"John Gray"
Date:
I've packaged up what I've done so far and you can find it at
http://www.cabbage.uklinux.net/pgxml.tar.gz

The TODO file included indicates what still remains to be done (a lot!).

In particular, it would be good to implement more of the XPath grammar.
However, once we get into the realm of more complex paths there becomes a
question about optimisation of XPath selection. If the documents are
pre-parsed, then XPath query elements can be rewritten as SQL queries and
you get the optimisation of the planner on your side.

I'd like to stick with the current solution if possible, because I think
it delivers a very simple interface to the user and is (code-wise) also
very straightforward. Maybe less efficient queries are a penalty worth paying?

Any thoughts?

Regards

John



Re: Re: Re: Storing XML in PostgreSQL

From
Bruce Momjian
Date:
Should we add this to /contrib?

> I've packaged up what I've done so far and you can find it at
> http://www.cabbage.uklinux.net/pgxml.tar.gz
> 
> The TODO file included indicates what still remains to be done (a lot!).
> 
> In particular, it would be good to implement more of the XPath grammar.
> However, once we get into the realm of more complex paths there becomes a
> question about optimisation of XPath selection. If the documents are
> pre-parsed, then XPath query elements can be rewritten as SQL queries and
> you get the optimisation of the planner on your side.
> 
> I'd like to stick with the current solution if possible, because I think
> it delivers a very simple interface to the user and is (code-wise) also
> very straightforward. Maybe less efficient queries are a penalty worth paying?
> 
> Any thoughts?
> 
> Regards
> 
> John
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: Re: Storing XML in PostgreSQL

From
"Colin 't Hart"
Date:
> Should we add this to /contrib?

I think so, at least until we get something better.

Cheers,

Colin




Re: Re: Re: Storing XML in PostgreSQL

From
"John Gray"
Date:
In article <9jrn78$pbv$1@news.tht.net>, "Colin 't Hart" <cthart@yahoo.com>
wrote:
>> Should we add this to /contrib?
> 
> I think so, at least until we get something better.
> 

I'm happy for you to add it, if you're willing to have it (It is meant to
be under the PostgreSQL license). I agree that there's still much to be
done... note that another thread (From TODO, XML?) has started up on this
subject as well.

No threads on XML for months, and then along come two at once :)

Regards

John



Re: Re: Re: Re: Storing XML in PostgreSQL

From
Bruce Momjian
Date:
> In article <9jrn78$pbv$1@news.tht.net>, "Colin 't Hart" <cthart@yahoo.com>
> wrote:
> >> Should we add this to /contrib?
> > 
> > I think so, at least until we get something better.
> > 
> 
> I'm happy for you to add it, if you're willing to have it (It is meant to
> be under the PostgreSQL license). I agree that there's still much to be
> done... note that another thread (From TODO, XML?) has started up on this
> subject as well.

I figured we could add it to /contrib and use it as a starting point.

> No threads on XML for months, and then along come two at once :)

Yep.  Seems it is getting hot.  I like the use of XML to transfer data
and schema between databases from different vendors.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: Re: Storing XML in PostgreSQL

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

> I've packaged up what I've done so far and you can find it at
> http://www.cabbage.uklinux.net/pgxml.tar.gz
> 
> The TODO file included indicates what still remains to be done (a lot!).
> 
> In particular, it would be good to implement more of the XPath grammar.
> However, once we get into the realm of more complex paths there becomes a
> question about optimisation of XPath selection. If the documents are
> pre-parsed, then XPath query elements can be rewritten as SQL queries and
> you get the optimisation of the planner on your side.
> 
> I'd like to stick with the current solution if possible, because I think
> it delivers a very simple interface to the user and is (code-wise) also
> very straightforward. Maybe less efficient queries are a penalty worth paying?
> 
> Any thoughts?
> 
> Regards
> 
> John
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Re: Re: Storing XML in PostgreSQL

From
Bruce Momjian
Date:
Added to /contrib, with small Makefile changes.  Requires expat library.
Does not compile by default.

> I've packaged up what I've done so far and you can find it at
> http://www.cabbage.uklinux.net/pgxml.tar.gz
> 
> The TODO file included indicates what still remains to be done (a lot!).
> 
> In particular, it would be good to implement more of the XPath grammar.
> However, once we get into the realm of more complex paths there becomes a
> question about optimisation of XPath selection. If the documents are
> pre-parsed, then XPath query elements can be rewritten as SQL queries and
> you get the optimisation of the planner on your side.
> 
> I'd like to stick with the current solution if possible, because I think
> it delivers a very simple interface to the user and is (code-wise) also
> very straightforward. Maybe less efficient queries are a penalty worth paying?
> 
> Any thoughts?
> 
> Regards
> 
> John
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026