Thread: XPath question - big trouble
Hello everyone, First of all, thank you for integrating XPath in Postgresql. But, as you will see, this is a desperate cry for help: 1. the actual context: Postgresql 8.1.4, Fedora Core 5 DATABASE ========= CREATE TABLE xmldocuments ( id int8 NOT NULL, rawdata text, title varchar(255), CONSTRAINT pk_xmldocs PRIMARY KEY (id) ) WITHOUT OIDS; RAWDATA contents model ===================== <mydocument> <title></title> <body> <paragraph id="87" style="para21"></paragraph> <chapter> <title></title> <contents> <paragraph id="01" style="para01"></paragraph> <paragraph id="02" style="para01"></paragraph> <paragraph id="03" style="para01"></paragraph> <paragraph id="04" style="para01"></paragraph> <paragraph id="05" style="para01"></paragraph> </contents> </chapter> <chapter> <title></title> <contents> <paragraph id="654" style="para01"></paragraph> <paragraph id="54" style="para02"></paragraph> <paragraph id="64" style="para01"></paragraph> <paragraph id="98" style="para02"></paragraph> <paragraph id="65" style="para02"></paragraph> <paragraph id="655" style="para01"></paragraph> </contents> </chapter> </body> </mydocument> I have 4 or 5 lines in the table xmldocuments; on every record, rawdata has data similar to the above model. 2. the problem: how can I select and return only this: a. a single paragraph <paragraph id="02"></paragraph> b. a collection of paragraphs that have in common a specific criteria (let's say style="para02") <paragraph id="54" style="para02"></paragraph> <paragraph id="98" style="para02"></paragraph> <paragraph id="65" style="para02"></paragraph> For now, the following query SELECT xpath_nodeset(rawdata, '/mydocument/body/chapter/contents/paragraph') FROM public.xmldocuments WHERE id=4 will return all paragraphs inside document body. If I add the following clause AND xpath_bool(rawdata,'/mydocument/body/chapter/contents/paragraph[objid="2_1"]'); the result set will be empty !!! I even tried: SELECT t.idxml, t.rawxml, t.xmlid FROM xpath_table('id', 'rawdata','xmldocuments', '/mydocument/body/paragraph|/mydocument/body/chapter/content/paragraph|/mydocument/body/chapter/content/paragraph/@objid', --'xpath_string(''rawdata'',''@objid'') = ''2_1'' ' --'xpath_bool(''rawdata'',''/mydocument/body/chapter/content/paragraph[@objid="2_1"]'')' 'true' ) AS t(idxml integer, rawxml text, xmlid text), xmldocuments as x WHERE t.idxml = x.id AND x.id = 4 Please, help! Thank you for your time, Marian -- ~~~~~~~~~~~~~~~~~~~~~~~ - S o f t E x p e r t - ~~~~~~~~~~~~~~~~~~~~~~~
[snip] > <paragraph id="01" style="para01"></paragraph> ^^^^ [snip] > xpath_bool(rawdata,'/mydocument/body/chapter/contents/paragraph[objid="2_1"]'); ^^^^^ To me it looks like attribute name mismatch, not to mention you forgot the '@' (i.e. I guess you wanted the xpath: '/mydocument/body/chapter/contents/paragraph[@id="2_1"]' If you were just sloppy writing the email, maybe you should also check your actual code for such sloppiness ;-) Cheers, Csaba.
Thanks for replying. It was a typo ... Also id fom example should be objid ... Anyway, without the typo, the result set is empty. There is also a weird thing: SELECT xpath_nodeset(rawdata, '/mydocument/body/chapter/contents/paragraph') FROM xmldocuments will output xpath_nodeset ---------------- (empty line) (empty line) (empty line) (empty line) 4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms] [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms] - (empty line) is what I wrote to say that there was nothing on output - which makes me think there is something wrong with XPath implementation. Anyway, is there a solution to my problems? Marian Csaba Nagy wrote: > [snip] >> <paragraph id="01" >> style="para01"></paragraph> > ^^^^ > [snip] >> xpath_bool(rawdata,'/mydocument/body/chapter/contents/paragraph[objid="2_1"]'); > ^^^^^ > To me it looks like attribute name mismatch, not to mention you forgot > the '@' (i.e. I guess you wanted the xpath: > '/mydocument/body/chapter/contents/paragraph[@id="2_1"]' > > If you were just sloppy writing the email, maybe you should also check > your actual code for such sloppiness ;-) > > Cheers, > Csaba.
Marian POPESCU wrote: > There is also a weird thing: > > > SELECT > xpath_nodeset(rawdata, > '/mydocument/body/chapter/contents/paragraph') FROM xmldocuments > > will output > > xpath_nodeset > ---------------- > (empty line) > (empty line) > (empty line) > (empty line) > > 4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms] > > [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms] > > - (empty line) is what I wrote to say that there was nothing on > output - which makes me think there is something wrong with XPath > implementation. Makes me think you have null values in the resultset. -- Guy Rouillier
Marian, On Mon, 2006-08-07 at 17:47, Marian POPESCU wrote: > SELECT > xpath_nodeset(rawdata, '/mydocument/body/chapter/contents/paragraph') > FROM xmldocuments > > will output > > xpath_nodeset > ---------------- > (empty line) > (empty line) > (empty line) > (empty line) > > 4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms] > > [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms] > > - (empty line) is what I wrote to say that there was nothing on output - > which makes me think there is something wrong with XPath implementation. In the first email you said this works for you... the only difference I find in the query from there and here is that you schema-qualified the xmldocuments table in your last mail. Are you sure there's no weirdness in your schemas, with some other table with the same name in another schema coming in your way ? Cheers, Csaba.
It looks that I shoot myself in the foot :( CREATE TABLE xmltable ( id int8 NOT NULL, xml_data text, CONSTRAINT pk_xmltable PRIMARY KEY (id) ) WITHOUT OIDS; The id field contains values from 1 to 3. The field xml_data contains something like this: <mydocument> <title objid="4654">My document</title> <body objid="6987"> <paragraph objid="87" style="para21"></paragraph> <chapter objid="5764"> <title objid="646">Chapter 1</title> <contents> <paragraph objid="01" style="para01">aaaaaaaaaaa</paragraph> <paragraph objid="02" style="para01">bbbbbbbbbb</paragraph> <paragraph objid="03" style="para01">cccccccccccc</paragraph> <paragraph objid="04" style="para01">dddddddddddd</paragraph> <paragraph objid="05" style="para01">eeeeeeeeeeee</paragraph> </contents> </chapter> <chapter objid="681"> <title objid="68746">Chapter 2</title> <contents> <paragraph objid="654" style="para01">gggggggggg</paragraph> <paragraph objid="54" style="para02">hhhhhhhhhh</paragraph> <paragraph objid="64" style="para01">iiiiiiiiii</paragraph> <paragraph objid="98" style="para02">ttttttttttt</paragraph> <paragraph objid="65" style="para02">eeeeeeeeeee</paragraph> <paragraph objid="655" style="para01">kkkkkkkkkk</paragraph> </contents> </chapter> </body> </mydocument> My questions: 1.What query should I write to get only <paragraph objid="02" style="para01">bbbbbbbbbb</paragraph> knowing that I can pass as a parameter the objid attribute value; 2.What query should I write to get <paragraph objid="54" style="para02">hhhhhhhhhh</paragraph> <paragraph objid="98" style="para02">ttttttttttt</paragraph> <paragraph objid="65" style="para02">eeeeeeeeeee</paragraph> knowing that I can pass as a parameter the style attribute value. My new found answers: 1. SELECT xpath_nodeset(xml_data, '/mydocument/body/chapter/contents/paragraph[@objid = "02"]|/mydocument/body/paragraph[@objid="87"]') FROM xmltable WHERE id = 3 will give me the desired paragraph 2. SELECT xpath_nodeset(xml_data, '/mydocument/body/chapter/contents/paragraph[@style = "para02"]|/mydocument/body/paragraph[@style="para02"]') FROM xmltable WHERE id = 3 will give me the collection of paragraphs that correspond to my criteria. Apparently I am more carefull once I make a fool of myself ;) Thank you for the replies and sorry for wasting your time! Csaba Nagy wrote: > Marian, > > On Mon, 2006-08-07 at 17:47, Marian POPESCU wrote: > >> SELECT >> xpath_nodeset(rawdata, '/mydocument/body/chapter/contents/paragraph') >> FROM xmldocuments >> >> will output >> >> xpath_nodeset >> ---------------- >> (empty line) >> (empty line) >> (empty line) >> (empty line) >> >> 4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms] >> >> [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms] >> >> - (empty line) is what I wrote to say that there was nothing on output - >> which makes me think there is something wrong with XPath implementation. > > In the first email you said this works for you... the only difference I > find in the query from there and here is that you schema-qualified the > xmldocuments table in your last mail. Are you sure there's no weirdness > in your schemas, with some other table with the same name in another > schema coming in your way ? > > Cheers, > Csaba. >
You are right; I had a typo : contents is content Thanks for replying ... Guy Rouillier wrote: > Marian POPESCU wrote: >> There is also a weird thing: >> >> >> SELECT >> xpath_nodeset(rawdata, >> '/mydocument/body/chapter/contents/paragraph') FROM xmldocuments >> >> will output >> >> xpath_nodeset >> ---------------- >> (empty line) >> (empty line) >> (empty line) >> (empty line) >> >> 4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms] >> >> [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms] >> >> - (empty line) is what I wrote to say that there was nothing on >> output - which makes me think there is something wrong with XPath >> implementation. > > Makes me think you have null values in the resultset. >
Marian, > 1. > SELECT > xpath_nodeset(xml_data, '/mydocument/body/chapter/contents/paragraph[@objid > = "02"]|/mydocument/body/paragraph[@objid="87"]') > FROM xmltable > WHERE id = 3 You could maybe use the xpath: '//paragraph[@objid = "02"]' if you want all paragraphs at all document levels. A nice xslt reference is: file:///home/cnagy/offline/XSLTreference/Output/index.html There you can also exercise your xpath skills... Cheers, Csaba.
Thank you! Since you brought up the subject, will this //paragraph[@objid = "02"] be slower than /mydocument/body/chapter/contents/paragraph[@objid >> = "02"] when handling a big document? I mean from the query time point of view ... Csaba Nagy wrote: > Marian, > >> 1. >> SELECT >> xpath_nodeset(xml_data, >> '/mydocument/body/chapter/contents/paragraph[@objid >> = "02"]|/mydocument/body/paragraph[@objid="87"]') >> FROM xmltable >> WHERE id = 3 > > You could maybe use the xpath: '//paragraph[@objid = "02"]' if you want > all paragraphs at all document levels. > > A nice xslt reference is: > file:///home/cnagy/offline/XSLTreference/Output/index.html > > There you can also exercise your xpath skills... > > Cheers, > Csaba. > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- ~~~~~~~~~~~~~~~~~~~~~~~ - S o f t E x p e r t - ~~~~~~~~~~~~~~~~~~~~~~~
> Since you brought up the subject, will this > //paragraph[@objid = "02"] > be slower than > /mydocument/body/chapter/contents/paragraph[@objid > >> = "02"] > when handling a big document? > I mean from the query time point of view ... I have no idea about the postgres XML implementation internals, I never used it, but for other XML processors I know the speed would be the same, and I would expect the same for postgres too. Cheers, Csaba.