Thread: XPath question - big trouble

XPath question - big trouble

From
Marian POPESCU
Date:
Hello everyone,

First of all, thank you for integrating XPath in Postgresql.

But, as you will see, this is a desperate cry for help:

1. the actual context:
Postgresql 8.1.4, Fedora Core 5

DATABASE
=========
CREATE TABLE xmldocuments
(
  id int8 NOT NULL,
  rawdata text,
  title varchar(255),
  CONSTRAINT pk_xmldocs PRIMARY KEY (id)
)
WITHOUT OIDS;

RAWDATA contents model
=====================
<mydocument>
        <title></title>
        <body>
                <paragraph id="87" style="para21"></paragraph>
                <chapter>
                        <title></title>
                        <contents>
                                <paragraph id="01" style="para01"></paragraph>
                                <paragraph id="02" style="para01"></paragraph>
                                <paragraph id="03" style="para01"></paragraph>
                                <paragraph id="04" style="para01"></paragraph>
                                <paragraph id="05" style="para01"></paragraph>
                        </contents>
                </chapter>
                <chapter>
                        <title></title>
                        <contents>
                                <paragraph id="654" style="para01"></paragraph>
                                <paragraph id="54" style="para02"></paragraph>
                                <paragraph id="64" style="para01"></paragraph>
                                <paragraph id="98" style="para02"></paragraph>
                                <paragraph id="65" style="para02"></paragraph>
                                <paragraph id="655" style="para01"></paragraph>
                        </contents>
                </chapter>
        </body>
</mydocument>

I have 4 or 5 lines in the table xmldocuments; on every record, rawdata has
data similar to the above model.

2. the problem:
how can I select and return only this:
a. a single paragraph
                        <paragraph id="02"></paragraph>
b. a collection of paragraphs that have in common a specific criteria (let's
say style="para02")

                        <paragraph id="54" style="para02"></paragraph>
                        <paragraph id="98" style="para02"></paragraph>
                        <paragraph id="65" style="para02"></paragraph>

For now, the following query
SELECT
    xpath_nodeset(rawdata, '/mydocument/body/chapter/contents/paragraph')
FROM public.xmldocuments
WHERE
    id=4
will return all paragraphs inside document body.

If I add the following clause
AND
xpath_bool(rawdata,'/mydocument/body/chapter/contents/paragraph[objid="2_1"]');
the result set will be empty !!!

I even tried:

SELECT t.idxml, t.rawxml, t.xmlid
FROM
    xpath_table('id', 'rawdata','xmldocuments',

'/mydocument/body/paragraph|/mydocument/body/chapter/content/paragraph|/mydocument/body/chapter/content/paragraph/@objid',
        --'xpath_string(''rawdata'',''@objid'') = ''2_1'' '
        --'xpath_bool(''rawdata'',''/mydocument/body/chapter/content/paragraph[@objid="2_1"]'')'
            'true'
        )
        AS t(idxml integer, rawxml text, xmlid text),
    xmldocuments as x
WHERE
    t.idxml = x.id
AND
    x.id = 4


Please, help!

Thank you for your time,

Marian
--
~~~~~~~~~~~~~~~~~~~~~~~
- S o f t E x p e r t -
~~~~~~~~~~~~~~~~~~~~~~~

Re: XPath question - big trouble

From
Csaba Nagy
Date:
[snip]
>                                 <paragraph id="01" style="para01"></paragraph>
                                            ^^^^
[snip]
> xpath_bool(rawdata,'/mydocument/body/chapter/contents/paragraph[objid="2_1"]');
                                                                  ^^^^^
To me it looks like attribute name mismatch, not to mention you forgot
the '@' (i.e. I guess you wanted the xpath:
'/mydocument/body/chapter/contents/paragraph[@id="2_1"]'

If you were just sloppy writing the email, maybe you should also check
your actual code for such sloppiness ;-)

Cheers,
Csaba.



Re: XPath question - big trouble

From
Marian POPESCU
Date:
Thanks for replying.

It was a typo ...
Also id fom example should be objid ...

Anyway, without the typo, the result set is empty.

There is also a weird thing:


SELECT
  xpath_nodeset(rawdata, '/mydocument/body/chapter/contents/paragraph')
FROM xmldocuments

will output

 xpath_nodeset
 ----------------
 (empty line)
 (empty line)
 (empty line)
 (empty line)

 4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms]

 [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms]

 - (empty line) is what I wrote to say that there was nothing on output -
which makes me think there is something wrong with XPath implementation.

Anyway, is there a solution to my problems?

Marian

Csaba Nagy wrote:

> [snip]
>>                                 <paragraph id="01"
>>                                 style="para01"></paragraph>
>                                             ^^^^
> [snip]
>>
xpath_bool(rawdata,'/mydocument/body/chapter/contents/paragraph[objid="2_1"]');
>                                                                   ^^^^^
> To me it looks like attribute name mismatch, not to mention you forgot
> the '@' (i.e. I guess you wanted the xpath:
> '/mydocument/body/chapter/contents/paragraph[@id="2_1"]'
>
> If you were just sloppy writing the email, maybe you should also check
> your actual code for such sloppiness ;-)
>
> Cheers,
> Csaba.


Re: XPath question - big trouble

From
"Guy Rouillier"
Date:
Marian POPESCU wrote:
> There is also a weird thing:
>
>
> SELECT
>   xpath_nodeset(rawdata,
> '/mydocument/body/chapter/contents/paragraph') FROM xmldocuments
>
> will output
>
>  xpath_nodeset
>  ----------------
>  (empty line)
>  (empty line)
>  (empty line)
>  (empty line)
>
>  4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms]
>
>  [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms]
>
>  - (empty line) is what I wrote to say that there was nothing on
> output - which makes me think there is something wrong with XPath
> implementation.

Makes me think you have null values in the resultset.

--
Guy Rouillier


Re: XPath question - big trouble

From
Csaba Nagy
Date:
Marian,

On Mon, 2006-08-07 at 17:47, Marian POPESCU wrote:

> SELECT
>   xpath_nodeset(rawdata, '/mydocument/body/chapter/contents/paragraph')
> FROM xmldocuments
>
> will output
>
>  xpath_nodeset
>  ----------------
>  (empty line)
>  (empty line)
>  (empty line)
>  (empty line)
>
>  4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms]
>
>  [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms]
>
>  - (empty line) is what I wrote to say that there was nothing on output -
> which makes me think there is something wrong with XPath implementation.

In the first email you said this works for you... the only difference I
find in the query from there and here is that you schema-qualified the
xmldocuments table in your last mail. Are you sure there's no weirdness
in your schemas, with some other table with the same name in another
schema coming in your way ?

Cheers,
Csaba.



Re: XPath question - big trouble

From
Marian POPESCU
Date:
It looks that I shoot myself in the foot :(

CREATE TABLE xmltable
(
  id int8 NOT NULL,
  xml_data text,
  CONSTRAINT pk_xmltable PRIMARY KEY (id)
)
WITHOUT OIDS;

The id field contains values from 1 to 3.
The field xml_data contains something like this:
<mydocument>
        <title objid="4654">My document</title>
                <body objid="6987">
                        <paragraph objid="87" style="para21"></paragraph>
                        <chapter objid="5764">
                                <title objid="646">Chapter 1</title>
                                <contents>
                                        <paragraph objid="01" style="para01">aaaaaaaaaaa</paragraph>
                                        <paragraph objid="02" style="para01">bbbbbbbbbb</paragraph>
                                        <paragraph objid="03" style="para01">cccccccccccc</paragraph>
                                        <paragraph objid="04" style="para01">dddddddddddd</paragraph>
                                        <paragraph objid="05" style="para01">eeeeeeeeeeee</paragraph>
                                </contents>
                        </chapter>
                        <chapter objid="681">
                                <title objid="68746">Chapter 2</title>
                                <contents>
                                        <paragraph objid="654" style="para01">gggggggggg</paragraph>
                                        <paragraph objid="54" style="para02">hhhhhhhhhh</paragraph>
                                        <paragraph objid="64" style="para01">iiiiiiiiii</paragraph>
                                        <paragraph objid="98" style="para02">ttttttttttt</paragraph>
                                        <paragraph objid="65" style="para02">eeeeeeeeeee</paragraph>
                                        <paragraph objid="655" style="para01">kkkkkkkkkk</paragraph>
                                </contents>
                        </chapter>
                </body>
</mydocument>

My questions:
1.What query should I write to get only
    <paragraph objid="02" style="para01">bbbbbbbbbb</paragraph>
knowing that I can pass as a parameter the objid attribute value;

2.What query should I write to get
    <paragraph objid="54" style="para02">hhhhhhhhhh</paragraph>
    <paragraph objid="98" style="para02">ttttttttttt</paragraph>
    <paragraph objid="65" style="para02">eeeeeeeeeee</paragraph>
knowing that I can pass as a parameter the style attribute value.

My new found answers:
1.
SELECT
        xpath_nodeset(xml_data, '/mydocument/body/chapter/contents/paragraph[@objid
= "02"]|/mydocument/body/paragraph[@objid="87"]')
FROM xmltable
WHERE id = 3

will give me the desired paragraph

2.
SELECT
        xpath_nodeset(xml_data, '/mydocument/body/chapter/contents/paragraph[@style
= "para02"]|/mydocument/body/paragraph[@style="para02"]')
FROM xmltable
WHERE id = 3

will give me the collection of paragraphs that correspond to my criteria.

Apparently I am more carefull once I make a fool of myself ;)

Thank you for the replies and sorry for wasting your time!

Csaba Nagy wrote:

> Marian,
>
> On Mon, 2006-08-07 at 17:47, Marian POPESCU wrote:
>
>> SELECT
>>   xpath_nodeset(rawdata, '/mydocument/body/chapter/contents/paragraph')
>> FROM xmldocuments
>>
>> will output
>>
>>  xpath_nodeset
>>  ----------------
>>  (empty line)
>>  (empty line)
>>  (empty line)
>>  (empty line)
>>
>>  4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms]
>>
>>  [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms]
>>
>>  - (empty line) is what I wrote to say that there was nothing on output -
>> which makes me think there is something wrong with XPath implementation.
>
> In the first email you said this works for you... the only difference I
> find in the query from there and here is that you schema-qualified the
> xmldocuments table in your last mail. Are you sure there's no weirdness
> in your schemas, with some other table with the same name in another
> schema coming in your way ?
>
> Cheers,
> Csaba.
>


Re: XPath question - big trouble

From
Marian POPESCU
Date:
You are right; I had a typo : contents is content
Thanks for replying ...

Guy Rouillier wrote:

> Marian POPESCU wrote:
>> There is also a weird thing:
>>
>>
>> SELECT
>>   xpath_nodeset(rawdata,
>> '/mydocument/body/chapter/contents/paragraph') FROM xmldocuments
>>
>> will output
>>
>>  xpath_nodeset
>>  ----------------
>>  (empty line)
>>  (empty line)
>>  (empty line)
>>  (empty line)
>>
>>  4 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 1/ms]
>>
>>  [Executed: 8/7/06 5:42:54 PM CEST ] [Execution: 3/ms]
>>
>>  - (empty line) is what I wrote to say that there was nothing on
>> output - which makes me think there is something wrong with XPath
>> implementation.
>
> Makes me think you have null values in the resultset.
>


Re: XPath question - big trouble

From
Csaba Nagy
Date:
Marian,

> 1.
> SELECT
>         xpath_nodeset(xml_data, '/mydocument/body/chapter/contents/paragraph[@objid
> = "02"]|/mydocument/body/paragraph[@objid="87"]')
> FROM xmltable
> WHERE id = 3

You could maybe use the xpath: '//paragraph[@objid = "02"]' if you want
all paragraphs at all document levels.

A nice xslt reference is:
file:///home/cnagy/offline/XSLTreference/Output/index.html

There you can also exercise your xpath skills...

Cheers,
Csaba.





Re: XPath question - big trouble

From
Marian POPESCU
Date:
Thank you!

Since you brought up the subject, will this
 //paragraph[@objid = "02"]
be slower than
/mydocument/body/chapter/contents/paragraph[@objid
>> = "02"]
when handling a big document?
I mean from the query time point of view ...


Csaba Nagy wrote:

> Marian,
>
>> 1.
>> SELECT
>>         xpath_nodeset(xml_data,
>>         '/mydocument/body/chapter/contents/paragraph[@objid
>> = "02"]|/mydocument/body/paragraph[@objid="87"]')
>> FROM xmltable
>> WHERE id = 3
>
> You could maybe use the xpath: '//paragraph[@objid = "02"]' if you want
> all paragraphs at all document levels.
>
> A nice xslt reference is:
> file:///home/cnagy/offline/XSLTreference/Output/index.html
>
> There you can also exercise your xpath skills...
>
> Cheers,
> Csaba.
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
~~~~~~~~~~~~~~~~~~~~~~~
- S o f t E x p e r t -
~~~~~~~~~~~~~~~~~~~~~~~

Re: XPath question - big trouble

From
Csaba Nagy
Date:
> Since you brought up the subject, will this
>  //paragraph[@objid = "02"]
> be slower than
> /mydocument/body/chapter/contents/paragraph[@objid
> >> = "02"]
> when handling a big document?
> I mean from the query time point of view ...

I have no idea about the postgres XML implementation internals, I never
used it, but for other XML processors I know the speed would be the
same, and I would expect the same for postgres too.

Cheers,
Csaba.