Thread: How to parse xml containing optional elements

How to parse xml containing optional elements

From
"Andrus"
Date:
SEPA ISO XML transactions file needs to be parsed into flat table  in
Postgres 9.1+ in ASP:NET 4.6 MVC controller.

I tried code below but this produces wrong result:

tasusumma  orderinr
    150.00  PV04131
      0.38  PV04131

Since there is no EndToEnd in second row there should be null in second row
orderinr column. Correct result is:

tasusumma  orderinr
    150.00  PV04131
      0.38  null


How to fix this ?

    create temp table t(x xml, nsa text[][]) on commit drop;
    insert into t values(
    '<?xml version=''1.0'' encoding=''UTF-8''?>
    <Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.053.001.02">
      <BkToCstmrStmt>
        <Stmt>
          <Ntry>
            <Amt Ccy="EUR">150.00</Amt>
            <NtryDtls>
              <TxDtls>
                <Refs>
                  <EndToEndId>PV04131</EndToEndId>
                </Refs>
              </TxDtls>
            </NtryDtls>
          </Ntry>
          <Ntry>
            <Amt Ccy="EUR">0.38</Amt>
            <NtryDtls>
              <TxDtls>
                <Refs>
                  <AcctSvcrRef>2016080100178214-2</AcctSvcrRef>
                </Refs>
              </TxDtls>
            </NtryDtls>
          </Ntry>
        </Stmt>
      </BkToCstmrStmt>
    </Document>
    ', ARRAY[ARRAY['ns','urn:iso:std:iso:20022:tech:xsd:camt.053.001.02']]);

    SELECT
    unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()',
x,nsa))::text::numeric AS tasusumma
     ,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))::text AS orderinr
    FROM t;

Parsing can done in other ways e.q. using xslt stylesheet for tranformation
or in client side ASP.NET 4.6 MVC if this is more reasonable.

Posted also in
http://stackoverflow.com/questions/38888739/how-to-parse-xml-with-optional-elements

Andrus.



Re: How to parse xml containing optional elements

From
Hannes Erven
Date:
Hi Andrus,


> SELECT
> unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()',
> x,nsa))::text::numeric AS tasusumma
>     ,
> unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
> x,nsa))::text AS orderinr
>    FROM t;


You need to extract all ns:Ntry elements first, and then get the amount
and EndToEndId for each of them:

SELECT
    (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
    (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))[1]
AS orderinr

FROM (
    SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
    FROM t
) Ntry



Best regards,

    -hannes


Re: How to parse xml containing optional elements

From
"Andrus"
Date:
Hi!

Thank you.
In "PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real
(Debian 4.4.5-8) 4.4.5, 64-bit"

it returns two empty rows. How to make it work in this version ?

In "PostgreSQL 9.5.2, compiled by Visual C++ build 1800, 32-bit" it works.

Andrus.


-----Algsõnum-----
From: Hannes Erven
Sent: Thursday, August 11, 2016 11:51 AM
To: Andrus
Cc: pgsql-general
Subject: Re: [GENERAL] How to parse xml containing optional elements

Hi Andrus,


> SELECT
> unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()',
> x,nsa))::text::numeric AS tasusumma
>     ,
> unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
> x,nsa))::text AS orderinr
>    FROM t;


You need to extract all ns:Ntry elements first, and then get the amount
and EndToEndId for each of them:

SELECT
(xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
(xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))[1]
AS orderinr

FROM (
SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
FROM t
) Ntry




Re: How to parse xml containing optional elements

From
Hannes Erven
Date:
Hi,


> Thank you.
> In "PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by
> gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit"
> it returns two empty rows. How to make it work in this version ?

I couldn't really believe this so I just installed a VM and a 9.1
postgresql just to test this for you.

It seems you hit a bug in PostgreSQL prior to 9.1.15:
https://www.postgresql.org/docs/9.1/static/release-9-1-15.html

"Fix namespace handling in xpath() (Ali Akbar)

Previously, the xml value resulting from an xpath() call would not have
namespace declarations if the namespace declarations were attached to an
ancestor element in the input xml value, rather than to the specific
element being returned. Propagate the ancestral declaration so that the
result is correct when considered in isolation.
"


Given your current PG version, the queries will probably work if you
remove the "ns:" parts of the first two xpaths like this:

SELECT
  (xpath('Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
  (xpath('NtryDtls/TxDtls/Refs/EndToEndId/text()', x,nsa))[1] AS orderinr
FROM (
  SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
  x,nsa)) as x, nsa
  FROM t
) Ntry


But that is not a good solution:
- when you eventually do upgrade, the query *will* break
- it depends on the exact location of the namespace declaration in the
source document. Your bank might change that in a way that will be still
perfectly valid, but break the assumptions made in that workaround.


So I suggest to upgrade to a supported version of the 9.1 branch from
your more than 5 years old build (which should be easy to do).


Best regards,

    -hannes



Re: How to parse xml containing optional elements

From
"Andrus"
Date:
Hi!

>I couldn't really believe this so I just installed a VM and a 9.1
postgresql just to test this for you.
>It seems you hit a bug in PostgreSQL prior to 9.1.15:
>https://www.postgresql.org/docs/9.1/static/release-9-1-15.html
>"Fix namespace handling in xpath() (Ali Akbar)
>Previously, the xml value resulting from an xpath() call would not have
>namespace declarations if the namespace declarations were attached to an
>ancestor element in the input xml value, rather than to the specific
>element being returned. Propagate the ancestral declaration so that the
>result is correct when considered in isolation.
>"
>Given your current PG version, the queries will probably work if you remove
>the "ns:" parts of the first two xpaths like this:

>SELECT
>  (xpath('Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
>  (xpath('NtryDtls/TxDtls/Refs/EndToEndId/text()', x,nsa))[1] AS orderinr
>FROM (
>  SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
>  x,nsa)) as x, nsa
>  FROM t
>) Ntry
>But that is not a good solution:
>- when you eventually do upgrade, the query *will* break
>- it depends on the exact location of the namespace declaration in the
>source document. Your bank might change that in a way that will be still
>perfectly valid, but break the assumptions made in that workaround.
>So I suggest to upgrade to a supported version of the 9.1 branch from your
>more than 5 years old build (which should be easy to do).

Psotgres 9.1 run isn Debian Squeeze which is unsupported.
How to upgrade in Debian Squeeze ?

How to add IBAN column to result table? This column has same value for all
rows.

create temp table t(x xml, nsa text[][]) on commit drop;
insert into t values(
'<?xml version=''1.0'' encoding=''UTF-8''?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.053.001.02">
  <BkToCstmrStmt>
    <Stmt>
      <Acct>
        <Id>
          <IBAN>XX00221059842412</IBAN>
        </Id>
      </Acct>
      <Ntry>
        <Amt Ccy="EUR">150.00</Amt>
        <NtryDtls>
          <TxDtls>
            <Refs>
              <EndToEndId>PV04131</EndToEndId>
            </Refs>
          </TxDtls>
        </NtryDtls>
      </Ntry>
      <Ntry>
        <Amt Ccy="EUR">0.38</Amt>
        <NtryDtls>
          <TxDtls>
            <Refs>
              <AcctSvcrRef>2016080100178214-2</AcctSvcrRef>
            </Refs>
          </TxDtls>
        </NtryDtls>
      </Ntry>
    </Stmt>
  </BkToCstmrStmt>
</Document>
', ARRAY[ARRAY['ns','urn:iso:std:iso:20022:tech:xsd:camt.053.001.02']]);

SELECT
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
x,nsa))::text AS endaaa,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()',
x,nsa))::text::numeric AS tasusumma
,
unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))::text AS orderinr
FROM t;

should produce

endaaa            tasusumma  orderinr
XX00221059842412     150.00  PV04131
XX00221059842412       0.38  null

Andrus.



Re: How to parse xml containing optional elements

From
Hannes Erven
Date:
Andrus,


> Psotgres 9.1 run isn Debian Squeeze which is unsupported.
> How to upgrade in Debian Squeeze ?

A plain "apt-get upgrade postgresql-9.1" does not work?
It might help to enable the postgresql.org APT repository. For
instructions, see here:
https://www.postgresql.org/download/linux/debian/


> How to add IBAN column to result table? This column has same value for
> all rows.

SELECT
     endaaa,
     (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
     (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))[1] AS orderinr

FROM (
     SELECT
(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
x,nsa))[1] as endaaa,
     unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
     FROM t
) Ntry

Be careful, this will only work when there is exactly one ns:Stmt
element in the document.
Else you will have to build a third query level, first selecting the
ns:Stmt entries, second the IBAN and Ntry from them and third amount and
EndToEndId.


-hannes



Re: How to parse xml containing optional elements

From
"Andrus"
Date:
Hi!

Thank you.

> How to upgrade in Debian Squeeze ?
>A plain "apt-get upgrade postgresql-9.1" does not work?
>It might help to enable the postgresql.org APT repository. For
>instructions, see here:
>https://www.postgresql.org/download/linux/debian/

apt-get upgrade postgresql-9.1 returns

Reading package lists... Done
Building dependency tree
Reading state information... Done
You might want to run 'apt-get -f install' to correct these.
The following packages have unmet dependencies:
openssl : Depends: libssl1.0.0 (>= 1.0.1e-2+deb7u5) but it is not
installable
wkhtmltox : Depends: libssl1.0.0 but it is not installable
E: Unmet dependencies. Try using -f.

so it looks like repository is found but not usable ?


>> How to add IBAN column to result table? This column has same value for
>> all rows.

>SELECT
     endaaa,
     (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
     (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))[1] AS orderinr

FROM (
     SELECT
(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
x,nsa))[1] as endaaa,
     unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa
     FROM t
) Ntry

>Be careful, this will only work when there is exactly one ns:Stmt element
>in the document.
>Else you will have to build a third query level, first selecting the
>ns:Stmt entries, second the IBAN and Ntry from them and third amount and
>EndToEndId.

Hopefully there is only one Stmt element in single file.

I solved it by moving xpath to select IBAN to main select:

SELECT
(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
xo,nsa))[1]::text AS endaaa,
     (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
     (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
x,nsa))[1] AS orderinr

FROM (
     SELECT
     unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
x,nsa)) as x, nsa, x as xo
     FROM t
) Ntry

This references endaaa from single select only. Changing code requires
changing only one line.

Is this OK ?

Andrus.



Re: How to parse xml containing optional elements

From
Hannes Erven
Date:
Hi,


> apt-get upgrade postgresql-9.1 returns
>
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> You might want to run 'apt-get -f install' to correct these.
> The following packages have unmet dependencies:
> openssl : Depends: libssl1.0.0 (>= 1.0.1e-2+deb7u5) but it is not
> installable
> wkhtmltox : Depends: libssl1.0.0 but it is not installable
> E: Unmet dependencies. Try using -f.

I'm afraid I cannot really help you with this. Is this with the
postgresql.org repository already enabled?
Did you run "apt-get update"?
What about "apt-get upgrade" (a system-wide upgrade)?


>> Be careful, this will only work when there is exactly one ns:Stmt
>> element in the document.
>> Else you will have to build a third query level, first selecting the
>> ns:Stmt entries, second the IBAN and Ntry from them and third amount
>> and EndToEndId.
>
> Hopefully there is only one Stmt element in single file.
> I solved it by moving xpath to select IBAN to main select:
>
> SELECT
> (xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()',
> xo,nsa))[1]::text AS endaaa,
>     (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma,
>     (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()',
> x,nsa))[1] AS orderinr
>
> FROM (
>     SELECT
>     unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry',
> x,nsa)) as x, nsa, x as xo
>     FROM t
> ) Ntry
>
> This references endaaa from single select only. Changing code requires
> changing only one line.
>
> Is this OK ?

This will still select only the IBAN from the first statement in the file.


-hannes