Thread: How to parse xml containing optional elements
SEPA ISO XML transactions file needs to be parsed into flat table in Postgres 9.1+ in ASP:NET 4.6 MVC controller. I tried code below but this produces wrong result: tasusumma orderinr 150.00 PV04131 0.38 PV04131 Since there is no EndToEnd in second row there should be null in second row orderinr column. Correct result is: tasusumma orderinr 150.00 PV04131 0.38 null How to fix this ? create temp table t(x xml, nsa text[][]) on commit drop; insert into t values( '<?xml version=''1.0'' encoding=''UTF-8''?> <Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.053.001.02"> <BkToCstmrStmt> <Stmt> <Ntry> <Amt Ccy="EUR">150.00</Amt> <NtryDtls> <TxDtls> <Refs> <EndToEndId>PV04131</EndToEndId> </Refs> </TxDtls> </NtryDtls> </Ntry> <Ntry> <Amt Ccy="EUR">0.38</Amt> <NtryDtls> <TxDtls> <Refs> <AcctSvcrRef>2016080100178214-2</AcctSvcrRef> </Refs> </TxDtls> </NtryDtls> </Ntry> </Stmt> </BkToCstmrStmt> </Document> ', ARRAY[ARRAY['ns','urn:iso:std:iso:20022:tech:xsd:camt.053.001.02']]); SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()', x,nsa))::text::numeric AS tasusumma , unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))::text AS orderinr FROM t; Parsing can done in other ways e.q. using xslt stylesheet for tranformation or in client side ASP.NET 4.6 MVC if this is more reasonable. Posted also in http://stackoverflow.com/questions/38888739/how-to-parse-xml-with-optional-elements Andrus.
Hi Andrus, > SELECT > unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()', > x,nsa))::text::numeric AS tasusumma > , > unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', > x,nsa))::text AS orderinr > FROM t; You need to extract all ns:Ntry elements first, and then get the amount and EndToEndId for each of them: SELECT (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma, (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))[1] AS orderinr FROM ( SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry', x,nsa)) as x, nsa FROM t ) Ntry Best regards, -hannes
Hi! Thank you. In "PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit" it returns two empty rows. How to make it work in this version ? In "PostgreSQL 9.5.2, compiled by Visual C++ build 1800, 32-bit" it works. Andrus. -----Algsõnum----- From: Hannes Erven Sent: Thursday, August 11, 2016 11:51 AM To: Andrus Cc: pgsql-general Subject: Re: [GENERAL] How to parse xml containing optional elements Hi Andrus, > SELECT > unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()', > x,nsa))::text::numeric AS tasusumma > , > unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', > x,nsa))::text AS orderinr > FROM t; You need to extract all ns:Ntry elements first, and then get the amount and EndToEndId for each of them: SELECT (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma, (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))[1] AS orderinr FROM ( SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry', x,nsa)) as x, nsa FROM t ) Ntry
Hi, > Thank you. > In "PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by > gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit" > it returns two empty rows. How to make it work in this version ? I couldn't really believe this so I just installed a VM and a 9.1 postgresql just to test this for you. It seems you hit a bug in PostgreSQL prior to 9.1.15: https://www.postgresql.org/docs/9.1/static/release-9-1-15.html "Fix namespace handling in xpath() (Ali Akbar) Previously, the xml value resulting from an xpath() call would not have namespace declarations if the namespace declarations were attached to an ancestor element in the input xml value, rather than to the specific element being returned. Propagate the ancestral declaration so that the result is correct when considered in isolation. " Given your current PG version, the queries will probably work if you remove the "ns:" parts of the first two xpaths like this: SELECT (xpath('Amt/text()', x,nsa))[1]::text::numeric AS tasusumma, (xpath('NtryDtls/TxDtls/Refs/EndToEndId/text()', x,nsa))[1] AS orderinr FROM ( SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry', x,nsa)) as x, nsa FROM t ) Ntry But that is not a good solution: - when you eventually do upgrade, the query *will* break - it depends on the exact location of the namespace declaration in the source document. Your bank might change that in a way that will be still perfectly valid, but break the assumptions made in that workaround. So I suggest to upgrade to a supported version of the 9.1 branch from your more than 5 years old build (which should be easy to do). Best regards, -hannes
Hi! >I couldn't really believe this so I just installed a VM and a 9.1 postgresql just to test this for you. >It seems you hit a bug in PostgreSQL prior to 9.1.15: >https://www.postgresql.org/docs/9.1/static/release-9-1-15.html >"Fix namespace handling in xpath() (Ali Akbar) >Previously, the xml value resulting from an xpath() call would not have >namespace declarations if the namespace declarations were attached to an >ancestor element in the input xml value, rather than to the specific >element being returned. Propagate the ancestral declaration so that the >result is correct when considered in isolation. >" >Given your current PG version, the queries will probably work if you remove >the "ns:" parts of the first two xpaths like this: >SELECT > (xpath('Amt/text()', x,nsa))[1]::text::numeric AS tasusumma, > (xpath('NtryDtls/TxDtls/Refs/EndToEndId/text()', x,nsa))[1] AS orderinr >FROM ( > SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry', > x,nsa)) as x, nsa > FROM t >) Ntry >But that is not a good solution: >- when you eventually do upgrade, the query *will* break >- it depends on the exact location of the namespace declaration in the >source document. Your bank might change that in a way that will be still >perfectly valid, but break the assumptions made in that workaround. >So I suggest to upgrade to a supported version of the 9.1 branch from your >more than 5 years old build (which should be easy to do). Psotgres 9.1 run isn Debian Squeeze which is unsupported. How to upgrade in Debian Squeeze ? How to add IBAN column to result table? This column has same value for all rows. create temp table t(x xml, nsa text[][]) on commit drop; insert into t values( '<?xml version=''1.0'' encoding=''UTF-8''?> <Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.053.001.02"> <BkToCstmrStmt> <Stmt> <Acct> <Id> <IBAN>XX00221059842412</IBAN> </Id> </Acct> <Ntry> <Amt Ccy="EUR">150.00</Amt> <NtryDtls> <TxDtls> <Refs> <EndToEndId>PV04131</EndToEndId> </Refs> </TxDtls> </NtryDtls> </Ntry> <Ntry> <Amt Ccy="EUR">0.38</Amt> <NtryDtls> <TxDtls> <Refs> <AcctSvcrRef>2016080100178214-2</AcctSvcrRef> </Refs> </TxDtls> </NtryDtls> </Ntry> </Stmt> </BkToCstmrStmt> </Document> ', ARRAY[ARRAY['ns','urn:iso:std:iso:20022:tech:xsd:camt.053.001.02']]); SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()', x,nsa))::text AS endaaa, unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:Amt/text()', x,nsa))::text::numeric AS tasusumma , unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry/ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))::text AS orderinr FROM t; should produce endaaa tasusumma orderinr XX00221059842412 150.00 PV04131 XX00221059842412 0.38 null Andrus.
Andrus, > Psotgres 9.1 run isn Debian Squeeze which is unsupported. > How to upgrade in Debian Squeeze ? A plain "apt-get upgrade postgresql-9.1" does not work? It might help to enable the postgresql.org APT repository. For instructions, see here: https://www.postgresql.org/download/linux/debian/ > How to add IBAN column to result table? This column has same value for > all rows. SELECT endaaa, (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma, (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))[1] AS orderinr FROM ( SELECT (xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()', x,nsa))[1] as endaaa, unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry', x,nsa)) as x, nsa FROM t ) Ntry Be careful, this will only work when there is exactly one ns:Stmt element in the document. Else you will have to build a third query level, first selecting the ns:Stmt entries, second the IBAN and Ntry from them and third amount and EndToEndId. -hannes
Hi! Thank you. > How to upgrade in Debian Squeeze ? >A plain "apt-get upgrade postgresql-9.1" does not work? >It might help to enable the postgresql.org APT repository. For >instructions, see here: >https://www.postgresql.org/download/linux/debian/ apt-get upgrade postgresql-9.1 returns Reading package lists... Done Building dependency tree Reading state information... Done You might want to run 'apt-get -f install' to correct these. The following packages have unmet dependencies: openssl : Depends: libssl1.0.0 (>= 1.0.1e-2+deb7u5) but it is not installable wkhtmltox : Depends: libssl1.0.0 but it is not installable E: Unmet dependencies. Try using -f. so it looks like repository is found but not usable ? >> How to add IBAN column to result table? This column has same value for >> all rows. >SELECT endaaa, (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma, (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))[1] AS orderinr FROM ( SELECT (xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()', x,nsa))[1] as endaaa, unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry', x,nsa)) as x, nsa FROM t ) Ntry >Be careful, this will only work when there is exactly one ns:Stmt element >in the document. >Else you will have to build a third query level, first selecting the >ns:Stmt entries, second the IBAN and Ntry from them and third amount and >EndToEndId. Hopefully there is only one Stmt element in single file. I solved it by moving xpath to select IBAN to main select: SELECT (xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()', xo,nsa))[1]::text AS endaaa, (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma, (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', x,nsa))[1] AS orderinr FROM ( SELECT unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry', x,nsa)) as x, nsa, x as xo FROM t ) Ntry This references endaaa from single select only. Changing code requires changing only one line. Is this OK ? Andrus.
Hi, > apt-get upgrade postgresql-9.1 returns > > Reading package lists... Done > Building dependency tree > Reading state information... Done > You might want to run 'apt-get -f install' to correct these. > The following packages have unmet dependencies: > openssl : Depends: libssl1.0.0 (>= 1.0.1e-2+deb7u5) but it is not > installable > wkhtmltox : Depends: libssl1.0.0 but it is not installable > E: Unmet dependencies. Try using -f. I'm afraid I cannot really help you with this. Is this with the postgresql.org repository already enabled? Did you run "apt-get update"? What about "apt-get upgrade" (a system-wide upgrade)? >> Be careful, this will only work when there is exactly one ns:Stmt >> element in the document. >> Else you will have to build a third query level, first selecting the >> ns:Stmt entries, second the IBAN and Ntry from them and third amount >> and EndToEndId. > > Hopefully there is only one Stmt element in single file. > I solved it by moving xpath to select IBAN to main select: > > SELECT > (xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Acct/ns:Id/ns:IBAN/text()', > xo,nsa))[1]::text AS endaaa, > (xpath('ns:Amt/text()', x,nsa))[1]::text::numeric AS tasusumma, > (xpath('ns:NtryDtls/ns:TxDtls/ns:Refs/ns:EndToEndId/text()', > x,nsa))[1] AS orderinr > > FROM ( > SELECT > unnest(xpath('/ns:Document/ns:BkToCstmrStmt/ns:Stmt/ns:Ntry', > x,nsa)) as x, nsa, x as xo > FROM t > ) Ntry > > This references endaaa from single select only. Changing code requires > changing only one line. > > Is this OK ? This will still select only the IBAN from the first statement in the file. -hannes