to high memory expensive xpath_table from xml2 contrib extension - Mailing list pgsql-bugs

From Pavel Stehule
Subject to high memory expensive xpath_table from xml2 contrib extension
Date
Msg-id CAFj8pRD75ZmgdKNmrVFQhkxxsoNt7NdoLRwmgO3huyeUGcruyg@mail.gmail.com
Whole thread Raw
List pgsql-bugs
Hi

I found some unexpected behave when I tested some possibilities for my
customer.

This test needs installed extension xml2 with xslt buildin support

create or replace function xml_generator(int) returns xml as $$
  select xslt_process(xmlelement(name data, xmlagg(xmlelement(name row,
        xmlforest(right('0000000' || i, 8) as a, current_date as b,
current_date + 100 as c, 'XXX' as d,
                       'short' as e, 'short' as f, 'some longer
description' as g,
                       'some other longer description english' as
h))))::text,
'<xsl:stylesheet version="1.0" xmlns:xsl="
http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*" /><xsl:output method="xml" indent="yes" />
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>')::xml from generate_series(1,$1) g(i)
$$ language sql;

postgres=# select xml_generator(1);
                  xml_generator
--------------------------------------------------
 <data>                                          +
   <row>                                         +
     <a>00000001</a>                             +
     <b>2015-04-05</b>                           +
     <c>2015-07-14</c>                           +
     <d>XXX</d>                                  +
     <e>short</e>                                +
     <f>short</f>                                +
     <g>some longer description</g>              +
     <h>some other longer description english</h>+
   </row>                                        +
 </data>                                         +

(1 row)

create table test(id int, a xml);
insert into test values(1, xml_generator(10));
insert into test values(2, xml_generator(100));
insert into test values(3, xml_generator(1000));
insert into test values(4, xml_generator(2000));
insert into test values(5, xml_generator(4000));
insert into test values(6, xml_generator(8000));

The size of xml document with 8K fields is about 1.7MB

postgres=# select length(xml_generator(8000)::text);
 length
---------
 1712037
(1 row)

processing via buildin function xpath is fast:

postgres=# select a[1]::text a, a[2]::text::date b, a[3]::text::date c,
a[4]::text, a[5]::text d, a[6]::text e, a[7]::text f, a[8]::text g
from (select

xpath('/row/a/text()|/row/b/text()|/row/c/text()|/row/d/text()|/row/e/text()|/row/f/text()|/row/g/text()|/row/h/text()',
r) a
from (select unnest(xpath('/data/row', a)) r from test where id = 1) s1) s2;
    a     |     b      |     c      |  a  |   d   |   e   |
f            |                   g

----------+------------+------------+-----+-------+-------+-------------------------+---------------------------------------
 00000001 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
 00000002 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
 00000003 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
 00000004 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
 00000005 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
 00000006 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
 00000007 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
 00000008 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
 00000009 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
 00000010 | 2015-04-05 | 2015-07-14 | XXX | short | short | some longer
description | some other longer description english
(10 rows)

postgres=# \timing
Timing is on.
postgres=# \o /dev/null
postgres=# select a[1]::text a, a[2]::text::date b, a[3]::text::date c,
a[4]::text, a[5]::text d, a[6]::text e, a[7]::text f, a[8]::text g
from (select

xpath('/row/a/text()|/row/b/text()|/row/c/text()|/row/d/text()|/row/e/text()|/row/f/text()|/row/g/text()|/row/h/text()',
r) a
from (select unnest(xpath('/data/row', a)) r from test where id = 6) s1) s2;
Time: 2620.013 ms

postgres=# select * from xpath_table('id', 'a', 'test',

'/data/row/a/text()|/data/row/b/text()|/data/row/c/text()|/data/row/d/text()|/data/row/e/text()|/data/row/f/text()|/data/row/g/text()|/data/row/h/text()',
'id = 1') as (id int, a text, b date, c date, d text, e text, f text, g
text, h text);
 id |    a     |     b      |     c      |  d  |   e   |   f   |
g            |                   h

----+----------+------------+------------+-----+-------+-------+-------------------------+---------------------------------------
  1 | 00000001 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
  1 | 00000002 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
  1 | 00000003 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
  1 | 00000004 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
  1 | 00000005 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
  1 | 00000006 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
  1 | 00000007 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
  1 | 00000008 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
  1 | 00000009 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
  1 | 00000010 | 2015-04-05 | 2015-07-14 | XXX | short | short | some
longer description | some other longer description english
(10 rows)

Time: 4.965 ms

But for processing xml(5) .. 4K fields it needs 1.5GB RAM and 89sec.

Using xpath_table is relative comfortable, but dangerous for little bit
larger data than small.

Regards

Pavel Stehule

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: src/port/getopt_long.c lossy with arguments having no option characters
Next
From: Michael Paquier
Date:
Subject: Re: BUG #12917: C program created by ecpg core dumped due to "varcharsize * offset"