Thread: generating a large XML document

generating a large XML document

From

Julius Tuskenis

Date:

16 June 2011, 16:40:49

Hello, postgresql guru's

When developing ticket export mechanism for our ticketing system I
thought the wise thing would be to generate the result (XML file) in
stored function. This worked fine for small datasets, but for the larger
ones I see it takes much longer to generate the document than to select
the data needed for it.

Here are several "explain alalyze"s :

explain analyze
SELECT * from tex.fnk_access_control_tickets(8560, 0);

"Function Scan on fnk_access_control_tickets  (cost=0.25..10.25
rows=1000 width=246) (actual time=479.276..484.429 rows=16292 loops=1)"
"Total runtime: 486.774 ms"

explain analyze
SELECT
     --xmlagg(
       XMLELEMENT ( NAME "bar",
         XMLELEMENT ( NAME "code", tick_barcode),
         XMLELEMENT ( NAME "stat", status),
         CASE WHEN type IS NOT NULL THEN XMLELEMENT ( NAME "tip", type)
ELSE NULL END,
         CASE WHEN sec_name IS NOT NULL THEN XMLELEMENT ( NAME "sec",
sec_name) ELSE NULL END,
         CASE WHEN row_name IS NOT NULL THEN XMLELEMENT ( NAME "row",
row_name) ELSE NULL END,
         CASE WHEN seat_name IS NOT NULL THEN XMLELEMENT ( NAME "plc",
seat_name) ELSE NULL END,
         CASE WHEN substr(tick_barcode,length(tick_barcode),1)= '1' THEN
         XMLELEMENT ( NAME "groups",
             XMLELEMENT ( NAME "group", 1)
            )
         ELSE NULL END
        )
     -- )
     FROM tex.fnk_access_control_tickets(8560, 0);

"Function Scan on fnk_access_control_tickets  (cost=0.25..17.75
rows=1000 width=238) (actual time=476.446..924.785 rows=16292 loops=1)"
"Total runtime: 928.768 ms"


explain analyze
SELECT
     xmlagg(
       XMLELEMENT ( NAME "bar",
         XMLELEMENT ( NAME "code", tick_barcode),
         XMLELEMENT ( NAME "stat", status),
         CASE WHEN type IS NOT NULL THEN XMLELEMENT ( NAME "tip", type)
ELSE NULL END,
         CASE WHEN sec_name IS NOT NULL THEN XMLELEMENT ( NAME "sec",
sec_name) ELSE NULL END,
         CASE WHEN row_name IS NOT NULL THEN XMLELEMENT ( NAME "row",
row_name) ELSE NULL END,
         CASE WHEN seat_name IS NOT NULL THEN XMLELEMENT ( NAME "plc",
seat_name) ELSE NULL END,
         CASE WHEN substr(tick_barcode,length(tick_barcode),1)= '1' THEN
         XMLELEMENT ( NAME "groups",
             XMLELEMENT ( NAME "group", 1)
            )
         ELSE NULL END
        )
      )
     FROM tex.fnk_access_control_tickets(8560, 0);

"Aggregate  (cost=12.75..12.77 rows=1 width=238) (actual
time=16110.847..16110.848 rows=1 loops=1)"
"  ->  Function Scan on fnk_access_control_tickets  (cost=0.25..10.25
rows=1000 width=238) (actual time=500.029..520.974 rows=16292 loops=1)"
"Total runtime: 16111.264 ms"

It seems the aggregate combining the elements is to blame... What I did
next was rewriting it in stored function using FOR loop and combining it
in a text variable. Sadly the result was the same (approximately 16
seconds).

 From that I had to conclude that text := text + some_text operation is
an expensive one, but now I have no ideas left how to solve the problem.
Any notices, recommendations, advices are very welcome.

I've also tried google'ing on XML creation in Postgresql, but found no
warnings or even mentioning xmlagg could cause a headache. I have
nowhere to turn for help now, so please advice...

Not sure if that will help, but anyway:
Server:
PostgreSQL 9.0.3 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real
(Debian 4.4.5-10) 4.4.5, 32-bit
see the result of select * from pg_settings  in attachment if needed.
Client:
Windows XP, pgAdmin 1.12.3


Thank you in advance.

--
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050

Attachment

pg_settings.txt

Re: generating a large XML document

From

Julius Tuskenis

Date:

20 June 2011, 03:36:41

Hello,

I'm sorry to write again, but as I received no answer I wonder if there
is a better mailing list to address concerning this question? Or is
there nothing to be done about the speed of xmlagg ?. Please let me as
no answer is the worst answer to get....

--
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050

Re: generating a large XML document

From

Pavel Stehule

Date:

20 June 2011, 03:52:33

Hello

2011/6/20 Julius Tuskenis <julius@nsoft.lt>:
> Hello,
>
> I'm sorry to write again, but as I received no answer I wonder if there is a
> better mailing list to address concerning this question? Or is there nothing
> to be done about the speed of xmlagg ?. Please let me as no answer is the
> worst answer to get....
>

It's hard to say where is problem - PostgreSQL wraps libxml2 library
for xml functionality, so problem can be

a) inside libxml2
b) on interface between libxml2 and PostgreSQL
c) on PostgreSQL memory management

can you send a profile?

Regards

Pavel Stehule

> --
> Julius Tuskenis
> Programavimo skyriaus vadovas
> UAB nSoft
> mob. +37068233050
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: generating a large XML document

From

Samuel Gendler

Date:

20 June 2011, 04:21:18

On Sun, Jun 19, 2011 at 11:36 PM, Julius Tuskenis <julius@nsoft.lt> wrote:

Hello,

I'm sorry to write again, but as I received no answer I wonder if there is a better mailing list to address concerning this question? Or is there nothing to be done about the speed of xmlagg ?. Please let me as no answer is the worst answer to get....

I played around a little after your initial post, but I'm not a postgresql developer and I have no familiarity with the internals, so everything I'm about to write is pure conjecture.

Programming languages that treat strings as immutable often suffer from this kind of problem. WIth each string concatenation, both strings have to be copied to a new location in memory. I suspect that this is happening in this case. The only viable fix would be to use a buffer that is mutable and append into it rather than doing raw string concatenation. If this is truly the problem, I don't see that you have much choice but to re-implement xmlagg in one of the available languages such that it uses a buffer instead of immutable string concatenation. It is probably that the xmlelement function doesn't need to be rewritten, since it is only concatenating relatively short strings. It is less efficient than appending to a buffer, but shouldn't get catastrophically slow. But xmlagg is concatenating many rows. If each concatenation results in a full copy of the already concatenated rows, you can see where performance would drop catastrophically.

Here's the first google result for 'mutable string python' that I found, which addresses this problem in python. http://www.skymind.com/~ocrow/python_string/ You could rewrite the aggregate function in plpython using one of the techniques in that file. I just attempted to find the source to xm_agg in the postgresql source code and it is pretty well obfuscated, so I don't see that being much help. I wasn't even able to determine if the problem actually is immutable string concatenation. So we don't know if xmlagg is building a DOM tree and then serializing it once (which would imply that XMLELEMENT returns a single DOM node, or if it is all workign with strings. Barring answers from someone who actually knows, I can only suggest that you read through the documentation on writing an aggregate function and then do some experimentation to see what you get when you use your own aggregate function instead of xml_agg. Hopefully, such experimentation won't take long to determine if re-implementing xml_agg with a mutable buffer is a viable option.

--sam

Re: generating a large XML document

From

Julius Tuskenis

Date:

20 June 2011, 04:29:54

Thank you, Pavel for your answer

2011.06.20 09:51, Pavel Stehule rašė:
> can you send a profile?
Excuse me, but what do you mean by saying "profile"? I've sent content
of pg_settings in the first post. Please be more specific as I am more
of a programmer than an server administrator.

--
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050

Re: generating a large XML document

From

Julius Tuskenis

Date:

20 June 2011, 04:38:55

Thank You, Samuel for the time it took to investigate the issue.

I'll try to use buffer to see what the results are... I'll post results
to the list if I succeed.

--
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050

Re: generating a large XML document

From

Pavel Stehule

Date:

20 June 2011, 04:47:57

2011/6/20 Julius Tuskenis <julius@nsoft.lt>:
> Thank you, Pavel for your answer
>
> 2011.06.20 09:51, Pavel Stehule rašė:
>>
>> can you send a profile?
>
> Excuse me, but what do you mean by saying "profile"? I've sent content of
> pg_settings in the first post. Please be more specific as I am more of a
> programmer than an server administrator.
>

a result from oprofile profiler

Regards

Pavel

> --
> Julius Tuskenis
> Programavimo skyriaus vadovas
> UAB nSoft
> mob. +37068233050
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: generating a large XML document

From

Pavel Stehule

Date:

20 June 2011, 04:59:02

2011/6/20 Pavel Stehule <pavel.stehule@gmail.com>:
> 2011/6/20 Julius Tuskenis <julius@nsoft.lt>:
>> Thank you, Pavel for your answer
>>
>> 2011.06.20 09:51, Pavel Stehule rašė:
>>>
>>> can you send a profile?
>>
>> Excuse me, but what do you mean by saying "profile"? I've sent content of
>> pg_settings in the first post. Please be more specific as I am more of a
>> programmer than an server administrator.
>>
>
> a result from oprofile profiler
>

I looked into code - probably a implementation of xmlagg is too silly

xmlagg use a xmlconcat functions - that means repeated xml parsing and
xmlserialization. So it is not effective on larger trees :(

string_agg is more effective now. The solution is only radical
refactoring of xmlagg function.

Pavel




> Regards
>
> Pavel
>
>> --
>> Julius Tuskenis
>> Programavimo skyriaus vadovas
>> UAB nSoft
>> mob. +37068233050
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>

Re: generating a large XML document

From

Julius Tuskenis

Date:

20 June 2011, 06:03:34

2011.06.20 10:58, Pavel Stehule rašė:
> string_agg is more effective now. The solution is only radical
> refactoring of xmlagg function.
Thank you, Pavel for letting me know of string_agg.

explain analyze
SELECT
     string_agg(
       XMLELEMENT ( NAME "bar",
         XMLELEMENT ( NAME "code", tick_barcode),
         XMLELEMENT ( NAME "stat", status),
         CASE WHEN type IS NOT NULL THEN XMLELEMENT ( NAME "tip", type)
ELSE NULL END,
         CASE WHEN sec_name IS NOT NULL THEN XMLELEMENT ( NAME "sec",
sec_name) ELSE NULL END,
         CASE WHEN row_name IS NOT NULL THEN XMLELEMENT ( NAME "row",
row_name) ELSE NULL END,
         CASE WHEN seat_name IS NOT NULL THEN XMLELEMENT ( NAME "plc",
seat_name) ELSE NULL END,
         CASE WHEN substr(tick_barcode,length(tick_barcode),1)= '1' THEN
         XMLELEMENT ( NAME "groups",
             XMLELEMENT ( NAME "group", 1)
            )
         ELSE NULL END
        )::text, NULL
     )::xml
     FROM tex.fnk_access_control_tickets(8560, 0);

"Aggregate  (cost=12.75..12.77 rows=1 width=238) (actual
time=1025.502..1025.502 rows=1 loops=1)"
"  ->  Function Scan on fnk_access_control_tickets  (cost=0.25..10.25
rows=1000 width=238) (actual time=495.703..503.999 rows=16292 loops=1)"
"Total runtime: 1036.775 ms"

Its over 10 times faster than using xmlagg.

--
Julius Tuskenis
Programavimo skyriaus vadovas
UAB nSoft
mob. +37068233050