BUG #17896: xmlagg exponentially slower than string_agg equivalent - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17896: xmlagg exponentially slower than string_agg equivalent
Date
Msg-id 17896-f4bf54f99b6a3089@postgresql.org
Whole thread Raw
Responses Re: BUG #17896: xmlagg exponentially slower than string_agg equivalent
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17896
Logged by:          Paul Wehr
Email address:      sf_postgresql@informationsoftworks.com
PostgreSQL version: 14.6
Operating system:   Debian GNU/Linux 11 (bullseye)
Description:

Not a bug in terms of functionality, but an interesting difference in
performance.  If nothing else hopefully documenting this work-around might
help anyone else that may be aggregating xmlelements on this scale.

Using xmlagg(xml) on 60,000 rows of xml takes 16.3s on my hardware.  If I
replace "xmlagg(xml)" with "string_agg(xml::text)::xml", I can get the same
result in 0.051s.  (see script below). I have confirmed this behavior on
other hardware, and in versions 11.19, 13.7 and 14.6.

Also, while string_agg() seems to scale linearly (O(n)), xmlagg() seems to
scale somewhat steeper (O(n^2)?).  Running the same script on various second
parameter values in the generate_series() function produced these results:

       |                |   string_agg()
 Rows  |  xmlagg(), ms  | work-around, ms
-------+----------------+------------------
 10000 |      314.44    |      14.207
 20000 |    1111.104    |      23.754
 30000 |    2558.706    |      37.565
 40000 |    5306.631    |      51.021
 50000 |   10178.496    |      61.928
 60000 |   16306.651    |      75.146
 70000 |   23853.331    |      93.529
 80000 |   32243.908    |     105.318


-------- psql test script output ----------------

select version();
                                                           version
                                                

-----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

create temporary table xmlsample as
select xmlelement(name test, 'test') as xml
from generate_series(1,60000) as i (i)
;
SELECT 60000
explain analyze select xmlagg(xml) as xml from xmlsample;
                                                     QUERY PLAN
                                        
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1034.10..1034.11 rows=1 width=32) (actual
time=16304.009..16304.010 rows=1 loops=1)
   ->  Seq Scan on xmlsample  (cost=0.00..903.88 rows=52088 width=32)
(actual time=0.021..12.754 rows=60000 loops=1)
 Planning Time: 0.351 ms
 Execution Time: 16306.651 ms
(4 rows)

explain analyze select string_agg(xml::text, '')::xml as xml from
xmlsample;
                                                     QUERY PLAN
                                       
--------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1034.10..1034.12 rows=1 width=32) (actual
time=63.076..63.078 rows=1 loops=1)
   ->  Seq Scan on xmlsample  (cost=0.00..903.88 rows=52088 width=32)
(actual time=0.010..5.591 rows=60000 loops=1)
 Planning Time: 0.098 ms
 Execution Time: 75.146 ms
(4 rows)


pgsql-bugs by date:

Previous
From: Eric Wilfried Ettien
Date:
Subject: Re: BUG #17895: Bug concerning UTF-8
Next
From: PG Bug reporting form
Date:
Subject: BUG #17897: Crash on assignment to array of constraint-less domain over composite type