The following bug has been logged on the website:
Bug reference: 17896
Logged by: Paul Wehr
Email address: sf_postgresql@informationsoftworks.com
PostgreSQL version: 14.6
Operating system: Debian GNU/Linux 11 (bullseye)
Description:
Not a bug in terms of functionality, but an interesting difference in
performance. If nothing else hopefully documenting this work-around might
help anyone else that may be aggregating xmlelements on this scale.
Using xmlagg(xml) on 60,000 rows of xml takes 16.3s on my hardware. If I
replace "xmlagg(xml)" with "string_agg(xml::text)::xml", I can get the same
result in 0.051s. (see script below). I have confirmed this behavior on
other hardware, and in versions 11.19, 13.7 and 14.6.
Also, while string_agg() seems to scale linearly (O(n)), xmlagg() seems to
scale somewhat steeper (O(n^2)?). Running the same script on various second
parameter values in the generate_series() function produced these results:
| | string_agg()
Rows | xmlagg(), ms | work-around, ms
-------+----------------+------------------
10000 | 314.44 | 14.207
20000 | 1111.104 | 23.754
30000 | 2558.706 | 37.565
40000 | 5306.631 | 51.021
50000 | 10178.496 | 61.928
60000 | 16306.651 | 75.146
70000 | 23853.331 | 93.529
80000 | 32243.908 | 105.318
-------- psql test script output ----------------
select version();
version
-----------------------------------------------------------------------------------------------------------------------------
PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)
create temporary table xmlsample as
select xmlelement(name test, 'test') as xml
from generate_series(1,60000) as i (i)
;
SELECT 60000
explain analyze select xmlagg(xml) as xml from xmlsample;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1034.10..1034.11 rows=1 width=32) (actual
time=16304.009..16304.010 rows=1 loops=1)
-> Seq Scan on xmlsample (cost=0.00..903.88 rows=52088 width=32)
(actual time=0.021..12.754 rows=60000 loops=1)
Planning Time: 0.351 ms
Execution Time: 16306.651 ms
(4 rows)
explain analyze select string_agg(xml::text, '')::xml as xml from
xmlsample;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1034.10..1034.12 rows=1 width=32) (actual
time=63.076..63.078 rows=1 loops=1)
-> Seq Scan on xmlsample (cost=0.00..903.88 rows=52088 width=32)
(actual time=0.010..5.591 rows=60000 loops=1)
Planning Time: 0.098 ms
Execution Time: 75.146 ms
(4 rows)