Re: xmlconcat performance - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: xmlconcat performance
Date
Msg-id CAHyXU0zo9UGStdVWazyX1ASFixFpuc25n4-k28qBsdW8YpvmPQ@mail.gmail.com
Whole thread Raw
In response to xmlconcat performance  (Davide Berra <d.berra@esitelsrl.it>)
List pgsql-performance
On Fri, Mar 1, 2013 at 2:18 AM, Davide Berra <d.berra@esitelsrl.it> wrote:
> Il 28/02/2013 18:48, Merlin Moncure ha scritto:
>
>> On Fri, Feb 22, 2013 at 3:21 AM, Davide Berra <d.berra@esitelsrl.it>
>> wrote:
>>>
>>> I got a problem with the performance of a PL/PGsql stored procedure
>>> outputting an xml.
>>>
>>> Server version: PostgreSQL 8.3.6 on i686-pc-linux-gnu, compiled by GCC
>>> gcc
>>> (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
>>> CPU: Intel(R) Core(TM) i3 CPU 540  @ 3.07GHz
>>> RAM installed: 4GB
>>> Hard Disk: Seagate 500Gb SATA 2
>>>
>>> This is a simplified content of the function showing the xmlconcat
>>> behaviour.
>>>
>>> CREATE OR REPLACE FUNCTION test_function (v_limit int)
>>>      RETURNS xml AS
>>> $BODY$
>>> DECLARE
>>>      v_xml xml;
>>> BEGIN
>>>
>>>      FOR i IN 1..v_limit LOOP
>>>          v_xml := xmlconcat(v_xml, xmlelement(name content, 'aaaaaaa'));
>>>      END LOOP;
>>>
>>>      RETURN v_xml ;
>>> END
>>> $BODY$
>>>      LANGUAGE 'plpgsql' SECURITY DEFINER ;
>>>
>>>
>>> As long as the v_limit parameter grows (and then the size of the output
>>> xml,
>>> the time needed increase exponentially.
>>> Look at this examples:
>>>
>>> pang=# explain analyze select test_function(1000);
>>>                                        QUERY PLAN
>>>
>>> --------------------------------------------------------------------------------------
>>>   Result  (cost=0.00..0.26 rows=1 width=0) (actual time=65.430..65.431
>>> rows=1
>>> loops=1)
>>>   Total runtime: 65.457 ms
>>> (2 rows)
>>>
>>> pang=# explain analyze select test_function(5000);
>>>                                         QUERY PLAN
>>>
>>> ----------------------------------------------------------------------------------------
>>>   Result  (cost=0.00..0.26 rows=1 width=0) (actual time=473.318..473.318
>>> rows=1 loops=1)
>>>   Total runtime: 473.340 ms
>>> (2 rows)
>>>
>>> pang=# explain analyze select test_function(15000);
>>>                                          QUERY PLAN
>>>
>>> ------------------------------------------------------------------------------------------
>>>   Result  (cost=0.00..0.26 rows=1 width=0) (actual
>>> time=4044.903..4044.904
>>> rows=1 loops=1)
>>>   Total runtime: 4044.928 ms
>>> (2 rows)
>>>
>>> pang=# explain analyze select test_function(50000);
>>>                                           QUERY PLAN
>>>
>>> --------------------------------------------------------------------------------------------
>>>   Result  (cost=0.00..0.26 rows=1 width=0) (actual
>>> time=94994.337..94994.369
>>> rows=1 loops=1)
>>>   Total runtime: 94994.396 ms
>>> (2 rows)
>>>
>>> I already tried to update to 8.3.23 service version but i didn't see any
>>> improvement.
>>>
>>> Do you have any suggestion about how to increase the performance of
>>> xmlconcat?
>>>
>>> My need is to use stored procedures that calls xmlconcat more than 50000
>>> times, but it is unacceptable 94 seconds to complete the job.
>>>
>>> Thanks in advance
>>
>> typically for high performance string manipulation you have to do
>> things on more purely textual level and manipulate through arrays to
>> get really good performance.  iterative string concatenation is
>> typically wrong approach -- you have to think in set terms.
>>
>> also your database version is obsolete -- time to start thinking about
>> upgrade.
>>
>> merlin
>>
> Thank you for the reply Merlin but i don't fully get what you mean. (sorry,
> i'm not a PostgreSQL expert)
> How would you change the above example function in order to improve
> performance?
> What do you mean with "manipulate through arrays"?

well arrays, or simple aggregation.  for example:
select string_agg(v, '') from (select 'aaaaaaa'::text as v from
generate_series(1,50000)) q;

runs in ~ 30 ms.

merlin


pgsql-performance by date:

Previous
From: Niels Kristian Schjødt
Date:
Subject: New server setup
Next
From: Craig James
Date:
Subject: Re: New server setup