Thread: Concatenate performance question

Concatenate performance question

From
"Michael Guyver"
Date:
Hi there,

I've got a rather large PL/pgSQL function which returns a varchar
(though it could be text, char or blob, I'm not fussy) containing JSON
information (where JSON is Javascript Object Notation). The middle
tier of the app does pretty much sweet FA except pass this straight
back to the client. I'm interested in seeing how much faster I can get
the app to process a request this way as opposed to retrieving the
data over three or four calls to the DB before constructing the JSON
response in the middle tier.

I've got to the point where I suspect the concatenation could do with
some attention. What is the fastest way of returning this to the
client?

I thought that storing the individual segments of text in an array and
stitiching it all together at the end of the function may be a fast
way of doing things, using an

array_to_string(resultArray,'');

call. However I have my doubts whether the

resultArray := array_append(resultArray,'next token');

is performant as it looks as though it's constructing a new array from
the argument each time its called. Can someone confirm or rebut this?

How would a simple

result := result || 'next token';

perform? The result size is in the 20-25 Kb range.

A mate mentioned that the way Oracle's OWS does it is to stream the
response back as a blob. I presume he means that the function could
iterate over the different queries' result-sets and simply write the
results to the blob before returning. Can anyone shed any light on
this approach and its applicabilty to PostgreSQL?

Cheers

Michael

Looking for pgsql sysadmin

From
Marc Evans
Date:
Hello -

A company that I am working with is looking for a pgsql system
administrator. The host OSes are FreeBSD 6 and RedHat Enterprise Linux,
currently. Postgresql is currently version 8.1. Tasks would include
devising a world-wide replication and backup strategy, system performance
tuning, routine maintenance tasks, etc. The work location is San Jose, CA,
though tele-commuting may be arranged for the right person. If you or
someone that you know would be interested, I would be happy to make
introductions.

- Marc

Re: Looking for pgsql sysadmin

From
Richard Huxton
Date:
Marc Evans wrote:
> Hello -
>
> A company that I am working with is looking for a pgsql system
> administrator.

Couple of points Marc
1. Don't reply to a message to start a new thread - it can end up with
your message being displayed in the wrong place.
2. There's a pgsql-jobs list that you will probably have more luck on.

--
   Richard Huxton
   Archonet Ltd

Re: Concatenate performance question

From
"Michael Guyver"
Date:
Gents,

At risk of answering my own question to spur someone actually to share
their thoughts on this topic, I thought I'd provide a quick look at
the performance of the alternatives: either using the || operator, or
the array_append method.

-- SELECT * FROM test_v_01();
-- SELECT * FROM test_v_02();

CREATE OR REPLACE FUNCTION test_v_01() RETURNS VARCHAR AS $$
DECLARE
    buffer varchar;
    i int4;
BEGIN
    buffer := 'the quick brown fox jumps over the lazy dog';
    FOR i IN 1..1000 LOOP
        buffer := buffer || 'the quick brown fox jumps over the lazy dog';
    END LOOP;
    RETURN buffer;
END;
$$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION test_v_02() RETURNS VARCHAR AS $$
DECLARE
    buffer varchar[] := '{}';
BEGIN
    buffer := array_append(buffer,'the quick brown fox jumps over the lazy dog');
    FOR i IN 1..1000 LOOP
        buffer := array_append(buffer, 'the quick brown fox jumps over the lazy dog');
    END LOOP;
    RETURN array_to_string(buffer,'');
END;
$$
LANGUAGE plpgsql;

Running the array_append version is faster by at least one order of
magnitude in these examples. However, where you can in-line the ||
operator with multiple operands, ie

buffer := buffer || 'token 1' || results.user_id::text || 'token 2' ||
results.event_id::text || 'token3';

it is faster than calling

buffer := array_append(buffer, 'token 1');
buffer := array_append(buffer, results.user_id::text);
buffer := array_append(buffer, 'token 2');
buffer := array_append(buffer, results.event_id::text);
buffer := array_append(buffer, 'token 3');

This seems entirely reasonable, as the latter requires the evaluation
of five calls, whereas the former can do it in one go.

However, my original question still stands - is there another way of
doing this? Is it possible to write to a bytea or blob or stream and
avoid having to do any concatenation at all?

Cheers

Michael


On 29/11/06, Michael Guyver <kenevel@googlemail.com> wrote:
> Hi there,
>
> I've got a rather large PL/pgSQL function which returns a varchar
> (though it could be text, char or blob, I'm not fussy) containing JSON
> information (where JSON is Javascript Object Notation). The middle
> tier of the app does pretty much sweet FA except pass this straight
> back to the client. I'm interested in seeing how much faster I can get
> the app to process a request this way as opposed to retrieving the
> data over three or four calls to the DB before constructing the JSON
> response in the middle tier.
>
> I've got to the point where I suspect the concatenation could do with
> some attention. What is the fastest way of returning this to the
> client?
>
> I thought that storing the individual segments of text in an array and
> stitiching it all together at the end of the function may be a fast
> way of doing things, using an
>
> array_to_string(resultArray,'');
>
> call. However I have my doubts whether the
>
> resultArray := array_append(resultArray,'next token');
>
> is performant as it looks as though it's constructing a new array from
> the argument each time its called. Can someone confirm or rebut this?
>
> How would a simple
>
> result := result || 'next token';
>
> perform? The result size is in the 20-25 Kb range.
>
> A mate mentioned that the way Oracle's OWS does it is to stream the
> response back as a blob. I presume he means that the function could
> iterate over the different queries' result-sets and simply write the
> results to the blob before returning. Can anyone shed any light on
> this approach and its applicabilty to PostgreSQL?
>
> Cheers
>
> Michael
>

Re: Concatenate performance question

From
Tom Lane
Date:
"Michael Guyver" <kenevel@googlemail.com> writes:
> ... Running the array_append version is faster by at least one order of
> magnitude in these examples.

Really?  I see only about a 50% advantage (155 msec vs 105 msec) in both
8.1 and 8.2.  What PG version are you running?  What database encoding
are you using?

To a zeroth-order approximation, the two methods should have about the
same amount of data copying involved, and so the concatenation method
ought to win out based on avoiding array data structure manipulation
overhead (not to mention avoiding the array_to_string step, which makes
an extra pass over all the data).  So I find my own results surprising,
and yours very much more so ...

            regards, tom lane

Re: Concatenate performance question

From
Tom Lane
Date:
"Michael Guyver" <kenevel@googlemail.com> writes:
> CREATE OR REPLACE FUNCTION test_v_01() RETURNS VARCHAR AS $$
> DECLARE
>     buffer varchar;
>     i int4;
> BEGIN
>     buffer := 'the quick brown fox jumps over the lazy dog';
>     FOR i IN 1..1000 LOOP
>         buffer := buffer || 'the quick brown fox jumps over the lazy dog';
>     END LOOP;
>     RETURN buffer;
> END;
> $$
> LANGUAGE plpgsql;

I looked into this a bit with gprof/oprofile, and found that there are
two issues skewing the results of this test:

* the function ought to be declared STABLE or even IMMUTABLE.  The fact
that it is not creates significant snapshot/command-counter overhead.
(This cost is about the same for both variants, though.)

* the buffer variable, and probably the function result too, ought to be
declared TEXT not VARCHAR.  That's because the result of the concat
operator is always TEXT, and plpgsql is not bright about the fact that
the conversion from text to varchar could be optimized away.  There
isn't any comparable cost for the array_append approach because of the
way that the result type of array_append() is determined --- it comes
out varchar[].

After correcting these issues I find that the concat approach is about
20% faster than the array_append approach, which is more in line with
what I expected.

            regards, tom lane

Re: Concatenate performance question

From
"Michael Guyver"
Date:
On 03/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Michael Guyver" <kenevel@googlemail.com> writes:
> > ... Running the array_append version is faster by at least one order of
> > magnitude in these examples.
>
> Really?  I see only about a 50% advantage (155 msec vs 105 msec) in both
> 8.1 and 8.2.  What PG version are you running?  What database encoding
> are you using?

Hi Tom,

Thanks for taking the time to have a look at this. Perhaps I
overstated the case when I said an order of magnitude :¬O  That said,
however, I'm fairly sure that the difference was more than you've
seen.

The hardware may indeed have been a factor - I was running psql 8.1 on
a laptop (AMD 64 3200, 1GB RAM, 5400rpm HDD) so the difference in
performance may have been more pronounced. I will check the figures
when I next get to my laptop at home, but I'm currently doing silly
hours at work for the release we have this week.

From some fairly unscientific bodging I've found that a mixture of the
two is the most performant, as I mentioned in a previous post.
Incidentally using a RETURN NEXT statement where I previously employed
an array_append performed as well if not slightly better than my
previous approach. Any thoughts?

Cheers

Michael