Thread: Concatenate performance question
Hi there, I've got a rather large PL/pgSQL function which returns a varchar (though it could be text, char or blob, I'm not fussy) containing JSON information (where JSON is Javascript Object Notation). The middle tier of the app does pretty much sweet FA except pass this straight back to the client. I'm interested in seeing how much faster I can get the app to process a request this way as opposed to retrieving the data over three or four calls to the DB before constructing the JSON response in the middle tier. I've got to the point where I suspect the concatenation could do with some attention. What is the fastest way of returning this to the client? I thought that storing the individual segments of text in an array and stitiching it all together at the end of the function may be a fast way of doing things, using an array_to_string(resultArray,''); call. However I have my doubts whether the resultArray := array_append(resultArray,'next token'); is performant as it looks as though it's constructing a new array from the argument each time its called. Can someone confirm or rebut this? How would a simple result := result || 'next token'; perform? The result size is in the 20-25 Kb range. A mate mentioned that the way Oracle's OWS does it is to stream the response back as a blob. I presume he means that the function could iterate over the different queries' result-sets and simply write the results to the blob before returning. Can anyone shed any light on this approach and its applicabilty to PostgreSQL? Cheers Michael
Hello - A company that I am working with is looking for a pgsql system administrator. The host OSes are FreeBSD 6 and RedHat Enterprise Linux, currently. Postgresql is currently version 8.1. Tasks would include devising a world-wide replication and backup strategy, system performance tuning, routine maintenance tasks, etc. The work location is San Jose, CA, though tele-commuting may be arranged for the right person. If you or someone that you know would be interested, I would be happy to make introductions. - Marc
Marc Evans wrote: > Hello - > > A company that I am working with is looking for a pgsql system > administrator. Couple of points Marc 1. Don't reply to a message to start a new thread - it can end up with your message being displayed in the wrong place. 2. There's a pgsql-jobs list that you will probably have more luck on. -- Richard Huxton Archonet Ltd
Gents, At risk of answering my own question to spur someone actually to share their thoughts on this topic, I thought I'd provide a quick look at the performance of the alternatives: either using the || operator, or the array_append method. -- SELECT * FROM test_v_01(); -- SELECT * FROM test_v_02(); CREATE OR REPLACE FUNCTION test_v_01() RETURNS VARCHAR AS $$ DECLARE buffer varchar; i int4; BEGIN buffer := 'the quick brown fox jumps over the lazy dog'; FOR i IN 1..1000 LOOP buffer := buffer || 'the quick brown fox jumps over the lazy dog'; END LOOP; RETURN buffer; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION test_v_02() RETURNS VARCHAR AS $$ DECLARE buffer varchar[] := '{}'; BEGIN buffer := array_append(buffer,'the quick brown fox jumps over the lazy dog'); FOR i IN 1..1000 LOOP buffer := array_append(buffer, 'the quick brown fox jumps over the lazy dog'); END LOOP; RETURN array_to_string(buffer,''); END; $$ LANGUAGE plpgsql; Running the array_append version is faster by at least one order of magnitude in these examples. However, where you can in-line the || operator with multiple operands, ie buffer := buffer || 'token 1' || results.user_id::text || 'token 2' || results.event_id::text || 'token3'; it is faster than calling buffer := array_append(buffer, 'token 1'); buffer := array_append(buffer, results.user_id::text); buffer := array_append(buffer, 'token 2'); buffer := array_append(buffer, results.event_id::text); buffer := array_append(buffer, 'token 3'); This seems entirely reasonable, as the latter requires the evaluation of five calls, whereas the former can do it in one go. However, my original question still stands - is there another way of doing this? Is it possible to write to a bytea or blob or stream and avoid having to do any concatenation at all? Cheers Michael On 29/11/06, Michael Guyver <kenevel@googlemail.com> wrote: > Hi there, > > I've got a rather large PL/pgSQL function which returns a varchar > (though it could be text, char or blob, I'm not fussy) containing JSON > information (where JSON is Javascript Object Notation). The middle > tier of the app does pretty much sweet FA except pass this straight > back to the client. I'm interested in seeing how much faster I can get > the app to process a request this way as opposed to retrieving the > data over three or four calls to the DB before constructing the JSON > response in the middle tier. > > I've got to the point where I suspect the concatenation could do with > some attention. What is the fastest way of returning this to the > client? > > I thought that storing the individual segments of text in an array and > stitiching it all together at the end of the function may be a fast > way of doing things, using an > > array_to_string(resultArray,''); > > call. However I have my doubts whether the > > resultArray := array_append(resultArray,'next token'); > > is performant as it looks as though it's constructing a new array from > the argument each time its called. Can someone confirm or rebut this? > > How would a simple > > result := result || 'next token'; > > perform? The result size is in the 20-25 Kb range. > > A mate mentioned that the way Oracle's OWS does it is to stream the > response back as a blob. I presume he means that the function could > iterate over the different queries' result-sets and simply write the > results to the blob before returning. Can anyone shed any light on > this approach and its applicabilty to PostgreSQL? > > Cheers > > Michael >
"Michael Guyver" <kenevel@googlemail.com> writes: > ... Running the array_append version is faster by at least one order of > magnitude in these examples. Really? I see only about a 50% advantage (155 msec vs 105 msec) in both 8.1 and 8.2. What PG version are you running? What database encoding are you using? To a zeroth-order approximation, the two methods should have about the same amount of data copying involved, and so the concatenation method ought to win out based on avoiding array data structure manipulation overhead (not to mention avoiding the array_to_string step, which makes an extra pass over all the data). So I find my own results surprising, and yours very much more so ... regards, tom lane
"Michael Guyver" <kenevel@googlemail.com> writes: > CREATE OR REPLACE FUNCTION test_v_01() RETURNS VARCHAR AS $$ > DECLARE > buffer varchar; > i int4; > BEGIN > buffer := 'the quick brown fox jumps over the lazy dog'; > FOR i IN 1..1000 LOOP > buffer := buffer || 'the quick brown fox jumps over the lazy dog'; > END LOOP; > RETURN buffer; > END; > $$ > LANGUAGE plpgsql; I looked into this a bit with gprof/oprofile, and found that there are two issues skewing the results of this test: * the function ought to be declared STABLE or even IMMUTABLE. The fact that it is not creates significant snapshot/command-counter overhead. (This cost is about the same for both variants, though.) * the buffer variable, and probably the function result too, ought to be declared TEXT not VARCHAR. That's because the result of the concat operator is always TEXT, and plpgsql is not bright about the fact that the conversion from text to varchar could be optimized away. There isn't any comparable cost for the array_append approach because of the way that the result type of array_append() is determined --- it comes out varchar[]. After correcting these issues I find that the concat approach is about 20% faster than the array_append approach, which is more in line with what I expected. regards, tom lane
On 03/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Michael Guyver" <kenevel@googlemail.com> writes: > > ... Running the array_append version is faster by at least one order of > > magnitude in these examples. > > Really? I see only about a 50% advantage (155 msec vs 105 msec) in both > 8.1 and 8.2. What PG version are you running? What database encoding > are you using? Hi Tom, Thanks for taking the time to have a look at this. Perhaps I overstated the case when I said an order of magnitude :¬O That said, however, I'm fairly sure that the difference was more than you've seen. The hardware may indeed have been a factor - I was running psql 8.1 on a laptop (AMD 64 3200, 1GB RAM, 5400rpm HDD) so the difference in performance may have been more pronounced. I will check the figures when I next get to my laptop at home, but I'm currently doing silly hours at work for the release we have this week. From some fairly unscientific bodging I've found that a mixture of the two is the most performant, as I mentioned in a previous post. Incidentally using a RETURN NEXT statement where I previously employed an array_append performed as well if not slightly better than my previous approach. Any thoughts? Cheers Michael