Thread: concat_ws
Hello all, Currently I'm working on porting a mysql-app to postgresql and they're using concat_ws[1], but I'm currently can't find anything like it in postgresql. Or did I overlooked something in the documentation? Kind regards, Hans [1] http://www.mysql.com/doc/en/String_functions.html -- Hans
I think that you should write your own stored procedure. Regards Gaetano Mendola
Hans Spaans <pgsql-admin@lists.hansspaans.nl> writes: > Currently I'm working on porting a mysql-app to postgresql and they're > using concat_ws[1], but I'm currently can't find anything like it in > postgresql. Or did I overlooked something in the documentation? There's nothing like that built-in, but you could duplicate the functionality in five minutes with a user-defined function. plperl or pltcl would probably be the best suited to whacking strings around, but for a task as simple as this, plpgsql would do fine too. I think you'd need to generate a separate function definition for each number of arguments you wanted to deal with, which is a bit of a pain in the neck, but it still beats writing an extension function in C ... regards, tom lane
Tom Lane wrote: > I think you'd need to generate a separate function definition for > each number of arguments you wanted to deal with, which is a bit > of a pain in the neck, but it still beats writing an extension > function in C ... I thought I'd whack out this example similar to the GREATEST/LEAST functions a month or so ago. It works fine in 7.3, but has a problem on 7.4devel. First the function: create or replace function make_concat_ws() returns text as ' declare v_args int := 32; v_first text := ''create or replace function concat_ws(text,text,text) returns text as ''''select case when $1 is null then null when $3 is null then $2 else $2 || $1 || $3 end'''' language sql IMMUTABLE''; v_part1 text := ''create or replace function concat_ws(text,text''; v_part2 text := '') returns text as ''''select concat_ws($1,concat_ws($1,$2''; v_part3 text := '')'''' language sql IMMUTABLE''; v_sql text; begin execute v_first; for i in 4 .. v_args loop v_sql := v_part1; for j in 3 .. i loop v_sql := v_sql || '',text''; end loop; v_sql := v_sql || v_part2; for j in 3 .. i - 1 loop v_sql := v_sql || '',$'' || j::text; end loop; v_sql := v_sql || ''),$'' || i::text; v_sql := v_sql || v_part3; execute v_sql; end loop; return ''OK''; end; ' language 'plpgsql'; select make_concat_ws(); After creating and executing make_concat_ws(), you'll have 30 concat_ws() functions accepting from 3 to 32 arguments. On 7.3 it works well: test=# select concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31'); concat_ws ---------------------------------------------------------------------------------------------- 01~02~03~04~05~06~07~08~09~10~11~12~13~14~15~16~17~18~19~20~21~22~23~24~25~26~27~28~29~30~31 (1 row) test=# explain analyze select concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31'); QUERY PLAN ---------------------------------------------------------------------------------- Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=1) Total runtime: 0.02 msec (2 rows) But on 7.4devel it works OK with smaller numbers of arguments, and seems to take exponentially longer as arguments are added. The odd thing is that explain analyze does not seem to reflect this. I noticed that on 7.4devel: regression=# explain analyze select concat_ws('~','01','02','03','04','05','06','07','08'); QUERY PLAN ---------------------------------------------------------------------------------- Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=1) Total runtime: 0.05 msec (2 rows) regression=# explain analyze select concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16'); QUERY PLAN ---------------------------------------------------------------------------------- Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 loops=1) Total runtime: 0.07 msec (2 rows) But the "clock" time to run the commands is noticeably longer for the 17 argument case (~2 seconds versus instant). At 25 arguments (possibly sooner, I didn't test cases in between) it fails with: regression=# select concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24'); ERROR: ERRORDATA_STACK_SIZE exceeded I don't have time at the moment to dig into this, but I'll try to later today or tomorrow. Joe
Joe Conway <mail@joeconway.com> writes: > I thought I'd whack out this example similar to the GREATEST/LEAST > functions a month or so ago. It works fine in 7.3, but has a problem on > 7.4devel. First the function: It seems to be running out of memory. I'm not sure what's eating so much RAM --- the expression isn't that complex. Also it seems that elog.c's ability to handle a shortage of memory has regressed ... 'ERROR: ERRORDATA_STACK_SIZE exceeded' is better than a core dump, I suppose, but it's not supposed to fail at all ... Anyway, seems we have a couple different bugs to chase in CVS tip. regards, tom lane
Tom Lane wrote: > It seems to be running out of memory. I'm not sure what's eating so > much RAM --- the expression isn't that complex. Also it seems that > elog.c's ability to handle a shortage of memory has regressed ... > 'ERROR: ERRORDATA_STACK_SIZE exceeded' is better than a core dump, > I suppose, but it's not supposed to fail at all ... > > Anyway, seems we have a couple different bugs to chase in CVS tip. > Another piece of the puzzle -- if the functions are defined: ...language sql IMMUTABLE STRICT; instead of just ...language sql IMMUTABLE; the function returns immediately even with 32 arguments (on 7.4devel): regression=# explain analyze select concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31'); QUERY PLAN ---------------------------------------------------------------------------------- Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=1) Total runtime: 0.02 msec (2 rows) Also worth noting, without STRICT, even a plain EXPLAIN *without* ANALYZE causes the problem to show itself. Joe
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> It seems to be running out of memory. > Also worth noting, without STRICT, even a plain EXPLAIN *without* > ANALYZE causes the problem to show itself. Yeah. The problem is that the SQL function inliner generates an enormous expression tree from this function definition. 7.3 had no inliner so no problem. I am not sure what to do about it --- the only idea that comes to mind is to put an arbitrary limit (of, say, 5 or 10 function calls) on the depth of inlining expansion. That seems like a pretty ugly answer ... anyone have a better one? Your definition of concat_ws bears some passing resemblance to the infamous Ackermann's function, btw. regards, tom lane
Joe Conway <mail@joeconway.com> writes: > regression=# select > concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24'); > ERROR: ERRORDATA_STACK_SIZE exceeded With the removal of some overoptimistic code in elog.c, I get test=# select concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24'); ERROR: out of memory DETAIL: Failed on request of size 12. CONTEXT: SQL function "concat_ws" during inlining SQL function "concat_ws" during inlining SQL function "concat_ws" during inlining SQL function "concat_ws" during inlining SQL function "concat_ws" during inlining SQL function "concat_ws" during inlining SQL function "concat_ws" during inlining SQL function "concat_ws" during inlining SQL function "concat_ws" during startup test=# which at least gives some clue what's going wrong. There is some code in the function inliner to abandon inlining if an input expression is "too expensive", but its notion of "too expensive" is currently just "contains subselects". I'm going to see if it helps to reject inlining when the input grows "too large", for some value of "too large". regards, tom lane
Tom Lane wrote: > Yeah. The problem is that the SQL function inliner generates an > enormous expression tree from this function definition. 7.3 had no > inliner so no problem. But I wonder why it isn't at all a problem when the function is also defined STRICT? I also looked back at the greatest() example -- similar behavior. If defined ...language sql; or ...language sql IMMUTABLE STRICT; it works great. But when defined ...language sql IMMUTABLE; it dies a horrible recursive death. In case 1 above, the function doesn't get inlined at all, right? But in both case 2 and 3, it should get inlined -- why does 2 work fine when 3 doesn't? > I am not sure what to do about it --- the only idea that comes to mind > is to put an arbitrary limit (of, say, 5 or 10 function calls) on the > depth of inlining expansion. That seems like a pretty ugly answer > ... anyone have a better one? But as above, case 2 is inlined (I think) and works fine -- why restrict it. > Your definition of concat_ws bears some passing resemblance to the > infamous Ackermann's function, btw. I always knew I was destined to be infamous ;-) Joe
"Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Joe Conway <mail@joeconway.com> writes: > > Tom Lane wrote: > >> It seems to be running out of memory. > > > Also worth noting, without STRICT, even a plain EXPLAIN *without* > > ANALYZE causes the problem to show itself. > > Yeah. The problem is that the SQL function inliner generates an > enormous expression tree from this function definition. 7.3 had no > inliner so no problem. > > I am not sure what to do about it --- the only idea that comes to mind > is to put an arbitrary limit (of, say, 5 or 10 function calls) on the > depth of inlining expansion. That seems like a pretty ugly answer > ... anyone have a better one? Please no one tell to add another ugly parameter on postgresql.conf My cent. Regards Gaetano Mendola
Joe Conway <mail@joeconway.com> writes: > But I wonder why it isn't at all a problem when the function is also > defined STRICT? Because the inliner doesn't think it can safely inline in that case; the substituted expression isn't strict and so inlining would potentially change the semantics. regards, tom lane
I said: > There is some code in the function inliner to abandon inlining if an > input expression is "too expensive", but its notion of "too expensive" > is currently just "contains subselects". I'm going to see if it helps > to reject inlining when the input grows "too large", for some value of > "too large". Okay, after further study, I see that the real problem here is that when expanding nested concat_ws calls, the expansions of the lower-level calls would get substituted in more than one place in the outer functions. Specifically, since $2 occurs twice in the bottom-level function, you'd end up with an expanded tree of O(2^n) nodes for n levels of function call. I added code to inline_function to stop inlining if a parameter expression to be substituted multiple times has cost greater than 10*cpu_operator_cost (which roughly means that it contains more than 10 operators or functions). This seems to cut off the problem nicely, at least for this example. The factor of 10 is a bit of a magic number but it seems reasonable. regards, tom lane
Tom Lane wrote: > I added code to inline_function to stop inlining if a parameter > expression to be substituted multiple times has cost greater than > 10*cpu_operator_cost (which roughly means that it contains more than > 10 operators or functions). This seems to cut off the problem nicely, > at least for this example. The factor of 10 is a bit of a magic number > but it seems reasonable. > That did it (on fresh copy of cvs): regression=# explain analyze select concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31'); QUERY PLAN ---------------------------------------------------------------------------------- Result (cost=0.00..0.11 rows=1 width=0) (actual time=2.37..2.37 rows=1 loops=1) Total runtime: 2.66 msec (2 rows) Thanks Tom! Joe
On Sun, 3 Aug 2003, Tom Lane wrote: > I added code to inline_function to stop inlining if a parameter > expression to be substituted multiple times has cost greater than > 10*cpu_operator_cost (which roughly means that it contains more than > 10 operators or functions). When is this inlining taking place and what is the logic? I just want to make sure that there is no code in pg that will unfold forever, say for example for a recursive fac() function. From the above it sounds like that might be a problem. -- /Dennis
=?ISO-8859-1?Q?Dennis_Bj=F6rklund?= <db@zigo.dhs.org> writes: > On Sun, 3 Aug 2003, Tom Lane wrote: >> I added code to inline_function to stop inlining if a parameter >> expression to be substituted multiple times has cost greater than >> 10*cpu_operator_cost (which roughly means that it contains more than >> 10 operators or functions). > When is this inlining taking place and what is the logic? I just want to > make sure that there is no code in pg that will unfold forever, say for > example for a recursive fac() function. From the above it sounds like that > might be a problem. That's already dealt with. See inline_function() in src/backend/optimizer/util/clauses.c. regards, tom lane