Thread: concat_ws

concat_ws

From
Hans Spaans
Date:
Hello all,

Currently I'm working on porting a mysql-app to postgresql and they're
using concat_ws[1], but I'm currently can't find anything like it in
postgresql. Or did I overlooked something in the documentation?

Kind regards,
Hans

[1] http://www.mysql.com/doc/en/String_functions.html

--
Hans

Re: concat_ws

From
Gaetano Mendola
Date:
I think that you should write your own stored procedure.

Regards
Gaetano Mendola


Re: concat_ws

From
Tom Lane
Date:
Hans Spaans <pgsql-admin@lists.hansspaans.nl> writes:
> Currently I'm working on porting a mysql-app to postgresql and they're
> using concat_ws[1], but I'm currently can't find anything like it in
> postgresql. Or did I overlooked something in the documentation?

There's nothing like that built-in, but you could duplicate the
functionality in five minutes with a user-defined function.

plperl or pltcl would probably be the best suited to whacking
strings around, but for a task as simple as this, plpgsql would
do fine too.

I think you'd need to generate a separate function definition for
each number of arguments you wanted to deal with, which is a bit
of a pain in the neck, but it still beats writing an extension
function in C ...

            regards, tom lane

Re: concat_ws

From
Joe Conway
Date:
Tom Lane wrote:
> I think you'd need to generate a separate function definition for
> each number of arguments you wanted to deal with, which is a bit
> of a pain in the neck, but it still beats writing an extension
> function in C ...

I thought I'd whack out this example similar to the GREATEST/LEAST
functions a month or so ago. It works fine in 7.3, but has a problem on
7.4devel. First the function:

create or replace function make_concat_ws() returns text as '
declare
   v_args int := 32;
   v_first text := ''create or replace function
concat_ws(text,text,text) returns text as ''''select case when $1 is
null then null when $3 is null then $2 else $2 || $1 || $3 end''''
language sql IMMUTABLE'';
   v_part1 text := ''create or replace function concat_ws(text,text'';
   v_part2 text := '') returns text as ''''select
concat_ws($1,concat_ws($1,$2'';
   v_part3 text := '')'''' language sql IMMUTABLE'';
   v_sql text;
begin
   execute v_first;
   for i in 4 .. v_args loop
     v_sql := v_part1;
     for j in 3 .. i loop
       v_sql := v_sql || '',text'';
     end loop;

     v_sql := v_sql || v_part2;

     for j in 3 .. i - 1 loop
       v_sql := v_sql || '',$'' || j::text;
     end loop;
     v_sql := v_sql || ''),$'' || i::text;

     v_sql := v_sql || v_part3;
     execute v_sql;
   end loop;
   return ''OK'';
end;
' language 'plpgsql';

select make_concat_ws();

After creating and executing make_concat_ws(), you'll have 30
concat_ws() functions accepting from 3 to 32 arguments. On 7.3 it works
well:

test=# select

concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31');
                                           concat_ws
----------------------------------------------------------------------------------------------

01~02~03~04~05~06~07~08~09~10~11~12~13~14~15~16~17~18~19~20~21~22~23~24~25~26~27~28~29~30~31
(1 row)

test=# explain analyze select

concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31');
                                     QUERY PLAN
----------------------------------------------------------------------------------
  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.00..0.00
rows=1 loops=1)
  Total runtime: 0.02 msec
(2 rows)

But on 7.4devel it works OK with smaller numbers of arguments, and seems
to take exponentially longer as arguments are added. The odd thing is
that explain analyze does not seem to reflect this. I noticed that on
7.4devel:

regression=# explain analyze select
concat_ws('~','01','02','03','04','05','06','07','08');
                                                      QUERY PLAN
----------------------------------------------------------------------------------
  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.00..0.00
rows=1 loops=1)
  Total runtime: 0.05 msec
(2 rows)

regression=# explain analyze select
concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16');
                                     QUERY PLAN
----------------------------------------------------------------------------------
  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01
rows=1 loops=1)
  Total runtime: 0.07 msec
(2 rows)

But the "clock" time to run the commands is noticeably longer for the 17
argument case (~2 seconds versus instant). At 25 arguments (possibly
sooner, I didn't test cases in between) it fails with:

regression=# select

concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24');
ERROR:  ERRORDATA_STACK_SIZE exceeded

I don't have time at the moment to dig into this, but I'll try to later
today or tomorrow.

Joe


Re: concat_ws

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> I thought I'd whack out this example similar to the GREATEST/LEAST
> functions a month or so ago. It works fine in 7.3, but has a problem on
> 7.4devel. First the function:

It seems to be running out of memory.  I'm not sure what's eating so
much RAM --- the expression isn't that complex.  Also it seems that
elog.c's ability to handle a shortage of memory has regressed ...
'ERROR:  ERRORDATA_STACK_SIZE exceeded' is better than a core dump,
I suppose, but it's not supposed to fail at all ...

Anyway, seems we have a couple different bugs to chase in CVS tip.

            regards, tom lane

Re: concat_ws

From
Joe Conway
Date:
Tom Lane wrote:
> It seems to be running out of memory.  I'm not sure what's eating so
> much RAM --- the expression isn't that complex.  Also it seems that
> elog.c's ability to handle a shortage of memory has regressed ...
> 'ERROR:  ERRORDATA_STACK_SIZE exceeded' is better than a core dump,
> I suppose, but it's not supposed to fail at all ...
>
> Anyway, seems we have a couple different bugs to chase in CVS tip.
>

Another piece of the puzzle -- if the functions are defined:
    ...language sql IMMUTABLE STRICT;
instead of just
    ...language sql IMMUTABLE;
the function returns immediately even with 32 arguments (on 7.4devel):

regression=# explain analyze select

concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31');
                                     QUERY PLAN
----------------------------------------------------------------------------------
  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.00..0.00
rows=1 loops=1)
  Total runtime: 0.02 msec
(2 rows)

Also worth noting, without STRICT, even a plain EXPLAIN *without*
ANALYZE causes the problem to show itself.

Joe


Re: concat_ws

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> It seems to be running out of memory.

> Also worth noting, without STRICT, even a plain EXPLAIN *without*
> ANALYZE causes the problem to show itself.

Yeah.  The problem is that the SQL function inliner generates an
enormous expression tree from this function definition.  7.3 had no
inliner so no problem.

I am not sure what to do about it --- the only idea that comes to mind
is to put an arbitrary limit (of, say, 5 or 10 function calls) on the
depth of inlining expansion.  That seems like a pretty ugly answer
... anyone have a better one?

Your definition of concat_ws bears some passing resemblance to the
infamous Ackermann's function, btw.

            regards, tom lane

Re: concat_ws

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> regression=# select
>
concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24');
> ERROR:  ERRORDATA_STACK_SIZE exceeded

With the removal of some overoptimistic code in elog.c, I get

test=#  select
concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24');
ERROR:  out of memory
DETAIL:  Failed on request of size 12.
CONTEXT:  SQL function "concat_ws" during inlining
SQL function "concat_ws" during inlining
SQL function "concat_ws" during inlining
SQL function "concat_ws" during inlining
SQL function "concat_ws" during inlining
SQL function "concat_ws" during inlining
SQL function "concat_ws" during inlining
SQL function "concat_ws" during inlining
SQL function "concat_ws" during startup
test=#

which at least gives some clue what's going wrong.

There is some code in the function inliner to abandon inlining if an
input expression is "too expensive", but its notion of "too expensive"
is currently just "contains subselects".  I'm going to see if it helps
to reject inlining when the input grows "too large", for some value of
"too large".

            regards, tom lane

Re: [HACKERS] concat_ws

From
Joe Conway
Date:
Tom Lane wrote:
> Yeah.  The problem is that the SQL function inliner generates an
> enormous expression tree from this function definition.  7.3 had no
> inliner so no problem.

But I wonder why it isn't at all a problem when the function is also
defined STRICT?

I also looked back at the greatest() example -- similar behavior. If
defined
   ...language sql;
or
   ...language sql IMMUTABLE STRICT;
it works great.

But when defined
   ...language sql IMMUTABLE;
it dies a horrible recursive death.

In case 1 above, the function doesn't get inlined at all, right? But in
both case 2 and 3, it should get inlined -- why does 2 work fine when 3
doesn't?

> I am not sure what to do about it --- the only idea that comes to mind
> is to put an arbitrary limit (of, say, 5 or 10 function calls) on the
> depth of inlining expansion.  That seems like a pretty ugly answer
> ... anyone have a better one?

But as above, case 2 is inlined (I think) and works fine -- why restrict it.

> Your definition of concat_ws bears some passing resemblance to the
> infamous Ackermann's function, btw.

I always knew I was destined to be infamous ;-)

Joe


Re: concat_ws

From
"Mendola Gaetano"
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote:
> Joe Conway <mail@joeconway.com> writes:
> > Tom Lane wrote:
> >> It seems to be running out of memory.
>
> > Also worth noting, without STRICT, even a plain EXPLAIN *without*
> > ANALYZE causes the problem to show itself.
>
> Yeah.  The problem is that the SQL function inliner generates an
> enormous expression tree from this function definition.  7.3 had no
> inliner so no problem.
>
> I am not sure what to do about it --- the only idea that comes to mind
> is to put an arbitrary limit (of, say, 5 or 10 function calls) on the
> depth of inlining expansion.  That seems like a pretty ugly answer
> ... anyone have a better one?

Please no one tell to add another ugly parameter on postgresql.conf

My cent.

Regards
Gaetano Mendola





Re: [HACKERS] concat_ws

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> But I wonder why it isn't at all a problem when the function is also
> defined STRICT?

Because the inliner doesn't think it can safely inline in that case; the
substituted expression isn't strict and so inlining would potentially
change the semantics.

            regards, tom lane

Re: concat_ws

From
Tom Lane
Date:
I said:
> There is some code in the function inliner to abandon inlining if an
> input expression is "too expensive", but its notion of "too expensive"
> is currently just "contains subselects".  I'm going to see if it helps
> to reject inlining when the input grows "too large", for some value of
> "too large".

Okay, after further study, I see that the real problem here is that when
expanding nested concat_ws calls, the expansions of the lower-level
calls would get substituted in more than one place in the outer
functions.  Specifically, since $2 occurs twice in the bottom-level
function, you'd end up with an expanded tree of O(2^n) nodes for n
levels of function call.

I added code to inline_function to stop inlining if a parameter
expression to be substituted multiple times has cost greater than
10*cpu_operator_cost (which roughly means that it contains more than
10 operators or functions).  This seems to cut off the problem nicely,
at least for this example.  The factor of 10 is a bit of a magic number
but it seems reasonable.

            regards, tom lane

Re: concat_ws

From
Joe Conway
Date:
Tom Lane wrote:
> I added code to inline_function to stop inlining if a parameter
> expression to be substituted multiple times has cost greater than
> 10*cpu_operator_cost (which roughly means that it contains more than
> 10 operators or functions).  This seems to cut off the problem nicely,
> at least for this example.  The factor of 10 is a bit of a magic number
> but it seems reasonable.
>

That did it (on fresh copy of cvs):

regression=# explain analyze select

concat_ws('~','01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31');
                                     QUERY PLAN
----------------------------------------------------------------------------------
  Result  (cost=0.00..0.11 rows=1 width=0) (actual time=2.37..2.37
rows=1 loops=1)
  Total runtime: 2.66 msec
(2 rows)

Thanks Tom!

Joe


Re: concat_ws

From
Dennis Björklund
Date:
On Sun, 3 Aug 2003, Tom Lane wrote:

> I added code to inline_function to stop inlining if a parameter
> expression to be substituted multiple times has cost greater than
> 10*cpu_operator_cost (which roughly means that it contains more than
> 10 operators or functions).

When is this inlining taking place and what is the logic? I just want to
make sure that there is no code in pg that will unfold forever, say for
example for a recursive fac() function. From the above it sounds like that
might be a problem.

--
/Dennis


Re: concat_ws

From
Tom Lane
Date:
=?ISO-8859-1?Q?Dennis_Bj=F6rklund?= <db@zigo.dhs.org> writes:
> On Sun, 3 Aug 2003, Tom Lane wrote:
>> I added code to inline_function to stop inlining if a parameter
>> expression to be substituted multiple times has cost greater than
>> 10*cpu_operator_cost (which roughly means that it contains more than
>> 10 operators or functions).

> When is this inlining taking place and what is the logic? I just want to
> make sure that there is no code in pg that will unfold forever, say for
> example for a recursive fac() function. From the above it sounds like that
> might be a problem.

That's already dealt with.  See inline_function() in
src/backend/optimizer/util/clauses.c.

            regards, tom lane