Thread: BUG #8441: Recursive function in plpgsql does not seem to handle null values correctly

BUG #8441: Recursive function in plpgsql does not seem to handle null values correctly

From
tvees@davincigroep.nl
Date:
The following bug has been logged on the website:

Bug reference:      8441
Logged by:          Tom van Ees
Email address:      tvees@davincigroep.nl
PostgreSQL version: 9.0.4
Operating system:   Windows Server 2008 R2
Description:

The Levenshtein function can only handle strings with length 255 or less.
I needed a Levenshtein function that could handle longer strings.
Therefore I wrote the following udf:


CREATE OR REPLACE FUNCTION longlevenshtein (string1 character varying
(1000000), string2 character varying (1000000)) RETURNS integer AS $$
    BEGIN
    IF  (length(coalesce($1, '')) = 0 AND length(coalesce($2, '')) = 0) THEN
        RETURN 0;
    ELSEIF ($1 IS NULL and $2 IS NOT NULL and length($2) > 0) THEN
        RETURN length($2);
    ELSEIF ($2 IS NULL and $1 IS NOT NULL and length($1)> 0) THEN
        RETURN length($1);
    ELSEIF length($1) = 0  AND length(coalesce($2, '')) > 0 THEN
            RETURN length(coalesce($2, ''));
        ELSEIF length($1) > 0 AND (length($2) = 0 or $2 is null) THEN
            RETURN length(coalesce($1, ''));
        ELSE
        RETURN (Levenshtein(SUBSTRING($1 FROM 1 FOR 254), SUBSTRING($2 FROM 1
for 254)) + longlevenshtein(coalesce(SUBSTRING($1 FROM 255), ''),
coalesce(SUBSTRING($2 FROM 255), '')));
    END IF;
    END;
$$ LANGUAGE plpgsql;


When I invoke this function with
SELECT longlevenshtein(null, 'foobar')
I get a ERROR:  stack depth limit exceeded
while I expected the return value 6
Hello

it works on 9.1.9

postgres=# SELECT longlevenshtein(null, 'foobar');
 longlevenshtein
-----------------
               6


Regards

Pavel

P.S. unlimitted varchar is "text" type in Postgres


2013/9/9 <tvees@davincigroep.nl>

> The following bug has been logged on the website:
>
> Bug reference:      8441
> Logged by:          Tom van Ees
> Email address:      tvees@davincigroep.nl
> PostgreSQL version: 9.0.4
> Operating system:   Windows Server 2008 R2
> Description:
>
> The Levenshtein function can only handle strings with length 255 or less.
> I needed a Levenshtein function that could handle longer strings.
> Therefore I wrote the following udf:
>
>
> CREATE OR REPLACE FUNCTION longlevenshtein (string1 character varying
> (1000000), string2 character varying (1000000)) RETURNS integer AS $$
>     BEGIN
>         IF  (length(coalesce($1, '')) = 0 AND length(coalesce($2, '')) =
> 0) THEN
>             RETURN 0;
>         ELSEIF ($1 IS NULL and $2 IS NOT NULL and length($2) > 0) THEN
>             RETURN length($2);
>         ELSEIF ($2 IS NULL and $1 IS NOT NULL and length($1)> 0) THEN
>             RETURN length($1);
>         ELSEIF length($1) = 0  AND length(coalesce($2, '')) > 0 THEN
>             RETURN length(coalesce($2, ''));
>         ELSEIF length($1) > 0 AND (length($2) = 0 or $2 is null) THEN
>             RETURN length(coalesce($1, ''));
>         ELSE
>             RETURN (Levenshtein(SUBSTRING($1 FROM 1 FOR 254), SUBSTRING($2
> FROM 1
> for 254)) + longlevenshtein(coalesce(SUBSTRING($1 FROM 255), ''),
> coalesce(SUBSTRING($2 FROM 255), '')));
>         END IF;
>     END;
> $$ LANGUAGE plpgsql;
>
>
> When I invoke this function with
> SELECT longlevenshtein(null, 'foobar')
> I get a ERROR:  stack depth limit exceeded
> while I expected the return value 6
>
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>
Hi,

On 2013-09-09 17:26:48 +0200, Pavel Stehule wrote:
> it works on 9.1.9

That's probably a question of the configured/detected max_stack_depth.

I have quite some doubts such a naive implementation implemented in
plgsql will work for strings > 255 chars in sensible manner though.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
tvees@davincigroep.nl writes:
> CREATE OR REPLACE FUNCTION longlevenshtein (string1 character varying
> (1000000), string2 character varying (1000000)) RETURNS integer AS $$
>     BEGIN
>     IF  (length(coalesce($1, '')) = 0 AND length(coalesce($2, '')) = 0) THEN
>         RETURN 0;
>     ELSEIF ($1 IS NULL and $2 IS NOT NULL and length($2) > 0) THEN
>         RETURN length($2);
>     ELSEIF ($2 IS NULL and $1 IS NOT NULL and length($1)> 0) THEN
>         RETURN length($1);
>     ELSEIF length($1) = 0  AND length(coalesce($2, '')) > 0 THEN
>             RETURN length(coalesce($2, ''));
>         ELSEIF length($1) > 0 AND (length($2) = 0 or $2 is null) THEN
>             RETURN length(coalesce($1, ''));
>         ELSE
>         RETURN (Levenshtein(SUBSTRING($1 FROM 1 FOR 254), SUBSTRING($2 FROM 1
> for 254)) + longlevenshtein(coalesce(SUBSTRING($1 FROM 255), ''),
> coalesce(SUBSTRING($2 FROM 255), '')));
>     END IF;
>     END;
> $$ LANGUAGE plpgsql;

> When I invoke this function with
> SELECT longlevenshtein(null, 'foobar')
> I get a ERROR:  stack depth limit exceeded

Worksforme.  You sure you transcribed the function accurately?

Note however that sufficiently long input strings *will* drive this
function to stack overrun, if you don't run out of heap memory first
(I think the heap consumption will be O(N^2) ...).  Consider rewriting
it with a loop rather than recursion.

            regards, tom lane