Thread: BUG #15971: Behaviour of SUBSTR function depending on its arguments

BUG #15971: Behaviour of SUBSTR function depending on its arguments

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15971
Logged by:          VIMAL VICTOR B
Email address:      bvimalvictor@gmail.com
PostgreSQL version: 10.4
Operating system:   Linux
Description:

Hi,

select substr('abcdef',1,2) val1, substr('abcdef',0,2) val2,
substr('abcdef',-1,2) val3;

The above query in Postgres returns expected result for val1 ('ab') but for
val2 and val3, it returns 'a' and '' respectively. Oracle and most of other
RDBMS systems would return 'ab' for val1 and val2. When the start position
is -ve, then the start position will be considered from end of input string.
The start position can be considered either as 0 or 1 and substring from
that position can be returned by considering the max length of the input
string, which is logically correct. But when -ve value given for start
position and returning empty string '', which does not seem logically
correct. Kindly consider this case and let me know your thoughts.

Regards,
Vimal


Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

From
Ashutosh Sharma
Date:
I don't think it's a bug. I think it's just that the SUBSTR in
PostgreSQL is more SQL compliant than any other databases that you
mentioned.  As per the SQL standard, if the start position is zero or
negative number, it should be adjusted to the start of the string and
not to the end of the string and that way I feel the behaviour of
SUBSTR in PostgreSQL is more SQL compliant than other databases.
That's my opinion though, let's see what others have to say on it.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Wed, Aug 21, 2019 at 12:33 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      15971
> Logged by:          VIMAL VICTOR B
> Email address:      bvimalvictor@gmail.com
> PostgreSQL version: 10.4
> Operating system:   Linux
> Description:
>
> Hi,
>
> select substr('abcdef',1,2) val1, substr('abcdef',0,2) val2,
> substr('abcdef',-1,2) val3;
>
> The above query in Postgres returns expected result for val1 ('ab') but for
> val2 and val3, it returns 'a' and '' respectively. Oracle and most of other
> RDBMS systems would return 'ab' for val1 and val2. When the start position
> is -ve, then the start position will be considered from end of input string.
> The start position can be considered either as 0 or 1 and substring from
> that position can be returned by considering the max length of the input
> string, which is logically correct. But when -ve value given for start
> position and returning empty string '', which does not seem logically
> correct. Kindly consider this case and let me know your thoughts.
>
> Regards,
> Vimal
>



Re: BUG #15971: Behaviour of SUBSTR function depending on itsarguments

From
Thomas Kellerer
Date:
PG Bug reporting form schrieb am 21.08.2019 um 09:02:

> select substr('abcdef',1,2) val1, substr('abcdef',0,2) val2,
> substr('abcdef',-1,2) val3;
> 
> The above query in Postgres returns expected result for val1 ('ab') but for
> val2 and val3, it returns 'a' and '' respectively. Oracle and most of other
> RDBMS systems would return 'ab' for val1 and val2. 

The only "other" DBMS that behaves the way you describe it, is Oracle. 

When I ran it on different systems, this is the result:

           | val1 |  val2   | val3    |
-----------|------|---------|---------|
Postgres   |  ab  |    a    | <empty> |
SQL Server |  ab  |    a    | <empty> |
SQLite     |  ab  |    a    |    f    |
MySQL      |  ab  | <empty> |    f    |
Oracle     |  ab  |   ab    |    f    |

DB2 and Firebird do not allow a starting position smaller than 1 

So SQL Server works the same as Postgres and Oracle is the only one that returns 'ab' for the second case

So clearly the claim "most other" is wrong here. 




Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

From
Pavel Stehule
Date:


st 21. 8. 2019 v 9:03 odesílatel PG Bug reporting form <noreply@postgresql.org> napsal:
The following bug has been logged on the website:

Bug reference:      15971
Logged by:          VIMAL VICTOR B
Email address:      bvimalvictor@gmail.com
PostgreSQL version: 10.4
Operating system:   Linux
Description:       

Hi,

select substr('abcdef',1,2) val1, substr('abcdef',0,2) val2,
substr('abcdef',-1,2) val3;

The above query in Postgres returns expected result for val1 ('ab') but for
val2 and val3, it returns 'a' and '' respectively. Oracle and most of other
RDBMS systems would return 'ab' for val1 and val2. When the start position
is -ve, then the start position will be considered from end of input string.
The start position can be considered either as 0 or 1 and substring from
that position can be returned by considering the max length of the input
string, which is logically correct. But when -ve value given for start
position and returning empty string '', which does not seem logically
correct. Kindly consider this case and let me know your thoughts.

Orafce support substr function with Oracle behave


Regards

Pavel

Regards,
Vimal

Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

From
Tom Lane
Date:
Ashutosh Sharma <ashu.coek88@gmail.com> writes:
> I don't think it's a bug. I think it's just that the SUBSTR in
> PostgreSQL is more SQL compliant than any other databases that you
> mentioned.  As per the SQL standard, if the start position is zero or
> negative number, it should be adjusted to the start of the string and
> not to the end of the string and that way I feel the behaviour of
> SUBSTR in PostgreSQL is more SQL compliant than other databases.

Yes, the standard provides no wiggle room here.  The behavior of
substring() with integer parameters, as specified in SQL:2008
6.29 <string value function> general rule 3, is

3) If <character substring function> is specified, then:

  a) If the character encoding form of <character value expression> is
  UTF8, UTF16, or UTF32, then, in the remainder of this General Rule, the
  term “character” shall be taken to mean “unit specified by <char length
  units>”.

  b) Let C be the value of the <character value expression>, let LC be the
  length in characters of C, and let S be the value of the <start
  position>.

  c) If <string length> is specified, then let L be the value of <string
  length> and let E be S+L. Otherwise, let E be the larger of LC + 1 and
  S.

  d) If either C, S, or L is the null value, then the result of the
  <character substring function> is the null value.

  e) If E is less than S, then an exception condition is raised: data
  exception — substring error.

  f) Case:

    i) If S is greater than LC or if E is less than 1 (one), then the
    result of the <character substring function> is a zero-length string.

    ii) Otherwise,

      1) Let S1 be the larger of S and 1 (one). Let E1 be the smaller of E
      and LC+1. Let L1 be E1–S1.

      2) The result of the <character substring function> is a character
      string containing the L1 characters of C starting at character
      number S1 in the same order that the characters appear in C.

I believe our implementation does this exactly.

Even if it were true that Oracle's behavior is more common than the
spec's definition, it's quite unlikely that we could be talked into
abandoning spec-compliant behavior to match Oracle.

            regards, tom lane



Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

From
Ashutosh Sharma
Date:
On Wed, Aug 21, 2019 at 7:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Ashutosh Sharma <ashu.coek88@gmail.com> writes:
> > I don't think it's a bug. I think it's just that the SUBSTR in
> > PostgreSQL is more SQL compliant than any other databases that you
> > mentioned.  As per the SQL standard, if the start position is zero or
> > negative number, it should be adjusted to the start of the string and
> > not to the end of the string and that way I feel the behaviour of
> > SUBSTR in PostgreSQL is more SQL compliant than other databases.
>
> Yes, the standard provides no wiggle room here.  The behavior of
> substring() with integer parameters, as specified in SQL:2008
> 6.29 <string value function> general rule 3, is
>
> 3) If <character substring function> is specified, then:
>
>   a) If the character encoding form of <character value expression> is
>   UTF8, UTF16, or UTF32, then, in the remainder of this General Rule, the
>   term “character” shall be taken to mean “unit specified by <char length
>   units>”.
>
>   b) Let C be the value of the <character value expression>, let LC be the
>   length in characters of C, and let S be the value of the <start
>   position>.
>
>   c) If <string length> is specified, then let L be the value of <string
>   length> and let E be S+L. Otherwise, let E be the larger of LC + 1 and
>   S.
>
>   d) If either C, S, or L is the null value, then the result of the
>   <character substring function> is the null value.
>
>   e) If E is less than S, then an exception condition is raised: data
>   exception — substring error.
>
>   f) Case:
>
>     i) If S is greater than LC or if E is less than 1 (one), then the
>     result of the <character substring function> is a zero-length string.
>
>     ii) Otherwise,
>
>       1) Let S1 be the larger of S and 1 (one). Let E1 be the smaller of E
>       and LC+1. Let L1 be E1–S1.
>
>       2) The result of the <character substring function> is a character
>       string containing the L1 characters of C starting at character
>       number S1 in the same order that the characters appear in C.
>
> I believe our implementation does this exactly.
>
> Even if it were true that Oracle's behavior is more common than the
> spec's definition, it's quite unlikely that we could be talked into
> abandoning spec-compliant behavior to match Oracle.
>

Absolutely. We shouldn't do that. We are more accurate than Oracle as
per the SQL standard.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com