Thread: BUG #17450: SUBSTRING function extracting lesser characters than specified

BUG #17450: SUBSTRING function extracting lesser characters than specified

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17450
Logged by:          Suman Ganguly
Email address:      ganguly.04@gmail.com
PostgreSQL version: 10.17
Operating system:   x86_64-pc-linux-gnu
Description:

select substring('123456', 0 , 5)
On running this, Postgres returns '1234'
Expecting '12345' to be returned as per the documentation


Re: BUG #17450: SUBSTRING function extracting lesser characters than specified

From
hubert depesz lubaczewski
Date:
On Mon, Mar 28, 2022 at 10:30:07AM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      17450
> Logged by:          Suman Ganguly
> Email address:      ganguly.04@gmail.com
> PostgreSQL version: 10.17
> Operating system:   x86_64-pc-linux-gnu
> Description:        
> 
> select substring('123456', 0 , 5)
> On running this, Postgres returns '1234'
> Expecting '12345' to be returned as per the documentation

Well, the problem is that you're trying to provide character number 0,
which is leading to bad results.

substring works on base-1 numbering.

depesz



Re: BUG #17450: SUBSTRING function extracting lesser characters than specified

From
Pavel Borisov
Date:
пн, 28 мар. 2022 г. в 15:01, hubert depesz lubaczewski <depesz@depesz.com>:
On Mon, Mar 28, 2022 at 10:30:07AM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      17450
> Logged by:          Suman Ganguly
> Email address:      ganguly.04@gmail.com
> PostgreSQL version: 10.17
> Operating system:   x86_64-pc-linux-gnu
> Description:       
>
> select substring('123456', 0 , 5)
> On running this, Postgres returns '1234'
> Expecting '12345' to be returned as per the documentation

Well, the problem is that you're trying to provide character number 0,
which is leading to bad results.

substring works on base-1 numbering.
Yes, the behavior expected in a report seems to be right. I've attached a very small fix for that issue. Probably it should be backpatched into all versions having 4bd3fad80e5c i.e. since v11.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com
Attachment
Pavel Borisov <pashkin.elfe@gmail.com> writes:
> Yes, the behavior expected in a report seems to be right. I've attached a
> very small fix for that issue. Probably it should be backpatched into all
> versions having 4bd3fad80e5c i.e. since v11.

The current calculation matches the SQL standard; this makes
it not match the standard.  From SQL99:

         3) If <character substring function> is specified, then:

            a) Let C be the value of the <character value expression>, let
              LC be the length of C, and let S be the value of the <start
              position>.

            b) If <string length> is specified, then let L be the value of
              <string length> and let E be S+L. Otherwise, let E be the
              larger of LC + 1 and S.

            c) If either C, S, or L is the null value, then the result of
              the <character substring function> is the null value.

            d) If E is less than S, then an exception condition is raised:
              data exception - substring error.

            e) Case:

              i) If S is greater than LC or if E is less than 1 (one), then
                 the result of the <character substring function> is a zero-
                 length string.

             ii) Otherwise,

                 1) Let S1 be the larger of S and 1 (one). Let E1 be the
                   smaller of E and LC+1. Let L1 be E1-S1.

                 2) The result of the <character substring function> is
                   a character string containing the L1 characters of C
                   starting at character number S1 in the same order that
                   the characters appear in C.

Perhaps there's something to do here documentation-wise,
but there is no bug.

            regards, tom lane



On Mon, 28 Mar 2022 at 20:35, Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> пн, 28 мар. 2022 г. в 15:01, hubert depesz lubaczewski <depesz@depesz.com>:
>
>> On Mon, Mar 28, 2022 at 10:30:07AM +0000, PG Bug reporting form wrote:
>> > The following bug has been logged on the website:
>> >
>> > Bug reference:      17450
>> > Logged by:          Suman Ganguly
>> > Email address:      ganguly.04@gmail.com
>> > PostgreSQL version: 10.17
>> > Operating system:   x86_64-pc-linux-gnu
>> > Description:
>> >
>> > select substring('123456', 0 , 5)
>> > On running this, Postgres returns '1234'
>> > Expecting '12345' to be returned as per the documentation
>>
>> Well, the problem is that you're trying to provide character number 0,
>> which is leading to bad results.
>>
>> substring works on base-1 numbering.
>>
> Yes, the behavior expected in a report seems to be right. I've attached a
> very small fix for that issue. Probably it should be backpatched into all
> versions having 4bd3fad80e5c i.e. since v11.

It seems bytea_substring also has the problem.

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 22ab5a4329..a48f2fa2a7 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3335,7 +3335,7 @@ bytea_substring(Datum str,
                  errmsg("negative substring length not allowed")));
         L1 = -1;                /* silence stupider compilers */
     }
-    else if (pg_add_s32_overflow(S, L, &E))
+    else if (pg_add_s32_overflow(S1, L, &E))
     {
         /*
          * L could be large enough for S + L to overflow, in which case the


--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: BUG #17450: SUBSTRING function extracting lesser characters than specified

From
Pavel Borisov
Date:
пн, 28 мар. 2022 г. в 17:55, Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Borisov <pashkin.elfe@gmail.com> writes:
> Yes, the behavior expected in a report seems to be right. I've attached a
> very small fix for that issue. Probably it should be backpatched into all
> versions having 4bd3fad80e5c i.e. since v11.

The current calculation matches the SQL standard; this makes
it not match the standard.  From SQL99:

         3) If <character substring function> is specified, then:

            a) Let C be the value of the <character value expression>, let
              LC be the length of C, and let S be the value of the <start
              position>.

            b) If <string length> is specified, then let L be the value of
              <string length> and let E be S+L. Otherwise, let E be the
              larger of LC + 1 and S.

            c) If either C, S, or L is the null value, then the result of
              the <character substring function> is the null value.

            d) If E is less than S, then an exception condition is raised:
              data exception - substring error.

            e) Case:

              i) If S is greater than LC or if E is less than 1 (one), then
                 the result of the <character substring function> is a zero-
                 length string.

             ii) Otherwise,

                 1) Let S1 be the larger of S and 1 (one). Let E1 be the
                   smaller of E and LC+1. Let L1 be E1-S1.

                 2) The result of the <character substring function> is
                   a character string containing the L1 characters of C
                   starting at character number S1 in the same order that
                   the characters appear in C.

Perhaps there's something to do here documentation-wise,
but there is no bug.

Ok, thanks for clarification!

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: BUG #17450: SUBSTRING function extracting lesser characters than specified

From
"David G. Johnston"
Date:
On Mon, Mar 28, 2022 at 5:35 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
пн, 28 мар. 2022 г. в 15:01, hubert depesz lubaczewski <depesz@depesz.com>:
On Mon, Mar 28, 2022 at 10:30:07AM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      17450
> Logged by:          Suman Ganguly
> Email address:      ganguly.04@gmail.com
> PostgreSQL version: 10.17
> Operating system:   x86_64-pc-linux-gnu
> Description:       
>
> select substring('123456', 0 , 5)
> On running this, Postgres returns '1234'
> Expecting '12345' to be returned as per the documentation

You should reference the documentation you are basing your conclusion off of for this kind of report.

This example in the documentation clearly demonstrates the 1-based nature of the numbering:

substring('Thomas' from 2 for 3) → hom

as does this one:

substr('alphabet', 3, 2) → ph


Oddly, I don't actually see a non-standard form of substring spelled that way though indeed the example works.


Probably it should be backpatched into all versions having 4bd3fad80e5c i.e. since v11.

The behavior of the example command is identical both before and since v11 so I don't see how that commit has anything to do with this.  Nor, as shown above, does this contradict the documentation.  The bug report is simply wrong and you seem to have attempted to supply a fix without confirming it.

David J.