Thread: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
Hi,

Per Clang UBSan
Clang 10 (64 bits)
Postgres 14 (latest)

2020-08-27 01:02:14.930 -03 client backend[42432] pg_regress/create_table STATEMENT:  create table defcheck_0 partition of defcheck for values in (0);
indexcmds.c:1162:22: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior indexcmds.c:1162:22 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in 

indexcmds.c (1162):
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

clog.c (299):
memcmp(subxids, MyProc->subxids.xids,
  nsubxids * sizeof(TransactionId)) == 0)

xact.c (5285)
memcpy(&workspace[i], s->childXids,
  s->nChildXids * sizeof(TransactionId));

snapmgr.c (590)
memcpy(CurrentSnapshot->xip, sourcesnap->xip,
  sourcesnap->xcnt * sizeof(TransactionId));
snapmgr.c (594)
memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
  sourcesnap->subxcnt * sizeof(TransactionId));

copyfuncs.c:1190
COPY_POINTER_FIELD(uniqColIdx, from->uniqNumCols * sizeof(AttrNumber));

1.STATEMENT:  CREATE TABLESPACE regress_tblspacewith LOCATION '/usr/src/postgres/src/test/regress/testtablespace' WITH (some_nonexistent_parameter = true);
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
2.STATEMENT:  CREATE TABLE testschema.dflt (a int PRIMARY KEY USING INDEX TABLESPACE regress_tblspace) PARTITION BY LIST (a);
indexcmds.c:1162:22: runtime error: null pointer passed as argument 2, which is declared to never be null
3.STATEMENT:  SELECT bool 'nay' AS error;
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
4.STATEMENT:  SELECT U&'wrong: +0061' UESCAPE '+';
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
5. STATEMENT:  ALTER TABLE circles ADD EXCLUDE USING gist
 (c1 WITH &&, (c2::circle) WITH &&);
xact.c:5285:25: runtime error: null pointer passed as argument 2, which is declared to never be null
6.STATEMENT:  COMMENT ON CONSTRAINT the_constraint ON DOMAIN no_comments_dom IS 'another bad comment';
snapmgr.c:590:31: runtime error: null pointer passed as argument 2, which is declared to never be null
7.STATEMENT:  create trigger my_table_col_update_trig
 after update of b on my_table referencing new table as new_table
 for each statement execute procedure dump_insert();
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
xact.c:5285:25: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior xact.c:5285:25 in
snapmgr.c:590:31: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior snapmgr.c:590:31 in
snapmgr.c:594:34: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior snapmgr.c:594:34 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in 8.
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
8.STATEMENT:  select array_fill(1, array[[1,2],[3,4]]);
copyfuncs.c:1190:2: runtime error: null pointer passed as argument 2, which is declared to never be null

I stopped counting clog.c (299).
If anyone wants, the full report, it has 2mb.

Ranier Vilela

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Alvaro Herrera
Date:
On 2020-Aug-27, Ranier Vilela wrote:

> indexcmds.c (1162):
> memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.

Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
Em qui., 27 de ago. de 2020 às 13:57, Alvaro Herrera <alvherre@2ndquadrant.com> escreveu:
On 2020-Aug-27, Ranier Vilela wrote:

> indexcmds.c (1162):
> memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.

Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.
Hi Álvaro,
If we are passing a null pointer in these places and it should not be done,
it is a sign that perhaps these calls should not or should not be made, and they can be avoided.
This would eliminate undefined behavior and save some cycles?

regards,
Ranier Vilela

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
Em qui., 27 de ago. de 2020 às 13:57, Alvaro Herrera <alvherre@2ndquadrant.com> escreveu:
On 2020-Aug-27, Ranier Vilela wrote:

> indexcmds.c (1162):
> memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.

Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.
See at:

regards,
Ranier Vilela

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Alvaro Herrera
Date:
On 2020-Aug-27, Ranier Vilela wrote:

> If we are passing a null pointer in these places and it should not be done,
> it is a sign that perhaps these calls should not or should not be made, and
> they can be avoided.

Feel free to send a patch.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:
> On 2020-Aug-27, Ranier Vilela wrote:
> > indexcmds.c (1162):
> > memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
> 
> Looks legit, and at least per commit 13bba02271dc we do fix such things,
> even if it's useless in practice.
> 
> Given that no buildfarm member has ever complained, this exercise seems
> pretty pointless.

Later decision to stop changing such code:
https://postgr.es/m/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com



Noah Misch <noah@leadboat.com> writes:
> On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:
>> Looks legit, and at least per commit 13bba02271dc we do fix such things,
>> even if it's useless in practice.
>> Given that no buildfarm member has ever complained, this exercise seems
>> pretty pointless.

> Later decision to stop changing such code:
> https://postgr.es/m/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com

I agree that this seems academic for any sane implementation of memcmp
and friends.  If the function is not allowed to fetch or store any bytes
when the length parameter is zero, which it certainly is not, then how
could it matter whether the pointer parameter is NULL?  It would be
interesting to know the rationale behind the C standard's claim that
this case should be undefined.

Having said that, I think that the actual risk here has to do not with
what memcmp() might do, but with what gcc might do in code surrounding
the call, once it's armed with the assumption that any pointer we pass
to memcmp() could not be null.  See

https://www.postgresql.org/message-id/flat/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10%40phx.gbl

It's surely not hard to visualize cases where necessary code could
be optimized away if the compiler thinks it's entitled to assume
such things.

In other words, the C standard made a damfool decision and now we need
to deal with the consequences of that as perpetrated by other fools.
Still, it's all hypothetical so far --- does anyone have examples of
actual rather than theoretical issues?

            regards, tom lane



On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:
> >> Looks legit, and at least per commit 13bba02271dc we do fix such things,
> >> even if it's useless in practice.
> >> Given that no buildfarm member has ever complained, this exercise seems
> >> pretty pointless.
> 
> > Later decision to stop changing such code:
> > https://postgr.es/m/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com

> I think that the actual risk here has to do not with
> what memcmp() might do, but with what gcc might do in code surrounding
> the call, once it's armed with the assumption that any pointer we pass
> to memcmp() could not be null.  See
> 
> https://www.postgresql.org/message-id/flat/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10%40phx.gbl
> 
> It's surely not hard to visualize cases where necessary code could
> be optimized away if the compiler thinks it's entitled to assume
> such things.

Good point.  We could pick from a few levels of concern:

- No concern: reject changes serving only to remove this class of deviation.
  This is today's policy.
- Medium concern: accept fixes, but the buildfarm continues not to break in
  the face of new deviations.  This will make some code uglier, but we'll be
  ready against some compiler growing the optimization you describe.
- High concern: I remove -fno-sanitize=nonnull-attribute from buildfarm member
  thorntail.  In addition to the drawback of the previous level, this will
  create urgent work for committers introducing new deviations (or introducing
  test coverage that unearths old deviations).  This is our current response
  to Valgrind complaints, for example.



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
Em sex., 28 de ago. de 2020 às 00:11, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

In other words, the C standard made a damfool decision and now we need
to deal with the consequences of that as perpetrated by other fools.
Still, it's all hypothetical so far --- does anyone have examples of
actual rather than theoretical issues?
I still think the value of this alert would be to avoid the call.
Why do memcmp have to deal with a NULL value?
clog.c: 299, it is a case outside the curve, there are hundreds of calls in the report.
It must be very difficult to correct, but if TransactionIdSetPageStatus was not called in these cases, memcmp, it would not have to deal with the NULL pointer.

regards,
Ranier Vilela

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Peter Geoghegan
Date:
On Thu, Aug 27, 2020 at 11:04 PM Noah Misch <noah@leadboat.com> wrote:
> On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote:
> > It's surely not hard to visualize cases where necessary code could
> > be optimized away if the compiler thinks it's entitled to assume
> > such things.
>
> Good point.

I wonder if we should start using -fno-delete-null-pointer-checks:

https://lkml.org/lkml/2018/4/4/601

This may not be strictly relevant to the discussion, but I was
reminded of it just now and thought I'd mention it.

-- 
Peter Geoghegan



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
Em sex., 28 de ago. de 2020 às 03:04, Noah Misch <noah@leadboat.com> escreveu:
On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:
> >> Looks legit, and at least per commit 13bba02271dc we do fix such things,
> >> even if it's useless in practice.
> >> Given that no buildfarm member has ever complained, this exercise seems
> >> pretty pointless.
>
> > Later decision to stop changing such code:
> > https://postgr.es/m/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com

> I think that the actual risk here has to do not with
> what memcmp() might do, but with what gcc might do in code surrounding
> the call, once it's armed with the assumption that any pointer we pass
> to memcmp() could not be null.  See
>
> https://www.postgresql.org/message-id/flat/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10%40phx.gbl
>
> It's surely not hard to visualize cases where necessary code could
> be optimized away if the compiler thinks it's entitled to assume
> such things.

Good point.  We could pick from a few levels of concern:

- No concern: reject changes serving only to remove this class of deviation.
  This is today's policy.
- Medium concern: accept fixes, but the buildfarm continues not to break in
  the face of new deviations.  This will make some code uglier, but we'll be
  ready against some compiler growing the optimization you describe.
- High concern: I remove -fno-sanitize=nonnull-attribute from buildfarm member
  thorntail.  In addition to the drawback of the previous level, this will
  create urgent work for committers introducing new deviations (or introducing
  test coverage that unearths old deviations).  This is our current response
  to Valgrind complaints, for example.
Maybe in this specific case, the policy could be ignored, this change does not hurt.

--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -293,7 +293,7 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
  * sub-XIDs and all of the XIDs for which we're adjusting clog should be
  * on the same page.  Check those conditions, too.
  */
- if (all_xact_same_page && xid == MyProc->xid &&
+ if (all_xact_same_page && subxids && xid == MyProc->xid &&
  nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT &&
  nsubxids == MyProc->subxidStatus.count &&
  memcmp(subxids, MyProc->subxids.xids,

regards,
Ranier Vilela
 
Peter Geoghegan <pg@bowt.ie> writes:
> I wonder if we should start using -fno-delete-null-pointer-checks:
> https://lkml.org/lkml/2018/4/4/601
> This may not be strictly relevant to the discussion, but I was
> reminded of it just now and thought I'd mention it.

Hmm.  gcc 8.3 defines this as:

     Assume that programs cannot safely dereference null pointers, and
     that no code or data element resides at address zero.  This option
     enables simple constant folding optimizations at all optimization
     levels.  In addition, other optimization passes in GCC use this
     flag to control global dataflow analyses that eliminate useless
     checks for null pointers; these assume that a memory access to
     address zero always results in a trap, so that if a pointer is
     checked after it has already been dereferenced, it cannot be null.

AFAICS, that's a perfectly valid assumption for our usage.  I can see why
the kernel might not want it, but we set things up whenever possible to
ensure that dereferencing NULL would crash.

However, while grepping the manual for that I also found

'-Wnull-dereference'
     Warn if the compiler detects paths that trigger erroneous or
     undefined behavior due to dereferencing a null pointer.  This
     option is only active when '-fdelete-null-pointer-checks' is
     active, which is enabled by optimizations in most targets.  The
     precision of the warnings depends on the optimization options used.

I wonder whether turning that on would find anything interesting.

            regards, tom lane



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
More troubles with undefined-behavior.

This type of code can leaves overflow:
var = (cast) (expression);
diff = (int32) (id1 - id2);

See:
    diff64 =  ((long int) d1 - (long int) d2);
    diff64=-4294901760

#include <stdio.h>
#include <stdint.h>

int main()
{
    unsigned int d1 = 3;
    unsigned int d2 = 4294901763;
    unsigned int diffu32 = 0;
    unsigned long int diffu64 = 0;
    unsigned long int diff64 = 0;
    int32_t diff = 0;

    diff = (int32_t) (d1 - d2);
    diff64 =  ((long int) d1 - (long int) d2);
    diffu32 = (unsigned int) (d1 - d2);
    diffu64 = (unsigned long int) (d1 - d2);
printf("d1=%u\n", d1);
printf("d2=%u\n", d2);
printf("diff=%d\n", diff);
printf("diffu32=%u\n", diffu32);
printf("diff64=%ld\n", diff64);
printf("diffu64=%lu\n", diffu64);

    return 0;
}

output:
d1=3
d2=4294901763
diff=65536
diffu32=65536
diff64=-4294901760
diffu64=65536

(With Ubuntu 64 bits + clang 10)
transam.c:311:22: runtime error: unsigned integer overflow: 3 - 4294901763 cannot be represented in type 'unsigned int'
TransactionIdPrecedes(TransactionId id1, TransactionId id2)
{
/*
* If either ID is a permanent XID then we can just do unsigned
* comparison.  If both are normal, do a modulo-2^32 comparison.
*/
int32 diff;

if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);

diff = (int32) (id1 - id2);
return (diff < 0);
}

This works, all time or really with bad numbers can break?
I would like to know, why doesn't it work?

With Windows 10 (64 bits) + msvc 2019 (64 bits)
bool
TransactionIdPrecedes(TransactionId id1, TransactionId id2)
{
/*
* If either ID is a permanent XID then we can just do unsigned
* comparison.  If both are normal, do a modulo-2^32 comparison.
*/
int32 diff;
int64       diff64;

if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);

diff = (int32) (id1 - id2);
diff64 = ((int64) id1 - (int64) id2);
    fprintf(stderr, "id1=%lu\n", id1);
    fprintf(stderr, "id2=%lu\n", id1);
    fprintf(stderr, "diff32=%ld\n", diff);
    fprintf(stderr, "diff64=%lld\n", diff64);
return (diff64 < 0);
}

id1=498
id2=498
diff32=200000000
diff64=-4094967296
2020-08-31 12:46:30.422 -03 [8908] WARNING:  oldest xmin is far in the past
2020-08-31 12:46:30.422 -03 [8908] HINT:  Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.

id1=4
id2=4
diff32=-494
diff64=-494

id1=4
id2=4
diff32=50000000
diff64=-4244967296
2020-08-31 12:46:30.423 -03 [8908] FATAL:  found xmin 4 from before relfrozenxid 4244967300
2020-08-31 12:46:30.423 -03 [8908] CONTEXT:  while scanning block 0 and offset 1 of relation "pg_catalog.pg_depend"
2020-08-31 12:46:30.423 -03 [8908] STATEMENT:  VACUUM FREEZE;

Most of the time:
id1=498
id2=498
diff32=0
diff64=0
id1=498
id2=498
diff32=0
diff64=0

regards,
Ranier Vilela

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Alvaro Herrera
Date:
On 2020-Aug-31, Ranier Vilela wrote:

> More troubles with undefined-behavior.
> 
> This type of code can leaves overflow:
> var = (cast) (expression);
> diff = (int32) (id1 - id2);
> 
> See:
>     diff64 =  ((long int) d1 - (long int) d2);
>     diff64=-4294901760

Did you compile this with gcc -fwrapv?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
Em seg., 31 de ago. de 2020 às 14:00, Alvaro Herrera <alvherre@2ndquadrant.com> escreveu:
On 2020-Aug-31, Ranier Vilela wrote:

> More troubles with undefined-behavior.
>
> This type of code can leaves overflow:
> var = (cast) (expression);
> diff = (int32) (id1 - id2);
>
> See:
>     diff64 =  ((long int) d1 - (long int) d2);
>     diff64=-4294901760

Did you compile this with gcc -fwrapv?
gcc 10.2 -O2  -fwrapv
bool test1()
{
    unsigned int d1 = 3;
    unsigned int d2 = 4294901763;
    long int diff64 = 0;

    diff64 =  ((long int) d1 - (long int) d2);

    return (diff64 < 0);
}
 
output:
mov     eax1
        ret

What is a workaround for msvc 2019 (64 bits) and clang 64 bits (linux)?
transam.c:311:22: runtime error: unsigned integer overflow: 3 - 4294901763 cannot be represented in type 'unsigned int'

Ranier Vilela

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
Em seg., 31 de ago. de 2020 às 14:43, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Em seg., 31 de ago. de 2020 às 14:00, Alvaro Herrera <alvherre@2ndquadrant.com> escreveu:
On 2020-Aug-31, Ranier Vilela wrote:

> More troubles with undefined-behavior.
>
> This type of code can leaves overflow:
> var = (cast) (expression);
> diff = (int32) (id1 - id2);
>
> See:
>     diff64 =  ((long int) d1 - (long int) d2);
>     diff64=-4294901760

Did you compile this with gcc -fwrapv?
gcc 10.2 -O2  -fwrapv
bool test1()
{
    unsigned int d1 = 3;
    unsigned int d2 = 4294901763;
    long int diff64 = 0;

    diff64 =  ((long int) d1 - (long int) d2);

    return (diff64 < 0);
}
 
output:
mov     eax1
        ret

What is a workaround for msvc 2019 (64 bits) and clang 64 bits (linux)?
transam.c:311:22: runtime error: unsigned integer overflow: 3 - 4294901763 cannot be represented in type 'unsigned int'

with Debug:
#include <stdio.h>
#include <stdint.h>

bool test1(void)
{
    unsigned int d1 = 3;
    unsigned int d2 = 4294901763;
    int32_t diff;

    diff = (int32_t) (d1 - d2);

    return (diff < 0);
}
 
gcc 10.2 -g
output:
       push    rbp
        mov     rbprsp
        mov     DWORD PTR [rbp-4]3
        mov     DWORD PTR [rbp-8], -65533
        mov     eaxDWORD PTR [rbp-4]
        sub     eaxDWORD PTR [rbp-8]
        mov     DWORD PTR [rbp-12]eax
        mov     eaxDWORD PTR [rbp-12]
        shr     eax31
        pop     rbp
        ret
 
it is possible to conclude that:
1. TransactionIdPrecedes works in release mode, because the compiler treats undefined-behavior and corrects it,
treating a possible overflow.
2. TransactionIdPrecedes does not work in debug mode, and overflow occurs.
3. TransactionID cannot contain the largest possible ID or an invalid ID (4294901763) has been generated and passed to TransactionIdPrecedes.

Ranier Vilela

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Andres Freund
Date:
Hi,

On August 31, 2020 11:08:49 AM PDT, Ranier Vilela <ranier.vf@gmail.com> wrote:
>Em seg., 31 de ago. de 2020 às 14:43, Ranier Vilela
><ranier.vf@gmail.com>
>escreveu:
>
>> Em seg., 31 de ago. de 2020 às 14:00, Alvaro Herrera <
>> alvherre@2ndquadrant.com> escreveu:
>>
>>> On 2020-Aug-31, Ranier Vilela wrote:
>>>
>>> > More troubles with undefined-behavior.
>>> >
>>> > This type of code can leaves overflow:
>>> > var = (cast) (expression);
>>> > diff = (int32) (id1 - id2);
>>> >
>>> > See:
>>> >     diff64 =  ((long int) d1 - (long int) d2);
>>> >     diff64=-4294901760
>>>
>>> Did you compile this with gcc -fwrapv?
>>>
>> gcc 10.2 -O2  -fwrapv
>> bool test1()
>> {
>>     unsigned int d1 = 3;
>>     unsigned int d2 = 4294901763;
>>     long int diff64 = 0;
>>
>>     diff64 =  ((long int) d1 - (long int) d2);
>>
>>     return (diff64 < 0);
>> }
>>
>> output:
>> mov     eax, 1
>>         ret
>>
>> What is a workaround for msvc 2019 (64 bits) and clang 64 bits
>(linux)?
>> transam.c:311:22: runtime error: unsigned integer overflow: 3 -
>4294901763
>> cannot be represented in type 'unsigned int'

Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about.

Andres

Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Peter Geoghegan
Date:
On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:
> Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn
about.

Presumably it's simply warning that the value -4294901760 (i.e. the
result of 3 - 4294901763) cannot be faithfully represented as an
unsigned int. This is true, of course. It's just not relevant.

I'm pretty sure that UBSan does not actually state that this is
undefined behavior. At least Ranier's sample output didn't seem to
indicate it.

-- 
Peter Geoghegan



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Andres Freund
Date:
Hi,

On 2020-08-31 12:38:51 -0700, Peter Geoghegan wrote:
> On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:
> > Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn
about.
> 
> Presumably it's simply warning that the value -4294901760 (i.e. the
> result of 3 - 4294901763) cannot be faithfully represented as an
> unsigned int. This is true, of course. It's just not relevant.
> 
> I'm pretty sure that UBSan does not actually state that this is
> undefined behavior. At least Ranier's sample output didn't seem to
> indicate it.

Well, my point is that there's no point in discussing unsigned integer
overflow, since it's precisely specified. And hence I don't understand
what we're discussing in this sub-thread.

https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html says:

> -fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where
> the result of an unsigned integer computation cannot be represented in
> its type. Unlike signed integer overflow, this is not undefined
> behavior, but it is often unintentional. This sanitizer does not check
> for lossy implicit conversions performed before such a computation
> (see -fsanitize=implicit-conversion).

So it seems Rainier needs to turn this test off, because it actually is
intentional.

Greetings,

Andres Freund



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
Em seg., 31 de ago. de 2020 às 16:39, Peter Geoghegan <pg@bowt.ie> escreveu:
On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:
> Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about.

Presumably it's simply warning that the value -4294901760 (i.e. the
result of 3 - 4294901763) cannot be faithfully represented as an
unsigned int. This is true, of course. It's just not relevant.

I'm pretty sure that UBSan does not actually state that this is
undefined behavior. At least Ranier's sample output didn't seem to
indicate it.
4294901763 can not store at unsigned int (TransactionID is uint32_t).
TransactionId id2 at TransactionIdPrecedes already has an overflow, before anything is done.

Ranier Vilela

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Ranier Vilela
Date:
Em seg., 31 de ago. de 2020 às 17:05, Andres Freund <andres@anarazel.de> escreveu:
Hi,

On 2020-08-31 12:38:51 -0700, Peter Geoghegan wrote:
> On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:
> > Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about.
>
> Presumably it's simply warning that the value -4294901760 (i.e. the
> result of 3 - 4294901763) cannot be faithfully represented as an
> unsigned int. This is true, of course. It's just not relevant.
>
> I'm pretty sure that UBSan does not actually state that this is
> undefined behavior. At least Ranier's sample output didn't seem to
> indicate it.

Well, my point is that there's no point in discussing unsigned integer
overflow, since it's precisely specified. And hence I don't understand
what we're discussing in this sub-thread.

https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html says:

> -fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where
> the result of an unsigned integer computation cannot be represented in
> its type. Unlike signed integer overflow, this is not undefined
> behavior, but it is often unintentional. This sanitizer does not check
> for lossy implicit conversions performed before such a computation
> (see -fsanitize=implicit-conversion).

So it seems Rainier needs to turn this test off, because it actually is
intentional.
No problem.
If intentional, the code at TransactionIdPrecedes, already knows that overflow can occur
and trusts that the compiler will save it.

Ranier Vilela

Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Andres Freund
Date:
Hi,

On 2020-08-31 17:35:14 -0300, Ranier Vilela wrote:
> Em seg., 31 de ago. de 2020 às 17:05, Andres Freund <andres@anarazel.de>
> escreveu:
> > So it seems Rainier needs to turn this test off, because it actually is
> > intentional.
> >
> No problem.
> If intentional, the code at TransactionIdPrecedes, already knows that
> overflow can occur
> and trusts that the compiler will save it.

I don't know what you mean with "saving" it. Again, unsigned integer
overflow is well specified in C. All that's needed is for the compiler
to implement normal C.



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Noah Misch
Date:
On Sat, Aug 29, 2020 at 12:36:42PM -0400, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > I wonder if we should start using -fno-delete-null-pointer-checks:
> > https://lkml.org/lkml/2018/4/4/601
> > This may not be strictly relevant to the discussion, but I was
> > reminded of it just now and thought I'd mention it.
> 
> Hmm.  gcc 8.3 defines this as:
> 
>      Assume that programs cannot safely dereference null pointers, and
>      that no code or data element resides at address zero.  This option
>      enables simple constant folding optimizations at all optimization
>      levels.  In addition, other optimization passes in GCC use this
>      flag to control global dataflow analyses that eliminate useless
>      checks for null pointers; these assume that a memory access to
>      address zero always results in a trap, so that if a pointer is
>      checked after it has already been dereferenced, it cannot be null.
> 
> AFAICS, that's a perfectly valid assumption for our usage.  I can see why
> the kernel might not want it, but we set things up whenever possible to
> ensure that dereferencing NULL would crash.

We do assume dereferencing NULL would crash, but we also assume this
optimization doesn't happen:

=== opt-null.c
#include <string.h>
#include <unistd.h>

int my_memcpy(void *dest, const void *src, size_t n)
{
#ifndef REMOVE_MEMCPY
    memcpy(dest, src, n);
#endif
    if (src)
    pause();
    return 0;
}
===

$ gcc --version
gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ diff -sU1 <(gcc -O2 -fno-delete-null-pointer-checks -S -o- opt-null.c) <(gcc -O2 -S -o- opt-null.c)
--- /dev/fd/63  2020-09-03 19:23:53.206864378 -0700
+++ /dev/fd/62  2020-09-03 19:23:53.206864378 -0700
@@ -8,13 +8,8 @@
        .cfi_startproc
-       pushq   %rbx
+       subq    $8, %rsp
        .cfi_def_cfa_offset 16
-       .cfi_offset 3, -16
-       movq    %rsi, %rbx
        call    memcpy@PLT
-       testq   %rbx, %rbx
-       je      .L2
        call    pause@PLT
-.L2:
        xorl    %eax, %eax
-       popq    %rbx
+       addq    $8, %rsp
        .cfi_def_cfa_offset 8
$ diff -sU1 <(gcc -DREMOVE_MEMCPY -O2 -fno-delete-null-pointer-checks -S -o- opt-null.c) <(gcc -DREMOVE_MEMCPY -O2 -S
-o-opt-null.c)
 
Files /dev/fd/63 and /dev/fd/62 are identical


So yes, it would be reasonable to adopt -fno-delete-null-pointer-checks and/or
remove -fno-sanitize=nonnull-attribute from buildfarm member thorntail.

> However, while grepping the manual for that I also found
> 
> '-Wnull-dereference'
>      Warn if the compiler detects paths that trigger erroneous or
>      undefined behavior due to dereferencing a null pointer.  This
>      option is only active when '-fdelete-null-pointer-checks' is
>      active, which is enabled by optimizations in most targets.  The
>      precision of the warnings depends on the optimization options used.
> 
> I wonder whether turning that on would find anything interesting.

Promising.  Sadly, it doesn't warn for the above test case.



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> We do assume dereferencing NULL would crash, but we also assume this
> optimization doesn't happen:

> #ifndef REMOVE_MEMCPY
>     memcpy(dest, src, n);
> #endif
>     if (src)
>     pause();

> [ gcc believes the if-test is unnecessary ]

Hm.  I would not blame that on -fdelete-null-pointer-checks per se.
Rather the problem is what we were touching on before: the dubious
but standard-approved assumption that memcpy's arguments cannot be
null.

If there actually are places where this is a problem, I think we
need to fix it by doing

    if (n > 0)
        memcpy(dest, src, n);

so that the compiler can no longer assume that src!=NULL even
when n is zero.  I'd still leave -fdelete-null-pointer-checks
enabled, because it can make valid and useful optimizations in
other cases.  (Besides that, it's far from clear that disabling
that flag would suppress all bad consequences of the assumption
that memcpy's arguments aren't null.)

            regards, tom lane



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Peter Geoghegan
Date:
On Thu, Sep 3, 2020 at 7:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hm.  I would not blame that on -fdelete-null-pointer-checks per se.
> Rather the problem is what we were touching on before: the dubious
> but standard-approved assumption that memcpy's arguments cannot be
> null.

Isn't it both, together? That is, it's the combination of that
assumption alongside -fdelete-null-pointer-checks's actual willingness
to propagate the assumption.

> I'd still leave -fdelete-null-pointer-checks
> enabled, because it can make valid and useful optimizations in
> other cases.

Is there any evidence that that's true? I wouldn't assume that the gcc
people exercised good judgement here.

-- 
Peter Geoghegan



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Thu, Sep 3, 2020 at 7:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'd still leave -fdelete-null-pointer-checks
>> enabled, because it can make valid and useful optimizations in
>> other cases.

> Is there any evidence that that's true? I wouldn't assume that the gcc
> people exercised good judgement here.

I have not actually dug for examples, but the sort of situation where
I think it would help us is that macros or static inlines could contain
null tests that can be proven useless at particular call sites due to
surrounding code.

            regards, tom lane



Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

From
Noah Misch
Date:
On Thu, Sep 03, 2020 at 10:53:37PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > We do assume dereferencing NULL would crash, but we also assume this
> > optimization doesn't happen:
> 
> > #ifndef REMOVE_MEMCPY
> >     memcpy(dest, src, n);
> > #endif
> >     if (src)
> >     pause();
> 
> > [ gcc believes the if-test is unnecessary ]
> 
> > So yes, it would be reasonable to adopt -fno-delete-null-pointer-checks and/or
> > remove -fno-sanitize=nonnull-attribute from buildfarm member thorntail.

> If there actually are places where this is a problem, I think we
> need to fix it by doing
> 
>     if (n > 0)
>         memcpy(dest, src, n);
> 
> so that the compiler can no longer assume that src!=NULL even
> when n is zero.  I'd still leave -fdelete-null-pointer-checks
> enabled, because it can make valid and useful optimizations in
> other cases.  (Besides that, it's far from clear that disabling
> that flag would suppress all bad consequences of the assumption
> that memcpy's arguments aren't null.)

Your proposal is what I had in mind when I wrote "remove
-fno-sanitize=nonnull-attribute from buildfarm member thorntail", and I agree
it's attractive.  In particular, gcc is not likely to be the last compiler to
attempt such an optimization, and other compilers may not offer
-fno-delete-null-pointer-checks or equivalent.

One might argue for -fno-delete-null-pointer-checks in addition, because it
would defend against cases that sanitizers miss.  I tend to think that's
overkill, but maybe not.  I suppose one could do an audit, diffing the
generated code with and without the option.