Thread: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
Hi,
Per Clang UBSan
Clang 10 (64 bits)
Postgres 14 (latest)
2020-08-27 01:02:14.930 -03 client backend[42432] pg_regress/create_table STATEMENT: create table defcheck_0 partition of defcheck for values in (0);
indexcmds.c:1162:22: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior indexcmds.c:1162:22 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
indexcmds.c:1162:22: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior indexcmds.c:1162:22 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
indexcmds.c (1162):
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
clog.c (299):
memcmp(subxids, MyProc->subxids.xids,
nsubxids * sizeof(TransactionId)) == 0)
nsubxids * sizeof(TransactionId)) == 0)
xact.c (5285)
memcpy(&workspace[i], s->childXids,
s->nChildXids * sizeof(TransactionId));
s->nChildXids * sizeof(TransactionId));
snapmgr.c (590)
memcpy(CurrentSnapshot->xip, sourcesnap->xip,
sourcesnap->xcnt * sizeof(TransactionId));
sourcesnap->xcnt * sizeof(TransactionId));
snapmgr.c (594)
memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
sourcesnap->subxcnt * sizeof(TransactionId));
sourcesnap->subxcnt * sizeof(TransactionId));
copyfuncs.c:1190
COPY_POINTER_FIELD(uniqColIdx, from->uniqNumCols * sizeof(AttrNumber));
1.STATEMENT: CREATE TABLESPACE regress_tblspacewith LOCATION '/usr/src/postgres/src/test/regress/testtablespace' WITH (some_nonexistent_parameter = true);
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
2.STATEMENT: CREATE TABLE testschema.dflt (a int PRIMARY KEY USING INDEX TABLESPACE regress_tblspace) PARTITION BY LIST (a);
indexcmds.c:1162:22: runtime error: null pointer passed as argument 2, which is declared to never be null
indexcmds.c:1162:22: runtime error: null pointer passed as argument 2, which is declared to never be null
3.STATEMENT: SELECT bool 'nay' AS error;
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
4.STATEMENT: SELECT U&'wrong: +0061' UESCAPE '+';
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
5. STATEMENT: ALTER TABLE circles ADD EXCLUDE USING gist
(c1 WITH &&, (c2::circle) WITH &&);
xact.c:5285:25: runtime error: null pointer passed as argument 2, which is declared to never be null
(c1 WITH &&, (c2::circle) WITH &&);
xact.c:5285:25: runtime error: null pointer passed as argument 2, which is declared to never be null
6.STATEMENT: COMMENT ON CONSTRAINT the_constraint ON DOMAIN no_comments_dom IS 'another bad comment';
snapmgr.c:590:31: runtime error: null pointer passed as argument 2, which is declared to never be null
snapmgr.c:590:31: runtime error: null pointer passed as argument 2, which is declared to never be null
7.STATEMENT: create trigger my_table_col_update_trig
after update of b on my_table referencing new table as new_table
for each statement execute procedure dump_insert();
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
xact.c:5285:25: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior xact.c:5285:25 in
snapmgr.c:590:31: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior snapmgr.c:590:31 in
snapmgr.c:594:34: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior snapmgr.c:594:34 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in 8.
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
after update of b on my_table referencing new table as new_table
for each statement execute procedure dump_insert();
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
xact.c:5285:25: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior xact.c:5285:25 in
snapmgr.c:590:31: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior snapmgr.c:590:31 in
snapmgr.c:594:34: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior snapmgr.c:594:34 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in 8.
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
8.STATEMENT: select array_fill(1, array[[1,2],[3,4]]);
copyfuncs.c:1190:2: runtime error: null pointer passed as argument 2, which is declared to never be null
copyfuncs.c:1190:2: runtime error: null pointer passed as argument 2, which is declared to never be null
I stopped counting clog.c (299).
If anyone wants, the full report, it has 2mb.
Ranier Vilela
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Alvaro Herrera
Date:
On 2020-Aug-27, Ranier Vilela wrote: > indexcmds.c (1162): > memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts); Looks legit, and at least per commit 13bba02271dc we do fix such things, even if it's useless in practice. Given that no buildfarm member has ever complained, this exercise seems pretty pointless. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Ranier Vilela
Date:
Em qui., 27 de ago. de 2020 às 13:57, Alvaro Herrera <alvherre@2ndquadrant.com> escreveu:
On 2020-Aug-27, Ranier Vilela wrote:
> indexcmds.c (1162):
> memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.
Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.
Hi Álvaro,
If we are passing a null pointer in these places and it should not be done,
it is a sign that perhaps these calls should not or should not be made, and they can be avoided.
This would eliminate undefined behavior and save some cycles?
This would eliminate undefined behavior and save some cycles?
regards,
Ranier Vilela
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Ranier Vilela
Date:
Em qui., 27 de ago. de 2020 às 13:57, Alvaro Herrera <alvherre@2ndquadrant.com> escreveu:
On 2020-Aug-27, Ranier Vilela wrote:
> indexcmds.c (1162):
> memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.
Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.
See at:
regards,
Ranier Vilela
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Alvaro Herrera
Date:
On 2020-Aug-27, Ranier Vilela wrote: > If we are passing a null pointer in these places and it should not be done, > it is a sign that perhaps these calls should not or should not be made, and > they can be avoided. Feel free to send a patch. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote: > On 2020-Aug-27, Ranier Vilela wrote: > > indexcmds.c (1162): > > memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts); > > Looks legit, and at least per commit 13bba02271dc we do fix such things, > even if it's useless in practice. > > Given that no buildfarm member has ever complained, this exercise seems > pretty pointless. Later decision to stop changing such code: https://postgr.es/m/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com
Noah Misch <noah@leadboat.com> writes: > On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote: >> Looks legit, and at least per commit 13bba02271dc we do fix such things, >> even if it's useless in practice. >> Given that no buildfarm member has ever complained, this exercise seems >> pretty pointless. > Later decision to stop changing such code: > https://postgr.es/m/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com I agree that this seems academic for any sane implementation of memcmp and friends. If the function is not allowed to fetch or store any bytes when the length parameter is zero, which it certainly is not, then how could it matter whether the pointer parameter is NULL? It would be interesting to know the rationale behind the C standard's claim that this case should be undefined. Having said that, I think that the actual risk here has to do not with what memcmp() might do, but with what gcc might do in code surrounding the call, once it's armed with the assumption that any pointer we pass to memcmp() could not be null. See https://www.postgresql.org/message-id/flat/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10%40phx.gbl It's surely not hard to visualize cases where necessary code could be optimized away if the compiler thinks it's entitled to assume such things. In other words, the C standard made a damfool decision and now we need to deal with the consequences of that as perpetrated by other fools. Still, it's all hypothetical so far --- does anyone have examples of actual rather than theoretical issues? regards, tom lane
On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote: > >> Looks legit, and at least per commit 13bba02271dc we do fix such things, > >> even if it's useless in practice. > >> Given that no buildfarm member has ever complained, this exercise seems > >> pretty pointless. > > > Later decision to stop changing such code: > > https://postgr.es/m/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com > I think that the actual risk here has to do not with > what memcmp() might do, but with what gcc might do in code surrounding > the call, once it's armed with the assumption that any pointer we pass > to memcmp() could not be null. See > > https://www.postgresql.org/message-id/flat/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10%40phx.gbl > > It's surely not hard to visualize cases where necessary code could > be optimized away if the compiler thinks it's entitled to assume > such things. Good point. We could pick from a few levels of concern: - No concern: reject changes serving only to remove this class of deviation. This is today's policy. - Medium concern: accept fixes, but the buildfarm continues not to break in the face of new deviations. This will make some code uglier, but we'll be ready against some compiler growing the optimization you describe. - High concern: I remove -fno-sanitize=nonnull-attribute from buildfarm member thorntail. In addition to the drawback of the previous level, this will create urgent work for committers introducing new deviations (or introducing test coverage that unearths old deviations). This is our current response to Valgrind complaints, for example.
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Ranier Vilela
Date:
Em sex., 28 de ago. de 2020 às 00:11, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
In other words, the C standard made a damfool decision and now we need
to deal with the consequences of that as perpetrated by other fools.
Still, it's all hypothetical so far --- does anyone have examples of
actual rather than theoretical issues?
I still think the value of this alert would be to avoid the call.
Why do memcmp have to deal with a NULL value?
clog.c: 299, it is a case outside the curve, there are hundreds of calls in the report.
It must be very difficult to correct, but if TransactionIdSetPageStatus was not called in these cases, memcmp, it would not have to deal with the NULL pointer.
Why do memcmp have to deal with a NULL value?
clog.c: 299, it is a case outside the curve, there are hundreds of calls in the report.
It must be very difficult to correct, but if TransactionIdSetPageStatus was not called in these cases, memcmp, it would not have to deal with the NULL pointer.
regards,
Ranier Vilela
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Peter Geoghegan
Date:
On Thu, Aug 27, 2020 at 11:04 PM Noah Misch <noah@leadboat.com> wrote: > On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote: > > It's surely not hard to visualize cases where necessary code could > > be optimized away if the compiler thinks it's entitled to assume > > such things. > > Good point. I wonder if we should start using -fno-delete-null-pointer-checks: https://lkml.org/lkml/2018/4/4/601 This may not be strictly relevant to the discussion, but I was reminded of it just now and thought I'd mention it. -- Peter Geoghegan
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Ranier Vilela
Date:
Em sex., 28 de ago. de 2020 às 03:04, Noah Misch <noah@leadboat.com> escreveu:
On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:
> >> Looks legit, and at least per commit 13bba02271dc we do fix such things,
> >> even if it's useless in practice.
> >> Given that no buildfarm member has ever complained, this exercise seems
> >> pretty pointless.
>
> > Later decision to stop changing such code:
> > https://postgr.es/m/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com
> I think that the actual risk here has to do not with
> what memcmp() might do, but with what gcc might do in code surrounding
> the call, once it's armed with the assumption that any pointer we pass
> to memcmp() could not be null. See
>
> https://www.postgresql.org/message-id/flat/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10%40phx.gbl
>
> It's surely not hard to visualize cases where necessary code could
> be optimized away if the compiler thinks it's entitled to assume
> such things.
Good point. We could pick from a few levels of concern:
- No concern: reject changes serving only to remove this class of deviation.
This is today's policy.
- Medium concern: accept fixes, but the buildfarm continues not to break in
the face of new deviations. This will make some code uglier, but we'll be
ready against some compiler growing the optimization you describe.
- High concern: I remove -fno-sanitize=nonnull-attribute from buildfarm member
thorntail. In addition to the drawback of the previous level, this will
create urgent work for committers introducing new deviations (or introducing
test coverage that unearths old deviations). This is our current response
to Valgrind complaints, for example.
Maybe in this specific case, the policy could be ignored, this change does not hurt.
+++ b/src/backend/access/transam/clog.c
@@ -293,7 +293,7 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
* sub-XIDs and all of the XIDs for which we're adjusting clog should be
* on the same page. Check those conditions, too.
*/
- if (all_xact_same_page && xid == MyProc->xid &&
+ if (all_xact_same_page && subxids && xid == MyProc->xid &&
nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT &&
nsubxids == MyProc->subxidStatus.count &&
memcmp(subxids, MyProc->subxids.xids,
regards,
Ranier Vilela
Peter Geoghegan <pg@bowt.ie> writes: > I wonder if we should start using -fno-delete-null-pointer-checks: > https://lkml.org/lkml/2018/4/4/601 > This may not be strictly relevant to the discussion, but I was > reminded of it just now and thought I'd mention it. Hmm. gcc 8.3 defines this as: Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. This option enables simple constant folding optimizations at all optimization levels. In addition, other optimization passes in GCC use this flag to control global dataflow analyses that eliminate useless checks for null pointers; these assume that a memory access to address zero always results in a trap, so that if a pointer is checked after it has already been dereferenced, it cannot be null. AFAICS, that's a perfectly valid assumption for our usage. I can see why the kernel might not want it, but we set things up whenever possible to ensure that dereferencing NULL would crash. However, while grepping the manual for that I also found '-Wnull-dereference' Warn if the compiler detects paths that trigger erroneous or undefined behavior due to dereferencing a null pointer. This option is only active when '-fdelete-null-pointer-checks' is active, which is enabled by optimizations in most targets. The precision of the warnings depends on the optimization options used. I wonder whether turning that on would find anything interesting. regards, tom lane
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Ranier Vilela
Date:
More troubles with undefined-behavior.
This type of code can leaves overflow:
var = (cast) (expression);
diff = (int32) (id1 - id2);
See:
diff64 = ((long int) d1 - (long int) d2);
diff64=-4294901760
#include <stdio.h>
#include <stdint.h>
int main()
{
unsigned int d1 = 3;
unsigned int d2 = 4294901763;
unsigned int diffu32 = 0;
unsigned long int diffu64 = 0;
unsigned long int diff64 = 0;
int32_t diff = 0;
diff = (int32_t) (d1 - d2);
diff64 = ((long int) d1 - (long int) d2);
diffu32 = (unsigned int) (d1 - d2);
diffu64 = (unsigned long int) (d1 - d2);
printf("d1=%u\n", d1);
printf("d2=%u\n", d2);
printf("diff=%d\n", diff);
printf("diffu32=%u\n", diffu32);
printf("diff64=%ld\n", diff64);
printf("diffu64=%lu\n", diffu64);
return 0;
}
#include <stdint.h>
int main()
{
unsigned int d1 = 3;
unsigned int d2 = 4294901763;
unsigned int diffu32 = 0;
unsigned long int diffu64 = 0;
unsigned long int diff64 = 0;
int32_t diff = 0;
diff = (int32_t) (d1 - d2);
diff64 = ((long int) d1 - (long int) d2);
diffu32 = (unsigned int) (d1 - d2);
diffu64 = (unsigned long int) (d1 - d2);
printf("d1=%u\n", d1);
printf("d2=%u\n", d2);
printf("diff=%d\n", diff);
printf("diffu32=%u\n", diffu32);
printf("diff64=%ld\n", diff64);
printf("diffu64=%lu\n", diffu64);
return 0;
}
output:
d1=3
d2=4294901763
diff=65536
diffu32=65536
diff64=-4294901760
diffu64=65536
(With Ubuntu 64 bits + clang 10)
transam.c:311:22: runtime error: unsigned integer overflow: 3 - 4294901763 cannot be represented in type 'unsigned int'
TransactionIdPrecedes(TransactionId id1, TransactionId id2)
{
/*
* If either ID is a permanent XID then we can just do unsigned
* comparison. If both are normal, do a modulo-2^32 comparison.
*/
int32 diff;
if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);
diff = (int32) (id1 - id2);
return (diff < 0);
}
{
/*
* If either ID is a permanent XID then we can just do unsigned
* comparison. If both are normal, do a modulo-2^32 comparison.
*/
int32 diff;
if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);
diff = (int32) (id1 - id2);
return (diff < 0);
}
This works, all time or really with bad numbers can break?
I would like to know, why doesn't it work?
With Windows 10 (64 bits) + msvc 2019 (64 bits)
bool
TransactionIdPrecedes(TransactionId id1, TransactionId id2)
{
/*
* If either ID is a permanent XID then we can just do unsigned
* comparison. If both are normal, do a modulo-2^32 comparison.
*/
int32 diff;
int64 diff64;
if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);
diff = (int32) (id1 - id2);
diff64 = ((int64) id1 - (int64) id2);
fprintf(stderr, "id1=%lu\n", id1);
fprintf(stderr, "id2=%lu\n", id1);
fprintf(stderr, "diff32=%ld\n", diff);
fprintf(stderr, "diff64=%lld\n", diff64);
return (diff64 < 0);
}
TransactionIdPrecedes(TransactionId id1, TransactionId id2)
{
/*
* If either ID is a permanent XID then we can just do unsigned
* comparison. If both are normal, do a modulo-2^32 comparison.
*/
int32 diff;
int64 diff64;
if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);
diff = (int32) (id1 - id2);
diff64 = ((int64) id1 - (int64) id2);
fprintf(stderr, "id1=%lu\n", id1);
fprintf(stderr, "id2=%lu\n", id1);
fprintf(stderr, "diff32=%ld\n", diff);
fprintf(stderr, "diff64=%lld\n", diff64);
return (diff64 < 0);
}
id1=498
id2=498
diff32=200000000
diff64=-4094967296
2020-08-31 12:46:30.422 -03 [8908] WARNING: oldest xmin is far in the past
2020-08-31 12:46:30.422 -03 [8908] HINT: Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
id2=498
diff32=200000000
diff64=-4094967296
2020-08-31 12:46:30.422 -03 [8908] WARNING: oldest xmin is far in the past
2020-08-31 12:46:30.422 -03 [8908] HINT: Close open transactions soon to avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or drop stale replication slots.
id1=4
id2=4
diff32=-494
diff64=-494
id2=4
diff32=-494
diff64=-494
id1=4
id2=4
diff32=50000000
diff64=-4244967296
2020-08-31 12:46:30.423 -03 [8908] FATAL: found xmin 4 from before relfrozenxid 4244967300
2020-08-31 12:46:30.423 -03 [8908] CONTEXT: while scanning block 0 and offset 1 of relation "pg_catalog.pg_depend"
2020-08-31 12:46:30.423 -03 [8908] STATEMENT: VACUUM FREEZE;
id2=4
diff32=50000000
diff64=-4244967296
2020-08-31 12:46:30.423 -03 [8908] FATAL: found xmin 4 from before relfrozenxid 4244967300
2020-08-31 12:46:30.423 -03 [8908] CONTEXT: while scanning block 0 and offset 1 of relation "pg_catalog.pg_depend"
2020-08-31 12:46:30.423 -03 [8908] STATEMENT: VACUUM FREEZE;
Most of the time:
id1=498
id2=498
diff32=0
diff64=0
id2=498
diff32=0
diff64=0
id1=498
id2=498
diff32=0
diff64=0
id2=498
diff32=0
diff64=0
regards,
Ranier Vilela
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Alvaro Herrera
Date:
On 2020-Aug-31, Ranier Vilela wrote: > More troubles with undefined-behavior. > > This type of code can leaves overflow: > var = (cast) (expression); > diff = (int32) (id1 - id2); > > See: > diff64 = ((long int) d1 - (long int) d2); > diff64=-4294901760 Did you compile this with gcc -fwrapv? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Ranier Vilela
Date:
Em seg., 31 de ago. de 2020 às 14:00, Alvaro Herrera <alvherre@2ndquadrant.com> escreveu:
On 2020-Aug-31, Ranier Vilela wrote:
> More troubles with undefined-behavior.
>
> This type of code can leaves overflow:
> var = (cast) (expression);
> diff = (int32) (id1 - id2);
>
> See:
> diff64 = ((long int) d1 - (long int) d2);
> diff64=-4294901760
Did you compile this with gcc -fwrapv?
gcc 10.2 -O2 -fwrapv
bool test1()
{
unsigned int d1 = 3;
unsigned int d2 = 4294901763;
long int diff64 = 0;
diff64 = ((long int) d1 - (long int) d2);
return (diff64 < 0);
}
output:
mov eax, 1
ret
What is a workaround for msvc 2019 (64 bits) and clang 64 bits (linux)?
transam.c:311:22: runtime error: unsigned integer overflow: 3 - 4294901763 cannot be represented in type 'unsigned int'
Ranier Vilela
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Ranier Vilela
Date:
Em seg., 31 de ago. de 2020 às 14:43, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Em seg., 31 de ago. de 2020 às 14:00, Alvaro Herrera <alvherre@2ndquadrant.com> escreveu:On 2020-Aug-31, Ranier Vilela wrote:
> More troubles with undefined-behavior.
>
> This type of code can leaves overflow:
> var = (cast) (expression);
> diff = (int32) (id1 - id2);
>
> See:
> diff64 = ((long int) d1 - (long int) d2);
> diff64=-4294901760
Did you compile this with gcc -fwrapv?gcc 10.2 -O2 -fwrapvbool test1(){unsigned int d1 = 3;unsigned int d2 = 4294901763;long int diff64 = 0;diff64 = ((long int) d1 - (long int) d2);return (diff64 < 0);}output:mov eax, 1retWhat is a workaround for msvc 2019 (64 bits) and clang 64 bits (linux)?transam.c:311:22: runtime error: unsigned integer overflow: 3 - 4294901763 cannot be represented in type 'unsigned int'
with Debug:
#include <stdio.h>
#include <stdint.h>
bool test1(void)
{
unsigned int d1 = 3;
unsigned int d2 = 4294901763;
int32_t diff;
diff = (int32_t) (d1 - d2);
return (diff < 0);
}
gcc 10.2 -g
output:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 3
mov DWORD PTR [rbp-8], -65533
mov eax, DWORD PTR [rbp-4]
sub eax, DWORD PTR [rbp-8]
mov DWORD PTR [rbp-12], eax
mov eax, DWORD PTR [rbp-12]
shr eax, 31
pop rbp
ret
it is possible to conclude that:
1. TransactionIdPrecedes works in release mode, because the compiler treats undefined-behavior and corrects it,
treating a possible overflow.
2. TransactionIdPrecedes does not work in debug mode, and overflow occurs.
3. TransactionID cannot contain the largest possible ID or an invalid ID (4294901763) has been generated and passed to TransactionIdPrecedes.
1. TransactionIdPrecedes works in release mode, because the compiler treats undefined-behavior and corrects it,
treating a possible overflow.
2. TransactionIdPrecedes does not work in debug mode, and overflow occurs.
3. TransactionID cannot contain the largest possible ID or an invalid ID (4294901763) has been generated and passed to TransactionIdPrecedes.
Ranier Vilela
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Andres Freund
Date:
Hi, On August 31, 2020 11:08:49 AM PDT, Ranier Vilela <ranier.vf@gmail.com> wrote: >Em seg., 31 de ago. de 2020 às 14:43, Ranier Vilela ><ranier.vf@gmail.com> >escreveu: > >> Em seg., 31 de ago. de 2020 às 14:00, Alvaro Herrera < >> alvherre@2ndquadrant.com> escreveu: >> >>> On 2020-Aug-31, Ranier Vilela wrote: >>> >>> > More troubles with undefined-behavior. >>> > >>> > This type of code can leaves overflow: >>> > var = (cast) (expression); >>> > diff = (int32) (id1 - id2); >>> > >>> > See: >>> > diff64 = ((long int) d1 - (long int) d2); >>> > diff64=-4294901760 >>> >>> Did you compile this with gcc -fwrapv? >>> >> gcc 10.2 -O2 -fwrapv >> bool test1() >> { >> unsigned int d1 = 3; >> unsigned int d2 = 4294901763; >> long int diff64 = 0; >> >> diff64 = ((long int) d1 - (long int) d2); >> >> return (diff64 < 0); >> } >> >> output: >> mov eax, 1 >> ret >> >> What is a workaround for msvc 2019 (64 bits) and clang 64 bits >(linux)? >> transam.c:311:22: runtime error: unsigned integer overflow: 3 - >4294901763 >> cannot be represented in type 'unsigned int' Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about. Andres Regards, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Peter Geoghegan
Date:
On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote: > Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about. Presumably it's simply warning that the value -4294901760 (i.e. the result of 3 - 4294901763) cannot be faithfully represented as an unsigned int. This is true, of course. It's just not relevant. I'm pretty sure that UBSan does not actually state that this is undefined behavior. At least Ranier's sample output didn't seem to indicate it. -- Peter Geoghegan
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Andres Freund
Date:
Hi, On 2020-08-31 12:38:51 -0700, Peter Geoghegan wrote: > On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote: > > Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about. > > Presumably it's simply warning that the value -4294901760 (i.e. the > result of 3 - 4294901763) cannot be faithfully represented as an > unsigned int. This is true, of course. It's just not relevant. > > I'm pretty sure that UBSan does not actually state that this is > undefined behavior. At least Ranier's sample output didn't seem to > indicate it. Well, my point is that there's no point in discussing unsigned integer overflow, since it's precisely specified. And hence I don't understand what we're discussing in this sub-thread. https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html says: > -fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where > the result of an unsigned integer computation cannot be represented in > its type. Unlike signed integer overflow, this is not undefined > behavior, but it is often unintentional. This sanitizer does not check > for lossy implicit conversions performed before such a computation > (see -fsanitize=implicit-conversion). So it seems Rainier needs to turn this test off, because it actually is intentional. Greetings, Andres Freund
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Ranier Vilela
Date:
Em seg., 31 de ago. de 2020 às 16:39, Peter Geoghegan <pg@bowt.ie> escreveu:
On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:
> Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about.
Presumably it's simply warning that the value -4294901760 (i.e. the
result of 3 - 4294901763) cannot be faithfully represented as an
unsigned int. This is true, of course. It's just not relevant.
I'm pretty sure that UBSan does not actually state that this is
undefined behavior. At least Ranier's sample output didn't seem to
indicate it.
4294901763 can not store at unsigned int (TransactionID is uint32_t).
TransactionId id2 at TransactionIdPrecedes already has an overflow, before anything is done.
Ranier Vilela
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Ranier Vilela
Date:
Em seg., 31 de ago. de 2020 às 17:05, Andres Freund <andres@anarazel.de> escreveu:
Hi,
On 2020-08-31 12:38:51 -0700, Peter Geoghegan wrote:
> On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:
> > Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about.
>
> Presumably it's simply warning that the value -4294901760 (i.e. the
> result of 3 - 4294901763) cannot be faithfully represented as an
> unsigned int. This is true, of course. It's just not relevant.
>
> I'm pretty sure that UBSan does not actually state that this is
> undefined behavior. At least Ranier's sample output didn't seem to
> indicate it.
Well, my point is that there's no point in discussing unsigned integer
overflow, since it's precisely specified. And hence I don't understand
what we're discussing in this sub-thread.
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html says:
> -fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where
> the result of an unsigned integer computation cannot be represented in
> its type. Unlike signed integer overflow, this is not undefined
> behavior, but it is often unintentional. This sanitizer does not check
> for lossy implicit conversions performed before such a computation
> (see -fsanitize=implicit-conversion).
So it seems Rainier needs to turn this test off, because it actually is
intentional.
No problem.
If intentional, the code at TransactionIdPrecedes, already knows that overflow can occur
and trusts that the compiler will save it.
Ranier Vilela
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Andres Freund
Date:
Hi, On 2020-08-31 17:35:14 -0300, Ranier Vilela wrote: > Em seg., 31 de ago. de 2020 às 17:05, Andres Freund <andres@anarazel.de> > escreveu: > > So it seems Rainier needs to turn this test off, because it actually is > > intentional. > > > No problem. > If intentional, the code at TransactionIdPrecedes, already knows that > overflow can occur > and trusts that the compiler will save it. I don't know what you mean with "saving" it. Again, unsigned integer overflow is well specified in C. All that's needed is for the compiler to implement normal C.
On Sat, Aug 29, 2020 at 12:36:42PM -0400, Tom Lane wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > I wonder if we should start using -fno-delete-null-pointer-checks: > > https://lkml.org/lkml/2018/4/4/601 > > This may not be strictly relevant to the discussion, but I was > > reminded of it just now and thought I'd mention it. > > Hmm. gcc 8.3 defines this as: > > Assume that programs cannot safely dereference null pointers, and > that no code or data element resides at address zero. This option > enables simple constant folding optimizations at all optimization > levels. In addition, other optimization passes in GCC use this > flag to control global dataflow analyses that eliminate useless > checks for null pointers; these assume that a memory access to > address zero always results in a trap, so that if a pointer is > checked after it has already been dereferenced, it cannot be null. > > AFAICS, that's a perfectly valid assumption for our usage. I can see why > the kernel might not want it, but we set things up whenever possible to > ensure that dereferencing NULL would crash. We do assume dereferencing NULL would crash, but we also assume this optimization doesn't happen: === opt-null.c #include <string.h> #include <unistd.h> int my_memcpy(void *dest, const void *src, size_t n) { #ifndef REMOVE_MEMCPY memcpy(dest, src, n); #endif if (src) pause(); return 0; } === $ gcc --version gcc (Debian 8.3.0-6) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ diff -sU1 <(gcc -O2 -fno-delete-null-pointer-checks -S -o- opt-null.c) <(gcc -O2 -S -o- opt-null.c) --- /dev/fd/63 2020-09-03 19:23:53.206864378 -0700 +++ /dev/fd/62 2020-09-03 19:23:53.206864378 -0700 @@ -8,13 +8,8 @@ .cfi_startproc - pushq %rbx + subq $8, %rsp .cfi_def_cfa_offset 16 - .cfi_offset 3, -16 - movq %rsi, %rbx call memcpy@PLT - testq %rbx, %rbx - je .L2 call pause@PLT -.L2: xorl %eax, %eax - popq %rbx + addq $8, %rsp .cfi_def_cfa_offset 8 $ diff -sU1 <(gcc -DREMOVE_MEMCPY -O2 -fno-delete-null-pointer-checks -S -o- opt-null.c) <(gcc -DREMOVE_MEMCPY -O2 -S -o-opt-null.c) Files /dev/fd/63 and /dev/fd/62 are identical So yes, it would be reasonable to adopt -fno-delete-null-pointer-checks and/or remove -fno-sanitize=nonnull-attribute from buildfarm member thorntail. > However, while grepping the manual for that I also found > > '-Wnull-dereference' > Warn if the compiler detects paths that trigger erroneous or > undefined behavior due to dereferencing a null pointer. This > option is only active when '-fdelete-null-pointer-checks' is > active, which is enabled by optimizations in most targets. The > precision of the warnings depends on the optimization options used. > > I wonder whether turning that on would find anything interesting. Promising. Sadly, it doesn't warn for the above test case.
Noah Misch <noah@leadboat.com> writes: > We do assume dereferencing NULL would crash, but we also assume this > optimization doesn't happen: > #ifndef REMOVE_MEMCPY > memcpy(dest, src, n); > #endif > if (src) > pause(); > [ gcc believes the if-test is unnecessary ] Hm. I would not blame that on -fdelete-null-pointer-checks per se. Rather the problem is what we were touching on before: the dubious but standard-approved assumption that memcpy's arguments cannot be null. If there actually are places where this is a problem, I think we need to fix it by doing if (n > 0) memcpy(dest, src, n); so that the compiler can no longer assume that src!=NULL even when n is zero. I'd still leave -fdelete-null-pointer-checks enabled, because it can make valid and useful optimizations in other cases. (Besides that, it's far from clear that disabling that flag would suppress all bad consequences of the assumption that memcpy's arguments aren't null.) regards, tom lane
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior
From
Peter Geoghegan
Date:
On Thu, Sep 3, 2020 at 7:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hm. I would not blame that on -fdelete-null-pointer-checks per se. > Rather the problem is what we were touching on before: the dubious > but standard-approved assumption that memcpy's arguments cannot be > null. Isn't it both, together? That is, it's the combination of that assumption alongside -fdelete-null-pointer-checks's actual willingness to propagate the assumption. > I'd still leave -fdelete-null-pointer-checks > enabled, because it can make valid and useful optimizations in > other cases. Is there any evidence that that's true? I wouldn't assume that the gcc people exercised good judgement here. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Thu, Sep 3, 2020 at 7:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'd still leave -fdelete-null-pointer-checks >> enabled, because it can make valid and useful optimizations in >> other cases. > Is there any evidence that that's true? I wouldn't assume that the gcc > people exercised good judgement here. I have not actually dug for examples, but the sort of situation where I think it would help us is that macros or static inlines could contain null tests that can be proven useless at particular call sites due to surrounding code. regards, tom lane
On Thu, Sep 03, 2020 at 10:53:37PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > We do assume dereferencing NULL would crash, but we also assume this > > optimization doesn't happen: > > > #ifndef REMOVE_MEMCPY > > memcpy(dest, src, n); > > #endif > > if (src) > > pause(); > > > [ gcc believes the if-test is unnecessary ] > > > So yes, it would be reasonable to adopt -fno-delete-null-pointer-checks and/or > > remove -fno-sanitize=nonnull-attribute from buildfarm member thorntail. > If there actually are places where this is a problem, I think we > need to fix it by doing > > if (n > 0) > memcpy(dest, src, n); > > so that the compiler can no longer assume that src!=NULL even > when n is zero. I'd still leave -fdelete-null-pointer-checks > enabled, because it can make valid and useful optimizations in > other cases. (Besides that, it's far from clear that disabling > that flag would suppress all bad consequences of the assumption > that memcpy's arguments aren't null.) Your proposal is what I had in mind when I wrote "remove -fno-sanitize=nonnull-attribute from buildfarm member thorntail", and I agree it's attractive. In particular, gcc is not likely to be the last compiler to attempt such an optimization, and other compilers may not offer -fno-delete-null-pointer-checks or equivalent. One might argue for -fno-delete-null-pointer-checks in addition, because it would defend against cases that sanitizers miss. I tend to think that's overkill, but maybe not. I suppose one could do an audit, diffing the generated code with and without the option.