Thread: Ubsan complaint on kestrel

Ubsan complaint on kestrel

From
Andres Freund
Date:
Hi,

I just upgraded buildfarm animal kestrel (a buildfarm animal running with
ubsan) to a newer version of clang.  Unfortunately this causes it to fail.

The buildfarm output doesn't contain enough information to debug that
unfortunately. But I triggered it manually and meson-logs/testlog.txt did
contain this:

----------------------------------- stdout -----------------------------------
Running in no-clean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "bf".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
  locale provider:   libc
  LC_COLLATE:  en_US.UTF-8
  LC_CTYPE:    en_US.UTF-8
  LC_MESSAGES: C
  LC_MONETARY: en_US.UTF-8
  LC_NUMERIC:  en_US.UTF-8
  LC_TIME:     en_US.UTF-8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

creating directory /home/bf/bf-build/kestrel/HEAD/pgsql.build/tmp_install/initdb-template ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 25
selecting default "autovacuum_worker_slots" ... 4
selecting default "shared_buffers" ... 400kB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ...
----------------------------------- stderr -----------------------------------
../pgsql/src/backend/utils/hash/dynahash.c:1021:4: runtime error: call to function string_compare through pointer to
incorrectfunction type 'int (*)(const void *, const void *, unsigned long)'
 
/home/bf/bf-build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/utils/hash/dynahash.c:308: note: string_compare defined
here
    #0 0x55f30c0291dd in hash_search_with_hash_value
/home/bf/bf-build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/utils/hash/dynahash.c:1021:4
    #1 0x55f30c07f21d in pg_tzset /home/bf/bf-build/kestrel/HEAD/pgsql.build/../pgsql/src/timezone/pgtz.c:260:24
    #2 0x55f30baf5e44 in check_timezone
/home/bf/bf-build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/commands/variable.c:341:13
    #3 0x55f30c03d610 in call_string_check_hook
/home/bf/bf-build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/utils/misc/guc.c:6922:8
    #4 0x55f30c038490 in InitializeOneGUCOption
/home/bf/bf-build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/utils/misc/guc.c:1725:10
    #5 0x55f30c037bf2 in InitializeGUCOptions
/home/bf/bf-build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/utils/misc/guc.c:1556:3
    #6 0x55f30b9c46d4 in BootstrapModeMain
/home/bf/bf-build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/bootstrap/bootstrap.c:212:2
    #7 0x55f30bbc5c44 in main /home/bf/bf-build/kestrel/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:213:4
    #8 0x7f4653f6bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #9 0x7f4653f6bd64 in __libc_start_main csu/../csu/libc-start.c:360:3
    #10 0x55f30b84e3e0 in _start
(/home/bf/bf-build/kestrel/HEAD/pgsql.build/tmp_install/home/bf/bf-build/kestrel/HEAD/inst/bin/postgres+0x5f53e0)
(BuildId:63499d2fda05cc488eff7507a72a5be9ad156139)
 

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../pgsql/src/backend/utils/hash/dynahash.c:1021:4
Aborted (core dumped)
child process exited with exit code 134
initdb: data directory "/home/bf/bf-build/kestrel/HEAD/pgsql.build/tmp_install/initdb-template" not removed at user's
request
==============================================================================


And while that complaint is perhaps a bit nit-picky, it's not wrong.


/*
 * Key comparison functions must have this signature.  Comparison functions
 * return zero for match, nonzero for no match.  (The comparison function
 * definition is designed to allow memcmp() and strncmp() to be used directly
 * as key comparison functions.)
 */
typedef int (*HashCompareFunc) (const void *key1, const void *key2,
                                Size keysize);

vs

/*
 * HashCompareFunc for string keys
 *
 * Because we copy keys with strlcpy(), they will be truncated at keysize-1
 * bytes, so we can only compare that many ... hence strncmp is almost but
 * not quite the right thing.
 */
static int
string_compare(const char *key1, const char *key2, Size keysize)
{
    return strncmp(key1, key2, keysize - 1);
}

The only reason we don't get a compiler warning is that we cast the function
type:
    else if (hashp->hash == string_hash)
        hashp->match = (HashCompareFunc) string_compare;


I don't really understand why we define string_compare() with the wrong
signature just to then cast it to the right one?

<dig>

It's been this way for a long time. I suspect that this actually might have
been a holdover from when we directly used strncmp as the comparator function,
in

commit c92f7e258ee
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   2006-09-27 18:40:10 +0000

    Replace strncpy with strlcpy in selected places that seem possibly relevant
    to performance.  (A wholesale effort to get rid of strncpy should be
    undertaken sometime, but not during beta.)  This commit also fixes dynahash.c
    to correctly truncate overlength string keys for hashtables, so that its
    callers don't have to anymore.


Tom, do you see any reason to not instead do the typecase inside
string_compare()?


Greetings,

Andres Freund



Re: Ubsan complaint on kestrel

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I just upgraded buildfarm animal kestrel (a buildfarm animal running with
> ubsan) to a newer version of clang.  Unfortunately this causes it to fail.
> ...
> Tom, do you see any reason to not instead do the typecase inside
> string_compare()?

No.  Have at it.

            regards, tom lane



Re: Ubsan complaint on kestrel

From
Andres Freund
Date:
Hi,

On 2025-03-03 15:00:43 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I just upgraded buildfarm animal kestrel (a buildfarm animal running with
> > ubsan) to a newer version of clang.  Unfortunately this causes it to fail.
> > ...
> > Tom, do you see any reason to not instead do the typecase inside
> > string_compare()?
> 
> No.  Have at it.

Ugh. That change is obviously easy enough.


But after fixing that the next complaint is:

----------------------------------- stderr -----------------------------------
../../../../../home/andres/src/postgresql/src/include/lib/sort_template.h:316:28: runtime error: call to function
list_oid_cmpthrough pointer to incorrect function type 'int (*)(const void *, const void *)'
 

/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/nodes/list.c:1704:
note:list_oid_cmp defined here
 
    #0 0x5646f45c753f in pg_qsort
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/include/lib/sort_template.h:316:28
    #1 0x5646f32df4e0 in list_sort
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/nodes/list.c:1684:3
    #2 0x5646f43c6ce9 in RelationGetIndexList
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/utils/cache/relcache.c:4857:2
    #3 0x5646f2fe0159 in ExecOpenIndices
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/executor/execIndexing.c:179:17
    #4 0x5646f2b51c96 in CatalogOpenIndexes
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/catalog/indexing.c:52:2
    #5 0x5646f2b530c8 in CatalogTupleUpdate
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/catalog/indexing.c:320:13
    #6 0x5646f2af4e6e in ExecGrant_Relation
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/catalog/aclchk.c:1990:4
    #7 0x5646f2ae7adf in ExecGrantStmt_oids
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/catalog/aclchk.c:608:4
    #8 0x5646f2ae5aab in ExecuteGrantStmt
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/catalog/aclchk.c:593:2
    #9 0x5646f3c19883 in ProcessUtilitySlow
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/utility.c:1812:5
    #10 0x5646f3c1133c in standard_ProcessUtility
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/utility.c:969:6
    #11 0x5646f3c0efc6 in ProcessUtility
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/utility.c:523:3
    #12 0x5646f3c0ce77 in PortalRunUtility
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:1184:2
    #13 0x5646f3c0a250 in PortalRunMulti
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:1348:5
    #14 0x5646f3c079fc in PortalRun
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:819:5
    #15 0x5646f3bf8f5d in exec_simple_query
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:1272:10
    #16 0x5646f3bf7276 in PostgresMain
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4693:7
    #17 0x5646f3bf627b in PostgresSingleUserMain
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4132:2
    #18 0x5646f32cac35 in main
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/main/main.c:226:4
    #19 0x7f5d3c56bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #20 0x7f5d3c56bd64 in __libc_start_main csu/../csu/libc-start.c:360:3
    #21 0x5646f25c7580 in _start
(/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/tmp_install/srv/dev/install/postgres/m-dev-assert-clang-sanitizer/bin/postgres+0x21c7580)
(BuildId:3af3be5da960a221ffc2e2805234af46b9ef1b5a)
 

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../../../../../home/andres/src/postgresql/src/include/lib/sort_template.h:316:28
 
Aborted (core dumped)
child process exited with exit code 134
initdb: data directory "/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/tmp_install/initdb-template" not removed
atuser's request
 


Fixing all list comparators would be rather painful. So I fixed it by using
qsort_arg() and a wrapper function.

Which is followed by:

../../../../../home/andres/src/postgresql/src/backend/nodes/nodeFuncs.c:2712:6: runtime error: call to function
assign_query_collations_walkerthrough pointer to incorrect function type 'bool (*)(struct Node *, void *)'
 

/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/parser/parse_collate.c:127:
note:assign_query_collations_walker defined here
 
    #0 0x55c8ee0ffbd8 in query_tree_walker_impl
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/nodes/nodeFuncs.c:2712:6
    #1 0x55c8ee4d5df1 in assign_query_collations
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/parser/parse_collate.c:109:9
    #2 0x55c8ee47aa58 in transformReturnStmt
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/parser/analyze.c:2499:2
    #3 0x55c8ee4683f4 in transformStmt
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/parser/analyze.c:465:13
    #4 0x55c8edb2b54f in interpret_AS_clause
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/commands/functioncmds.c:969:8
    #5 0x55c8edb26977 in CreateFunction
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/commands/functioncmds.c:1237:2
    #6 0x55c8eea1900f in ProcessUtilitySlow
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/utility.c:1658:15
    #7 0x55c8eea11b21 in standard_ProcessUtility
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/utility.c:1070:4
    #8 0x55c8eea0f046 in ProcessUtility
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/utility.c:523:3
    #9 0x55c8eea0cef7 in PortalRunUtility
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:1184:2
    #10 0x55c8eea0a2d0 in PortalRunMulti
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:1348:5
    #11 0x55c8eea07a7c in PortalRun
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:819:5
    #12 0x55c8ee9f8fdd in exec_simple_query
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:1272:10
    #13 0x55c8ee9f72f6 in PostgresMain
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4693:7
    #14 0x55c8ee9f62fb in PostgresSingleUserMain
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4132:2
    #15 0x55c8ee0cac35 in main
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/main/main.c:226:4
    #16 0x7f6c4116bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #17 0x7f6c4116bd64 in __libc_start_main csu/../csu/libc-start.c:360:3
    #18 0x55c8ed3c7580 in _start
(/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/tmp_install/srv/dev/install/postgres/m-dev-assert-clang-sanitizer/bin/postgres+0x21c7580)
(BuildId:8d0a126082525373680eb87eaaa6e91cb98d57a6)
 

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../../../../../home/andres/src/postgresql/src/backend/nodes/nodeFuncs.c:2712:6
 

And I suspect that if I fixed that one there'd be heaps more.


So I suspect we'll need to disable this sub-sanitizer for now.

On https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html I found
-fsanitize=function, which also can be used as -fno-sanitize=function:
> -fsanitize=function: Indirect call of a function through a function pointer of the wrong type.

With that I can get past this issue.


I wish the sanitizer treated mismatches of void * arguments against a "real
type" different from other mismatches, but ...

Greetings,

Andres Freund



Re: Ubsan complaint on kestrel

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> ../../../../../home/andres/src/postgresql/src/backend/nodes/nodeFuncs.c:2712:6: runtime error: call to function
assign_query_collations_walkerthrough pointer to incorrect function type 'bool (*)(struct Node *, void *)' 
>
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/parser/parse_collate.c:127:
note:assign_query_collations_walker defined here 
>     #0 0x55c8ee0ffbd8 in query_tree_walker_impl
/srv/dev/build/postgres/m-dev-assert-clang-sanitizer/../../../../../home/andres/src/postgresql/src/backend/nodes/nodeFuncs.c:2712:6

Ugh.  So they're enforcing the C standard's position that "void *" is
not compatible with anything else.  That makes this check useless to
us.  We already decided that we're not going to take the hit of
declaring all walker/mutator callbacks with "void *", and I don't see
that this tool changes that decision.

> On https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html I found
> -fsanitize=function, which also can be used as -fno-sanitize=function:
>> -fsanitize=function: Indirect call of a function through a function pointer of the wrong type.
> With that I can get past this issue.

Good.

> I wish the sanitizer treated mismatches of void * arguments against a "real
> type" different from other mismatches, but ...

Indeed.  I think we have enough coverage of that via compile-time
checks, though -- it's not like the type match or mismatch will vary
dynamically.

            regards, tom lane



Re: Ubsan complaint on kestrel

From
Andres Freund
Date:
Hi,

On 2025-03-03 16:49:09 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I wish the sanitizer treated mismatches of void * arguments against a "real
> > type" different from other mismatches, but ...
> 
> Indeed.  I think we have enough coverage of that via compile-time
> checks, though -- it's not like the type match or mismatch will vary
> dynamically.

I think it would be useful, it's pretty easy to have more actual mismatches
that the compiler won't warn about due to casts. But it is is how it is...

kestrel is green on all branches again.

Greetings,

Andres Freund