ubsan - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | ubsan |
Date | |
Msg-id | 20220323173537.ll7klrglnp4gn2um@alap3.anarazel.de Whole thread Raw |
Responses |
Re: ubsan
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
Hi, I tried to run postgres with ubsan to debug something. I ran into two main issues: 1) Despite Tom's recent efforts in [1], I see two ubsan failures in HEAD. These are easy enough to fix as per the attached, although the fix for the GetConfigOptionByNum() isn't great - we should probably pass a nulls array to GetConfigOptionByNum() as well, but that'd have a bunch of followup changes. So I'm inclined to go with the minimal for now. 2) When debugging issues I got very confused by the fact that *sometimes* UBSAN_OPTIONS takes effect and sometimes not. I was trying to have ubsan generate backtraces as well as a coredump with UBSAN_OPTIONS="print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2" After a lot of debugging I figured out that the options took effect in postmaster, but not in any children. Which in turn is because set_ps_display() breaks /proc/$pid/environ - it's empty in all postmaster children for me. The sanitizer library use /proc/$pid/environ (from [2] to [3]), because they don't want to use getenv() because it wants to work without libc (whether that's the right way, i have my doubts). When just using undefined and alignment sanitizers, the sanitizer library is only initialized when the first error occurs, by which time we've often already called set_ps_display(). Note that ps_status.c breaks /proc/$pid/environ even if the update_process_title GUC is set to false, because init_ps_display() ignores that. Yes, that confused me for a while last night. The reason that /proc/$pid/environ is borken is fairly obvious: We overwrite it in set_ps_display() in the PS_USE_CLOBBER_ARGV case. The kernel just looks at the range set up originally, which we'll overwrite with zeroes. We could try telling the kernel about the new location of environ using prctl() PR_SET_MM_ENV_START/END but the restrictions around that sound problematic. I've included a hacky workaround: Define __ubsan_default_options, a weak symbol libsanitizer uses to get defaults from the application, and return getenv("UBSAN_OPTIONS"). But only if main already was reached, so that we don't end up relying on a not-yet-working getenv(). This also lead me to finally figure out why /proc/$pid/cmdline of postgres children includes a lot of NULL bytes. I'd noticed this because tools like pidstat -l started to print long long status strings at some point, before it got fixed on the pidstat side. The way we overwrite doesn't trigger this special case in the kernel: https://github.com/torvalds/linux/blob/master/fs/proc/base.c#L296 therefore the original size of the commandline arguments are printed, with a lot of boring zeroes. A test run of this in ci, with the guc.c intentionally reintroduced, shows the failure both via core dump https://api.cirrus-ci.com/v1/task/6543164315009024/logs/cores.log and in postmaster's log: https://api.cirrus-ci.com/v1/artifact/task/6543164315009024/log/src/test/regress/log/postmaster.log although that part is a bit annoying to read, because the error is interspersed with other log messages: guc.c:9801:15: runtime error: null pointer passed as argument 2, which is declared to never be null ==19253==Using libbacktrace symbolizer. 2022-03-23 17:20:35.412 UTC [19258][client backend] [pg_regress/alter_operator][14/429:0] ERROR: must be owner of operator=== ... 2022-03-23 17:20:35.601 UTC [19254][client backend] [pg_regress/alter_generic][10/1569:0] STATEMENT: ALTER STATISTICS alt_stat2OWNER TO regress_alter_generic_user2; #0 0x5626562b2ab4 in GetConfigOptionByNum /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:9801 #1 0x5626562b3fd5 in show_all_settings /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:10137 2022-03-23 17:20:35.604 UTC [19254][client backend] [pg_regress/alter_generic][10/1576:0] ERROR: must be owner of statisticsobject alt_stat3 ... 2022-03-23 17:20:35.601 UTC [19254][client backend] [pg_regress/alter_generic][10/1569:0] STATEMENT: ALTER STATISTICS alt_stat2OWNER TO regress_alter_generic_user2; #0 0x5626562b2ab4 in GetConfigOptionByNum /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:9801 #1 0x5626562b3fd5 in show_all_settings /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:10137 2022-03-23 17:20:35.604 UTC [19254][client backend] [pg_regress/alter_generic][10/1576:0] ERROR: must be owner of statisticsobject alt_stat3 2022-03-23 17:20:35.604 UTC [19254][client backend] [pg_regress/alter_generic][10/1576:0] STATEMENT: ALTER STATISTICS alt_stat3RENAME TO alt_stat4; #2 0x562655c0ea86 in ExecMakeTableFunctionResult /tmp/cirrus-ci-build/src/backend/executor/execSRF.c:234 #3 0x562655c3f8be in FunctionNext /tmp/cirrus-ci-build/src/backend/executor/nodeFunctionscan.c:95 2022-03-23 17:20:35.605 UTC [19254][client backend] [pg_regress/alter_generic][10/1578:0] ERROR: must be owner of statisticsobject alt_stat3 2022-03-23 17:20:35.605 UTC [19254][client backend] [pg_regress/alter_generic][10/1578:0] STATEMENT: ALTER STATISTICS alt_stat3OWNER TO regress_alter_generic_user2; 2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1579:0] ERROR: must be member of role"regress_alter_generic_user3" 2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1579:0] STATEMENT: ALTER STATISTICS alt_stat2OWNER TO regress_alter_generic_user3; #4 0x562655c10175 in ExecScanFetch /tmp/cirrus-ci-build/src/backend/executor/execScan.c:133 #5 0x562655c10653 in ExecScan /tmp/cirrus-ci-build/src/backend/executor/execScan.c:199 #6 0x562655c3f643 in ExecFunctionScan /tmp/cirrus-ci-build/src/backend/executor/nodeFunctionscan.c:270 2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1580:0] ERROR: must be owner of statisticsobject alt_stat3 2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1580:0] STATEMENT: ALTER STATISTICS alt_stat3SET SCHEMA alt_nsp2; 2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1581:0] ERROR: statistics object "alt_stat2"already exists in schema "alt_nsp2" 2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1581:0] STATEMENT: ALTER STATISTICS alt_stat2SET SCHEMA alt_nsp2; #7 0x562655c09bc9 in ExecProcNodeFirst /tmp/cirrus-ci-build/src/backend/executor/execProcnode.c:463 #8 0x562655bf7580 in ExecProcNode ../../../src/include/executor/executor.h:259 #9 0x562655bf7580 in ExecutePlan /tmp/cirrus-ci-build/src/backend/executor/execMain.c:1633 #10 0x562655bf78b9 in standard_ExecutorRun /tmp/cirrus-ci-build/src/backend/executor/execMain.c:362 Greetings, Andres Freund [1] https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com [2] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libsanitizer/ubsan/ubsan_flags.cpp;h=9a66bd37518b3a0606049b761ffdd7ddf3c3c714;hb=refs/heads/master#l68 [2] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libsanitizer/sanitizer_common/sanitizer_linux.cpp;h=aa59d9718ca89cc554bdf677df3e64ddd233ca59;hb=refs/heads/master#l559
Attachment
pgsql-hackers by date: