ubsan - Mailing list pgsql-hackers

From Andres Freund
Subject ubsan
Date
Msg-id 20220323173537.ll7klrglnp4gn2um@alap3.anarazel.de
Whole thread Raw
Responses Re: ubsan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

I tried to run postgres with ubsan to debug something. I ran into two main
issues:


1) Despite Tom's recent efforts in [1], I see two ubsan failures in
   HEAD.

   These are easy enough to fix as per the attached, although the fix for the
   GetConfigOptionByNum() isn't great - we should probably pass a nulls array
   to GetConfigOptionByNum() as well, but that'd have a bunch of followup
   changes. So I'm inclined to go with the minimal for now.


2) When debugging issues I got very confused by the fact that *sometimes*
   UBSAN_OPTIONS takes effect and sometimes not. I was trying to have ubsan
   generate backtraces as well as a coredump with
   UBSAN_OPTIONS="print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2"

   After a lot of debugging I figured out that the options took effect in
   postmaster, but not in any children. Which in turn is because
   set_ps_display() breaks /proc/$pid/environ - it's empty in all postmaster
   children for me.

   The sanitizer library use /proc/$pid/environ (from [2] to [3]), because
   they don't want to use getenv() because it wants to work without libc
   (whether that's the right way, i have my doubts).  When just using
   undefined and alignment sanitizers, the sanitizer library is only
   initialized when the first error occurs, by which time we've often already
   called set_ps_display().

   Note that ps_status.c breaks /proc/$pid/environ even if the
   update_process_title GUC is set to false, because init_ps_display() ignores
   that. Yes, that confused me for a while last night.


   The reason that /proc/$pid/environ is borken is fairly obvious: We
   overwrite it in set_ps_display() in the PS_USE_CLOBBER_ARGV case. The
   kernel just looks at the range set up originally, which we'll overwrite
   with zeroes.

   We could try telling the kernel about the new location of environ using
   prctl() PR_SET_MM_ENV_START/END but the restrictions around that sound
   problematic.


   I've included a hacky workaround: Define __ubsan_default_options, a weak
   symbol libsanitizer uses to get defaults from the application, and return
   getenv("UBSAN_OPTIONS"). But only if main already was reached, so that we
   don't end up relying on a not-yet-working getenv().


   This also lead me to finally figure out why /proc/$pid/cmdline of postgres
   children includes a lot of NULL bytes. I'd noticed this because tools like
   pidstat -l started to print long long status strings at some point, before
   it got fixed on the pidstat side.
   The way we overwrite doesn't trigger this special case in the kernel:
   https://github.com/torvalds/linux/blob/master/fs/proc/base.c#L296
   therefore the original size of the commandline arguments are printed, with
   a lot of boring zeroes.


A test run of this in ci, with the guc.c intentionally reintroduced, shows the
failure both via core dump
https://api.cirrus-ci.com/v1/task/6543164315009024/logs/cores.log
and
in postmaster's log:
https://api.cirrus-ci.com/v1/artifact/task/6543164315009024/log/src/test/regress/log/postmaster.log

although that part is a bit annoying to read, because the error is
interspersed with other log messages:

guc.c:9801:15: runtime error: null pointer passed as argument 2, which is declared to never be null
==19253==Using libbacktrace symbolizer.
2022-03-23 17:20:35.412 UTC [19258][client backend] [pg_regress/alter_operator][14/429:0] ERROR:  must be owner of
operator===
 
...
2022-03-23 17:20:35.601 UTC [19254][client backend] [pg_regress/alter_generic][10/1569:0] STATEMENT:  ALTER STATISTICS
alt_stat2OWNER TO regress_alter_generic_user2;
 
    #0 0x5626562b2ab4 in GetConfigOptionByNum /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:9801
    #1 0x5626562b3fd5 in show_all_settings /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:10137
2022-03-23 17:20:35.604 UTC [19254][client backend] [pg_regress/alter_generic][10/1576:0] ERROR:  must be owner of
statisticsobject alt_stat3
 
...
2022-03-23 17:20:35.601 UTC [19254][client backend] [pg_regress/alter_generic][10/1569:0] STATEMENT:  ALTER STATISTICS
alt_stat2OWNER TO regress_alter_generic_user2;
 
    #0 0x5626562b2ab4 in GetConfigOptionByNum /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:9801
    #1 0x5626562b3fd5 in show_all_settings /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:10137
2022-03-23 17:20:35.604 UTC [19254][client backend] [pg_regress/alter_generic][10/1576:0] ERROR:  must be owner of
statisticsobject alt_stat3
 
2022-03-23 17:20:35.604 UTC [19254][client backend] [pg_regress/alter_generic][10/1576:0] STATEMENT:  ALTER STATISTICS
alt_stat3RENAME TO alt_stat4;
 
    #2 0x562655c0ea86 in ExecMakeTableFunctionResult /tmp/cirrus-ci-build/src/backend/executor/execSRF.c:234
    #3 0x562655c3f8be in FunctionNext /tmp/cirrus-ci-build/src/backend/executor/nodeFunctionscan.c:95
2022-03-23 17:20:35.605 UTC [19254][client backend] [pg_regress/alter_generic][10/1578:0] ERROR:  must be owner of
statisticsobject alt_stat3
 
2022-03-23 17:20:35.605 UTC [19254][client backend] [pg_regress/alter_generic][10/1578:0] STATEMENT:  ALTER STATISTICS
alt_stat3OWNER TO regress_alter_generic_user2;
 
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1579:0] ERROR:  must be member of
role"regress_alter_generic_user3"
 
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1579:0] STATEMENT:  ALTER STATISTICS
alt_stat2OWNER TO regress_alter_generic_user3;
 
    #4 0x562655c10175 in ExecScanFetch /tmp/cirrus-ci-build/src/backend/executor/execScan.c:133
    #5 0x562655c10653 in ExecScan /tmp/cirrus-ci-build/src/backend/executor/execScan.c:199
    #6 0x562655c3f643 in ExecFunctionScan /tmp/cirrus-ci-build/src/backend/executor/nodeFunctionscan.c:270
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1580:0] ERROR:  must be owner of
statisticsobject alt_stat3
 
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1580:0] STATEMENT:  ALTER STATISTICS
alt_stat3SET SCHEMA alt_nsp2;
 
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1581:0] ERROR:  statistics object
"alt_stat2"already exists in schema "alt_nsp2"
 
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1581:0] STATEMENT:  ALTER STATISTICS
alt_stat2SET SCHEMA alt_nsp2;
 
    #7 0x562655c09bc9 in ExecProcNodeFirst /tmp/cirrus-ci-build/src/backend/executor/execProcnode.c:463
    #8 0x562655bf7580 in ExecProcNode ../../../src/include/executor/executor.h:259
    #9 0x562655bf7580 in ExecutePlan /tmp/cirrus-ci-build/src/backend/executor/execMain.c:1633
    #10 0x562655bf78b9 in standard_ExecutorRun /tmp/cirrus-ci-build/src/backend/executor/execMain.c:362


Greetings,

Andres Freund

[1] https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com
[2]
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libsanitizer/ubsan/ubsan_flags.cpp;h=9a66bd37518b3a0606049b761ffdd7ddf3c3c714;hb=refs/heads/master#l68
[2]
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libsanitizer/sanitizer_common/sanitizer_linux.cpp;h=aa59d9718ca89cc554bdf677df3e64ddd233ca59;hb=refs/heads/master#l559

Attachment

pgsql-hackers by date:

Previous
From: "alvherre@alvh.no-ip.org"
Date:
Subject: Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Next
From: Tom Lane
Date:
Subject: Re: Parameter for planner estimate of recursive queries