Re: valgrind error - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: valgrind error
Date
Msg-id 03e424b8-8dbc-1cd6-f743-3b5b93d5fe9f@2ndQuadrant.com
Whole thread Raw
In response to valgrind error  (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>)
Responses Re: valgrind error
List pgsql-hackers
On 4/18/20 9:15 AM, Andrew Dunstan wrote:
> I was just trying to revive lousyjack, my valgrind buildfarm animal
> which has been offline for 12 days, after having upgraded the machine
> (fedora 31, gcc 9.3.1, valgrind 3.15) and noticed lots of errors like this:
>
>
> 2020-04-17 19:26:03.483 EDT [63741:3] pg_regress LOG:  statement: CREATE
> DATABASE "regression" TEMPLATE=template0
> ==63717== VALGRINDERROR-BEGIN
> ==63717== Use of uninitialised value of size 8
> ==63717==    at 0xAC5BB5: pg_comp_crc32c_sb8 (pg_crc32c_sb8.c:82)
> ==63717==    by 0x55A98B: XLogRecordAssemble (xloginsert.c:785)
> ==63717==    by 0x55A268: XLogInsert (xloginsert.c:461)
> ==63717==    by 0x8BC9E0: LogCurrentRunningXacts (standby.c:1005)
> ==63717==    by 0x8BC8F9: LogStandbySnapshot (standby.c:961)
> ==63717==    by 0x550CB3: CreateCheckPoint (xlog.c:8937)
> ==63717==    by 0x82A3B2: CheckpointerMain (checkpointer.c:441)
> ==63717==    by 0x56347D: AuxiliaryProcessMain (bootstrap.c:453)
> ==63717==    by 0x83CA18: StartChildProcess (postmaster.c:5474)
> ==63717==    by 0x83A120: reaper (postmaster.c:3045)
> ==63717==    by 0x4874B1F: ??? (in /usr/lib64/libpthread-2.30.so)
> ==63717==    by 0x5056F29: select (in /usr/lib64/libc-2.30.so)
> ==63717==    by 0x8380A0: ServerLoop (postmaster.c:1691)
> ==63717==    by 0x837A1F: PostmasterMain (postmaster.c:1400)
> ==63717==    by 0x74A71D: main (main.c:210)
> ==63717==  Uninitialised value was created by a stack allocation
> ==63717==    at 0x8BC942: LogCurrentRunningXacts (standby.c:984)
> ==63717==
> ==63717== VALGRINDERROR-END
> {
>    <insert_a_suppression_name_here>
>    Memcheck:Value8
>    fun:pg_comp_crc32c_sb8
>    fun:XLogRecordAssemble
>    fun:XLogInsert
>    fun:LogCurrentRunningXacts
>    fun:LogStandbySnapshot
>    fun:CreateCheckPoint
>    fun:CheckpointerMain
>    fun:AuxiliaryProcessMain
>    fun:StartChildProcess
>    fun:reaper
>    obj:/usr/lib64/libpthread-2.30.so
>    fun:select
>    fun:ServerLoop
>    fun:PostmasterMain
>    fun:main
> }
>
>


After many hours of testing I have a culprit for this. The error appears
with valgrind 3.15.0  with everything else held constant. 3.14.0  does
not produce the problem.  So lousyjack will be back on the air before long.


Here are the build flags it's using:


CFLAGS=-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -Wno-format-truncation
-Wno-stringop-truncatio
n -g -fno-omit-frame-pointer -O0 -fPIC
CPPFLAGS=-DUSE_VALGRIND  -DRELCACHE_FORCE_RELEASE -D_GNU_SOURCE
-I/usr/include/libxml2


and valgrind is invoked like this:


valgrind --quiet --trace-children=yes --track-origins=yes
--read-var-info=yes --num-callers=20 --leak-check=no
--gen-suppressions=all --error-limit=no
--suppressions=../pgsql/src/tools/valgrind.supp
--error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END bin/postgres -D data-C


Does anyone see anything here that needs tweaking?


Note that this is quite an old machine:


andrew@freddo:bf (master)*$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          16
Model:               6
Model name:          AMD Athlon(tm) II X2 215 Processor
Stepping:            2
CPU MHz:             2700.000
CPU max MHz:         2700.0000
CPU min MHz:         800.0000
BogoMIPS:            5425.13
Virtualization:      AMD-V
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl
nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy
svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_save


I did not manage to reproduce this anywhere else, tried on various
physical, Virtualbox and Docker instances.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: godjan •
Date:
Subject: Re: Strange decreasing value of pg_last_wal_receive_lsn()