Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker - Mailing list pgsql-bugs
From | Tomas Vondra |
---|---|
Subject | Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker |
Date | |
Msg-id | 20191121160906.wpq5u6yh6pxv6hyo@development Whole thread Raw |
In response to | Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker (Ondřej Jirman <ienieghapheoghaiwida@xff.cz>) |
Responses |
Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
|
List | pgsql-bugs |
On Thu, Nov 21, 2019 at 04:57:07PM +0100, Ondřej Jirman wrote: >On Thu, Nov 21, 2019 at 04:27:32PM +0100, Tomas Vondra wrote: >> On Thu, Nov 21, 2019 at 04:07:16PM +0100, Ondřej Jirman wrote: >> > I've also managed to get valgrind output on the segfaulting process: >> > >> > https://megous.com/dl/tmp/valgrind-105524-109020.log >> > >> > Though it probably doesn't show much new. >> > >> >> Is postgres actually compiled with valgrind support? What does pg_config >> say? > >I don't see any valgrind related options in configure --help, I repackaged it >just with CFLAGS set to -ggdb -Og. > You need to use -DUSE_VALGRIND or edit pg_config_manual.h. I usually just use CFLAGS="-DRANDOMIZE_ALLOCATED_MEMORY -DUSE_VALGRIND -O0 -ggdb" which does the trick. Without the USE_VALGRIND defined, we don't include the valgrind instrumentation, so I suppose the previous results are bogus. >CONFIGURE = '--prefix=/usr' '--mandir=/usr/share/man' '--datadir=/usr/share/postgresql' '--sysconfdir=/etc' '--with-gssapi''--with-libxml' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-pam' '--with-system-tzdata=/usr/share/zoneinfo''--with-uuid=e2fs' '--with-icu' '--with-systemd' '--with-ldap' '--enable-nls' '--enable-thread-safety''--disable-rpath' 'PYTHON=/usr/bin/python2' 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong-fno-plt -Og -ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug' 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now''CPPFLAGS=-D_FORTIFY_SOURCE=2' 'CXXFLAGS=-march=x86-64 -mtune=generic-O2 -pipe -fstack-protector-strong -fno-plt -Og -ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug' >CC = gcc >CPPFLAGS = -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 >CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation-Wno-stringop-truncation -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -Og-ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug >CFLAGS_SL = -fPIC >LDFLAGS = -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -Wl,--as-needed >LDFLAGS_EX = >LDFLAGS_SL = >LIBS = -lpgcommon -lpgport -lpthread -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm >VERSION = PostgreSQL 12.1 > >> If you're building the server locally (I don't think there are packages >> with valgrind enabled), maybe try adding --enable-cassert option and >> >> CFLAGS"-DRANDOMIZE_ALLOCATED_MEMORY" >> >> which will do more extensive checks at runtime. > >Done. > >CONFIGURE = '--prefix=/usr' '--mandir=/usr/share/man' '--datadir=/usr/share/postgresql' '--sysconfdir=/etc' '--with-gssapi''--with-libxml' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-pam' '--with-system-tzdata=/usr/share/zoneinfo''--with-uuid=e2fs' '--with-icu' '--with-systemd' '--with-ldap' '--enable-nls' '--enable-thread-safety''--disable-rpath' '--enable-cassert' 'PYTHON=/usr/bin/python2' 'CFLAGS=-march=x86-64 -mtune=generic-O2 -pipe -fstack-protector-strong -fno-plt -Og -ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug-DRANDOMIZE_ALLOCATED_MEMORY' 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now''CPPFLAGS=-D_FORTIFY_SOURCE=2' 'CXXFLAGS=-march=x86-64 -mtune=generic-O2 -pipe -fstack-protector-strong -fno-plt -Og -ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug' >CC = gcc >CPPFLAGS = -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 >CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation-Wno-stringop-truncation -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -Og-ggdb -fdebug-prefix-map=/workspace/external/pgsql/postgresql-pkg/src=/usr/src/debug -DRANDOMIZE_ALLOCATED_MEMORY >CFLAGS_SL = -fPIC >LDFLAGS = -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -Wl,--as-needed >LDFLAGS_EX = >LDFLAGS_SL = >LIBS = -lpgcommon -lpgport -lpthread -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm >VERSION = PostgreSQL 12.1 > >The difference now is that _len argument value is not stable accross crashes. > >one crash... > >#0 0x00007fb90c336917 in __memmove_sse2_unaligned_erms () from /usr/lib/libc.so.6 >#1 0x000055ff2a7d0e64 in memcpy (__len=972662513, __src=<optimized out>, __dest=0x55ff2cc7dce4) at /usr/include/bits/string_fortified.h:34 >#2 tts_virtual_materialize (slot=0x55ff2cc7d868) at execTuples.c:235 >#3 0x000055ff2a7d2a63 in ExecFetchSlotHeapTuple (slot=slot@entry=0x55ff2cc7d868, materialize=materialize@entry=true, shouldFree=shouldFree@entry=0x7ffc80cfc607)at execTuples.c:1624 >#4 0x000055ff2a634809 in heapam_tuple_update (relation=0x7fb8f8338100, otid=0x55ff2cc7e138, slot=0x55ff2cc7d868, cid=0,snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7ffc80cfc680, > lockmode=0x7ffc80cfc67c, update_indexes=0x7ffc80cfc6d7) at heapam_handler.c:325 >#5 0x000055ff2a675d55 in table_tuple_update (update_indexes=0x7ffc80cfc6d7, lockmode=0x7ffc80cfc67c, tmfd=0x7ffc80cfc680,wait=true, crosscheck=0x0, snapshot=0x0, cid=<optimized out>, > slot=0x55ff2cc7d868, otid=0x55ff2cc7e138, rel=0x7fb8f8338100) at ../../../../src/include/access/tableam.h:1261 >#6 simple_table_tuple_update (rel=rel@entry=0x7fb8f8338100, otid=otid@entry=0x55ff2cc7e138, slot=slot@entry=0x55ff2cc7d868,snapshot=0x0, update_indexes=update_indexes@entry=0x7ffc80cfc6d7) > at tableam.c:325 >#7 0x000055ff2a7cdf08 in ExecSimpleRelationUpdate (estate=estate@entry=0x55ff2ccd02b8, epqstate=epqstate@entry=0x7ffc80cfc730,searchslot=searchslot@entry=0x55ff2cc7e108, > slot=slot@entry=0x55ff2cc7d868) at execReplication.c:493 >#8 0x000055ff2a8eb8b2 in apply_handle_update (s=s@entry=0x7ffc80d03d00) at worker.c:750 >#9 0x000055ff2a8ebe62 in apply_dispatch (s=s@entry=0x7ffc80d03d00) at worker.c:968 >#10 0x000055ff2a8ec5ba in LogicalRepApplyLoop (last_received=11741462160) at worker.c:1176 >#11 0x000055ff2a8ecd1a in ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734 >#12 0x000055ff2a8b3927 in StartBackgroundWorker () at bgworker.c:834 >#13 0x000055ff2a8c2b6b in do_start_bgworker (rw=rw@entry=0x55ff2cbbf430) at postmaster.c:5770 >#14 0x000055ff2a8c2cf7 in maybe_start_bgworkers () at postmaster.c:5996 >#15 0x000055ff2a8c3678 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5167 >#16 <signal handler called> >#17 0x00007fb90c384e2a in select () from /usr/lib/libc.so.6 >#18 0x000055ff2a8c40f8 in ServerLoop () at postmaster.c:1668 >#19 0x000055ff2a8c54fc in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377 >#20 0x000055ff2a81f88a in main (argc=3, argv=0x55ff2cb94260) at main.c:228 > >and another ... > >#0 0x00007f7a1befc917 in __memmove_sse2_unaligned_erms () from /usr/lib/libc.so.6 >#1 0x0000559a5439ae64 in memcpy (__len=766338801, __src=<optimized out>, __dest=0x559a55626934) at /usr/include/bits/string_fortified.h:34 >#2 tts_virtual_materialize (slot=0x559a556264b8) at execTuples.c:235 >#3 0x0000559a5439ca63 in ExecFetchSlotHeapTuple (slot=slot@entry=0x559a556264b8, materialize=materialize@entry=true, shouldFree=shouldFree@entry=0x7fff7f2d5787)at execTuples.c:1624 >#4 0x0000559a541fe809 in heapam_tuple_update (relation=0x7f7a07efe100, otid=0x559a55626d88, slot=0x559a556264b8, cid=0,snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7fff7f2d5800, > lockmode=0x7fff7f2d57fc, update_indexes=0x7fff7f2d5857) at heapam_handler.c:325 >#5 0x0000559a5423fd55 in table_tuple_update (update_indexes=0x7fff7f2d5857, lockmode=0x7fff7f2d57fc, tmfd=0x7fff7f2d5800,wait=true, crosscheck=0x0, snapshot=0x0, cid=<optimized out>, > slot=0x559a556264b8, otid=0x559a55626d88, rel=0x7f7a07efe100) at ../../../../src/include/access/tableam.h:1261 >#6 simple_table_tuple_update (rel=rel@entry=0x7f7a07efe100, otid=otid@entry=0x559a55626d88, slot=slot@entry=0x559a556264b8,snapshot=0x0, update_indexes=update_indexes@entry=0x7fff7f2d5857) > at tableam.c:325 >#7 0x0000559a54397f08 in ExecSimpleRelationUpdate (estate=estate@entry=0x559a55678dc8, epqstate=epqstate@entry=0x7fff7f2d58b0,searchslot=searchslot@entry=0x559a55626d58, > slot=slot@entry=0x559a556264b8) at execReplication.c:493 >#8 0x0000559a544b58b2 in apply_handle_update (s=s@entry=0x7fff7f2dce80) at worker.c:750 >#9 0x0000559a544b5e62 in apply_dispatch (s=s@entry=0x7fff7f2dce80) at worker.c:968 >#10 0x0000559a544b65ba in LogicalRepApplyLoop (last_received=11741462160) at worker.c:1176 >#11 0x0000559a544b6d1a in ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734 >#12 0x0000559a5447d927 in StartBackgroundWorker () at bgworker.c:834 >#13 0x0000559a5448cb6b in do_start_bgworker (rw=rw@entry=0x559a5558c010) at postmaster.c:5770 >#14 0x0000559a5448ccf7 in maybe_start_bgworkers () at postmaster.c:5996 >#15 0x0000559a5448d678 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5167 >#16 <signal handler called> >#17 0x00007f7a1bf4ae2a in select () from /usr/lib/libc.so.6 >#18 0x0000559a5448e0f8 in ServerLoop () at postmaster.c:1668 >#19 0x0000559a5448f4fc in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377 >#20 0x0000559a543e988a in main (argc=3, argv=0x559a55534260) at main.c:228 > >So it perhaps looks like some memory management issue in the subscribing >postmaster, rather than wal corruption on the publisher. > Right, maybe using freed memory, or something like that. >> > > Can you show us \d+ on the videos table? What's the replica identity? >> > >> > Table "public.videos" >> > Column | Type | Collation | Nullable | Default | Storage | Statstarget | Description >> > -------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+------------- >> > id | integer | | not null | nextval('videos_id_seq'::regclass) | plain | | >> > title | text | | not null | | extended | | >> > cover_image | bytea | | not null | | extended | | >> > metadata | jsonb | | not null | | extended | | >> > category | integer | | not null | | plain | | >> > published | date | | not null | | plain | | >> > added | timestamp without time zone | | not null | now() | plain | | >> > played | boolean | | not null | false | plain | | >> > Indexes: >> > "videos_pkey" PRIMARY KEY, btree (id) >> > Foreign-key constraints: >> > "videos_category_fkey" FOREIGN KEY (category) REFERENCES categories(id) >> > Publications: >> > "pub" >> > >> > Table "public.videos" >> > Column | Type | Collation | Nullable | Default | Storage | Statstarget | Description >> > -------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+------------- >> > id | integer | | not null | nextval('videos_id_seq'::regclass) | plain | | >> > title | text | | not null | | extended | | >> > cover_image | bytea | | not null | | extended | | >> > metadata | jsonb | | not null | | extended | | >> > category | integer | | not null | | plain | | >> > published | date | | not null | | plain | | >> > added | timestamp without time zone | | not null | now() | plain | | >> > played | boolean | | not null | false | plain | | >> > Indexes: >> > "videos_pkey" PRIMARY KEY, btree (id) >> > Foreign-key constraints: >> > "videos_category_fkey" FOREIGN KEY (category) REFERENCES categories(id) >> > Access method: heap >> > >> > >> > Googling for relica identity, it looks like it should be shown in the \d+ >> > output, but it isn't, for me. Maybe because it's has default value? >> > >> >> Yes, I think it should be listed either for the index, or there should >> be an extra section with replica identity. But that depends on psql >> version. Aren't you accidentally using an older psql version? That >> should be mentioned right after connecting to the server, I think. > >Maybe it has something to do with my upgrade method. I dumped/restored >the replica with pg_dumpall, and then just proceded to enable subscription >and refresh publication with (copy_data=false) for all my subscriptions. > Interesting. >Now, I've tried to start fresh and replicate from the primary "hometv" database >into a new l5_hometv2 on my crashing 12.1 server and it replicates fine, no >crashes. Anyway even in this fresh replicated database, there's no REPLICA >IDENTITY in the \d+ output. > >I don't have multiple postgresql installations on the host. There's only >a difference between primary version 11.5 (ARMv7) and replica 12.1 (x86_64). > Hmmm, not sure. Have to check the code. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-bugs by date: