BUG #16226: background worker "logical replication worker" (PID ) was terminated by signal 11: Segmentation - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16226: background worker "logical replication worker" (PID |
Date | |
Msg-id | 16226-690712ac4467269a@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16226: background worker "logical replication worker" (PID Re: BUG #16226: background worker "logical replication worker" (PID |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16226 Logged by: Vadim Yatsenko Email address: vadim@postgrespro.co.il PostgreSQL version: 12.1 Operating system: CentOS Linux 7 (Core) Kernel: Linux 3.10.0-1062.9. Description: Hi! We have 2 PostgreSQL servers with logical replication between Postgres 11.6 (Primary) and 12.1 (Logical). Some times ago, we changed column type in a 2 big tables from integer to text: oid | nspname | table | rows | pages | size | index_size | total_size ------------+----------+--------------------------------------+-----------+---------+------------+------------+---------------- 573803236 | public | acc_rep | 283359648 | 9276153 | 71 GB | 50 GB | 121 GB 586884963 | public | acc_rep_actual | 283280448 | 8855209 | 68 GB | 27 GB | 95 GB , this of course led to a full rewrite both tables. We repated this operation on both servers. And after that we started to get error like "background worker "logical replication worker" (PID <pid>) was terminated by signal 11: Segmentation fault" and server goes to recovery mode. This error repated several times every 10-15 minutes. When we connected to logical replication worker process using gdb, we recieved the following message: Program received signal SIGSEGV, Segmentation fault. 0x00007f04e5060426 in __memcpy_ssse3_back () from /lib64/libc.so.6 (gdb) bt #0 0x00007f04e5060426 in __memcpy_ssse3_back () from /lib64/libc.so.6 #1 0x000000000061e1cb in memcpy (__len=224588909, __src=<optimized out>, __dest=0x101b870) at /usr/include/bits/string3.h:51 #2 tts_virtual_materialize (slot=0x101b0b0) at execTuples.c:235 #3 0x000000000061f594 in ExecFetchSlotHeapTuple (slot=0x101b0b0, materialize=<optimized out>, shouldFree=0x7ffc331ae187) at execTuples.c:1624 #4 0x00000000004c868c in heapam_tuple_update (relation=0x7f04e77d8470, otid=0x101b240, slot=0x101b0b0, cid=19778, snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7ffc331ae210, lockmode=0x7ffc331ae20c, update_indexes=0x7ffc331ae277) at heapam_handler.c:325 #5 0x00000000004f5524 in table_tuple_update (update_indexes=0x7ffc331ae277, lockmode=0x7ffc331ae20c, tmfd=0x7ffc331ae210, wait=true, crosscheck=0x0, snapshot=0x0, cid=<optimized out>, slot=0x101b0b0, otid=0x101b240, rel=0x7f04e77d8470) at ../../../../src/include/access/tableam.h:1261 #6 simple_table_tuple_update (rel=0x7f04e77d8470, otid=otid@entry=0x101b240, slot=slot@entry=0x101b0b0, snapshot=0x0, update_indexes=update_indexes@entry=0x7ffc331ae277) at tableam.c:325 #7 0x000000000061bbc0 in ExecSimpleRelationUpdate (estate=estate@entry=0x1031660, epqstate=epqstate@entry=0x7ffc331ae330, searchslot=searchslot@entry=0x101b210, slot=slot@entry=0x101b0b0) at execReplication.c:493 #8 0x000000000071100d in apply_handle_update (s=s@entry=0x7ffc331b5910) at worker.c:750 #9 0x0000000000711ca7 in apply_dispatch (s=0x7ffc331b5910) at worker.c:968 #10 LogicalRepApplyLoop (last_received=31760851165536) at worker.c:1176 #11 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734 #12 0x00000000006e2d4a in StartBackgroundWorker () at bgworker.c:834 #13 0x00000000006ef1d3 in do_start_bgworker (rw=0xf71970) at postmaster.c:5770 #14 maybe_start_bgworkers () at postmaster.c:5996 #15 0x00000000006efde5 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5167 #16 <signal handler called> #17 0x00007f04e4fff933 in __select_nocancel () from /lib64/libc.so.6 #18 0x00000000004833d4 in ServerLoop () at postmaster.c:1668 #19 0x00000000006f106f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0xf45300) at postmaster.c:1377 #20 0x0000000000484cd3 in main (argc=3, argv=0xf45300) at main.c:228 (gdb) #0 0x00007f04e5060426 in __memcpy_ssse3_back () from /lib64/libc.so.6 #1 0x000000000061e1cb in memcpy (__len=224588909, __src=<optimized out>, __dest=0x101b870) at /usr/include/bits/string3.h:51 #2 tts_virtual_materialize (slot=0x101b0b0) at execTuples.c:235 #3 0x000000000061f594 in ExecFetchSlotHeapTuple (slot=0x101b0b0, materialize=<optimized out>, shouldFree=0x7ffc331ae187) at execTuples.c:1624 #4 0x00000000004c868c in heapam_tuple_update (relation=0x7f04e77d8470, otid=0x101b240, slot=0x101b0b0, cid=19778, snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7ffc331ae210, lockmode=0x7ffc331ae20c, update_indexes=0x7ffc331ae277) at heapam_handler.c:325 #5 0x00000000004f5524 in table_tuple_update (update_indexes=0x7ffc331ae277, lockmode=0x7ffc331ae20c, tmfd=0x7ffc331ae210, wait=true, crosscheck=0x0, snapshot=0x0, cid=<optimized out>, slot=0x101b0b0, otid=0x101b240, rel=0x7f04e77d8470) at ../../../../src/include/access/tableam.h:1261 #6 simple_table_tuple_update (rel=0x7f04e77d8470, otid=otid@entry=0x101b240, slot=slot@entry=0x101b0b0, snapshot=0x0, update_indexes=update_indexes@entry=0x7ffc331ae277) at tableam.c:325 #7 0x000000000061bbc0 in ExecSimpleRelationUpdate (estate=estate@entry=0x1031660, epqstate=epqstate@entry=0x7ffc331ae330, searchslot=searchslot@entry=0x101b210, slot=slot@entry=0x101b0b0) at execReplication.c:493 #8 0x000000000071100d in apply_handle_update (s=s@entry=0x7ffc331b5910) at worker.c:750 #9 0x0000000000711ca7 in apply_dispatch (s=0x7ffc331b5910) at worker.c:968 #10 LogicalRepApplyLoop (last_received=31760851165536) at worker.c:1176 #11 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734 #12 0x00000000006e2d4a in StartBackgroundWorker () at bgworker.c:834 #13 0x00000000006ef1d3 in do_start_bgworker (rw=0xf71970) at postmaster.c:5770 #14 maybe_start_bgworkers () at postmaster.c:5996 #15 0x00000000006efde5 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5167 #16 <signal handler called> #17 0x00007f04e4fff933 in __select_nocancel () from /lib64/libc.so.6 #18 0x00000000004833d4 in ServerLoop () at postmaster.c:1668 #19 0x00000000006f106f in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0xf45300) at postmaster.c:1377 #20 0x0000000000484cd3 in main (argc=3, argv=0xf45300) at main.c:228 Error from /var/log/messages: Jan 21 02:01:59 dbmr01 kernel: postmaster[50093]: segfault at e76f13d ip 00007fa8f701ffa0 sp 00007ffc8ae23498 error 4 in libc-2.17.so[7fa8f6ed4000+1c2000] Jan 21 02:01:59 dbmr01 abrt-hook-ccpp: Process 50093 (postgres) of user 26 killed by SIGSEGV - dumping core Jan 21 02:02:15 dbmr01 abrt-server: Package 'postgresql12-server' isn't signed with proper key Jan 21 02:02:15 dbmr01 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2020-01-21-02:01:59-50093' exited with 1 Jan 21 02:02:15 dbmr01 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2020-01-21-02:01:59-50093' The issue was solved by removing these tables from publication on Primary. Can you advise something? Is this a bug or didn't I take something into account when working with logical replication? Thank you!
pgsql-bugs by date: