Ashwin Agrawal <aagrawal@pivotal.io> writes:
> For Greenplum (based on 9.4 but current master code looks the same) we
> did see deadlocks recently hit in CI many times for walreceiver which
> I believe confirms above finding.
> #0 __lll_lock_wait_private () at
> ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
> #1 0x00007f0637ee72bd in _int_free (av=0x7f063822bb20 <main_arena>,
> p=0x26bb3b0, have_lock=0) at malloc.c:3962
> #2 0x00007f0637eeb53c in __GI___libc_free (mem=<optimized out>) at
> malloc.c:2968
> #3 0x00007f0636629464 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
> #4 0x00007f0636630720 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
> #5 0x00007f063b5cede7 in _dl_fini () at dl-fini.c:235
> #6 0x00007f0637ea0ff8 in __run_exit_handlers (status=1,
> listp=0x7f063822b5f8 <__exit_funcs>,
> run_list_atexit=run_list_atexit@entry=true) at exit.c:82
> #7 0x00007f0637ea1045 in __GI_exit (status=<optimized out>) at exit.c:104
> #8 0x00000000008c72c7 in proc_exit ()
> #9 0x0000000000a75867 in errfinish ()
> #10 0x000000000089ea53 in ProcessWalRcvInterrupts ()
> #11 0x000000000089eac5 in WalRcvShutdownHandler ()
> #12 <signal handler called>
> #13 _int_malloc (av=av@entry=0x7f063822bb20 <main_arena>,
> bytes=bytes@entry=16384) at malloc.c:3802
> #14 0x00007f0637eeb184 in __GI___libc_malloc (bytes=16384) at malloc.c:2913
> #15 0x00000000007754c3 in makeEmptyPGconn ()
> #16 0x0000000000779686 in PQconnectStart ()
> #17 0x0000000000779b8b in PQconnectdb ()
> #18 0x00000000008aae52 in libpqrcv_connect ()
> #19 0x000000000089f735 in WalReceiverMain ()
> #20 0x00000000005c5eab in AuxiliaryProcessMain ()
> #21 0x00000000004cd5f1 in ServerLoop ()
> #22 0x000000000086fb18 in PostmasterMain ()
> #23 0x00000000004d2e28 in main ()
Cool --- that stack trace is *exactly* what you'd expect if this
were the problem. Thanks for sending it along!
Can you try applying a1a789eb5ac894b4ca4b7742f2dc2d9602116e46
to see if it fixes the problem for you?
regards, tom lane