Re: Race conditions with checkpointer and shutdown - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Race conditions with checkpointer and shutdown
Date
Msg-id 29929.1556559359@sss.pgh.pa.us
Whole thread Raw
In response to Re: Race conditions with checkpointer and shutdown  (Ashwin Agrawal <aagrawal@pivotal.io>)
Responses Re: Race conditions with checkpointer and shutdown
List pgsql-hackers
Ashwin Agrawal <aagrawal@pivotal.io> writes:
> For Greenplum (based on 9.4 but current master code looks the same) we
> did see deadlocks recently hit in CI many times for walreceiver which
> I believe confirms above finding.

> #0  __lll_lock_wait_private () at
> ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
> #1  0x00007f0637ee72bd in _int_free (av=0x7f063822bb20 <main_arena>,
> p=0x26bb3b0, have_lock=0) at malloc.c:3962
> #2  0x00007f0637eeb53c in __GI___libc_free (mem=<optimized out>) at
> malloc.c:2968
> #3  0x00007f0636629464 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
> #4  0x00007f0636630720 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
> #5  0x00007f063b5cede7 in _dl_fini () at dl-fini.c:235
> #6  0x00007f0637ea0ff8 in __run_exit_handlers (status=1,
> listp=0x7f063822b5f8 <__exit_funcs>,
> run_list_atexit=run_list_atexit@entry=true) at exit.c:82
> #7  0x00007f0637ea1045 in __GI_exit (status=<optimized out>) at exit.c:104
> #8  0x00000000008c72c7 in proc_exit ()
> #9  0x0000000000a75867 in errfinish ()
> #10 0x000000000089ea53 in ProcessWalRcvInterrupts ()
> #11 0x000000000089eac5 in WalRcvShutdownHandler ()
> #12 <signal handler called>
> #13 _int_malloc (av=av@entry=0x7f063822bb20 <main_arena>,
> bytes=bytes@entry=16384) at malloc.c:3802
> #14 0x00007f0637eeb184 in __GI___libc_malloc (bytes=16384) at malloc.c:2913
> #15 0x00000000007754c3 in makeEmptyPGconn ()
> #16 0x0000000000779686 in PQconnectStart ()
> #17 0x0000000000779b8b in PQconnectdb ()
> #18 0x00000000008aae52 in libpqrcv_connect ()
> #19 0x000000000089f735 in WalReceiverMain ()
> #20 0x00000000005c5eab in AuxiliaryProcessMain ()
> #21 0x00000000004cd5f1 in ServerLoop ()
> #22 0x000000000086fb18 in PostmasterMain ()
> #23 0x00000000004d2e28 in main ()

Cool --- that stack trace is *exactly* what you'd expect if this
were the problem.  Thanks for sending it along!

Can you try applying a1a789eb5ac894b4ca4b7742f2dc2d9602116e46
to see if it fixes the problem for you?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: "long" type is not appropriate for counting tuples
Next
From: Tom Lane
Date:
Subject: Re: CHAR vs NVARCHAR vs TEXT performance