Home > mailing lists

Re: BUG #16331: segfault in checkpointer with full disk - Mailing list pgsql-bugs

From	Julien Rouhaud
Subject	Re: BUG #16331: segfault in checkpointer with full disk
Date	April 1, 2020 09:04:55
Msg-id	20200401090455.GB82418@nol Whole thread
In response to	BUG #16331: segfault in checkpointer with full disk (PG Bug reporting form <noreply@postgresql.org>)
Responses	Re: BUG #16331: segfault in checkpointer with full disk
List	pgsql-bugs

Tree view

Hi,

On Wed, Apr 01, 2020 at 08:51:56AM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      16331
> Logged by:          Jozef Mlich
> Email address:      jmlich83@gmail.com
> PostgreSQL version: 12.2
> Operating system:   CentOS
> Description:        
> 
> I can see segfaults on CentOS 7 with postgresql 12.2-2PGDG.rhel7 (from
> yum.postgresql.org). I am using multiple extensions  (cstore, postgres_fdw,
> pgcrypto,dblink, etc.). It seems crash is related to disk run out of space
> (I am using separate partion for / and for /var/lib/pgsql). It occurs few
> times a day. According to backtrace it seems to be related to checkpointer.
> Replication is not configured. 
> 
> 
> [New LWP 26290]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `postgres: checkpointer                               
>  '.
> Program terminated with signal 6, Aborted.
> #0  0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> 55      return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> 
> Thread 1 (Thread 0x7fe462e148c0 (LWP 26290)):
> #0  0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
>         resultvar = 0
>         pid = 26290
>         selftid = 26290
> #1  0x00007fe4604c28f8 in __GI_abort () at abort.c:90
>         save_stage = 2
>         act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0},
> sa_mask = {__val = {0, 0, 0, 0, 0, 9268713, 70403103920717,
> 39808819211026438, 20126216749056, 70394513997832, 9268713, 70403103920719,
> 17316096998686159616, 20134806683648, 140618848608704, 140618848592800}},
> sa_flags = 1615828275, sa_restorer = 0x0}
>         sigs = {__val = {32, 0 <repeats 15 times>}}
> #2  0x000000000087840a in errfinish (dummy=<optimized out>) at elog.c:552
>         edata = 0xd47040 <errordata>
>         elevel = 22
>         oldcontext = 0x171a6d0
>         econtext = 0x0
>         __func__ = "errfinish"
> #3  0x0000000000706b24 in CheckPointReplicationOrigin () at origin.c:562
>         tmppath = 0x9e6fa8 "pg_logical/replorigin_checkpoint.tmp"
>         path = 0x9e6fd0 "pg_logical/replorigin_checkpoint"
>         tmpfd = <optimized out>
>         i = <optimized out>
>         magic = 307747550
>         crc = 4294967295
>         __func__ = "CheckPointReplicationOrigin"


That's not a bug (nor a segfault) but the expected behavior if the checkpointer
is not able to do its work.  As data durability can't be guaranteed in such
case, the checkpointer raises a PANIC level message, which raises an abort so
that the whole instance do an emergency restart cycle.

Do you have monitoring for this filesystem?  Do you see spikes in disk usage or
other strange behavior?

pgsql-bugs by date:

From: PG Bug reporting form
Date: 01 April 2020, 08:51:56
Subject: BUG #16331: segfault in checkpointer with full disk

From: Jozef Mlich
Date: 01 April 2020, 09:51:16
Subject: Re: BUG #16331: segfault in checkpointer with full disk

Re: BUG #16331: segfault in checkpointer with full disk - Mailing list pgsql-bugs

Previous

Next