Re: BUG #16331: segfault in checkpointer with full disk - Mailing list pgsql-bugs
From | Julien Rouhaud |
---|---|
Subject | Re: BUG #16331: segfault in checkpointer with full disk |
Date | |
Msg-id | 20200401090455.GB82418@nol Whole thread Raw |
In response to | BUG #16331: segfault in checkpointer with full disk (PG Bug reporting form <noreply@postgresql.org>) |
Responses |
Re: BUG #16331: segfault in checkpointer with full disk
|
List | pgsql-bugs |
Hi, On Wed, Apr 01, 2020 at 08:51:56AM +0000, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 16331 > Logged by: Jozef Mlich > Email address: jmlich83@gmail.com > PostgreSQL version: 12.2 > Operating system: CentOS > Description: > > I can see segfaults on CentOS 7 with postgresql 12.2-2PGDG.rhel7 (from > yum.postgresql.org). I am using multiple extensions (cstore, postgres_fdw, > pgcrypto,dblink, etc.). It seems crash is related to disk run out of space > (I am using separate partion for / and for /var/lib/pgsql). It occurs few > times a day. According to backtrace it seems to be related to checkpointer. > Replication is not configured. > > > [New LWP 26290] > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > Core was generated by `postgres: checkpointer > '. > Program terminated with signal 6, Aborted. > #0 0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:55 > 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); > > Thread 1 (Thread 0x7fe462e148c0 (LWP 26290)): > #0 0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:55 > resultvar = 0 > pid = 26290 > selftid = 26290 > #1 0x00007fe4604c28f8 in __GI_abort () at abort.c:90 > save_stage = 2 > act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, > sa_mask = {__val = {0, 0, 0, 0, 0, 9268713, 70403103920717, > 39808819211026438, 20126216749056, 70394513997832, 9268713, 70403103920719, > 17316096998686159616, 20134806683648, 140618848608704, 140618848592800}}, > sa_flags = 1615828275, sa_restorer = 0x0} > sigs = {__val = {32, 0 <repeats 15 times>}} > #2 0x000000000087840a in errfinish (dummy=<optimized out>) at elog.c:552 > edata = 0xd47040 <errordata> > elevel = 22 > oldcontext = 0x171a6d0 > econtext = 0x0 > __func__ = "errfinish" > #3 0x0000000000706b24 in CheckPointReplicationOrigin () at origin.c:562 > tmppath = 0x9e6fa8 "pg_logical/replorigin_checkpoint.tmp" > path = 0x9e6fd0 "pg_logical/replorigin_checkpoint" > tmpfd = <optimized out> > i = <optimized out> > magic = 307747550 > crc = 4294967295 > __func__ = "CheckPointReplicationOrigin" That's not a bug (nor a segfault) but the expected behavior if the checkpointer is not able to do its work. As data durability can't be guaranteed in such case, the checkpointer raises a PANIC level message, which raises an abort so that the whole instance do an emergency restart cycle. Do you have monitoring for this filesystem? Do you see spikes in disk usage or other strange behavior?
pgsql-bugs by date: