Re: BUG #16331: segfault in checkpointer with full disk - Mailing list pgsql-bugs
From | Jozef Mlich |
---|---|
Subject | Re: BUG #16331: segfault in checkpointer with full disk |
Date | |
Msg-id | cb90caff210d67bee6be0752b665bcac862d1e25.camel@gmail.com Whole thread Raw |
In response to | Re: BUG #16331: segfault in checkpointer with full disk (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: BUG #16331: segfault in checkpointer with full disk
|
List | pgsql-bugs |
On Wed, 2020-04-01 at 11:04 +0200, Julien Rouhaud wrote: > Hi, > > On Wed, Apr 01, 2020 at 08:51:56AM +0000, PG Bug reporting form > wrote: > > The following bug has been logged on the website: > > > > Bug reference: 16331 > > Logged by: Jozef Mlich > > Email address: jmlich83@gmail.com > > PostgreSQL version: 12.2 > > Operating system: CentOS > > Description: > > > > I can see segfaults on CentOS 7 with postgresql 12.2-2PGDG.rhel7 > > (from > > yum.postgresql.org). I am using multiple extensions (cstore, > > postgres_fdw, > > pgcrypto,dblink, etc.). It seems crash is related to disk run out > > of space > > (I am using separate partion for / and for /var/lib/pgsql). It > > occurs few > > times a day. According to backtrace it seems to be related to > > checkpointer. > > Replication is not configured. > > > > > > [New LWP 26290] > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > Core was generated by `postgres: > > checkpointer > > '. > > Program terminated with signal 6, Aborted. > > #0 0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at > > ../nptl/sysdeps/unix/sysv/linux/raise.c:55 > > 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); > > > > Thread 1 (Thread 0x7fe462e148c0 (LWP 26290)): > > #0 0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at > > ../nptl/sysdeps/unix/sysv/linux/raise.c:55 > > resultvar = 0 > > pid = 26290 > > selftid = 26290 > > #1 0x00007fe4604c28f8 in __GI_abort () at abort.c:90 > > save_stage = 2 > > act = {__sigaction_handler = {sa_handler = 0x0, > > sa_sigaction = 0x0}, > > sa_mask = {__val = {0, 0, 0, 0, 0, 9268713, 70403103920717, > > 39808819211026438, 20126216749056, 70394513997832, 9268713, > > 70403103920719, > > 17316096998686159616, 20134806683648, 140618848608704, > > 140618848592800}}, > > sa_flags = 1615828275, sa_restorer = 0x0} > > sigs = {__val = {32, 0 <repeats 15 times>}} > > #2 0x000000000087840a in errfinish (dummy=<optimized out>) at > > elog.c:552 > > edata = 0xd47040 <errordata> > > elevel = 22 > > oldcontext = 0x171a6d0 > > econtext = 0x0 > > __func__ = "errfinish" > > #3 0x0000000000706b24 in CheckPointReplicationOrigin () at > > origin.c:562 > > tmppath = 0x9e6fa8 "pg_logical/replorigin_checkpoint.tmp" > > path = 0x9e6fd0 "pg_logical/replorigin_checkpoint" > > tmpfd = <optimized out> > > i = <optimized out> > > magic = 307747550 > > crc = 4294967295 > > __func__ = "CheckPointReplicationOrigin" > > That's not a bug (nor a segfault) but the expected behavior if the > checkpointer is not able to do its work. As data durability can't be > guaranteed in such case, the checkpointer raises a PANIC level > message, which raises an abort so that the whole instance do an > emergency restart cycle. > > Do you have monitoring for this filesystem? Do you see spikes in > disk usage or other strange behavior? Then it is clear. Thanks for explanation and applogize for false bug report. I have probably misunderstood how is segfault distinguished from abort. I need to fix my kernel.core_pattern script. In attachment is screenshot from monitoring grafana with information about space on /var/lib/pgsql partition. -- Jozef Mlich <jmlich83@gmail.com>
Attachment
pgsql-bugs by date: