Home > mailing lists

Re: BUG #16331: segfault in checkpointer with full disk - Mailing list pgsql-bugs

From	Jozef Mlich
Subject	Re: BUG #16331: segfault in checkpointer with full disk
Date	April 1, 2020 09:51:16
Msg-id	cb90caff210d67bee6be0752b665bcac862d1e25.camel@gmail.com Whole thread Raw
In response to	Re: BUG #16331: segfault in checkpointer with full disk (Julien Rouhaud <rjuju123@gmail.com>)
Responses	Re: BUG #16331: segfault in checkpointer with full disk
List	pgsql-bugs

Tree view

On Wed, 2020-04-01 at 11:04 +0200, Julien Rouhaud wrote:
> Hi,
> 
> On Wed, Apr 01, 2020 at 08:51:56AM +0000, PG Bug reporting form
> wrote:
> > The following bug has been logged on the website:
> > 
> > Bug reference:      16331
> > Logged by:          Jozef Mlich
> > Email address:      jmlich83@gmail.com
> > PostgreSQL version: 12.2
> > Operating system:   CentOS
> > Description:        
> > 
> > I can see segfaults on CentOS 7 with postgresql 12.2-2PGDG.rhel7
> > (from
> > yum.postgresql.org). I am using multiple extensions  (cstore,
> > postgres_fdw,
> > pgcrypto,dblink, etc.). It seems crash is related to disk run out
> > of space
> > (I am using separate partion for / and for /var/lib/pgsql). It
> > occurs few
> > times a day. According to backtrace it seems to be related to
> > checkpointer.
> > Replication is not configured. 
> > 
> > 
> > [New LWP 26290]
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `postgres:
> > checkpointer                               
> >  '.
> > Program terminated with signal 6, Aborted.
> > #0  0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> > 55      return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> > 
> > Thread 1 (Thread 0x7fe462e148c0 (LWP 26290)):
> > #0  0x00007fe4604c1207 in __GI_raise (sig=sig@entry=6) at
> > ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> >         resultvar = 0
> >         pid = 26290
> >         selftid = 26290
> > #1  0x00007fe4604c28f8 in __GI_abort () at abort.c:90
> >         save_stage = 2
> >         act = {__sigaction_handler = {sa_handler = 0x0,
> > sa_sigaction = 0x0},
> > sa_mask = {__val = {0, 0, 0, 0, 0, 9268713, 70403103920717,
> > 39808819211026438, 20126216749056, 70394513997832, 9268713,
> > 70403103920719,
> > 17316096998686159616, 20134806683648, 140618848608704,
> > 140618848592800}},
> > sa_flags = 1615828275, sa_restorer = 0x0}
> >         sigs = {__val = {32, 0 <repeats 15 times>}}
> > #2  0x000000000087840a in errfinish (dummy=<optimized out>) at
> > elog.c:552
> >         edata = 0xd47040 <errordata>
> >         elevel = 22
> >         oldcontext = 0x171a6d0
> >         econtext = 0x0
> >         __func__ = "errfinish"
> > #3  0x0000000000706b24 in CheckPointReplicationOrigin () at
> > origin.c:562
> >         tmppath = 0x9e6fa8 "pg_logical/replorigin_checkpoint.tmp"
> >         path = 0x9e6fd0 "pg_logical/replorigin_checkpoint"
> >         tmpfd = <optimized out>
> >         i = <optimized out>
> >         magic = 307747550
> >         crc = 4294967295
> >         __func__ = "CheckPointReplicationOrigin"
> 
> That's not a bug (nor a segfault) but the expected behavior if the
> checkpointer is not able to do its work.  As data durability can't be
> guaranteed in such case, the checkpointer raises a PANIC level
> message, which raises an abort so that the whole instance do an
> emergency restart cycle.
> 
> Do you have monitoring for this filesystem?  Do you see spikes in
> disk usage or other strange behavior?

Then it is clear. Thanks for explanation and applogize for false bug
report.

I have probably misunderstood how is segfault distinguished from abort.
I need to fix my kernel.core_pattern script.

In attachment is screenshot from monitoring grafana with information
about space on /var/lib/pgsql partition.


-- 
Jozef Mlich <jmlich83@gmail.com>

Attachment

monitoring-df.png

pgsql-bugs by date:

From: Julien Rouhaud
Date: 01 April 2020, 09:04:55
Subject: Re: BUG #16331: segfault in checkpointer with full disk

From: Dmitry Dolgov
Date: 01 April 2020, 12:01:17
Subject: Re: BUG #16325: Assert failure on partitioning by int for a textvalue with a collation

Re: BUG #16331: segfault in checkpointer with full disk - Mailing list pgsql-bugs

Attachment

Previous

Next