Re: stress test for parallel workers - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: stress test for parallel workers |
Date | |
Msg-id | 20032.1570808731@sss.pgh.pa.us Whole thread Raw |
In response to | Re: stress test for parallel workers (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>) |
Responses |
Re: stress test for parallel workers
Re: stress test for parallel workers |
List | pgsql-hackers |
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes: >> At least on F29 I have set /proc/sys/kernel/core_pattern and it works. FWIW, I'm not excited about that as a permanent solution. It requires root privilege, and it affects the whole machine not only the buildfarm, and making it persist across reboots is even more invasive. > I have done the same on this machine. wobbegong runs every hour, so > let's see what happens next. With any luck the buildfarm will give us a > stack trace without needing further action. I already collected one manually. It looks like this: Program terminated with signal SIGSEGV, Segmentation fault. #0 sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5114 5114 { Missing separate debuginfos, use: dnf debuginfo-install glibc-2.26-30.fc27.ppc64le (gdb) bt #0 sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5114 #1 <signal handler called> #2 0x00007fff93923ca4 in sigprocmask () from /lib64/libc.so.6 #3 0x00000000103fad08 in reaper (postgres_signal_arg=<optimized out>) at postmaster.c:3215 #4 <signal handler called> #5 0x00007fff93923ca4 in sigprocmask () from /lib64/libc.so.6 #6 0x00000000103f9f98 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5275 #7 <signal handler called> #8 0x00007fff93923ca4 in sigprocmask () from /lib64/libc.so.6 #9 0x00000000103fad08 in reaper (postgres_signal_arg=<optimized out>) at postmaster.c:3215 #10 <signal handler called> #11 sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5114 #12 <signal handler called> #13 0x00007fff93923ca4 in sigprocmask () from /lib64/libc.so.6 #14 0x00000000103f9f98 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5275 #15 <signal handler called> #16 0x00007fff93923ca4 in sigprocmask () from /lib64/libc.so.6 #17 0x00000000103fad08 in reaper (postgres_signal_arg=<optimized out>) at postmaster.c:3215 ... #572 <signal handler called> #573 0x00007fff93923ca4 in sigprocmask () from /lib64/libc.so.6 #574 0x00000000103f9f98 in sigusr1_handler ( postgres_signal_arg=<optimized out>) at postmaster.c:5275 #575 <signal handler called> #576 0x00007fff93923ca4 in sigprocmask () from /lib64/libc.so.6 #577 0x00000000103fad08 in reaper (postgres_signal_arg=<optimized out>) at postmaster.c:3215 #578 <signal handler called> #579 sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5114 #580 <signal handler called> #581 0x00007fff93a01514 in select () from /lib64/libc.so.6 #582 0x00000000103f7ad8 in ServerLoop () at postmaster.c:1682 #583 PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1391 #584 0x0000000000000000 in ?? () What we've apparently got here is that signals were received so fast that the postmaster ran out of stack space. I remember Andres complaining about this as a theoretical threat, but I hadn't seen it in the wild before. I haven't finished investigating though, as there are some things that remain to be explained. The dependency on having force_parallel_mode = regress makes sense now, because the extra traffic to launch and reap all those parallel workers would increase the stress on the postmaster (and it seems likely that this stack trace corresponds exactly to alternating launch and reap signals). But why does it only happen during the pg_upgrade test --- plain "make check" ought to be about the same? I also want to investigate why clang builds seem more prone to this than gcc builds on the same machine; that might just be down to more or less stack consumption, but it bears looking into. regards, tom lane
pgsql-hackers by date: