Re: [sqlsmith] Unpinning error in parallel worker - Mailing list pgsql-hackers
From | Jonathan Rudenberg |
---|---|
Subject | Re: [sqlsmith] Unpinning error in parallel worker |
Date | |
Msg-id | 1524579700.3786618.1349037056.03F1C731@webmail.messagingengine.com Whole thread Raw |
In response to | Re: [sqlsmith] Unpinning error in parallel worker (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: [sqlsmith] Unpinning error in parallel worker
|
List | pgsql-hackers |
On Fri, Apr 20, 2018, at 00:42, Thomas Munro wrote: > On Wed, Apr 18, 2018 at 11:43 AM, Jonathan Rudenberg > <jonathan@titanous.com> wrote: > > On Tue, Apr 17, 2018, at 19:31, Thomas Munro wrote: > >> On Wed, Apr 18, 2018 at 11:01 AM, Jonathan Rudenberg > >> <jonathan@titanous.com> wrote: > >> > Yep, I think I know approximately what it looked like, I've attached a lightly redacted plan. All of the hung querieswere running some variant of this plan as far as I can tell. > >> > >> Hmm, that isn't a parallel query. I was expecting to see "Gather" and > >> "Parallel" in there. > > > > Oops, I'm really sorry about that. I only have the first part of the hung queries, and there are a few variants. Here'sone that's parallel. > > I spent some time trying to reproduce this failure without any luck, > using query plans similar to your Gather plan fragment, and using some > test harness code for the allocator stuff in isolation. > > I had an idea that (1) freeing a large object that releases and unpins > a segment in one backend and then (2) freeing it again in another > backend (illegally) might produce this effect with sufficiently bad > luck. I'm still trying to reproduce that without any success, but I > get other kinds of failures which I think you'd be seeing too if that > hunch were right. Still looking... This issue happened again in production, here are the stack traces from three we grabbed before nuking the >400 hanging backends. process: /proc/24194/mem thread: 0x7f46263648c0, lwp: 24194, type: 1 #0 0x00007f4625fdc827 in do_futex_wait+55() at /lib/x86_64-linux-gnu/libpthread.so.0 at futex-internal.h:205 #1 0x00007f4625fdc8d4 in __new_sem_wait_slow+84() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_waitcommon.c:181 #2 0x00007f4625fdc97a in __new_sem_wait+58() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_wait.c:29 #3 0x000055fccb8c8832 in PGSemaphoreLock+34() at /usr/lib/postgresql/10/bin/postgres at pg_sema.c:310 #4 0x000055fccb93b21c in LWLockAcquire+188() at /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233 #5 0x000055fccb925fa7 in dsm_create+151() at /usr/lib/postgresql/10/bin/postgres at dsm.c:493 #6 0x000055fccb6f2a6f in InitializeParallelDSM+511() at /usr/lib/postgresql/10/bin/postgres at parallel.c:266 #7 0x000055fccb815f27 in ExecInitParallelPlan+807() at /usr/lib/postgresql/10/bin/postgres at execParallel.c:470 #8 0x000055fccb825a50 in ExecGather+1168() at /usr/lib/postgresql/10/bin/postgres at nodeGather.c:158 #9 0x000055fccb81dbd3 in ExecAppend+83() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #10 0x000055fccb81df2c in fetch_input_tuple+172() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #11 0x000055fccb820293 in ExecAgg+1235() at /usr/lib/postgresql/10/bin/postgres at nodeAgg.c:2539 #12 0x000055fccb835235 in CteScanNext+229() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #13 0x000055fccb817f0d in ExecScan+557() at /usr/lib/postgresql/10/bin/postgres at execScan.c:97 #14 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #15 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #16 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #17 0x000055fccb81df2c in fetch_input_tuple+172() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #18 0x000055fccb8201eb in ExecAgg+1067() at /usr/lib/postgresql/10/bin/postgres at nodeAgg.c:2347 #19 0x000055fccb811dfb in standard_ExecutorRun+379() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #20 0x000055fccb94d19b in PortalRunSelect+507() at /usr/lib/postgresql/10/bin/postgres at pquery.c:932 #21 0x000055fccb94e798 in PortalRun+824() at /usr/lib/postgresql/10/bin/postgres at pquery.c:773 #22 0x000055fccb94b329 in PostgresMain+3721() at /usr/lib/postgresql/10/bin/postgres at postgres.c:1984 #23 0x000055fccb6840d9 in ServerLoop+3412() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:4405 #24 0x000055fccb8da78b in PostmasterMain+4235() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:1363 #25 0x000055fccb6854d5 in main+2165() at /usr/lib/postgresql/10/bin/postgres at main.c:228 #26 0x00007f4623c07830 in __libc_start_main+240() at /lib/x86_64-linux-gnu/libc.so.6 at libc-start.c:291 #27 0x000055fccb685559 in _start!+41() at /usr/lib/postgresql/10/bin/postgres process: /proc/24231/mem thread: 0x7f46263648c0, lwp: 24231, type: 1 #0 0x00007f4625fdc827 in do_futex_wait+55() at /lib/x86_64-linux-gnu/libpthread.so.0 at futex-internal.h:205 #1 0x00007f4625fdc8d4 in __new_sem_wait_slow+84() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_waitcommon.c:181 #2 0x00007f4625fdc97a in __new_sem_wait+58() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_wait.c:29 #3 0x000055fccb8c8832 in PGSemaphoreLock+34() at /usr/lib/postgresql/10/bin/postgres at pg_sema.c:310 #4 0x000055fccb93b21c in LWLockAcquire+188() at /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233 #5 0x000055fccb925fa7 in dsm_create+151() at /usr/lib/postgresql/10/bin/postgres at dsm.c:493 #6 0x000055fccb6f2a6f in InitializeParallelDSM+511() at /usr/lib/postgresql/10/bin/postgres at parallel.c:266 #7 0x000055fccb815f27 in ExecInitParallelPlan+807() at /usr/lib/postgresql/10/bin/postgres at execParallel.c:470 #8 0x000055fccb825a50 in ExecGather+1168() at /usr/lib/postgresql/10/bin/postgres at nodeGather.c:158 #9 0x000055fccb81dbd3 in ExecAppend+83() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #10 0x000055fccb8348d6 in ExecSort+278() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #11 0x000055fccb834bd7 in ExecUnique+135() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #12 0x000055fccb835235 in CteScanNext+229() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #13 0x000055fccb817f0d in ExecScan+557() at /usr/lib/postgresql/10/bin/postgres at execScan.c:97 #14 0x000055fccb811dfb in standard_ExecutorRun+379() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #15 0x000055fccb94d19b in PortalRunSelect+507() at /usr/lib/postgresql/10/bin/postgres at pquery.c:932 #16 0x000055fccb94e798 in PortalRun+824() at /usr/lib/postgresql/10/bin/postgres at pquery.c:773 #17 0x000055fccb94b329 in PostgresMain+3721() at /usr/lib/postgresql/10/bin/postgres at postgres.c:1984 #18 0x000055fccb6840d9 in ServerLoop+3412() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:4405 #19 0x000055fccb8da78b in PostmasterMain+4235() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:1363 #20 0x000055fccb6854d5 in main+2165() at /usr/lib/postgresql/10/bin/postgres at main.c:228 #21 0x00007f4623c07830 in __libc_start_main+240() at /lib/x86_64-linux-gnu/libc.so.6 at libc-start.c:291 #22 0x000055fccb685559 in _start!+41() at /usr/lib/postgresql/10/bin/postgres process: /proc/24136/mem thread: 0x7f46263648c0, lwp: 24136, type: 1 #0 0x00007f4625fdc827 in do_futex_wait+55() at /lib/x86_64-linux-gnu/libpthread.so.0 at futex-internal.h:205 #1 0x00007f4625fdc8d4 in __new_sem_wait_slow+84() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_waitcommon.c:181 #2 0x00007f4625fdc97a in __new_sem_wait+58() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_wait.c:29 #3 0x000055fccb8c8832 in PGSemaphoreLock+34() at /usr/lib/postgresql/10/bin/postgres at pg_sema.c:310 #4 0x000055fccb93b21c in LWLockAcquire+188() at /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233 #5 0x000055fccb925fa7 in dsm_create+151() at /usr/lib/postgresql/10/bin/postgres at dsm.c:493 #6 0x000055fccb6f2a6f in InitializeParallelDSM+511() at /usr/lib/postgresql/10/bin/postgres at parallel.c:266 #7 0x000055fccb815f27 in ExecInitParallelPlan+807() at /usr/lib/postgresql/10/bin/postgres at execParallel.c:470 #8 0x000055fccb825a50 in ExecGather+1168() at /usr/lib/postgresql/10/bin/postgres at nodeGather.c:158 #9 0x000055fccb81dbd3 in ExecAppend+83() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #10 0x000055fccb81df2c in fetch_input_tuple+172() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #11 0x000055fccb820293 in ExecAgg+1235() at /usr/lib/postgresql/10/bin/postgres at nodeAgg.c:2539 #12 0x000055fccb835235 in CteScanNext+229() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #13 0x000055fccb817f0d in ExecScan+557() at /usr/lib/postgresql/10/bin/postgres at execScan.c:97 #14 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #15 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #16 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #17 0x000055fccb81df2c in fetch_input_tuple+172() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #18 0x000055fccb8201eb in ExecAgg+1067() at /usr/lib/postgresql/10/bin/postgres at nodeAgg.c:2347 #19 0x000055fccb811dfb in standard_ExecutorRun+379() at /usr/lib/postgresql/10/bin/postgres at executor.h:250 #20 0x000055fccb94d19b in PortalRunSelect+507() at /usr/lib/postgresql/10/bin/postgres at pquery.c:932 #21 0x000055fccb94e798 in PortalRun+824() at /usr/lib/postgresql/10/bin/postgres at pquery.c:773 #22 0x000055fccb94b329 in PostgresMain+3721() at /usr/lib/postgresql/10/bin/postgres at postgres.c:1984 #23 0x000055fccb6840d9 in ServerLoop+3412() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:4405 #24 0x000055fccb8da78b in PostmasterMain+4235() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:1363 #25 0x000055fccb6854d5 in main+2165() at /usr/lib/postgresql/10/bin/postgres at main.c:228 #26 0x00007f4623c07830 in __libc_start_main+240() at /lib/x86_64-linux-gnu/libc.so.6 at libc-start.c:291 #27 0x000055fccb685559 in _start!+41() at /usr/lib/postgresql/10/bin/postgres
pgsql-hackers by date: