Re: Current master hangs under the debugger after Parallel Seq Scan (Linux, MacOS) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Current master hangs under the debugger after Parallel Seq Scan (Linux, MacOS) |
Date | |
Msg-id | 3kp64koynvdzepbyddpkel7dugnku7ksfevkovx3rrrsle4dcp@ah7gla44mxjh Whole thread Raw |
In response to | Current master hangs under the debugger after Parallel Seq Scan (Linux, MacOS) (Vladlen Popolitov <v.popolitov@postgrespro.ru>) |
Responses |
Re: Current master hangs under the debugger after Parallel Seq Scan (Linux, MacOS)
|
List | pgsql-hackers |
Hi, On 2025-03-26 21:53:35 +0700, Vladlen Popolitov wrote: > During debug session I found, that queries with Parallel Seq Scan hang > in the current master - the leader worker waits indefinitely the signal > from parallel workers. A query is not possible to break, the leader > does not check interrupt status in the waiting loop. > > 1. How to reproduce: > a) Create table: > > CREATE DATABASE expr; > \c expr > CREATE TABLE testexpr( > id INT, > val INT > ); > INSERT INTO testexpr (id, val) > SELECT serie as id , MOD(serie, 10) as val > FROM generate_series(1,1000000) as serie; > EXPLAIN (ANALYZE) SELECT * FROM testexpr > WHERE val=1 AND id<30; > > b) start debugger for this connection > > c) Run command (parallel workers should be enabled as it is by default > configuration) > EXPLAIN (ANALYZE) SELECT * FROM testexpr > WHERE val=1 AND id<30; > > d) Above query will start parallel worker(s). When worker(s) finish(es), > it/they send SIGUSR1 that is caught by debugger. When you dimiss > the signal message, you find that query continues to run, but really it > waits (in latch.c or in waiteventset.c depending on commit version). Isn't that to be expected? If I understand correctly, the way your gdb is configured is that it intercepts SIGUSR1 signals *without* passing it on to the application (i.e. postgres). We rely on the signal to be delivered. Which it isn't. Thus a hang. At least my gdb doesn't intercept SIGUSR1 by default. It's a newer gdb though, so that could have been different in the past (although I don't remember a different behaviour). (gdb) handle SIGUSR1 Signal Stop Print Pass to program Description SIGUSR1 No No Yes User defined signal 1 If I change the configuration to not pass it, but print it, I can reproduce a hang: handle SIGUSR1 print nopass What does your gdb show for "handle SIGUSR1"? If it isn't what I reported, is it possible that you set that in your .gdbinit or such? > 2. Original commit with reproducible behaviour. > I tracked this behaviour down to commit > > commit 7202d72787d3b93b692feae62ee963238580c877 > > Date: Fri Feb 21 08:03:33 2025 +0100 > > backend launchers void * arguments for binary data > > Change backend launcher functions to take void * for binary data > > instead of char *. This removes the need for numerous casts. > > Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org I also find it very hard to believe that this commit introduced this problem - it doesn't sound like a postgres issue to me. I can reproduce it in PG 16, after doing "handle SIGUSR1 print nopass". Greetings, Andres Freund
pgsql-hackers by date: