Re: Performance issues with parallelism and LIMIT - Mailing list pgsql-hackers
| From | Tomas Vondra |
|---|---|
| Subject | Re: Performance issues with parallelism and LIMIT |
| Date | |
| Msg-id | c7a3cfcc-81a6-4dac-b064-7582ef9754bb@vondra.me Whole thread Raw |
| In response to | Re: Performance issues with parallelism and LIMIT (David Geier <geidav.pg@gmail.com>) |
| List | pgsql-hackers |
On 11/19/25 13:28, David Geier wrote: > On 18.11.2025 20:37, Tomas Vondra wrote: >> >> >> On 11/18/25 19:35, David Geier wrote: >>> >>> On 18.11.2025 18:31, Tomas Vondra wrote: >>>> On 11/18/25 17:51, Tom Lane wrote: >>>>> David Geier <geidav.pg@gmail.com> writes: >>>>>> On 18.11.2025 16:40, Tomas Vondra wrote: >>>>>>> It'd need code in the parallel-aware scans, i.e. seqscan, bitmap, index. >>>>>>> I don't think you'd need code in other plans, because all parallel plans >>>>>>> have one "driving" table. >>>>> >>>>> You're assuming that the planner will insert Gather nodes at arbitrary >>>>> places in the plan, which isn't true. If it does generate plans that >>>>> are problematic from this standpoint, maybe the answer is "don't >>>>> parallelize in exactly that way". >>>>> >>>> >>>> I think David has a point that nodes that "buffer" tuples (like Sort or >>>> HashAgg) would break the approach making this the responsibility of the >>>> parallel-aware scan. I don't see anything particularly wrong with such >>>> plans - plans with partial aggregation often look like that. >>>> >>>> Maybe this should be the responsibility of execProcnode.c, not the >>>> various nodes? > I hadn't realized that this approach has the same limitation: > ExecProcNode() is only called when e.g. heap_nextslot() or > index_getnext_slot() have found a qualifying tuple. Otherwise, they just > keep crunching without returning. > Right, that's why I suggested to have a function the nodes would call in suitable places. >>> I like that idea, even though it would still not work while a node is >>> doing the crunching. That is after it has pulled all rows and before it >>> can return the first row. During this time the node won't call >>> ExecProcNode(). >>> >> True. Perhaps we could provide a function nodes could call in suitable >> places to check whether to end? > This function would then also be required by the base relation scans. > And we would have to call it more or less in the same places > CHECK_FOR_INTERRUPTS() is called today. > Yes, but I don't think CHECK_FOR_INTERRUPTS() would be a good place to manipulate the executor state. Maybe you could do some magic with siglongjmp(), but I have "funny" feeling about that - I wouldn't be surprised if that interfered with elog(), which is the only other place using siglongjmp() AFAICS. Which is why I suggested maybe it should be handled in execProcnode (which would take care of cases where we produce a tuple), and then let nodes to optionally check the flag too (through a new function). I haven't tried doing this, so maybe I'm missing something ... > Beyond that, code such as heap_nextslot() or index_getnext_slot() don't > have access to the PlanState which would contain the stop flag. So that > would have to be propagated downwards as well. > > All of that would make for a fairly big patch, which the initial patch > avoids. > Right. I don't think we can set the flag in plan/executor state, because that's not available in signal handler / ProcessInterrupts() ... It'd need to be a global variable, I guess. >> >> Actually, how does canceling queries with parallel workers work? Is that >> done similarly to what your patch did? > Parallel workers use the same mechanism as normal backends, except that > parallel workers quit instead of waiting for the next query. > > The flow is as follows: parallel workers catch SIGINT via > StatementCancelHandler() which sets QueryCancelPending = true. When > ProcessInterrupts() is called the next time, it will elog(ERROR) out. > BackgroundWorkerMain() will catch the error and proc_exit(). > > This mechanism is very similar to what I have in my patch, with the > difference that (1) I use SendProcSignal() to inform the workers to stop > and (2) that I added another sigsetjmp target around ExecutorRun() to be > able to bail but still call all the shutdown code. > OK > (1) is necessary to be able to distinguish between query cancel and > early stopping but not cancelling. > > (2) is necessary because the parallel shutdown code needs to be called > before exiting. I tried to piggy back on the existing error handling > mechanism by siglongjmp(*PG_exception_stack, 1) and there to not calling > EmitErrorReport() if got_stopped == true. That gave me the following error: > > ERROR: lost connection to parallel worker > Not sure. I have my doubts about using siglongjmp() for this. >>> Inspectability on that end seems useful. Maybe only with VERBOSE, >>> similarly to the extended per-worker information. >>> >> >> Maybe, no opinion. But it probably needs to apply to all nodes in the >> parallel worker, right? Or maybe it's even a per-worker detail. > I thought to make it a per-worker detail such as time or number of rows > returned. Let's discuss this again, once we've settled on a solution. > OK regards -- Tomas Vondra
pgsql-hackers by date: