Re: Performance issues with parallelism and LIMIT - Mailing list pgsql-hackers
| From | David Geier |
|---|---|
| Subject | Re: Performance issues with parallelism and LIMIT |
| Date | |
| Msg-id | 782fd9d0-99dc-4880-adcc-2df93557633d@gmail.com Whole thread Raw |
| In response to | Re: Performance issues with parallelism and LIMIT (Tomas Vondra <tomas@vondra.me>) |
| List | pgsql-hackers |
Hi Tomas! On 14.11.2025 17:00, Tomas Vondra wrote: > On 11/13/25 23:36, Tomas Vondra wrote: >> ... >> >> What I think we should do is much simpler - make the threshold in shm_mq >> dynamic, start with a very low value and gradually ramp up (up to 1/4). >> So we'd have >> >> if (mqh->mqh_consume_pending > threshold) >> >> We might start with >> >> threshold = (mq->mq_ring_size / 1024) >> >> or maybe some fixed value, list >> >> thredhold = 128 >> >> And on every signal we'd double it, capping it to 1/4 of mq_ring_size. >> >> threshold = Min(threshold * 2, mq->mq_ring_size / 1024); >> >> This is very similar to other places doing this gradual ramp up, like in >> the prefetching / read_stream, etc. It allows fast termination for low >> LIMIT values, but quickly amortizes the cost for high LIMIT values. >> > > I gave this a try today, to see if it can actually solve the regression. > Attached is a WIP patch, and a set of benchmarking scripts. On my ryzen > machine I got this (timings of the queries): > > fill dataset | 14 15 16 17 18 patched > ----------------------------------------------------------------- > 10 random | 64.1 319.3 328.7 340.5 344.3 79.5 > sequential | 54.6 323.4 347.5 350.5 399.2 78.3 > 100 random | 11.8 42.9 42.3 42.3 68.5 18.6 > sequential | 10.0 44.3 45.0 44.3 60.6 20.0 > > Clearly 15 is a significant regression, with timings ~4x higher. And the > patch improves that quite a bit. It's not down all the way back to 14, > there's still ~10ms regression, for some reason. > > Also, I didn't measure if this patch causes some other regressions for > other queries. I don't think it does, but maybe there's some weird > corner case affected. > > > regards > Thanks for working on that. This is certainly an improvement. It doesn't work always though. You can still get into the situation where enough data is waiting in the queues to satisfy the limit but the threshold hasn't been reached and also won't be reached anymore because no more rows will match. I'm especially passionate about that case because currently you can get arbitrarily bad query runtimes with big data sets and small LIMITs. As shared previously in this thread, I cannot reproduce any slowdown when deactivating the late latching. The test used a very narrow row (single INT) and the data set fit into shared memory. I've only tried with 8 parallel workers. Could you test if you can reproduce the slowdown, in case you have a machine with more cores at hand? If we can somehow reproduce the original problem, I would also like to check if there's not other issues at play that can be fixed differently (e.g. false sharing). If that optimization is truly necessary, how about always latching if a LIMIT clause is present? Or in the presence of a LIMIT clause, keeping the row count of totally produced rows in shared memory and latching in all workers once the LIMIT has been reached? The overhead of changing the shared atomic should be neglectable for reasonable LIMITs. Another alternative would be periodically latching. Given that the minimum runtime of any parallel query is a few dozen milliseconds due to forking and plan (de-)serialization, we could live with latching only say every millisecond or so. -- David Geier
pgsql-hackers by date: