On Tue, 30 Apr 2024 at 10:36, Cary Huang <cary.huang@highgo.ca> wrote:
> In one of our migration scenarios, we rely on tid range scan to migrate huge table from one database to another once
thelower and upper ctid bound is determined. With the support of parallel ctid range scan, this process could be done
muchquicker.
I would have thought that the best way to migrate would be to further
divide the TID range into N segments and run N queries, one per
segment to get the data out.
From a CPU point of view, I'd hard to imagine that a SELECT * query
without any other items in the WHERE clause other than the TID range
quals would run faster with multiple workers than with 1. The problem
is the overhead of pushing tuples to the main process often outweighs
the benefits of the parallelism. However, from an I/O point of view
on a server with slow enough disks, I can imagine there'd be a
speedup.
> The attached patch is my approach to add parallel ctid range scan to PostgreSQL's planner and executor. In my tests,
Ido see an increase in performance using parallel tid range scan over the single worker tid range scan and it is also
fasterthan parallel sequential scan covering similar ranges. Of course, the table needs to be large enough to reflect
theperformance increase.
>
> below is the timing to complete a select query covering all the records in a simple 2-column table with 40 million
records,
>
> - tid range scan takes 10216ms
> - tid range scan with 2 workers takes 7109ms
> - sequential scan with 2 workers takes 8499ms
Can you share more details about this test? i.e. the query, what the
times are that you've measured (EXPLAIN ANALYZE, or SELECT, COPY?).
Also, which version/commit did you patch against? I was wondering if
the read stream code added in v17 would result in the serial case
running faster because the parallelism just resulted in more I/O
concurrency.
Of course, it may be beneficial to have parallel TID Range for other
cases when more row filtering or aggregation is being done as that
requires pushing fewer tuples over from the parallel worker to the
main process. It just would be good to get to the bottom of if there's
still any advantage to parallelism when no filtering other than the
ctid quals is being done now that we've less chance of having to wait
for I/O coming from disk with the read streams code.
David