Re: How to estimate the shared memory size required for parallel scan? - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: How to estimate the shared memory size required for parallel scan? |
Date | |
Msg-id | CAEepm=1Z_QY9fK6sSro-3AV+LfvGXpO52WBKrqTn7DzYyocQqQ@mail.gmail.com Whole thread Raw |
In response to | Re: How to estimate the shared memory size required for parallel scan? (Masayuki Takahashi <masayuki038@gmail.com>) |
Responses |
Re: How to estimate the shared memory size required for parallel scan?
|
List | pgsql-hackers |
On Sun, Aug 19, 2018 at 4:28 PM, Masayuki Takahashi <masayuki038@gmail.com> wrote: >> If you just supply an IsForeignScanParallelSafe function that returns >> true, that would allow your FDW to be used inside parallel workers and >> wouldn't need any extra shared memory, but it wouldn't be a "parallel >> scan". It would just be "parallel safe". Each process that does a >> scan of your FDW would expect a full normal scan (presumably returning >> the same tuples in each process). > > I think that parallel scan mechanism uses this each worker's full > normal scan to partitioned records, right? > For example, I turned IsForeignScanParallelSafe to true in cstore_fdw > and compared partitioned/non-partitioned scan. > > https://gist.github.com/masayuki038/daa63a21f8c16ffa8138b50db9129ced > > This shows that counted by each partition and 'Gather Merge' merge results. > As a result, parallel scan and aggregation shows the correct count. Ah, so here you have a Parallel Append node. That is a way to get coarse-grained parallelism when you have only parallel-safe (not parallel-aware) scans, but you have partitions. Technically (in our jargon) there is no parallel scan happening here, but Parallel Append is smart enough to scan each partition in a different worker. That means that the 'granularity' of parallelism is whole tables (partitions), so if you have (say) 3 partitions of approximately the same size and 2 processes, you'll probably see that one of the processes scans 1 partition and the other process scans 2 partitions, so the work can be quite unbalanced. But if you have lots of partitions, it's good, and in any case it's certainly better than no parallelism. > Then, in the case of cstore_fdw, it may not be necessary to reserve > the shared memory in EstimateDSMForeignScan. Correct. If all you need is parallel-safe scans, then you probably don't need any shared memory. BTW to be truly pedantically parallel-safe, I think it should ideally be the case that each process has the same "snapshot" when scanning, or subtle inconsistencies could arise (a transaction could be visible to one process, but not to another; this would be weirder if it applied to concurrent scans of the *same* foreign table, but it could still be strange when scanning different partitions in a Parallel Append). For file_fdw, we just didn't worry about that because plain old text files are not transactional anyway, so we shrugged and declared its scans to be parallel safe. I suppose that any FDW that is backed by a non-snapshot-based system (including other RDBMSs) would probably have no way to do better than that, and you might make the same decision we made for file_fdw. When the foreign table is PostgreSQL, or an extension that is tightly integrated into our transaction system, I suppose you might want to think harder and maybe even give the user some options? >> So I guess this hasn't been done before and would require some more >> research. > > I agree. I will try some query patterns. > thanks. Just to be clear, there I was talking about true Parallel Foreign Scan, which is aiming a bit higher than mere parallel safety. After looking at this again, this time with the benefit of coffee, I *think* it should be possible without modifying core, if you do this: 1. As already mentioned, you need to figure out a way for cstore_fdw to hand out a disjoint set of tuples to different processes. That seems quite doable, since cstore is apparently block-structured (though I only skim-read it for about 7 seconds and could be wrong about that). You apparently have blocks and stripes: hopefully they are of fixed size so you might be able to teach each process to advance some kind of atomic variable in shared memory so that each process eats different blocks? 2. Teach your GetForeignPath function to do something like this: ForeignPath *partial_path; double parallel_divisor; int parallel_workers; ... existing code that adds regular non-partial path here ... /* Should we add a partial path to enable a parallel scan? */ partial_path = create_foreignscan_path(root, baserel, NULL, baserel->rows, startup_cost, total_cost, NIL, NULL, NULL, coptions); parallel_workers = compute_parallel_worker(baserel, expected_num_pages, -1, max_parallel_workers_per_gather); partial_path->path.parallel_workers = parallel_workers; partial_path->path.parallel_aware = true; parallel_divisor = get_parallel_divisor(&partial_path->path); partial_path->path.rows /= parallel_divisor; partial_path->path.total_cost = startup_cost + ((total_cost - startup_cost) / parallel_divisor); if (parallel_workers > 0) add_partial_path(baserel, (Path *) partial_path); You don't really have to use compute_parallel_worker() and get_parallel_divisor() if you have a smarter way of coming up with those numbers, but I'd probably use that logic to get started. Unfortunately get_parallel_divisor() is not an extern function so you'd need a clone of it, or equivalent logic. It's also a bit inconvenient that it takes a Path * instead of just parallel_workers, which would allow tidier coding here. It's also inconvenient that you can't ALTER TABLE my_foreign_table SET (parallel_workers = N) today, which compute_parallel_worker() would respect. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: