Re: How to estimate the shared memory size required for parallel scan? - Mailing list pgsql-hackers

From Masayuki Takahashi
Subject Re: How to estimate the shared memory size required for parallel scan?
Date
Msg-id CA+z6ocQ69eWcVqoib2sDR+A3HFWwqerbBWwUe0sRieoFE+c=FA@mail.gmail.com
Whole thread Raw
In response to Re: How to estimate the shared memory size required for parallel scan?  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: How to estimate the shared memory size required for parallel scan?
List pgsql-hackers
(Sorry, once I sent to Thomas only. This is re-post.)

Hi Thomas,

Thanks you for excellent explaining about shared memory in parallel
scan and 'foreign path'.
Those are points that I want to know. thanks.

> If you just supply an IsForeignScanParallelSafe function that returns
> true, that would allow your FDW to be used inside parallel workers and
> wouldn't need any extra shared memory, but it wouldn't be a "parallel
> scan".  It would just be "parallel safe".  Each process that does a
> scan of your FDW would expect a full normal scan (presumably returning
> the same tuples in each process).

I think that parallel scan mechanism uses this each worker's full
normal scan to partitioned records, right?
For example, I turned IsForeignScanParallelSafe to true in cstore_fdw
and compared partitioned/non-partitioned scan.

https://gist.github.com/masayuki038/daa63a21f8c16ffa8138b50db9129ced

This shows that counted by each partition and 'Gather Merge' merge results.
As a result, parallel scan and aggregation shows the correct count.

Then, in the case of cstore_fdw, it may not be necessary to reserve
the shared memory in EstimateDSMForeignScan.

> So I guess this hasn't been done before and would require some more
> research.

I agree. I will try some query patterns.
thanks.
2018年8月18日(土) 23:08 Thomas Munro <thomas.munro@enterprisedb.com>:
>
> On Sun, Aug 19, 2018 at 1:40 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > A true parallel scan of an FDW would be one where each process emits
> > an arbitrary fraction of the tuples, but together they emit all of the
> > tuples.  You'd almost certainly need to use some shared memory to
> > coordinate that.  To say that you support that, I think your
> > GetForeignPaths() function would need to call add_partial_path().  And
> > unless I'm mistaken, whether or not InitializeDSMForeignScan etc are
> > called might be the only indication you get of whether you need to run
> > in parallel-aware mode.  I haven't personally heard of any FDWs that
> > can do this yet, but I just tried hacking file_fdw to register a
> > partial path and it seems to work (though of course the results are
> > duplicated because the emitted tuples are not actually partial).
>
> ... though I just noticed that my quick test used "Single Copy" mode.
> I think I see why: it looks like core's create_foreignscan_path()
> function might need to take num_workers and set parallel_aware if > 0.
> So I guess this hasn't been done before and would require some more
> research.
>
> --
> Thomas Munro
> http://www.enterprisedb.com



--
高橋 真之


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Fix for REFRESH MATERIALIZED VIEW ownership error message
Next
From: Nico Williams
Date:
Subject: Re: Allowing printf("%m") only where it actually works