Home > mailing lists

Re: [HACKERS] Parallel Index Scans - Mailing list pgsql-hackers

From	Rahila Syed
Subject	Re: [HACKERS] Parallel Index Scans
Date	January 31, 2017 18:23:50
Msg-id	CAH2L28u-HkqXXr8bWMWQQ4TFma3azyfXZUio1yjgcbNMJrXVvw@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Parallel Index Scans (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: [HACKERS] Parallel Index Scans
List	pgsql-hackers

Tree view

Hello,

>Agreed, that it makes sense to consider only the number of pages to
>scan for computation of parallel workers. I think for index scan we
>should consider both index and heap pages that need to be scanned
>(costing of index scan consider both index and heap pages). I thin
>where considering heap pages matter more is when the finally selected
>rows are scattered across heap pages or we need to apply a filter on
>rows after fetching from the heap. OTOH, we can consider just pages
>in the index as that is where mainly the parallelism works

IMO, considering just index pages will give a better estimate of work to be done

in parallel. As the amount of work/number of pages divided amongst workers is irrespective of

the number of heap pages scanned.

Thank you,

Rahila Syed

On Mon, Jan 30, 2017 at 2:52 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jan 28, 2017 at 1:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> parallel_index_opt_exec_support_v6 - Removed the usage of
>> GatherSupportsBackwardScan. Expanded the comments in
>> ExecReScanIndexScan.
>
> I looked through this and in general it looks reasonable to me.
> However, I did notice one thing that I think is wrong. In the
> parallel bitmap heap scan patch, the second argument to
> compute_parallel_worker() is the number of pages that the parallel
> scan is expected to fetch from the heap. In this patch, it's the
> total number of pages in the index. The former seems to me to be
> better, because the point of having a threshold relation size for
> parallelism is that we don't want to use a lot of workers to scan a
> small number of pages -- the distribution of work won't be even, and
> the potential savings are limited. If we've got a big index but are
> using a very selective qual to pull out only one or a small number of
> rows on a single page or a small handful of pages, we shouldn't
> generate a parallel path for that.
>

Agreed, that it makes sense to consider only the number of pages to
scan for computation of parallel workers. I think for index scan we
should consider both index and heap pages that need to be scanned
(costing of index scan consider both index and heap pages). I thin
where considering heap pages matter more is when the finally selected
rows are scattered across heap pages or we need to apply a filter on
rows after fetching from the heap. OTOH, we can consider just pages
in the index as that is where mainly the parallelism works. In the
attached patch (parallel_index_opt_exec_support_v7.patch), I have
considered both index and heap pages, let me know if you think some
other way is better. I have also prepared a separate independent
patch (compute_index_pages_v1) on HEAD to compute index pages which
can be used by parallel index scan. There is no change in
parallel_index_scan (parallel btree scan) patch, so I am not attaching
its new version.

> Now, against that theory, the GUC that controls the behavior of
> compute_parallel_worker() is called min_parallel_relation_size, which
> might make you think that the decision is supposed to be based on the
> whole size of some relation. But I think that just means we need to
> rename the GUC to something like min_parallel_scan_size. Possibly we
> also ought consider reducing the default value somewhat, because it
> seems like both sequential and index scans can benefit even when
> scanning less than 8MB.
>

Agreed, but let's consider it separately.

The order in which patches needs to be applied:
compute_index_pages_v1.patch, parallel_index_scan_v6.patch[1],
parallel_index_opt_exec_support_v7.patch

[1] - https://www.postgresql.org/message-id/CAA4eK1J%3DLSBpDx7i_izGJxGVUryqPe-2SKT02De-PrQvywiMxw%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: "Daniel Verite"
Date: 31 January 2017, 18:21:49
Subject: Re: [HACKERS] One-shot expanded output in psql using \G

From: Haribabu Kommi
Date: 31 January 2017, 18:39:53
Subject: Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

Re: [HACKERS] Parallel Index Scans - Mailing list pgsql-hackers

Previous

Next