Home > mailing lists

Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: [HACKERS] why not parallel seq scan for slow functions
Date	March 29, 2018 03:01:20
Msg-id	CA+TgmobwhrBcYNfo2y6sddUS4vgkaUs4M0vkdXxLTbi8+f-fmQ@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] why not parallel seq scan for slow functions (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: [HACKERS] why not parallel seq scan for slow functions (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Mar 28, 2018 at 3:06 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Good idea, such an optimization will ensure that the cases reported
> above will not have regression.  However isn't it somewhat beating the
> idea you are using in the patch which is to avoid modifying the path
> in-place?

Sure, but you can't have everything.  I don't think modifying the
sortgroupref data in place is really quite the same thing as changing
the pathtarget in place; the sortgroupref stuff is some extra
information about the targets being computed, not really a change in
targets per se.  But in any case if you want to eliminate extra work
then we've gotta eliminate it...

> In any case, I think one will still see regression in cases
> where this optimization doesn't apply.  For example,
>
> DO $$
> DECLARE count integer;
> BEGIN
> For count In 1..1000000 Loop
> Execute 'explain Select sum(thousand)from tenk1 group by ten';
> END LOOP;
> END;
> $$;
>
> The above block takes 43700.0289 ms on Head and 45025.3779 ms with the
> patch which is approximately 3% regression.

Thanks for the analysis -- the observation that this seemed to affect
cases where CP_LABEL_TLIST gets passed to create_projection_plan
allowed me to recognize that I was doing an unnecessary copyObject()
call.  Removing that seems to have reduced this regression below 1% in
my testing.

Also, keep in mind that we're talking about extremely small amounts of
time here.  On a trivial query that you're not even executing, you're
seeing a difference of (45025.3779 - 43700.0289)/1000000 = 0.00132 ms
per execution.  Sure, it's still 3%, but it's 3% of the time in an
artificial case where you don't actually run the query.  In real life,
nobody runs EXPLAIN in a tight loop a million times without ever
running the query, because that's not a useful thing to do.  The
overhead on a realistic test case will be smaller.  Furthermore, at
least in my testing, there are now cases where this is faster than
master.  Now, I welcome further ideas for optimization, but a patch
that makes some cases slightly slower while making others slightly
faster, and also improving the quality of plans in some cases, is not
to my mind an unreasonable thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

From: "David G. Johnston"
Date: 29 March 2018, 02:32:46
Subject: Re: csv format for psql

From: Robert Haas
Date: 29 March 2018, 03:09:07
Subject: Re: reorganizing partitioning code

Re: [HACKERS] why not parallel seq scan for slow functions - Mailing list pgsql-hackers

Attachment

Previous

Next