Home > mailing lists

Re: [HACKERS] Parallel Plans and Cost of non-filter functions - Mailing list pgsql-hackers

From	Paul Ramsey
Subject	Re: [HACKERS] Parallel Plans and Cost of non-filter functions
Date	November 4, 2017 04:56:24
Msg-id	CACowWR2Qy-7rODmnjnu-jzwjtz4WRtPf9f1fKCB9vDEJE23FhQ@mail.gmail.com Whole thread Raw
In response to	[HACKERS] Parallel Plans and Cost of non-filter functions (Paul Ramsey <pramsey@cleverelephant.ca>)
Responses	Re: [HACKERS] Parallel Plans and Cost of non-filter functions
List	pgsql-hackers

Tree view

Just clarifying myself a little, since I made a dumb error partway through.

On Thu, Nov 2, 2017 at 10:09 AM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

I'm working on a custom aggregate, that generates a serialized data format. The preparation of the geometry before being formatted is pretty intense, so it is probably a good thing for that work to be done in parallel, in partial aggregates. Here's an example SQL call:

EXPLAIN analyze
SELECT length(ST_AsMVT(a)) FROM (
SELECT ST_AsMVTGeom(p.geom, ::geometry_literal, 4096, 0, true), gid, fed_num
FROM pts_10 p
WHERE p.geom && ::geometry_literal
AND p.geom IS NOT NULL
) a;

The ST_AsMVTGeom() function can be comically expensive, it's really good when it's in partial aggregates. But the cost of the function seems to be ignored.

(First note that, in order to consistently get parallel plans I have to brutally suppress parallel_tuple_cost, as described here http://blog.cleverelephant.ca/2017/10/parallel-postgis-2.html)

Whether I get a parallel aggregate seems entirely determined by the number of rows, not the cost of preparing those rows.

This is true, as far as I can tell and unfortunate. Feeding tables with 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no matter how costly the work going on within. That's true of changing costs on the subquery select list, and on the aggregate transfn.

When changing the number of rows in the subquery, with a LIMIT, I can change from a seq scan to a paralllel seq scan and finally to a parallel aggregate, as the number of rows goes up.

I see now that as soon as I brought the LIMIT in, the plans had to go sequential, just due to the nature of a LIMIT in a subquery. Ignore the below, sorry.

Thanks!
P

An odd effect: when I have enough rows to get a paralllel seq scan, I get flip it back to a seq scan, by *increasing* the cost of ST_AsMVTGeom. That seems odd and backwards.

Is there anywhere a guide or rough description to how costs are used in determining parallel plans? The empirical approach starts to wear one down after a while :)

P.

pgsql-hackers by date:

From: Nikita Glukhov
Date: 04 November 2017, 04:52:53
Subject: Re: [HACKERS] SQL/JSON in PostgreSQL

From: Tom Lane
Date: 04 November 2017, 05:13:27
Subject: Re: [HACKERS] Parallel Plans and Cost of non-filter functions

Re: [HACKERS] Parallel Plans and Cost of non-filter functions - Mailing list pgsql-hackers

Previous

Next