Home > mailing lists

Re: Unwanted expression simplification in PG12b2 - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Unwanted expression simplification in PG12b2
Date	September 20, 2019 20:14:25
Msg-id	CA+Tgmob81L3_21YAN8Yg0HU5WUUbe4syy2hn+4it+VyhGZ3Riw@mail.gmail.com Whole thread
In response to	Re: Unwanted expression simplification in PG12b2 (Darafei "Komяpa" Praliaskouski <me@komzpa.net>)
Responses	Re: Unwanted expression simplification in PG12b2
List	pgsql-hackers

Tree view

On Wed, Jul 17, 2019 at 5:20 PM Darafei "Komяpa" Praliaskouski
<me@komzpa.net> wrote:
> Indeed, it seems I failed to minimize my example.
>
> Here is the actual one, on 90GB table with 16M rows:
> https://gist.github.com/Komzpa/8d5b9008ad60f9ccc62423c256e78b4c
>
> I can share the table on request if needed, but hope that plan may be enough.

[ replying to an old thread ]

I think that this boils down to a lack of planner smarts about target
lists. The planner currently assumes that any given relation - which
for planner purposes might be an actual table or might be the result
of joining multiple tables, aggregating something, running a subquery,
etc. - more or less has one thing that it's supposed to produce. It
only tries to generate plans that produce that target list. There's
some support in there for the idea that there might be various paths
for the same relation that produce different answers, but I don't know
of that actually being used anywhere (but it might be).

What I taught the planner to do here had to do with making the costing
more accurate for cases like this. It now figures out that if it's
going to stick a Gather in at that point, computing the expressions
below the Gather rather than above the Gather makes a difference to
the cost of that plan vs. other plans. However, it still doesn't
consider any more paths than it did before; it just costs them more
accurately. In your first example, I believe that the planner should
be able to consider both GroupAggregate -> Gather Merge -> Sort ->
Parallel Seq Scan and GroupAggregate -> Sort -> Gather -> Parallel Seq
Scan, but I think it's got a fixed idea about which fields should be
fed into the Sort. In particular, I believe it thinks that sorting
more data is so undesirable that it doesn't want to carry any
unnecessary baggage through the Sort for any reason. To solve this
problem, I think it would need to cost the second plan with projection
done both before the Sort and after the Sort and decide which one was
cheaper.

This class of problem is somewhat annoying in that the extra planner
cycles and complexity to deal with getting this right would be useless
for many queries, but at the same time, there are a few cases where it
can win big. I don't know what to do about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Andres Freund
Date: 20 September 2019, 20:05:06
Subject: Re: Write visibility map during CLUSTER/VACUUM FULL

From: Tom Lane
Date: 20 September 2019, 20:25:21
Subject: Re: subscriptionCheck failures on nightjar

Re: Unwanted expression simplification in PG12b2 - Mailing list pgsql-hackers

Previous

Next