Re: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function - Mailing list pgsql-bugs

From David Rowley
Subject Re: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function
Date
Msg-id CAApHDvqKE2MLsQidBtY8AaM+J9VWN5uct3dQ92HnfDKzc4f=cQ@mail.gmail.com
Whole thread Raw
In response to BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
On Sun, 20 Sep 2020 at 03:53, PG Bug reporting form
<noreply@postgresql.org> wrote:
> We are developing a tool for automatically finding performance bugs in
> PostgreSQL. Our key insight is that given a pair of semantic equivalent
> queries, a robust DBMS should return the same result within a similar
> execution time. Significant time difference suggests a potential performance
> bug in the DBMS.

It's important to differentiate between bugs and optimisations that
PostgreSQL does not perform.   I imagine the findings of running this
tool is best directed towards a wiki page on
https://wiki.postgresql.org/

It's also important to always go ahead and apply such optimisations
without any regard to the cost of checking if the optimisation can
apply.  For optimisations like this, it's not always just a simple
case of attempting to apply them regardless. Checking for such cases
will penalise queries where the optimisation cannot be applied. The
people that benefit are the ones that write bad SQL and the people
that lose out are the ones who quite good SQL. That's not a
particularly good incentive to write good SQL.  In some cases, the
cost of checking if the optimisation can be applied will be so
negligible that it's worth it as the gains are good if it can be
applied.  The answer to whether this the case for this particilar
optimisation will depend on who you ask.

> We are sharing a pair of TPC-H queries that exhibit a potential performance
> bug in this report:
>
> First query:
> SELECT "s_suppkey"
> FROM   "supplier"
> WHERE  s_suppkey > 100;
>
> Second query:
> SELECT "s_suppkey"
> FROM   "supplier"
> WHERE  s_suppkey > 100
> GROUP  BY s_suppkey;
>
> [Actual Behavior]
> We executed both queries on the TPC-H benchmark of scale factor 5: the first
> query takes only 17 millisecond, while the second query takes 42
> millisecond. We think the time difference results from different plans
> selected.

Accounting for the information on the other email that mentions
s_suppkey is the primary key of the supplier table, this is not a bug.
It's simply an optimisation that we currently don't apply. There's a
patch around that aims to implement this but it's not yet been
applied. So the optimisation may appear in some future version of
PostgreSQL. If you'd like to help with that then please look at
https://commitfest.postgresql.org/29/2433/

I'd suggest a wiki page would be a good place to note down other
possible future optimisations. There's no shortage of possible
optimisations that we don't apply, and an endless stream of bug
reports is not the way to have new optimisations added to the
PostgreSQL planner.

David



pgsql-bugs by date:

Previous
From: Xinyu Liu
Date:
Subject: Re: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function
Next
From: David Rowley
Date:
Subject: Re: BUG #16624: Query Optimizer - Performance bug related to predicate simplification