Home > mailing lists

Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2) - Mailing list pgsql-bugs

From	Dilyan Palauzov
Subject	Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2)
Date	September 17, 2017 06:30:11
Msg-id	a68737c6-601f-6846-2ba0-3795c942c81f@aegee.org Whole thread Raw
In response to	Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2) (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2)
List	pgsql-bugs

Tree view

Hello Tomas,

thanks for your answer.

While adding an ORDER BY uses the index, it does not help, as the user 
has to take care of the order of provided columns.  Whether the care is 
is taken in DISTINCT ON or in ORDER BY is secondary.

The behaviour behind DISTINCT and indexes surprises  me, as the query 
planner does reorder the columns for SELECT to determine the most 
suitable index.

My proposal to reflect this:

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1069,7 +1069,10 @@ SELECT DISTINCT ON (location) location, time, report    <para>     Currently, <literal>FOR NO
KEYUPDATE</>, <literal>FOR UPDATE</>,     <literal>FOR SHARE</> and <literal>FOR KEY SHARE</> cannot be
 
-    specified with <literal>DISTINCT</literal>.
+    specified with <literal>DISTINCT</literal>.  Contrary to
+    <literal>SELECT</>, which reorders its parameters to find a best,
+    matching index, DISTINCT ON constructs an expression, e.g. from the
+    provided rows, and checks then if an index can serve the expression.    </para>   </refsect2>


Please ignore my email from Friday, I have forgotten an ON after DISTINCT.


Greetings  Dilyan


On 09/16/2017 10:23 AM, Tomas Vondra wrote:
> Hello Dilyan,
>
> You're right - we're currently not able to use the index if it's not
> consistent with the DISTINCT ordering. That is, if you have index on
> (a,b) and DISTINCT ON (b,a) we fail to leverage the index.
>
> The reason for this simple - if you look at create_distinct_paths [1],
> which is where the Unique path comes from, you'll see it iterates over
> all paths and compares the ordering using pathkeys_is_contained [2].
>
> That however only ensures the path matches the expected Unique ordering
> (determined by the column list in DISTINCT ON clause), we don't try to
> re-shuffle the columns in any way at this point.
>
> So this is more a missing optimization than a bug, I'd guess. But it
> seems worthwhile and possibly not extremely difficult to implement, so I
> may look into it - but that's PG11 at the earliest.
>
> But, looking at the code in create_distinct_paths, ISTM you can easily
> convince the planner to use the index by simply adding a matching ORDER
> BY clause. That is
>
>     SELECT DISTINCT ON(token, id) token  FROM bayes_token
>     ORDER BY id, token;
>
> should be able to use the index on (id,token).
>
>
> [1]
> https://github.com/postgres/postgres/blob/master/src/backend/optimizer/plan/planner.c#L4725
>
> [2]
> https://github.com/postgres/postgres/blob/master/src/backend/optimizer/plan/planner.c#L4811
>
>
> regards
>


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

From: Bruce Momjian
Date: 17 September 2017, 03:02:06
Subject: Re: [BUGS] BUG #14798: postgres user superuser changed

From: Tomas Vondra
Date: 17 September 2017, 22:39:27
Subject: Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2)

Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2) - Mailing list pgsql-bugs

Previous

Next