Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2) - Mailing list pgsql-bugs

From Dilyan Palauzov
Subject Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2)
Date
Msg-id a68737c6-601f-6846-2ba0-3795c942c81f@aegee.org
Whole thread Raw
In response to Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2)
List pgsql-bugs
Hello Tomas,

thanks for your answer.

While adding an ORDER BY uses the index, it does not help, as the user 
has to take care of the order of provided columns.  Whether the care is 
is taken in DISTINCT ON or in ORDER BY is secondary.

The behaviour behind DISTINCT and indexes surprises  me, as the query 
planner does reorder the columns for SELECT to determine the most 
suitable index.

My proposal to reflect this:

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1069,7 +1069,10 @@ SELECT DISTINCT ON (location) location, time, report    <para>     Currently, <literal>FOR NO
KEYUPDATE</>, <literal>FOR UPDATE</>,     <literal>FOR SHARE</> and <literal>FOR KEY SHARE</> cannot be
 
-    specified with <literal>DISTINCT</literal>.
+    specified with <literal>DISTINCT</literal>.  Contrary to
+    <literal>SELECT</>, which reorders its parameters to find a best,
+    matching index, DISTINCT ON constructs an expression, e.g. from the
+    provided rows, and checks then if an index can serve the expression.    </para>   </refsect2>


Please ignore my email from Friday, I have forgotten an ON after DISTINCT.


Greetings  Dilyan


On 09/16/2017 10:23 AM, Tomas Vondra wrote:
> Hello Dilyan,
>
> You're right - we're currently not able to use the index if it's not
> consistent with the DISTINCT ordering. That is, if you have index on
> (a,b) and DISTINCT ON (b,a) we fail to leverage the index.
>
> The reason for this simple - if you look at create_distinct_paths [1],
> which is where the Unique path comes from, you'll see it iterates over
> all paths and compares the ordering using pathkeys_is_contained [2].
>
> That however only ensures the path matches the expected Unique ordering
> (determined by the column list in DISTINCT ON clause), we don't try to
> re-shuffle the columns in any way at this point.
>
> So this is more a missing optimization than a bug, I'd guess. But it
> seems worthwhile and possibly not extremely difficult to implement, so I
> may look into it - but that's PG11 at the earliest.
>
> But, looking at the code in create_distinct_paths, ISTM you can easily
> convince the planner to use the index by simply adding a matching ORDER
> BY clause. That is
>
>     SELECT DISTINCT ON(token, id) token  FROM bayes_token
>     ORDER BY id, token;
>
> should be able to use the index on (id,token).
>
>
> [1]
> https://github.com/postgres/postgres/blob/master/src/backend/optimizer/plan/planner.c#L4725
>
> [2]
> https://github.com/postgres/postgres/blob/master/src/backend/optimizer/plan/planner.c#L4811
>
>
> regards
>


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [BUGS] BUG #14798: postgres user superuser changed
Next
From: Tomas Vondra
Date:
Subject: Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2)