Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2) - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: [BUGS] Query planner skipping index depending on DISTINCTparameter order (2)
Date
Msg-id 580b8874-4655-1454-2459-c1cd3432c61b@2ndquadrant.com
Whole thread Raw
In response to Re: [BUGS] Query planner skipping index depending on DISTINCT parameter order (2)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On 09/19/2017 05:40 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 09/17/2017 07:15 PM, Tom Lane wrote:
>>> For the DISTINCT ON case, the user-visible semantics are actually pretty
>>> tightly tied to ORDER BY, so that it would not be very reasonable to
>>> consider any other orderings than the given column order anyway.
> 
>> Tied in what sense? In the docs we explicitly say this:
> 
>>   https://www.postgresql.org/docs/10/static/sql-select.html#sql-distinct
> 
>>   SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of
>>   each set of rows where the given expressions evaluate to equal. The
>>   DISTINCT ON expressions are interpreted using the same rules as for
>>   ORDER BY (see above). Note that the “first row” of each set is
>>   unpredictable unless ORDER BY is used to ensure that the desired row
>>   appears first.
> 
> Right, so the behavior is undefined unless you have an ORDER BY
> clause that includes the DISTINCT ON columns plus some more columns.
> That's pretty tightly tied in my book.
> 

Ah, OK. I thought you're suggesting we're required to produce the data
sorted by the DISTINCT ON columns. But you meant that ORDER BY clause is
required for well-defined result, which limits our options when picking
an index. Right?

>> So if an ORDER BY is not specified, why couldn't we pick an
>> arbitrary ordering matching based on available indexes?
> 
> The case is not of any real-world use, and so I'm unwilling to
> expend the large amount of coding effort that would be needed to make
> the planner behave this way.
> 

I don't think the "no real-world use" is so clear. Imagine for example a
denormalized table where the user knows that for a given ID, the other
columns match too. Then
   SELECT DISTINCT ON (id), x, y, z FROM t;   SELECT DISTINCT ON (id), x, y, z FROM t ORDER BY id, x, y, z;   SELECT
DISTINCTid, x, y, z, FROM t;
 

will all produce the same result, except that the first case only needs
compare values in "id". If sorts on "x", "y" and "z" are expensive (say,
because those are text columns with non-C collations), that may be quite
a difference.

That being said, I'm not sure how much code would this be, and how much
overhead would it mean.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Takhir Fakhrutdinov
Date:
Subject: Re: [BUGS] BUG #14819: postgres_fwd could not load library
Next
From: matioli.matheus@gmail.com
Date:
Subject: [BUGS] BUG #14820: Standby crash with "could not access status oftransaction" (for track_commit_timestamp)