Thread: Problem with ORDER BY and DISTINCT ON

Problem with ORDER BY and DISTINCT ON

From
Steve Midgley
Date:
Hi,

I'm a little baffled. I'm trying to generate a SQL statement that 
issues a DISTINCT ON using the same values as my ORDER BY statement. 
I'm using a somewhat complex CASE statement in my ORDER BY clause. I'm 
on Pg 8.2. Here is some SQL to get you started at seeing my problem:

------------------

drop table if exists property;
create table property
( id serial,  state varchar(255),  search_rate_max decimal(8,2),  data_priority_code varchar(255)
);

SELECT DISTINCT ON
("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max", property.id) property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY  "property"."state",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",  property.id
LIMIT 10 OFFSET 0

----------------
RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
BY expressions
SQL state: 42P10
----------------

Now if you run this statement it works

------------------

SELECT DISTINCT ON
("property"."state", property.id) property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY  "property"."state",  property.id
LIMIT 10 OFFSET 0

------------------

However if you run this statement it ALSO works, which tells me it's 
not just my CASE statements that are messing things up (note in this 
example, I just removed the primary key "property.id" from the ORDER BY 
and DISTINCT ON clauses:

-------------------

SELECT DISTINCT ON
("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max" ) property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY  "property"."state",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max"
LIMIT 10 OFFSET 0

--------------------
RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
BY expressions
SQL state: 42P10
--------------------

Finally, if you run this statement it works fine (removing one of the 
duplicate search_rate_max statements):

--------------------

SELECT DISTINCT ON
("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max", property.id) property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY  "property"."state",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",  property.id LIMIT 10 OFFSET 0

--------------------

What's going on here? Am I doing something that isn't legitimate SQL? I 
can't see why having a duplicate CASE statement should foul things up 
like this? It's pretty clear (from additional testing not included in 
this email) that the duplicate "search_rate_max" CASE is causing the 
problem.

Thanks for any advice or suggestions on how to get this to run 
correctly. Is this a bug?

Basically I'm doing this as an optimization - I can get much better 
performance running the DISTINCT ON in some circumstances than using 
DISTINCT, but the edge case above is breaking my tests and preventing 
me from implementing the idea. The code is generated by an application 
layer which is not really paying attention to whether or not the two 
CASE statements apply to the same field or not (sometimes they do 
sometimes they don't)..

Thanks!

Steve



Re: Problem with ORDER BY and DISTINCT ON

From
Tom Lane
Date:
Steve Midgley <public@misuse.org> writes:
> SELECT DISTINCT ON
> ("property"."state",
>   CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
> END,"search_rate_max",
>   CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
> END,"search_rate_max",
>   property.id)
>   property.id
> FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
> ORDER BY
>    "property"."state",
>    CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
> END,"search_rate_max",
>    CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
> END,"search_rate_max",
>    property.id
> LIMIT 10 OFFSET 0

> RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
> BY expressions

Interesting.  You realize of course that sorting by the same expression
twice is completely redundant?  I haven't dug through the code yet but
I think what is happening is that ORDER BY knows that and gets rid of
the duplicate entries while DISTINCT ON fails to do so.  Or some story
approximately like that.  It should be fixed, but the immediate
workaround is just to get rid of the redundant sort keys:

SELECT DISTINCT ON
("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END, "search_rate_max",
--  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
--  "search_rate_max", property.id) property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY  "property"."state",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,  "search_rate_max",
--   CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,
--   "search_rate_max",  property.id
LIMIT 10 OFFSET 0

BTW, why are you bothering with the CASEs at all?  Null values of
search_rate_max would sort high already.
        regards, tom lane


Re: Problem with ORDER BY and DISTINCT ON

From
Steve Midgley
Date:
At 07:29 AM 7/16/2008, Tom Lane wrote:
>Steve Midgley <public@misuse.org> writes:
>Interesting.  You realize of course that sorting by the same 
>expression
>twice is completely redundant?  I haven't dug through the code yet but
Thanks Tom. Yeah, I was a little embarrassed to throw this code up on 
the list b/c it's pretty weak. It's basically machine written - so 
sometimes the CASE features a different "THEN X ELSE Y" order.

>I think what is happening is that ORDER BY knows that and gets rid of
>the duplicate entries while DISTINCT ON fails to do so.  Or some story
>approximately like that.  It should be fixed, but the immediate
>workaround is just to get rid of the redundant sort keys:
I don't know if this will help track down the problem, but I figured 
out last night that ORDER BY is totally fine with having TWO duplicate 
entries, so long as I only put ONE entry in the DISTINCT ON area.

Of course removing the duplicate from both areas is the correct 
solution and I broke down and hacked that into the auto-sql-writing 
code and so my immediate problem is solved. I'm happy to file this as a 
ticket for Pg (please point me to your ticket tool as I've never used 
it). This is not a very big deal but Pg has such a high compliance with 
wacky-but-valid SQL it does seem like it should be fixed just because. 
Let me know if I can help on that.

Best,

Steve



Re: Problem with ORDER BY and DISTINCT ON

From
Tom Lane
Date:
Steve Midgley <public@misuse.org> writes:
> At 07:29 AM 7/16/2008, Tom Lane wrote:
>> I think what is happening is that ORDER BY knows that and gets rid of
>> the duplicate entries while DISTINCT ON fails to do so.

> Of course removing the duplicate from both areas is the correct 
> solution and I broke down and hacked that into the auto-sql-writing 
> code and so my immediate problem is solved. I'm happy to file this as a 
> ticket for Pg (please point me to your ticket tool as I've never used 
> it). This is not a very big deal but Pg has such a high compliance with 
> wacky-but-valid SQL it does seem like it should be fixed just because. 

I've applied a patch for this to CVS HEAD.  I doubt we'll try to fix it
in the back branches, though --- it's too much of a corner case to be
worth taking any risk of breaking other stuff.
        regards, tom lane


Re: Problem with ORDER BY and DISTINCT ON

From
Steve Midgley
Date:
At 03:51 PM 7/31/2008, Tom Lane wrote:
>Steve Midgley <public@misuse.org> writes:
> > At 07:29 AM 7/16/2008, Tom Lane wrote:
> >> I think what is happening is that ORDER BY knows that and gets rid 
> of
> >> the duplicate entries while DISTINCT ON fails to do so.
>
> > Of course removing the duplicate from both areas is the correct
> > solution and I broke down and hacked that into the auto-sql-writing 
>
> > code and so my immediate problem is solved. I'm happy to file this 
> as a
> > ticket for Pg (please point me to your ticket tool as I've never 
> used
> > it). This is not a very big deal but Pg has such a high compliance 
> with
> > wacky-but-valid SQL it does seem like it should be fixed just 
> because.
>
>I've applied a patch for this to CVS HEAD.  I doubt we'll try to fix 
>it
>in the back branches, though --- it's too much of a corner case to be
>worth taking any risk of breaking other stuff.
>
>                         regards, tom lane
Hey Tom,

That's really great - thanks. I'm impressed how quickly you are fixing 
this obscure issue. I came from MS SQL and it would be hard for me to 
put into words how much of a better job you all are doing on Pg.

Best,

Steve