Problem with ORDER BY and DISTINCT ON - Mailing list pgsql-sql

From Steve Midgley
Subject Problem with ORDER BY and DISTINCT ON
Date
Msg-id 20080716073955.EB1DF64FCBD@postgresql.org
Whole thread Raw
Responses Re: Problem with ORDER BY and DISTINCT ON  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
Hi,

I'm a little baffled. I'm trying to generate a SQL statement that 
issues a DISTINCT ON using the same values as my ORDER BY statement. 
I'm using a somewhat complex CASE statement in my ORDER BY clause. I'm 
on Pg 8.2. Here is some SQL to get you started at seeing my problem:

------------------

drop table if exists property;
create table property
( id serial,  state varchar(255),  search_rate_max decimal(8,2),  data_priority_code varchar(255)
);

SELECT DISTINCT ON
("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max", property.id) property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY  "property"."state",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",  property.id
LIMIT 10 OFFSET 0

----------------
RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
BY expressions
SQL state: 42P10
----------------

Now if you run this statement it works

------------------

SELECT DISTINCT ON
("property"."state", property.id) property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY  "property"."state",  property.id
LIMIT 10 OFFSET 0

------------------

However if you run this statement it ALSO works, which tells me it's 
not just my CASE statements that are messing things up (note in this 
example, I just removed the primary key "property.id" from the ORDER BY 
and DISTINCT ON clauses:

-------------------

SELECT DISTINCT ON
("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max" ) property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY  "property"."state",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max"
LIMIT 10 OFFSET 0

--------------------
RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER 
BY expressions
SQL state: 42P10
--------------------

Finally, if you run this statement it works fine (removing one of the 
duplicate search_rate_max statements):

--------------------

SELECT DISTINCT ON
("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max", property.id) property.id
FROM property WHERE (("property"."data_priority_code" IS NOT NULL))
ORDER BY  "property"."state",  CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 
END,"search_rate_max",  property.id LIMIT 10 OFFSET 0

--------------------

What's going on here? Am I doing something that isn't legitimate SQL? I 
can't see why having a duplicate CASE statement should foul things up 
like this? It's pretty clear (from additional testing not included in 
this email) that the duplicate "search_rate_max" CASE is causing the 
problem.

Thanks for any advice or suggestions on how to get this to run 
correctly. Is this a bug?

Basically I'm doing this as an optimization - I can get much better 
performance running the DISTINCT ON in some circumstances than using 
DISTINCT, but the edge case above is breaking my tests and preventing 
me from implementing the idea. The code is generated by an application 
layer which is not really paying attention to whether or not the two 
CASE statements apply to the same field or not (sometimes they do 
sometimes they don't)..

Thanks!

Steve



pgsql-sql by date:

Previous
From: "A. Kretschmer"
Date:
Subject: Re: How to GROUP results BY month
Next
From: Ivan Sergio Borgonovo
Date:
Subject: integrity check and visibility was: COPY equivalent for updates