Problem with ORDER BY and DISTINCT ON - Mailing list pgsql-sql
From | Steve Midgley |
---|---|
Subject | Problem with ORDER BY and DISTINCT ON |
Date | |
Msg-id | 20080716073955.EB1DF64FCBD@postgresql.org Whole thread Raw |
Responses |
Re: Problem with ORDER BY and DISTINCT ON
|
List | pgsql-sql |
Hi, I'm a little baffled. I'm trying to generate a SQL statement that issues a DISTINCT ON using the same values as my ORDER BY statement. I'm using a somewhat complex CASE statement in my ORDER BY clause. I'm on Pg 8.2. Here is some SQL to get you started at seeing my problem: ------------------ drop table if exists property; create table property ( id serial, state varchar(255), search_rate_max decimal(8,2), data_priority_code varchar(255) ); SELECT DISTINCT ON ("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max", property.id) property.id FROM property WHERE (("property"."data_priority_code" IS NOT NULL)) ORDER BY "property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max", property.id LIMIT 10 OFFSET 0 ---------------- RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions SQL state: 42P10 ---------------- Now if you run this statement it works ------------------ SELECT DISTINCT ON ("property"."state", property.id) property.id FROM property WHERE (("property"."data_priority_code" IS NOT NULL)) ORDER BY "property"."state", property.id LIMIT 10 OFFSET 0 ------------------ However if you run this statement it ALSO works, which tells me it's not just my CASE statements that are messing things up (note in this example, I just removed the primary key "property.id" from the ORDER BY and DISTINCT ON clauses: ------------------- SELECT DISTINCT ON ("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max" ) property.id FROM property WHERE (("property"."data_priority_code" IS NOT NULL)) ORDER BY "property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max" LIMIT 10 OFFSET 0 -------------------- RESULTS: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions SQL state: 42P10 -------------------- Finally, if you run this statement it works fine (removing one of the duplicate search_rate_max statements): -------------------- SELECT DISTINCT ON ("property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max", property.id) property.id FROM property WHERE (("property"."data_priority_code" IS NOT NULL)) ORDER BY "property"."state", CASE WHEN ("search_rate_max" IS NOT NULL) THEN 1 ELSE 2 END,"search_rate_max", property.id LIMIT 10 OFFSET 0 -------------------- What's going on here? Am I doing something that isn't legitimate SQL? I can't see why having a duplicate CASE statement should foul things up like this? It's pretty clear (from additional testing not included in this email) that the duplicate "search_rate_max" CASE is causing the problem. Thanks for any advice or suggestions on how to get this to run correctly. Is this a bug? Basically I'm doing this as an optimization - I can get much better performance running the DISTINCT ON in some circumstances than using DISTINCT, but the edge case above is breaking my tests and preventing me from implementing the idea. The code is generated by an application layer which is not really paying attention to whether or not the two CASE statements apply to the same field or not (sometimes they do sometimes they don't).. Thanks! Steve