Re: [HACKERS] DISTINCT and ORDER BY bug? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] DISTINCT and ORDER BY bug?
Date
Msg-id 20101.949943439@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] DISTINCT and ORDER BY bug?  (Don Baccus <dhogaza@pacifier.com>)
Responses Re: [HACKERS] DISTINCT and ORDER BY bug?  (Don Baccus <dhogaza@pacifier.com>)
List pgsql-hackers
Don Baccus <dhogaza@pacifier.com> writes:
SQL> select distinct x,y+1 from foo order by x+y+1
>                                           *
> ERROR at line 1:
> ORA-01791: not a SELECTed expression

Actually, that was a little unfair, since their parser no doubt parsed
"x+y+1" as (x+y)+1, leaving no common subexpression visible.  Do they
accept
select distinct x,y+1 from foo order by x+(y+1)

>>> At least, the rule is simple if you can compare expression trees.

>> I think we have something pretty similar for GROUP BY, actually,
>> so it may not be hard to make this work.

On further thought, I think the real implementation issue is that
doing SELECT DISTINCT ORDER BY requires either two sorting steps
(sort by DISTINCT fields, "uniq" filter, sort again by ORDER BY fields)
or else some very hairy logic to figure out that ORDER BY x+1
"implies" ORDER BY x.  In fact I'm not sure it does imply it
in the general case.  In your original example, the requested sort
was ORDER BY upper(x), but that doesn't guarantee that the tuples
will be ordered adequately for duplicate-x elimination.  For example,
that ORDER BY might yield
Ansel AdamsDon BaccusDON BACCUSDon BaccusJoe Blow...

which is a valid sort by upper(x), but a uniq filter on plain x
will fail to get rid of the second occurrence of "Don Baccus" as
it should.

Possibly we could make this work by implicitly expanding the ORDER BY
to "ORDER BY upper(x), x" which would ensure that the duplicate x's
are brought together.  I am not sure this will give the right results
always, but it seems promising.  We are assuming here that upper(x)
gives equal outputs for equal inputs, so it would fall down on random(x)
--- I suppose we could refuse to do this if we see a function that is
marked non-constant-foldable in pg_proc...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Thomas Lockhart
Date:
Subject: status
Next
From: Lamar Owen
Date:
Subject: PostgreSQL 7 RPMs coming soon