Re: [HACKERS] distinct. Is this the correct behaviour? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] distinct. Is this the correct behaviour?
Date
Msg-id 15107.940468204@sss.pgh.pa.us
Whole thread Raw
In response to distinct. Is this the correct behaviour?  (Vince Vielhaber <vev@michvhf.com>)
Responses Re: [HACKERS] distinct. Is this the correct behaviour?  (Vince Vielhaber <vev@michvhf.com>)
List pgsql-hackers
Vince Vielhaber <vev@michvhf.com> writes:
> Is this the way distinct is supposed to work?  My intent is to give
> only one for each different value of x - like it does in the first
> distinct example.  But when order by is added for the date/time sort
> I get what you see in the second distinct example.

Yeah, I think it's a bug too.  It's not quite clear what to change,
though.

The "problem" is that nodeUnique is doing a bitwise compare across the
whole tuple, including the hidden ('junk') y column that is needed to do
the sorting.  So, because you have four different y values, you get four
rows out.

However, if we fix nodeUnique to ignore junk columns, then the result
becomes nondeterministic.  Consider
x    y
1    31    52    4

If we do "select distinct x from foo order by y" on this data, then the
order of the result depends on which of the two tuples with x=1 happens
to get chosen by the Unique filter.  This is not good.

SQL92 gets around this by allowing ORDER BY only on columns of the
targetlist, so that you are not allowed to specify this query in the
first place.

I think it is useful to allow ORDER BY on hidden columns, but maybe we
need to forbid it when DISTINCT is present.  If we do that then the
implementation of nodeUnique is OK as it stands, and the bug is that
the parser accepts an invalid query.

This is pretty closely related to the semantic problems of DISTINCT ON,
once you see that the trouble is having columns in the query that aren't
being used for (or aren't supposed to be used for) the DISTINCT check.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: New psql startup banner
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] translate function (BUG?)