Home > mailing lists

Re: [HACKERS] distinct + order by - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [HACKERS] distinct + order by
Date	November 8, 1998 12:19:56
Msg-id	19294.910544819@sss.pgh.pa.us Whole thread Raw
In response to	Re: [HACKERS] distinct + order by (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [HACKERS] distinct + order by
List	pgsql-hackers

Tree view

I said:
> If we did want to make this example behave in a rational way, then
> probably the right implementation is something like
>     * sort by i,j
>     * distinct-filter on i only, being careful to keep first row
>         in each set of duplicates
>     * sort by j
> This would ensure that the final sort by j uses, for each distinct i,
> the lowest of the j-values associated with that i.  This is a totally
> arbitrary decision, but at least it will give reproducible results.

Some closer probing with "explain verbose" shows that
"SELECT DISTINCT i FROM dtest ORDER BY j" is actually transformed
into this:

Unique on i,j  (cost=1.10 size=0 width=0) ->  Sort by i,j  (cost=1.10 size=0 width=0)       ->  Seq Scan on dtest
selectingi,j  (cost=1.10 size=3 width=16)
 

This explains why you get the apparently duplicate i values --- they're
not duplicate when both i and j are considered.

It looks to me like someone tried to make the query tree builder deal
with this case in the way I suggest above, but didn't finish the job.
The "Unique" pass is being done on the wrong targets, and there's no
final sort.
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 08 November 1998, 11:22:27
Subject: Re: [HACKERS] regression tests

From: Terry Mackintosh
Date: 08 November 1998, 14:34:35
Subject: Re: [HACKERS] regression tests

Re: [HACKERS] distinct + order by - Mailing list pgsql-hackers

Previous

Next