On Wed, Dec 11, 2013 at 4:56 AM, hubert depesz lubaczewski
<depesz@depesz.com> wrote:
> before I'll go any further - this is only thought-experiment. I do not
> plan to use such queries in real-life applications. I was just presented
> with a question that I can't answer in any logical way.
>
> There are two simple queries:
>
> #v+
> with rok2005 (miesiac,wynik) as (VALUES (1,1),(2,2) ,(4,4),(5,NULL),(6,6))
> ,rok2004 (miesiac,wynik) as (VALUES (1,3) ,(3,3),(4,5) ,(6,6))
> SELECT
> distinct on (miesiac) *
> FROM (
> SELECT miesiac, 2005 as rok, wynik FROM rok2005
> union all
> SELECT miesiac, 2004 as rok, wynik FROM rok2004
> ) as polaczone
> ORDER BY miesiac, wynik desc;
> #v-
>
> #v+
> with rok2005 (miesiac,wynik) as (VALUES (1,1),(2,2) ,(4,4),(5,NULL),(6,6))
> ,rok2004 (miesiac,wynik) as (VALUES (1,3) ,(3,3),(4,5) ,(6,6))
> SELECT
> distinct on (miesiac) *
> FROM (
> SELECT miesiac, 2004 as rok, wynik FROM rok2004
> union all
> SELECT miesiac, 2005 as rok, wynik FROM rok2005
> ) as polaczone
> ORDER BY miesiac, wynik desc;
> #v-
>
> They differ only in order of queries in union all part.
>
> The thing is that they return the same result. Why isn't one of them returning
> "2005" for 6th "miesiac"?
The query planner sees that in order for the output ordering to match
the ORDER BY clause, it's got to sort by miesiac, wynik desc. The
DISTINCT ON clause can be implemented very cheaply after that - every
time the value of miesiac changes, it emits only the first of the rows
with the new value. So it's a good plan. However, because the sort
happens before the unique step, the results you get are dependent on
what order the sort happens to emit the rows. Our sort algorithms are
not stable, so there's no particular guarantee about the order in
which rows will pop out, beyond the fact that they must obey the sort
key.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company