Re: Limits in subqueries... - Mailing list pgsql-general

From Tom Lane
Subject Re: Limits in subqueries...
Date
Msg-id 9000.961690991@sss.pgh.pa.us
Whole thread Raw
In response to Limits in subqueries...  (Martijn van Oosterhout <kleptog@cupid.suninternet.com>)
List pgsql-general
Martijn van Oosterhout <kleptog@cupid.suninternet.com> writes:
> I want, for each customer, to get the serial of the the
> highest timestamp for that customer, and for duplicate
> timestamps, the highest serial.

If you will take a non-SQL-standard solution, this will work:

select distinct on (custid) custid, serial from test
order by custid, stamp desc, serial desc;

(In pre-7.0, omit parens around distinct-on value.)  Basically this
works by ordering the data in the right way and then taking just the
first row out of each group with the same custid.

It's a lot harder if you want to be SQL-compliant ...

> The way I thought how to do that would be:

> select custid, serial
> from test b
> where serial =
>    (select serial
>     from test a
>     where a.custid = b.custid
>     order by serial desc limit 1);

> but thats not allowed.

Nope.  Subselects can't have order-by clauses; SQL thinks you shouldn't
do anything as grotty as depending on the order in which tuples are
processed.  The spec lets you sort the final output of a SELECT, but not
any intermediate steps.

> The best I could do is:

> select custid, max(serial)
> from test
> where (custid,stamp) in
>    (select custid, max(stamp)
>     from test
>     group by custid)
> group by custid;

> Which is ugly and only works in this case because
> max does what I want. It has a horrible plan though:

> Aggregate  (cost=1.36 rows=11 width=8)
>   ->  Group  (cost=1.36 rows=11 width=8)
>         ->  Sort  (cost=1.36 rows=11 width=8)
>               ->  Seq Scan on test  (cost=1.36 rows=11 width=8)
>                     SubPlan
>                       ->  Aggregate  (cost=1.36 rows=11 width=12)
>                             ->  Group  (cost=1.36 rows=11 width=12)
>                                   ->  Sort  (cost=1.36 rows=11 width=12)
>                                         ->  Seq Scan on test  (cost=1.36 rows=11 width=12)

> In the more general case, this won't work.

It'd work as well as your other solution, seems to me, since max()
should be able to do the same calculation that "order by" would.
What's the more general case you are visualizing?

> Is there a better way of doing this? Does postgres 7.0 do
> better in this case?

7.0 is smart enough to materialize the result of the subselect (ie,
do the subselect just once and cache the results).  That'd make
a pretty considerable difference in runtime, but it's still a lot
slower than the DISTINCT ON solution.

            regards, tom lane

pgsql-general by date:

Previous
From: "Stuart Grimshaw"
Date:
Subject: PHP Install with Postgres support.
Next
From: "J.R. Belding"
Date:
Subject: Re: PHP Install with Postgres support.