Home > mailing lists

Re: Limits in subqueries... - Mailing list pgsql-general

From	Tom Lane
Subject	Re: Limits in subqueries...
Date	June 22, 2000 12:24:33
Msg-id	9000.961690991@sss.pgh.pa.us Whole thread Raw
In response to	Limits in subqueries... (Martijn van Oosterhout <kleptog@cupid.suninternet.com>)
List	pgsql-general

Tree view

Martijn van Oosterhout <kleptog@cupid.suninternet.com> writes:
> I want, for each customer, to get the serial of the the
> highest timestamp for that customer, and for duplicate
> timestamps, the highest serial.

If you will take a non-SQL-standard solution, this will work:

select distinct on (custid) custid, serial from test
order by custid, stamp desc, serial desc;

(In pre-7.0, omit parens around distinct-on value.)  Basically this
works by ordering the data in the right way and then taking just the
first row out of each group with the same custid.

It's a lot harder if you want to be SQL-compliant ...

> The way I thought how to do that would be:

> select custid, serial
> from test b
> where serial =
>    (select serial
>     from test a
>     where a.custid = b.custid
>     order by serial desc limit 1);

> but thats not allowed.

Nope.  Subselects can't have order-by clauses; SQL thinks you shouldn't
do anything as grotty as depending on the order in which tuples are
processed.  The spec lets you sort the final output of a SELECT, but not
any intermediate steps.

> The best I could do is:

> select custid, max(serial)
> from test
> where (custid,stamp) in
>    (select custid, max(stamp)
>     from test
>     group by custid)
> group by custid;

> Which is ugly and only works in this case because
> max does what I want. It has a horrible plan though:

> Aggregate  (cost=1.36 rows=11 width=8)
>   ->  Group  (cost=1.36 rows=11 width=8)
>         ->  Sort  (cost=1.36 rows=11 width=8)
>               ->  Seq Scan on test  (cost=1.36 rows=11 width=8)
>                     SubPlan
>                       ->  Aggregate  (cost=1.36 rows=11 width=12)
>                             ->  Group  (cost=1.36 rows=11 width=12)
>                                   ->  Sort  (cost=1.36 rows=11 width=12)
>                                         ->  Seq Scan on test  (cost=1.36 rows=11 width=12)

> In the more general case, this won't work.

It'd work as well as your other solution, seems to me, since max()
should be able to do the same calculation that "order by" would.
What's the more general case you are visualizing?

> Is there a better way of doing this? Does postgres 7.0 do
> better in this case?

7.0 is smart enough to materialize the result of the subselect (ie,
do the subselect just once and cache the results).  That'd make
a pretty considerable difference in runtime, but it's still a lot
slower than the DISTINCT ON solution.

            regards, tom lane

pgsql-general by date:

From: "Stuart Grimshaw"
Date: 22 June 2000, 12:19:26
Subject: PHP Install with Postgres support.

From: "J.R. Belding"
Date: 22 June 2000, 12:38:25
Subject: Re: PHP Install with Postgres support.

Re: Limits in subqueries... - Mailing list pgsql-general

Previous

Next