Thread:

From
J C Lawrence
Date:
Given a table ala:

  CREATE TABLE "node" (
    "nid" integer NOT NULL
    "type" integer NOT NULL;
    "version" integer NOT NULL,
  );

Where version defines is the count of the number of revisions to a
given nid/type tuple.

How could I construct a query which extracts only the rows with the
largest version number for each nid/type combination?  eg given:

  nid    type   version
  ---------------------
   1      1       5
   1      1       4
   1      1       3
   1      1       2
   1      1       1
   2      3       2
   2      3       1
   3      7       4
   3      7       3
   3      7       2
   3      7       1

I want a query which will return:

  nid    type   version
  ---------------------
   1      1       5
   2      3       2
   3      7       4

Is there a way without doing a temporary table and doing a bunch of
SELECT INTOs>

--
J C Lawrence
---------(*)                Satan, oscillate my metallic sonatas.
claw@kanga.nu               He lived as a devil, eh?
http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.

Re:

From
"Thalis A. Kalfigopoulos"
Date:
It's a simple select/group by:

select nid,type,max(version) from node group by nid,type;

It'd help reading the postgresql SQL tutorial to freshen you up.

cheers,
thalis


On Sun, 7 Oct 2001, J C Lawrence wrote:

>
> Given a table ala:
>
>   CREATE TABLE "node" (
>     "nid" integer NOT NULL
>     "type" integer NOT NULL;
>     "version" integer NOT NULL,
>   );
>
> Where version defines is the count of the number of revisions to a
> given nid/type tuple.
>
> How could I construct a query which extracts only the rows with the
> largest version number for each nid/type combination?  eg given:
>
>   nid    type   version
>   ---------------------
>    1      1       5
>    1      1       4
>    1      1       3
>    1      1       2
>    1      1       1
>    2      3       2
>    2      3       1
>    3      7       4
>    3      7       3
>    3      7       2
>    3      7       1
>
> I want a query which will return:
>
>   nid    type   version
>   ---------------------
>    1      1       5
>    2      3       2
>    3      7       4
>
> Is there a way without doing a temporary table and doing a bunch of
> SELECT INTOs>


Re:

From
Tom Lane
Date:
"Thalis A. Kalfigopoulos" <thalis@cs.pitt.edu> writes:
> It's a simple select/group by:
> select nid,type,max(version) from node group by nid,type;

That solves the problem as stated, but most likely there are more
columns in the table and what's really wanted is the whole row
containing the max version number.  The above doesn't work in that
case.

AFAIK the only way to solve the extended problem in standard SQL is

select * from node outside
where version = (select max(version) from node inside
where outside.nid = inside.nid and outside.type = inside.type);

The fact that this is standard is the only thing going for it :-(.
It's ugly, it's likely to be horribly slow, and it gets much worse
if you might have ties in the "version" column that you need to break
somehow (eg, by then choosing the latest timestamp among the rows
with maximal version).

If you don't mind nonstandard SQL, then this is the kind of problem
that DISTINCT ON was invented for:

select distinct on (nid,type) * from node
order by nid, type, version desc;

which gets the whole job done with one sort-and-uniq pass.  See the
weather-report example in the SELECT reference page.

            regards, tom lane

Re:

From
J C Lawrence
Date:
On Mon, 08 Oct 2001 10:33:01 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Thalis A. Kalfigopoulos" <thalis@cs.pitt.edu> writes:

>> It's a simple select/group by: select nid,type,max(version) from
>> node group by nid,type;

> That solves the problem as stated, but most likely there are more
> columns in the table and what's really wanted is the whole row
> containing the max version number.  The above doesn't work in that
> case.

The problem is you're both right.

> AFAIK the only way to solve the extended problem in standard SQL
> is

> select * from node outside where version = (select max(version)
> from node inside where outside.nid = inside.nid and outside.type =
> inside.type);

Unfortunately I need to retain backward compatibility with MySQL
which doesn't support sub-selects.

At this (early) point I think I can refactor the tables
appropriately, use the simple GROUP BY Thalis suggested, and then
use that for an inner join to get what I really want.  I don't know
what the performance curves of temp tables are like, but at least in
quick testing under psql it works...

> which gets the whole job done with one sort-and-uniq pass.  See
> the weather-report example in the SELECT reference page.

Cute.  Thanks, I had not noticed that.

--
J C Lawrence
---------(*)                Satan, oscillate my metallic sonatas.
claw@kanga.nu               He lived as a devil, eh?
http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.