Thread: Need SQL help, I'm stuck.

Need SQL help, I'm stuck.

From
Chris Albertson
Date:
Help. I seem to have a case of "brain lock" and can't figure out
something that I should know is simple.

Here is what I am trying to do.  Let's say I have a table called
T1 with columns C1, C2, C3, C4.  It contains data as follows

  a  1  abcd  dfg
  a  2  cvfr  erg
  a  3  derg  hbg
  b  1  cccc  rth
  c  1  rdvg  egt
  c  2  derf  ett

I want a SQL query that returns these rows

  a  3  derg  hbg
  b  1  cccc  rth
  c  2  derf  ett

All I can think of is

   SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;

That does not work.  What I really want is the values for C1, C3
and C4 that are associated with the row containing the maximum
value of C2 for each group of like C1 values.  I don't even need
to know what is max(C2).

Can I join the table with itself somehow?  See: "brain lock".
This should not be hard.


Thanks,


=====
Chris Albertson
  Home:   310-376-1029  chrisalbertson90278@yahoo.com
  Cell:   310-990-7550
  Office: 310-336-5189  Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

Re: Need SQL help, I'm stuck.

From
wsheldah@lexmark.com
Date:

I just had to do this, and came up with two different ways; maybe someone here
will come up with an even better one.

My first brute force attempt was:
SELECT C1, C3, C4
FROM T1
WHERE C2 = ( SELECT max(T2.C2) FROM T1 as T2 WHERE T1.C1=T2.C1);

That works, but is very slow because the subselect has to be reprocessed on
every row. I tried optimizing by replacing the subselect with an indexed temp
table:
CREATE TEMP TABLE temp_MaxC2 AS SELECT C1, max(C2) AS maxC2 FROM T1 GROUP BY C1;
CREATE INDEX tmp_idx_C1 ON temp_MaxC2 (maxC2);
SELECT C1, C3, C4 FROM T1, temp_MaxC2 TMP WHERE T1.C1= TMP.C1 and
T1.C2=TMP.maxC2;

By my benchmarks, that ran roughly four orders of magnitude faster. Temp tables
go away automatically at the end of a connection, but I'm running this under
mod_perl with Apache::DBI, which pools the connections, so there's a bit more
code to drop the table and indexes before creating them, and ignoring any errors
if they don't exist in the first place. (Wish there was an "IF EXISTS ... CREATE
..." syntax)

If anyone has a still better approach, I'd love to hear what it is.  Thanks,

--Wes Sheldahl




Chris Albertson <chrisalbertson90278%yahoo.com@interlock.lexmark.com> on
12/10/2001 04:42:54 PM

To:   pgsql-general%postgresql.org@interlock.lexmark.com
cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject:  [GENERAL] Need SQL help, I'm stuck.


Help. I seem to have a case of "brain lock" and can't figure out
something that I should know is simple.

Here is what I am trying to do.  Let's say I have a table called
T1 with columns C1, C2, C3, C4.  It contains data as follows

  a  1  abcd  dfg
  a  2  cvfr  erg
  a  3  derg  hbg
  b  1  cccc  rth
  c  1  rdvg  egt
  c  2  derf  ett

I want a SQL query that returns these rows

  a  3  derg  hbg
  b  1  cccc  rth
  c  2  derf  ett

All I can think of is

   SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;

That does not work.  What I really want is the values for C1, C3
and C4 that are associated with the row containing the maximum
value of C2 for each group of like C1 values.  I don't even need
to know what is max(C2).

Can I join the table with itself somehow?  See: "brain lock".
This should not be hard.


Thanks,


=====
Chris Albertson
  Home:   310-376-1029  chrisalbertson90278@yahoo.com
  Cell:   310-990-7550
  Office: 310-336-5189  Christopher.J.Albertson@aero.org

__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org





Re: Need SQL help, I'm stuck.

From
"Paul Wehr"
Date:
select * from t1 a
where c2=
  (select max(c2) from t1 b
  where a.c1=b.c1)

an index on c1 might be handy for this...

hth.

-paul

> Help. I seem to have a case of "brain lock" and can't figure out
>something that I should know is simple.
>
> Here is what I am trying to do.  Let's say I have a table called
> T1 with columns C1, C2, C3, C4.  It contains data as follows
>
>   a  1  abcd  dfg
>   a  2  cvfr  erg
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  1  rdvg  egt
>   c  2  derf  ett
>
> I want a SQL query that returns these rows
>
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  2  derf  ett
>
> All I can think of is
>
>    SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;
>
> That does not work.  What I really want is the values for C1, C3
> and C4 that are associated with the row containing the maximum
> value of C2 for each group of like C1 values.  I don't even need
> to know what is max(C2).
>
> Can I join the table with itself somehow?  See: "brain lock".
> This should not be hard.
>
>
> Thanks,
>
>
> =====
> Chris Albertson
>   Home:   310-376-1029  chrisalbertson90278@yahoo.com
>   Cell:   310-990-7550
>   Office: 310-336-5189  Christopher.J.Albertson@aero.org
>
> __________________________________________________
> Do You Yahoo!?
> Send your FREE holiday greetings online!
> http://greetings.yahoo.com
>
> ---------------------------(end of
> broadcast)--------------------------- TIP 1: subscribe and unsubscribe
> commands go to majordomo@postgresql.org



Re: Need SQL help, I'm stuck.

From
Stephan Szabo
Date:
On Mon, 10 Dec 2001, Chris Albertson wrote:

> Help. I seem to have a case of "brain lock" and can't figure out
> something that I should know is simple.
>
> Here is what I am trying to do.  Let's say I have a table called
> T1 with columns C1, C2, C3, C4.  It contains data as follows
>
>   a  1  abcd  dfg
>   a  2  cvfr  erg
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  1  rdvg  egt
>   c  2  derf  ett
>
> I want a SQL query that returns these rows
>
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  2  derf  ett
>
> All I can think of is
>
>    SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;
>
> That does not work.  What I really want is the values for C1, C3
> and C4 that are associated with the row containing the maximum
> value of C2 for each group of like C1 values.  I don't even need
> to know what is max(C2).
>
> Can I join the table with itself somehow?  See: "brain lock".
> This should not be hard.

Maybe something like?
select t1.c1, c3, c4 from t1, (select c1, max(c2) as c2 from t1 group by
c1) foo where t1.c1=foo.c1 and t1.c2=foo.c2;



Re: Need SQL help, I'm stuck.

From
Andrew Gould
Date:
Try using a subquery to identify the max(C2) value,
then join T1 to the result of the subselect by C1 and
limit the results where C2 equals max(C2).

I think the following should work; but I've never done
a subquery, so you may have to tweak the syntax:

select C1, C2, C3, C4 from T1,(select C1 as M1,
max(C2) as M2 from T1 group by M1) as T2
where T1.C1=T2.M1 and T1.C2=T2.M2;

Best of luck,

Andrew Gould

--- Chris Albertson <chrisalbertson90278@yahoo.com>
wrote:
> Help. I seem to have a case of "brain lock" and
> can't figure out
> something that I should know is simple.
>
> Here is what I am trying to do.  Let's say I have a
> table called
> T1 with columns C1, C2, C3, C4.  It contains data as
> follows
>
>   a  1  abcd  dfg
>   a  2  cvfr  erg
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  1  rdvg  egt
>   c  2  derf  ett
>
> I want a SQL query that returns these rows
>
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  2  derf  ett
>
> All I can think of is
>
>    SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;
>
> That does not work.  What I really want is the
> values for C1, C3
> and C4 that are associated with the row containing
> the maximum
> value of C2 for each group of like C1 values.  I
> don't even need
> to know what is max(C2).
>
> Can I join the table with itself somehow?  See:
> "brain lock".
> This should not be hard.
>
>
> Thanks,
>
>
> =====
> Chris Albertson
>   Home:   310-376-1029
> chrisalbertson90278@yahoo.com
>   Cell:   310-990-7550
>   Office: 310-336-5189
> Christopher.J.Albertson@aero.org
>
> __________________________________________________
> Do You Yahoo!?
> Send your FREE holiday greetings online!
> http://greetings.yahoo.com
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org


__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com

Re: Need SQL help, I'm stuck.

From
"Robert B. Easter"
Date:
This seems to work:

SELECT c1,c3,c4 from (select c1, max(c2) from t1 group by c1) as dt1 (c1,c2)
natural left join t1;

Bob

> I want a SQL query that returns these rows
>
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  2  derf  ett
>
> All I can think of is
>
>    SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1;
>
> That does not work.  What I really want is the values for C1, C3
> and C4 that are associated with the row containing the maximum
> value of C2 for each group of like C1 values.  I don't even need
> to know what is max(C2).
>
> Can I join the table with itself somehow?  See: "brain lock".
> This should not be hard.
>
>
> Thanks,
>
>
> =====
> Chris Albertson
>   Home:   310-376-1029  chrisalbertson90278@yahoo.com
>   Cell:   310-990-7550
>   Office: 310-336-5189  Christopher.J.Albertson@aero.org
>
> __________________________________________________
> Do You Yahoo!?
> Send your FREE holiday greetings online!
> http://greetings.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Need SQL help, I'm stuck.

From
Martijn van Oosterhout
Date:
On Mon, Dec 10, 2001 at 01:42:54PM -0800, Chris Albertson wrote:
> Help. I seem to have a case of "brain lock" and can't figure out
> something that I should know is simple.
>
> Here is what I am trying to do.  Let's say I have a table called
> T1 with columns C1, C2, C3, C4.  It contains data as follows
>
>   a  1  abcd  dfg
>   a  2  cvfr  erg
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  1  rdvg  egt
>   c  2  derf  ett
>
> I want a SQL query that returns these rows
>
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  2  derf  ett
>

How about:

select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc;
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Terrorists can only take my life. Only my government can take my freedom.

Re: Need SQL help, I'm stuck.

From
wsheldah@lexmark.com
Date:

In just eyeballing the various responses, it looks like the one using DISTINCT
ON manages to avoid using a subquery at all. Would this give it the edge in
performance? I had somehow never noticed the DISTINCT ON syntax before, this
looks very handy.

Also, my first attempt was to put the subquery in the WHERE clause, but I
noticed that several put the subquery in the FROM clause. Does putting it in the
FROM clause just run it once, with the results of the run joined to the outer
tables? It certainly seemed like putting the query in the WHERE clause was
running it for every row. Thanks,

Wes Sheldahl



Martijn van Oosterhout <kleptog%svana.org@interlock.lexmark.com> on 12/10/2001
06:33:59 PM

Please respond to Martijn van Oosterhout
      <kleptog%svana.org@interlock.lexmark.com>

To:   Chris Albertson <chrisalbertson90278%yahoo.com@interlock.lexmark.com>
cc:   pgsql-general%postgresql.org@interlock.lexmark.com (bcc: Wesley
      Sheldahl/Lex/Lexmark)
Subject:  Re: [GENERAL] Need SQL help, I'm stuck.


On Mon, Dec 10, 2001 at 01:42:54PM -0800, Chris Albertson wrote:
> Help. I seem to have a case of "brain lock" and can't figure out
> something that I should know is simple.
>
> Here is what I am trying to do.  Let's say I have a table called
> T1 with columns C1, C2, C3, C4.  It contains data as follows
>
>   a  1  abcd  dfg
>   a  2  cvfr  erg
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  1  rdvg  egt
>   c  2  derf  ett
>
> I want a SQL query that returns these rows
>
>   a  3  derg  hbg
>   b  1  cccc  rth
>   c  2  derf  ett
>

How about:

select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc;
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Terrorists can only take my life. Only my government can take my freedom.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org







Re: Need SQL help, I'm stuck.

From
Antonio Fiol Bonnín
Date:
What about this one, which also happens to give the right result?

select B.* from T1 A RIGHT JOIN T1 B on (A.C1=B.C1 AND A.C2>B.C2) where
A.c1 is null;

It is really amazing how many different ways there are to express the
same wishes in SQL...

Compared to the following ones, it is efficient:
SELECT A.* FROM T1 A WHERE NOT EXISTS (select * from T1 B where B.C2 >
A.C2 AND B.C1=A.C1);
SELECT * FROM T1 EXCEPT SELECT A.* FROM T1 A, T1 B where A.C1=B.C1 AND
A.C2<B.C2;

Though, the following is AMAZINGLY efficient. Only a seq scan, plus some
post processing.

select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc;


I think that distinct/order by combination is best suited for your needs.

Does anyone know of a "master source of knowledge" where one could learn
to choose an appropriate formulation for a SQL query without trying all
of the imaginable possibilities with EXPLAIN?

Thank you all!

Antonio


wsheldah@lexmark.com wrote:

>
>In just eyeballing the various responses, it looks like the one using DISTINCT
>ON manages to avoid using a subquery at all. Would this give it the edge in
>performance? I had somehow never noticed the DISTINCT ON syntax before, this
>looks very handy.
>
>Also, my first attempt was to put the subquery in the WHERE clause, but I
>noticed that several put the subquery in the FROM clause. Does putting it in the
>FROM clause just run it once, with the results of the run joined to the outer
>tables? It certainly seemed like putting the query in the WHERE clause was
>running it for every row. Thanks,
>
>Wes Sheldahl
>
>
>
>Martijn van Oosterhout <kleptog%svana.org@interlock.lexmark.com> on 12/10/2001
>06:33:59 PM
>
>Please respond to Martijn van Oosterhout
>      <kleptog%svana.org@interlock.lexmark.com>
>
>To:   Chris Albertson <chrisalbertson90278%yahoo.com@interlock.lexmark.com>
>cc:   pgsql-general%postgresql.org@interlock.lexmark.com (bcc: Wesley
>      Sheldahl/Lex/Lexmark)
>Subject:  Re: [GENERAL] Need SQL help, I'm stuck.
>
>
>On Mon, Dec 10, 2001 at 01:42:54PM -0800, Chris Albertson wrote:
>
>>Help. I seem to have a case of "brain lock" and can't figure out
>>something that I should know is simple.
>>
>>Here is what I am trying to do.  Let's say I have a table called
>>T1 with columns C1, C2, C3, C4.  It contains data as follows
>>
>>  a  1  abcd  dfg
>>  a  2  cvfr  erg
>>  a  3  derg  hbg
>>  b  1  cccc  rth
>>  c  1  rdvg  egt
>>  c  2  derf  ett
>>
>>I want a SQL query that returns these rows
>>
>>  a  3  derg  hbg
>>  b  1  cccc  rth
>>  c  2  derf  ett
>>
>
>How about:
>
>select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc;
>--
>Martijn van Oosterhout <kleptog@svana.org>
>http://svana.org/kleptog/
>




Re: Need SQL help, I'm stuck.

From
Stephan Szabo
Date:
On Tue, 11 Dec 2001 wsheldah@lexmark.com wrote:

>
>
> In just eyeballing the various responses, it looks like the one using DISTINCT
> ON manages to avoid using a subquery at all. Would this give it the edge in
> performance? I had somehow never noticed the DISTINCT ON syntax before, this
> looks very handy.

Distinct on often performs better than subquery options, however it's a
PostgreSQL addition, so you need to be careful if you want to run the
query on anything else :)

> Also, my first attempt was to put the subquery in the WHERE clause, but I
> noticed that several put the subquery in the FROM clause. Does putting it in the
> FROM clause just run it once, with the results of the run joined to the outer
> tables? It certainly seemed like putting the query in the WHERE clause was
> running it for every row. Thanks,

I've generally assumed that subselects in from are effectively a cursor
that gets scanned as opposed to running the entire query for each row.
I haven't looked to confirm that, but it seems reasonable :)