Thread: Need SQL help, I'm stuck.
Help. I seem to have a case of "brain lock" and can't figure out something that I should know is simple. Here is what I am trying to do. Let's say I have a table called T1 with columns C1, C2, C3, C4. It contains data as follows a 1 abcd dfg a 2 cvfr erg a 3 derg hbg b 1 cccc rth c 1 rdvg egt c 2 derf ett I want a SQL query that returns these rows a 3 derg hbg b 1 cccc rth c 2 derf ett All I can think of is SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1; That does not work. What I really want is the values for C1, C3 and C4 that are associated with the row containing the maximum value of C2 for each group of like C1 values. I don't even need to know what is max(C2). Can I join the table with itself somehow? See: "brain lock". This should not be hard. Thanks, ===== Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org __________________________________________________ Do You Yahoo!? Send your FREE holiday greetings online! http://greetings.yahoo.com
I just had to do this, and came up with two different ways; maybe someone here will come up with an even better one. My first brute force attempt was: SELECT C1, C3, C4 FROM T1 WHERE C2 = ( SELECT max(T2.C2) FROM T1 as T2 WHERE T1.C1=T2.C1); That works, but is very slow because the subselect has to be reprocessed on every row. I tried optimizing by replacing the subselect with an indexed temp table: CREATE TEMP TABLE temp_MaxC2 AS SELECT C1, max(C2) AS maxC2 FROM T1 GROUP BY C1; CREATE INDEX tmp_idx_C1 ON temp_MaxC2 (maxC2); SELECT C1, C3, C4 FROM T1, temp_MaxC2 TMP WHERE T1.C1= TMP.C1 and T1.C2=TMP.maxC2; By my benchmarks, that ran roughly four orders of magnitude faster. Temp tables go away automatically at the end of a connection, but I'm running this under mod_perl with Apache::DBI, which pools the connections, so there's a bit more code to drop the table and indexes before creating them, and ignoring any errors if they don't exist in the first place. (Wish there was an "IF EXISTS ... CREATE ..." syntax) If anyone has a still better approach, I'd love to hear what it is. Thanks, --Wes Sheldahl Chris Albertson <chrisalbertson90278%yahoo.com@interlock.lexmark.com> on 12/10/2001 04:42:54 PM To: pgsql-general%postgresql.org@interlock.lexmark.com cc: (bcc: Wesley Sheldahl/Lex/Lexmark) Subject: [GENERAL] Need SQL help, I'm stuck. Help. I seem to have a case of "brain lock" and can't figure out something that I should know is simple. Here is what I am trying to do. Let's say I have a table called T1 with columns C1, C2, C3, C4. It contains data as follows a 1 abcd dfg a 2 cvfr erg a 3 derg hbg b 1 cccc rth c 1 rdvg egt c 2 derf ett I want a SQL query that returns these rows a 3 derg hbg b 1 cccc rth c 2 derf ett All I can think of is SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1; That does not work. What I really want is the values for C1, C3 and C4 that are associated with the row containing the maximum value of C2 for each group of like C1 values. I don't even need to know what is max(C2). Can I join the table with itself somehow? See: "brain lock". This should not be hard. Thanks, ===== Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org __________________________________________________ Do You Yahoo!? Send your FREE holiday greetings online! http://greetings.yahoo.com ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
select * from t1 a where c2= (select max(c2) from t1 b where a.c1=b.c1) an index on c1 might be handy for this... hth. -paul > Help. I seem to have a case of "brain lock" and can't figure out >something that I should know is simple. > > Here is what I am trying to do. Let's say I have a table called > T1 with columns C1, C2, C3, C4. It contains data as follows > > a 1 abcd dfg > a 2 cvfr erg > a 3 derg hbg > b 1 cccc rth > c 1 rdvg egt > c 2 derf ett > > I want a SQL query that returns these rows > > a 3 derg hbg > b 1 cccc rth > c 2 derf ett > > All I can think of is > > SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1; > > That does not work. What I really want is the values for C1, C3 > and C4 that are associated with the row containing the maximum > value of C2 for each group of like C1 values. I don't even need > to know what is max(C2). > > Can I join the table with itself somehow? See: "brain lock". > This should not be hard. > > > Thanks, > > > ===== > Chris Albertson > Home: 310-376-1029 chrisalbertson90278@yahoo.com > Cell: 310-990-7550 > Office: 310-336-5189 Christopher.J.Albertson@aero.org > > __________________________________________________ > Do You Yahoo!? > Send your FREE holiday greetings online! > http://greetings.yahoo.com > > ---------------------------(end of > broadcast)--------------------------- TIP 1: subscribe and unsubscribe > commands go to majordomo@postgresql.org
On Mon, 10 Dec 2001, Chris Albertson wrote: > Help. I seem to have a case of "brain lock" and can't figure out > something that I should know is simple. > > Here is what I am trying to do. Let's say I have a table called > T1 with columns C1, C2, C3, C4. It contains data as follows > > a 1 abcd dfg > a 2 cvfr erg > a 3 derg hbg > b 1 cccc rth > c 1 rdvg egt > c 2 derf ett > > I want a SQL query that returns these rows > > a 3 derg hbg > b 1 cccc rth > c 2 derf ett > > All I can think of is > > SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1; > > That does not work. What I really want is the values for C1, C3 > and C4 that are associated with the row containing the maximum > value of C2 for each group of like C1 values. I don't even need > to know what is max(C2). > > Can I join the table with itself somehow? See: "brain lock". > This should not be hard. Maybe something like? select t1.c1, c3, c4 from t1, (select c1, max(c2) as c2 from t1 group by c1) foo where t1.c1=foo.c1 and t1.c2=foo.c2;
Try using a subquery to identify the max(C2) value, then join T1 to the result of the subselect by C1 and limit the results where C2 equals max(C2). I think the following should work; but I've never done a subquery, so you may have to tweak the syntax: select C1, C2, C3, C4 from T1,(select C1 as M1, max(C2) as M2 from T1 group by M1) as T2 where T1.C1=T2.M1 and T1.C2=T2.M2; Best of luck, Andrew Gould --- Chris Albertson <chrisalbertson90278@yahoo.com> wrote: > Help. I seem to have a case of "brain lock" and > can't figure out > something that I should know is simple. > > Here is what I am trying to do. Let's say I have a > table called > T1 with columns C1, C2, C3, C4. It contains data as > follows > > a 1 abcd dfg > a 2 cvfr erg > a 3 derg hbg > b 1 cccc rth > c 1 rdvg egt > c 2 derf ett > > I want a SQL query that returns these rows > > a 3 derg hbg > b 1 cccc rth > c 2 derf ett > > All I can think of is > > SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1; > > That does not work. What I really want is the > values for C1, C3 > and C4 that are associated with the row containing > the maximum > value of C2 for each group of like C1 values. I > don't even need > to know what is max(C2). > > Can I join the table with itself somehow? See: > "brain lock". > This should not be hard. > > > Thanks, > > > ===== > Chris Albertson > Home: 310-376-1029 > chrisalbertson90278@yahoo.com > Cell: 310-990-7550 > Office: 310-336-5189 > Christopher.J.Albertson@aero.org > > __________________________________________________ > Do You Yahoo!? > Send your FREE holiday greetings online! > http://greetings.yahoo.com > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org __________________________________________________ Do You Yahoo!? Send your FREE holiday greetings online! http://greetings.yahoo.com
This seems to work: SELECT c1,c3,c4 from (select c1, max(c2) from t1 group by c1) as dt1 (c1,c2) natural left join t1; Bob > I want a SQL query that returns these rows > > a 3 derg hbg > b 1 cccc rth > c 2 derf ett > > All I can think of is > > SELECT C1, max(C2), C3, C4 FROM T1 GROUP BY C1; > > That does not work. What I really want is the values for C1, C3 > and C4 that are associated with the row containing the maximum > value of C2 for each group of like C1 values. I don't even need > to know what is max(C2). > > Can I join the table with itself somehow? See: "brain lock". > This should not be hard. > > > Thanks, > > > ===== > Chris Albertson > Home: 310-376-1029 chrisalbertson90278@yahoo.com > Cell: 310-990-7550 > Office: 310-336-5189 Christopher.J.Albertson@aero.org > > __________________________________________________ > Do You Yahoo!? > Send your FREE holiday greetings online! > http://greetings.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Mon, Dec 10, 2001 at 01:42:54PM -0800, Chris Albertson wrote: > Help. I seem to have a case of "brain lock" and can't figure out > something that I should know is simple. > > Here is what I am trying to do. Let's say I have a table called > T1 with columns C1, C2, C3, C4. It contains data as follows > > a 1 abcd dfg > a 2 cvfr erg > a 3 derg hbg > b 1 cccc rth > c 1 rdvg egt > c 2 derf ett > > I want a SQL query that returns these rows > > a 3 derg hbg > b 1 cccc rth > c 2 derf ett > How about: select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc; -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Terrorists can only take my life. Only my government can take my freedom.
In just eyeballing the various responses, it looks like the one using DISTINCT ON manages to avoid using a subquery at all. Would this give it the edge in performance? I had somehow never noticed the DISTINCT ON syntax before, this looks very handy. Also, my first attempt was to put the subquery in the WHERE clause, but I noticed that several put the subquery in the FROM clause. Does putting it in the FROM clause just run it once, with the results of the run joined to the outer tables? It certainly seemed like putting the query in the WHERE clause was running it for every row. Thanks, Wes Sheldahl Martijn van Oosterhout <kleptog%svana.org@interlock.lexmark.com> on 12/10/2001 06:33:59 PM Please respond to Martijn van Oosterhout <kleptog%svana.org@interlock.lexmark.com> To: Chris Albertson <chrisalbertson90278%yahoo.com@interlock.lexmark.com> cc: pgsql-general%postgresql.org@interlock.lexmark.com (bcc: Wesley Sheldahl/Lex/Lexmark) Subject: Re: [GENERAL] Need SQL help, I'm stuck. On Mon, Dec 10, 2001 at 01:42:54PM -0800, Chris Albertson wrote: > Help. I seem to have a case of "brain lock" and can't figure out > something that I should know is simple. > > Here is what I am trying to do. Let's say I have a table called > T1 with columns C1, C2, C3, C4. It contains data as follows > > a 1 abcd dfg > a 2 cvfr erg > a 3 derg hbg > b 1 cccc rth > c 1 rdvg egt > c 2 derf ett > > I want a SQL query that returns these rows > > a 3 derg hbg > b 1 cccc rth > c 2 derf ett > How about: select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc; -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Terrorists can only take my life. Only my government can take my freedom. ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
What about this one, which also happens to give the right result? select B.* from T1 A RIGHT JOIN T1 B on (A.C1=B.C1 AND A.C2>B.C2) where A.c1 is null; It is really amazing how many different ways there are to express the same wishes in SQL... Compared to the following ones, it is efficient: SELECT A.* FROM T1 A WHERE NOT EXISTS (select * from T1 B where B.C2 > A.C2 AND B.C1=A.C1); SELECT * FROM T1 EXCEPT SELECT A.* FROM T1 A, T1 B where A.C1=B.C1 AND A.C2<B.C2; Though, the following is AMAZINGLY efficient. Only a seq scan, plus some post processing. select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc; I think that distinct/order by combination is best suited for your needs. Does anyone know of a "master source of knowledge" where one could learn to choose an appropriate formulation for a SQL query without trying all of the imaginable possibilities with EXPLAIN? Thank you all! Antonio wsheldah@lexmark.com wrote: > >In just eyeballing the various responses, it looks like the one using DISTINCT >ON manages to avoid using a subquery at all. Would this give it the edge in >performance? I had somehow never noticed the DISTINCT ON syntax before, this >looks very handy. > >Also, my first attempt was to put the subquery in the WHERE clause, but I >noticed that several put the subquery in the FROM clause. Does putting it in the >FROM clause just run it once, with the results of the run joined to the outer >tables? It certainly seemed like putting the query in the WHERE clause was >running it for every row. Thanks, > >Wes Sheldahl > > > >Martijn van Oosterhout <kleptog%svana.org@interlock.lexmark.com> on 12/10/2001 >06:33:59 PM > >Please respond to Martijn van Oosterhout > <kleptog%svana.org@interlock.lexmark.com> > >To: Chris Albertson <chrisalbertson90278%yahoo.com@interlock.lexmark.com> >cc: pgsql-general%postgresql.org@interlock.lexmark.com (bcc: Wesley > Sheldahl/Lex/Lexmark) >Subject: Re: [GENERAL] Need SQL help, I'm stuck. > > >On Mon, Dec 10, 2001 at 01:42:54PM -0800, Chris Albertson wrote: > >>Help. I seem to have a case of "brain lock" and can't figure out >>something that I should know is simple. >> >>Here is what I am trying to do. Let's say I have a table called >>T1 with columns C1, C2, C3, C4. It contains data as follows >> >> a 1 abcd dfg >> a 2 cvfr erg >> a 3 derg hbg >> b 1 cccc rth >> c 1 rdvg egt >> c 2 derf ett >> >>I want a SQL query that returns these rows >> >> a 3 derg hbg >> b 1 cccc rth >> c 2 derf ett >> > >How about: > >select distinct on (C1) C1, C2, C3, C4 from T1 order by C1, C2 desc; >-- >Martijn van Oosterhout <kleptog@svana.org> >http://svana.org/kleptog/ >
On Tue, 11 Dec 2001 wsheldah@lexmark.com wrote: > > > In just eyeballing the various responses, it looks like the one using DISTINCT > ON manages to avoid using a subquery at all. Would this give it the edge in > performance? I had somehow never noticed the DISTINCT ON syntax before, this > looks very handy. Distinct on often performs better than subquery options, however it's a PostgreSQL addition, so you need to be careful if you want to run the query on anything else :) > Also, my first attempt was to put the subquery in the WHERE clause, but I > noticed that several put the subquery in the FROM clause. Does putting it in the > FROM clause just run it once, with the results of the run joined to the outer > tables? It certainly seemed like putting the query in the WHERE clause was > running it for every row. Thanks, I've generally assumed that subselects in from are effectively a cursor that gets scanned as opposed to running the entire query for each row. I haven't looked to confirm that, but it seems reasonable :)