Thread: Am I crazy or is this SQL not possible
I am having some serious mental block here. Here is the abstract version of my problem. I have a table like this: unique_id (PK) broadcast_id date_sent status 1 1 2005-04-04 30 2 1 2005-04-01 30 3 1 2005-05-20 10 4 2 2005-05-29 30 So it is a table that stores broadcasts including the broadcast_id, the date sent, and the status of the broadcast. What I would like to do is simply get the last date_sent and it's status for every broadcast. I can't do a GROUP BY because I can't put an aggregate on the status column. SELECT MAX(date_sent), status FROM broadcast_history GROUP BY broadcast_id How do I get the status for the most recent date_sent using GROUP BY? DISTINCT also doesn't work SELECT DISTINCT ON (email_broadcast_id) * FROM email_broadcast_history ORDER BY date_sent As you have to have the DISTINCT fields matching the ORDER BY fields. I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent I keep thinking am I missing something. Does anybody have any ideas?
Collin Peters wrote: > I am having some serious mental block here. Here is the abstract > version of my problem. I have a table like this: > > unique_id (PK) broadcast_id date_sent status > 1 1 2005-04-04 30 > 2 1 2005-04-01 30 > 3 1 2005-05-20 10 > 4 2 2005-05-29 30 > > So it is a table that stores broadcasts including the broadcast_id, > the date sent, and the status of the broadcast. > > What I would like to do is simply get the last date_sent and it's > status for every broadcast. I can't do a GROUP BY because I can't put > an aggregate on the status column. > > SELECT MAX(date_sent), status > FROM broadcast_history > GROUP BY broadcast_id > > How do I get the status for the most recent date_sent using GROUP BY? > > DISTINCT also doesn't work > > SELECT DISTINCT ON (email_broadcast_id) * > FROM email_broadcast_history > ORDER BY date_sent > > As you have to have the DISTINCT fields matching the ORDER BY fields. > I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent > > I keep thinking am I missing something. Does anybody have any ideas? > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster May not be the most efficient but seems to work here. Select broadcast_id,status from broadcast_history bh1 where bh1.date_sent = (select max(date_sent) from broadcast_history bh2 where bh1.broadcast_id=bh2.broadcast_id) order by bh1.broadcast_id; Oisin
select my_sub.max_date, broadcast_history.status
from (
SELECT MAX(date_sent) max_date, broadcast_id
FROM broadcast_history
GROUP BY broadcast_id
) my_sub
inner join broadcast_history on (
broadcast_history.broadcast_id = my_sub.broadcast_id
and broadcast_history.date_sent = my_sub.max_date
);
This should work if the combined broadcast_id, date_sent is unique. If not, you will need to decide what record to pick in case of a tie.
--
==================================================================
Aaron Bono
President Aranya Software Technologies, Inc.
http://www.aranya.com We take care of your technology needs.
Phone: (816) 695-6071
==================================================================
from (
SELECT MAX(date_sent) max_date, broadcast_id
FROM broadcast_history
GROUP BY broadcast_id
) my_sub
inner join broadcast_history on (
broadcast_history.broadcast_id = my_sub.broadcast_id
and broadcast_history.date_sent = my_sub.max_date
);
This should work if the combined broadcast_id, date_sent is unique. If not, you will need to decide what record to pick in case of a tie.
On 6/1/06, Collin Peters <cadiolis@gmail.com> wrote:
I am having some serious mental block here. Here is the abstract
version of my problem. I have a table like this:
unique_id (PK) broadcast_id date_sent status
1 1 2005-04-04 30
2 1 2005-04-01 30
3 1 2005-05-20 10
4 2 2005-05-29 30
So it is a table that stores broadcasts including the broadcast_id,
the date sent, and the status of the broadcast.
What I would like to do is simply get the last date_sent and it's
status for every broadcast. I can't do a GROUP BY because I can't put
an aggregate on the status column.
SELECT MAX(date_sent), status
FROM broadcast_history
GROUP BY broadcast_id
How do I get the status for the most recent date_sent using GROUP BY?
DISTINCT also doesn't work
SELECT DISTINCT ON (email_broadcast_id) *
FROM email_broadcast_history
ORDER BY date_sent
As you have to have the DISTINCT fields matching the ORDER BY fields.
I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent
I keep thinking am I missing something. Does anybody have any ideas?
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
==================================================================
Aaron Bono
President Aranya Software Technologies, Inc.
http://www.aranya.com We take care of your technology needs.
Phone: (816) 695-6071
==================================================================
> What I would like to do is simply get the last date_sent and it's > status for every broadcast. I can't do a GROUP BY because I can't put > an aggregate on the status column. > > SELECT MAX(date_sent), status > FROM broadcast_history > GROUP BY broadcast_id > You could try the following: select status from broadcast_history bh where bh.date_sent = (select max(bh2.date_sent) from broadcast_historybh2); This reminds me of an interview question: I was asked how to get a maximum column from a table without using max. How would you do that? Thanks, Yasir
> select max(date_sent) from table; > would equal > select date_sent from broadcast_history order by date_sent DESC limit 1; > That sounds like a hack. Is limit a SQL-99 standard? Is there are another way to do this? Sorry to take over your topic, Collin. Thanks, Yasir
"Collin Peters" <cadiolis@gmail.com> writes: > What I would like to do is simply get the last date_sent and it's > status for every broadcast. I can't do a GROUP BY because I can't put > an aggregate on the status column. You missed the key idea about how to use DISTINCT ON. SELECT DISTINCT ON (email_broadcast_id) * FROM email_broadcast_history ORDER BY email_broadcast_id, date_sent DESC You order by the DISTINCT ON fields, then one or more additional fields to select the representative row you want within each DISTINCT ON group. regards, tom lane
It is a hack, but when someone wants you to do something in a way different from the norm, aren't they asking for a hack? SQL Server does something like select top (1) from .... I am thinking this is NOT a SQL-99 standard. -Aaron On 6/1/06, Yasir Malik <ymalik@cs.stevens.edu> wrote: > > select max(date_sent) from table; > > would equal > > select date_sent from broadcast_history order by date_sent DESC limit 1; > > > That sounds like a hack. Is limit a SQL-99 standard? Is there are > another way to do this? > Sorry to take over your topic, Collin. > > Thanks, > Yasir
> It is a hack, but when someone wants you to do something in a way > different from the norm, aren't they asking for a hack? > > SQL Server does something like > select top (1) from .... > > I am thinking this is NOT a SQL-99 standard. > This was an interview with Amazon, and I don't think Amazon wanted a hack. I hope Amazon doesn't use hacks. There has to be another way. It would be cruel if they expected me know some database specific functionality. Here's what I gave them: select value from table t where t.value > (select t2.value from table t2); which would be fine if the sub-select returned simply a list, but that isn't even valid SQL (I didn't get the job, BTW). Yasir
Yasir Malik wrote: > >It is a hack, but when someone wants you to do something in a way > >different from the norm, aren't they asking for a hack? > > > >SQL Server does something like > >select top (1) from .... > > > >I am thinking this is NOT a SQL-99 standard. > > > This was an interview with Amazon, and I don't think Amazon wanted a hack. > I hope Amazon doesn't use hacks. There has to be another way. It would > be cruel if they expected me know some database specific functionality. Do they use PostgreSQL at Amazon? Maybe they wanted to know if you are aware of common non-standard tricks for whatever RDBMS they use. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, 2006-06-01 at 14:47, Yasir Malik wrote: > > It is a hack, but when someone wants you to do something in a way > > different from the norm, aren't they asking for a hack? > > > > SQL Server does something like > > select top (1) from .... > > > > I am thinking this is NOT a SQL-99 standard. > > > This was an interview with Amazon, and I don't think Amazon wanted a hack. > I hope Amazon doesn't use hacks. There has to be another way. It would > be cruel if they expected me know some database specific functionality. > Here's what I gave them: > select value > from table t > where t.value > > (select t2.value > from table t2); > > which would be fine if the sub-select returned simply a list, but that > isn't even valid SQL (I didn't get the job, BTW). Sometimes the only reasonable answer is a hack. Much like in the older versions of PostgreSQL where select max(id) from sometable was slower than watching grass grow, the standard "hack" was to do select id from sometable order by id desc limit 1. If the SQL way kills your server and the hack doesn't, you use the hack and move on.
Collin, I have a similar circumstance in one of my own apps. I operate under the simple presumption that the unique_id is sequential and thus the record with the highest unique_id is the most recent entry. In that case I use a query such as select * from broadcast_history where unique_id in (select broadcast_id, max(unique_id) from broadcast_historygroup by broadcast_id) which permits me to examine the entire record which is necessary in my situation. Good luck Mark On Thu, 2006-06-01 at 10:43 -0700, Collin Peters wrote: > I am having some serious mental block here. Here is the abstract > version of my problem. I have a table like this: > > unique_id (PK) broadcast_id date_sent status > 1 1 2005-04-04 30 > 2 1 2005-04-01 30 > 3 1 2005-05-20 10 > 4 2 2005-05-29 30 > > So it is a table that stores broadcasts including the broadcast_id, > the date sent, and the status of the broadcast. > > What I would like to do is simply get the last date_sent and it's > status for every broadcast. I can't do a GROUP BY because I can't put > an aggregate on the status column. > > SELECT MAX(date_sent), status > FROM broadcast_history > GROUP BY broadcast_id > > How do I get the status for the most recent date_sent using GROUP BY? > > DISTINCT also doesn't work > > SELECT DISTINCT ON (email_broadcast_id) * > FROM email_broadcast_history > ORDER BY date_sent > > As you have to have the DISTINCT fields matching the ORDER BY fields. > I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent > > I keep thinking am I missing something. Does anybody have any ideas? > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
On Thu, 2006-06-01 at 14:13 -0400, Yasir Malik wrote: > > What I would like to do is simply get the last date_sent and it's > > status for every broadcast. I can't do a GROUP BY because I can't put > > an aggregate on the status column. > > > > SELECT MAX(date_sent), status > > FROM broadcast_history > > GROUP BY broadcast_id > > > You could try the following: > select status > from broadcast_history bh > where bh.date_sent = > (select max(bh2.date_sent) > from broadcast_history bh2); > > This reminds me of an interview question: I was asked how to get a > maximum column from a table without using max. How would you do that? Find the list of everything that isn't the highest value, then invert it. Don't expect it to perform very well though. select col from foo where col not in (select f1.col from foo as f1 join foo as f2 on (f1.col < f2.col) ); --
Is this SQL-99 compliant or a PostgreSQL specific query? I really like it and have never seen this before.
-Aaron
-Aaron
On 6/1/06, Tom Lane < tgl@sss.pgh.pa.us> wrote:
"Collin Peters" < cadiolis@gmail.com> writes:
> What I would like to do is simply get the last date_sent and it's
> status for every broadcast. I can't do a GROUP BY because I can't put
> an aggregate on the status column.
You missed the key idea about how to use DISTINCT ON.
SELECT DISTINCT ON (email_broadcast_id) *
FROM email_broadcast_history
ORDER BY email_broadcast_id, date_sent DESC
You order by the DISTINCT ON fields, then one or more additional fields
to select the representative row you want within each DISTINCT ON group.
"Aaron Bono" <postgresql@aranya.com> writes: > Is this SQL-99 compliant or a PostgreSQL specific query? I really like it > and have never seen this before. DISTINCT ON is a Postgres-ism, I'm afraid. It's pretty handy though. regards, tom lane
Collin Peters wrote: > I am having some serious mental block here. Here is the abstract > version of my problem. I have a table like this: > > unique_id (PK) broadcast_id date_sent status > 1 1 2005-04-04 30 > 2 1 2005-04-01 30 > 3 1 2005-05-20 10 > 4 2 2005-05-29 30 > > So it is a table that stores broadcasts including the broadcast_id, > the date sent, and the status of the broadcast. > > What I would like to do is simply get the last date_sent and it's > status for every broadcast. I can't do a GROUP BY because I can't put > an aggregate on the status column. > > SELECT MAX(date_sent), status > FROM broadcast_history > GROUP BY broadcast_id > > How do I get the status for the most recent date_sent using GROUP BY? > > DISTINCT also doesn't work > > SELECT DISTINCT ON (email_broadcast_id) * > FROM email_broadcast_history > ORDER BY date_sent > > As you have to have the DISTINCT fields matching the ORDER BY fields. > I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent > > I keep thinking am I missing something. Does anybody have any ideas? > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > You can also try SELECT * from broadcast_history A WHERE NOT EXISTS(SELECT * from broadcast_history B WHERE B.date_sent >A.date_sent) There isn't any PostgreSQL-ism, just a correlated subrequest wich is perfectly standars, afaik
<br />Sorry to stick my nose in here...<br />would not this work better?<br /><br />SELECT broadcast_id,date_sent,statusfrom broadcast_history where<br />unique_id in (<br />SELECT max(unique_id) from broadcast_historygroup by broadcast_id);<br /><br />Seems like a simpler option.<br /><br /><p> <font size="2">----- OriginalMessage ------<br /> <b>From:</b>Patrick Jacquot<br /> <b>Sent:</b>Friday, June 02, 2006 10:45<br /> <b>To:</b>CollinPeters cadiolis@gmail.com; <br /> <b>Cc: </b>pgsql-sql@postgresql.org; <br /> <b>Subject:</b>Re: [SQL]Am I crazy or is this SQL not possible</font><p><table><tbody><tr><td></td></tr></tbody></table><font size="2">CollinPeters wrote:<br /><br />> I am having some serious mental block here. Here is the abstract<br />> versionof my problem. I have a table like this:<br />><br />> unique_id (PK) broadcast_id date_sent status<br />>1 1 2005-04-04 30<br />> 2 1 2005-04-01 30<br />> 3 1 2005-05-20 10<br />> 4 2 2005-05-29 30<br />><br/>> So it is a table that stores broadcasts including the broadcast_id,<br />> the date sent, and the statusof the broadcast.<br />><br />> What I would like to do is simply get the last date_sent and it's<br />> statusfor every broadcast. I can't do a GROUP BY because I can't put<br />> an aggregate on the status column.<br />><br/>> SELECT MAX(date_sent), status<br />> FROM broadcast_history<br />> GROUP BY broadcast_id<br />><br/>> How do I get the status for the most recent date_sent using GROUP BY?<br />><br />> DISTINCT also doesn'twork<br />><br />> SELECT DISTINCT ON (email_broadcast_id) *<br />> FROM email_broadcast_history<br />>ORDER BY date_sent<br />><br />> As you have to have the DISTINCT fields matching the ORDER BY fields.<br />>I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent<br />><br />> I keep thinking am I missingsomething. Does anybody have any ideas?<br />><br />> ---------------------------(end of broadcast)---------------------------<br/>> TIP 2: Don't 'kill -9' the postmaster<br />><br />You can also try<br />SELECT* from broadcast_history A WHERE NOT EXISTS<br />(SELECT * from broadcast_history B WHERE B.date_sent >A.date_sent)<br/>There isn't any PostgreSQL-ism, just a correlated subrequest wich is <br />perfectly standars, afaik<br/><br /><br /> ---------------------------(end of broadcast)---------------------------<br /> TIP 9: In versionsbelow 8.0, the planner will ignore your desire to<br /> choose an index scan if your joining column's datatypesdo not<br /> match<br /> <br /></font>
Only if you assume that ordering by unique_id and by date_sent are equivalent. That may be the case but I personally hate making assumptions like that. When someone goes into the database and updates records (clean up bad data, etc.) your perfectly running query can suddenly produce bad results.
-Aaron
-Aaron
On 6/2/06, Klay Martens <kmartens@wol.co.za> wrote:
Sorry to stick my nose in here...
would not this work better?
SELECT broadcast_id,date_sent,status from broadcast_history where
unique_id in (
SELECT max(unique_id) from broadcast_history group by broadcast_id);
Seems like a simpler option.
"Collin Peters" wrote: > DISTINCT also doesn't work > > SELECT DISTINCT ON (email_broadcast_id) * > FROM email_broadcast_history > ORDER BY date_sent > > As you have to have the DISTINCT fields matching the ORDER BY fields. > I have to ORDER BY date_sent, but I can't do a DISTINCT on date_sent DISTINCT does work. You don't have to do DISTINCT on date_sent. The left part of the ORDER BY must match the DISTINCT expression(s): SELECT DISTINCT ON(broadcast_id) broadcast_id, date_sent, status FROM table ORDER BY broadcast_id, date_sent DESC;
> This reminds me of an interview question: I was asked how to get a > maximum column from a table without using max. How would you do that? > > Select my_column from my_table order by my_column desc limit 1 -- ==================================================================Aaron BonoPresident Aranya SoftwareTechnologies, Inc.http://www.aranya.com We take care of your technology needs.Phone: (816) 695-6071 ==================================================================
I think this approach will only work if each broadcast_id has the same maximum date_sent value. You really need to do the group by in a sub-query to bring the broadcast_id together with the max date_sent. On 6/1/06, Yasir Malik <ymalik@cs.stevens.edu> wrote: > > What I would like to do is simply get the last date_sent and it's > > status for every broadcast. I can't do a GROUP BY because I can't put > > an aggregate on the status column. > > > > SELECT MAX(date_sent), status > > FROM broadcast_history > > GROUP BY broadcast_id > > > You could try the following: > select status > from broadcast_history bh > where bh.date_sent = > (select max(bh2.date_sent) > from broadcast_history bh2); > ==================================================================Aaron BonoPresident Aranya SoftwareTechnologies, Inc.http://www.aranya.com We take care of your technology needs.Phone: (816) 695-6071 ==================================================================