Thread: limiting join results
I am doing a query with a 3-way join. The join and select are working fine. However, what I want is to select only the row with the smallest timediff for each distinct t1.date. This is the query (simplified): select t1.date, t1.parent, t1.id, t2.id, t3.id, t3.date, (t3.date - t1.date) as timediff from (t1 LEFT JOIN t2 ON t1.parent = t2.id) LEFT JOIN t3 ON t2.page = t3.page where t3.date < t1.date and t3.event_type = 'page' and t1.user_id = '61516' and order by t1.date, timediff; Here are my results: t1.date | parent | t1.id | t2.id | t3.id | t3.date | timediff ------------------------+--------+--------+--------+--------+--------- ---------------+---------- 2002-11-14 14:46:33-08 | 102846 | 100918 | 102846 | 102845 | 2002-11-14 14:46:11-08 | 00:00:22 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102855 | 2002-11-14 15:33:50-08 | 00:00:11 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102853 | 2002-11-14 15:33:40-08 | 00:00:21 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102847 | 2002-11-14 14:46:35-08 | 00:47:26 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102845 | 2002-11-14 14:46:11-08 | 00:47:50 (5 rows) What I want are only the first two rows. However, I don't know how many distinct t1.date values I will have. Using DISTINCT doesn't seem to change the output, and I'm not convinced it would keep the correct row if it did. My normal habit is to clean up the results in perl, but it seems to me that I should be able to do it in the SQL query and be a bit cleaner. Thank you for your assistance. Elaine Lindelef
On Friday 15 November 2002 12:14 am, Elaine Lindelef wrote: > I am doing a query with a 3-way join. The join and select are working > fine. However, what I want is to select only the row with the > smallest timediff for each distinct t1.date. > > This is the query (simplified): > > select t1.date, t1.parent, > t1.id, t2.id, t3.id, t3.date, > (t3.date - t1.date) as timediff > from (t1 LEFT JOIN t2 > ON t1.parent = t2.id) > LEFT JOIN t3 ON t2.page = t3.page > where > t3.date < t1.date and > t3.event_type = 'page' and > t1.user_id = '61516' and > order by t1.date, timediff; > > Here are my results: > > t1.date | parent | t1.id | t2.id | t3.id | > t3.date | timediff > ------------------------+--------+--------+--------+--------+--------- > ---------------+---------- > 2002-11-14 14:46:33-08 | 102846 | 100918 | 102846 | 102845 | > 2002-11-14 14:46:11-08 | 00:00:22 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102855 | > 2002-11-14 15:33:50-08 | 00:00:11 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102853 | > 2002-11-14 15:33:40-08 | 00:00:21 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102847 | > 2002-11-14 14:46:35-08 | 00:47:26 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102845 | > 2002-11-14 14:46:11-08 | 00:47:50 > (5 rows) > > What I want are only the first two rows. However, I don't know how > many distinct t1.date values I will have. Using DISTINCT doesn't seem > to change the output, and I'm not convinced it would keep the correct > row if it did. > > My normal habit is to clean up the results in perl, but it seems to > me that I should be able to do it in the SQL query and be a bit > cleaner. > > Thank you for your assistance. > > Elaine Lindelef You try cursor begin; declare cursor c1 for select ....; fetch forward 2 from c1; commit; regards Haris Peco
On Thu, 14 Nov 2002, Elaine Lindelef wrote: > I am doing a query with a 3-way join. The join and select are working > fine. However, what I want is to select only the row with the > smallest timediff for each distinct t1.date. > > This is the query (simplified): > > select t1.date, t1.parent, > t1.id, t2.id, t3.id, t3.date, > (t3.date - t1.date) as timediff > from (t1 LEFT JOIN t2 > ON t1.parent = t2.id) > LEFT JOIN t3 ON t2.page = t3.page > where > t3.date < t1.date and > t3.event_type = 'page' and > t1.user_id = '61516' and > order by t1.date, timediff; If you don't mind a postgres specific solution, I think select distinct on (t1.date) t1.date, ... may give you what you want. A real SQL solution is a bit more involved, I think you need to do a subselect with a group by.
Elaine Lindelef wrote: > I am doing a query with a 3-way join. The join and select are working > fine. However, what I want is to select only the row with the smallest > timediff for each distinct t1.date. First of all, you are using left joins. I don't think that's what you want. A left join says to include all matching rows from t1, even if there's no matching row in t2; it makes a fake t2 with all nulls if necessary to match the t1 with. (Not a terribly good explanation. If you need better, do a web search or I can try again.) Likewise for t2 and t3. But you are discarding those with the t3.date comparison, because "t3.date < t1.date" will always be false if t3.date is null, as it would be for the extra stuff from the left join. You probably want an inner join (the normal kind), which is a lot less expensive also. And the way I typically say the smallest/largest/whateverest something is with a not exists clause. Something that says "there is no record with a smaller timediff than this one and matching it otherwise." I'll use a simpler example (only one table in the not exists) to demonstrate that with. Let's just say I'm interested in finding the oldest employee in each department of some business. create table department ( department_id serial primary key, name varchar(50) not null ); create table employee ( employee_id serial primary key, department_id integer references department not null, name varchar(50) not null, when_born date ); select department.name as department_name, employee.name as employee_name, when_born from employee natural join department where not exists (select 'x' from employee as older_employee where employee.department_id = older_employee.department_id and older_employee.when_born < employee.when_born); In other words, show me the department name, employee name, and birth date of every employee for whom there is no older employee in the same department. Does that help? Scott
Elaine, Are you sure of the where clause? (t3.date - t1.date) should be negative if t3.date < t1.date! JLL Elaine Lindelef wrote: > > I am doing a query with a 3-way join. The join and select are working > fine. However, what I want is to select only the row with the > smallest timediff for each distinct t1.date. > > This is the query (simplified): > > select t1.date, t1.parent, > t1.id, t2.id, t3.id, t3.date, > (t3.date - t1.date) as timediff > from (t1 LEFT JOIN t2 > ON t1.parent = t2.id) > LEFT JOIN t3 ON t2.page = t3.page > where > t3.date < t1.date and > t3.event_type = 'page' and > t1.user_id = '61516' and > order by t1.date, timediff; > > Here are my results: > > t1.date | parent | t1.id | t2.id | t3.id | > t3.date | timediff > ------------------------+--------+--------+--------+--------+--------- > ---------------+---------- > 2002-11-14 14:46:33-08 | 102846 | 100918 | 102846 | 102845 | > 2002-11-14 14:46:11-08 | 00:00:22 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102855 | > 2002-11-14 15:33:50-08 | 00:00:11 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102853 | > 2002-11-14 15:33:40-08 | 00:00:21 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102847 | > 2002-11-14 14:46:35-08 | 00:47:26 > 2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102845 | > 2002-11-14 14:46:11-08 | 00:47:50 > (5 rows) > > What I want are only the first two rows. However, I don't know how > many distinct t1.date values I will have. Using DISTINCT doesn't seem > to change the output, and I'm not convinced it would keep the correct > row if it did. > > My normal habit is to clean up the results in perl, but it seems to > me that I should be able to do it in the SQL query and be a bit > cleaner. > > Thank you for your assistance. > > Elaine Lindelef > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Thanks all! Those suggestions worked, and your help is deeply appreciated. Elaine