Thread: limiting join results

limiting join results

From

Elaine Lindelef

Date:

14 November 2002, 19:16:35

I am doing a query with a 3-way join. The join and select are working
fine. However, what I want is to select only the row with the
smallest timediff for each distinct t1.date.

This is the query (simplified):

select t1.date, t1.parent,
t1.id, t2.id, t3.id, t3.date,
(t3.date - t1.date) as timediff
  from (t1 LEFT JOIN t2
ON t1.parent = t2.id)
LEFT JOIN t3 ON t2.page = t3.page
where
t3.date < t1.date and
t3.event_type = 'page' and
t1.user_id = '61516' and
order by t1.date, timediff;

Here are my results:

           t1.date       | parent | t1.id  | t2.id  | t3.id  |
t3.date          | timediff
------------------------+--------+--------+--------+--------+---------
---------------+----------
  2002-11-14 14:46:33-08 | 102846 | 100918 | 102846 | 102845 |
2002-11-14 14:46:11-08 | 00:00:22
  2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102855 |
2002-11-14 15:33:50-08 | 00:00:11
  2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102853 |
2002-11-14 15:33:40-08 | 00:00:21
  2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102847 |
2002-11-14 14:46:35-08 | 00:47:26
  2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102845 |
2002-11-14 14:46:11-08 | 00:47:50
(5 rows)

What I want are only the first two rows. However, I don't know how
many distinct t1.date values I will have. Using DISTINCT doesn't seem
to change the output, and I'm not convinced it would keep the correct
row if it did.

My normal habit is to clean up the results in perl, but it seems to
me that I should be able to do it in the SQL query and be a bit
cleaner.

Thank you for your assistance.

Elaine Lindelef

Re: limiting join results

From

snpe

Date:

14 November 2002, 21:10:14

On Friday 15 November 2002 12:14 am, Elaine Lindelef wrote:
> I am doing a query with a 3-way join. The join and select are working
> fine. However, what I want is to select only the row with the
> smallest timediff for each distinct t1.date.
>
> This is the query (simplified):
>
> select t1.date, t1.parent,
> t1.id, t2.id, t3.id, t3.date,
> (t3.date - t1.date) as timediff
>   from (t1 LEFT JOIN t2
> ON t1.parent = t2.id)
> LEFT JOIN t3 ON t2.page = t3.page
> where
> t3.date < t1.date and
> t3.event_type = 'page' and
> t1.user_id = '61516' and
> order by t1.date, timediff;
>
> Here are my results:
>
>            t1.date       | parent | t1.id  | t2.id  | t3.id  |
> t3.date          | timediff
> ------------------------+--------+--------+--------+--------+---------
> ---------------+----------
>   2002-11-14 14:46:33-08 | 102846 | 100918 | 102846 | 102845 |
> 2002-11-14 14:46:11-08 | 00:00:22
>   2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102855 |
> 2002-11-14 15:33:50-08 | 00:00:11
>   2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102853 |
> 2002-11-14 15:33:40-08 | 00:00:21
>   2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102847 |
> 2002-11-14 14:46:35-08 | 00:47:26
>   2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102845 |
> 2002-11-14 14:46:11-08 | 00:47:50
> (5 rows)
>
> What I want are only the first two rows. However, I don't know how
> many distinct t1.date values I will have. Using DISTINCT doesn't seem
> to change the output, and I'm not convinced it would keep the correct
> row if it did.
>
> My normal habit is to clean up the results in perl, but it seems to
> me that I should be able to do it in the SQL query and be a bit
> cleaner.
>
> Thank you for your assistance.
>
> Elaine Lindelef

You try cursor
begin;
declare cursor c1 for select ....;
fetch forward 2 from c1;
commit;

regards
Haris Peco

Re: limiting join results

From

Stephan Szabo

Date:

14 November 2002, 22:00:48

On Thu, 14 Nov 2002, Elaine Lindelef wrote:

> I am doing a query with a 3-way join. The join and select are working
> fine. However, what I want is to select only the row with the
> smallest timediff for each distinct t1.date.
>
> This is the query (simplified):
>
> select t1.date, t1.parent,
> t1.id, t2.id, t3.id, t3.date,
> (t3.date - t1.date) as timediff
>   from (t1 LEFT JOIN t2
> ON t1.parent = t2.id)
> LEFT JOIN t3 ON t2.page = t3.page
> where
> t3.date < t1.date and
> t3.event_type = 'page' and
> t1.user_id = '61516' and
> order by t1.date, timediff;

If you don't mind a postgres specific solution,
I think
select distinct on (t1.date) t1.date, ...

may give you what you want.

A real SQL solution is a bit more involved, I think you need
to do a subselect with a group by.

Re: limiting join results

From

Scott Lamb

Date:

15 November 2002, 01:02:16

Elaine Lindelef wrote:
> I am doing a query with a 3-way join. The join and select are working
> fine. However, what I want is to select only the row with the smallest
> timediff for each distinct t1.date.

First of all, you are using left joins. I don't think that's what you
want. A left join says to include all matching rows from t1, even if
there's no matching row in t2; it makes a fake t2 with all nulls if
necessary to match the t1 with. (Not a terribly good explanation. If you
need better, do a web search or I can try again.) Likewise for t2 and
t3. But you are discarding those with the t3.date comparison, because
"t3.date < t1.date" will always be false if t3.date is null, as it would
be for the extra stuff from the left join. You probably want an inner
join (the normal kind), which is a lot less expensive also.

And the way I typically say the smallest/largest/whateverest something
is with a not exists clause. Something that says "there is no record
with a smaller timediff than this one and matching it otherwise."

I'll use a simpler example (only one table in the not exists) to
demonstrate that with. Let's just say I'm interested in finding the
oldest employee in each department of some business.

create table department (
     department_id            serial primary key,
     name                     varchar(50) not null
);

create table employee (
     employee_id              serial primary key,
     department_id            integer references department not null,
     name                     varchar(50) not null,
     when_born                date
);

select    department.name as department_name,
           employee.name as employee_name,
           when_born
from      employee natural join department
where     not exists
          (select    'x'
           from      employee as older_employee
           where     employee.department_id
                     = older_employee.department_id
             and     older_employee.when_born < employee.when_born);

In other words, show me the department name, employee name, and birth
date of every employee for whom there is no older employee in the same
department.

Does that help?

Scott

Re: limiting join results

From

Jean-Luc Lachance

Date:

15 November 2002, 12:15:49

Elaine,

Are you sure of the where clause? (t3.date - t1.date) should be negative
if t3.date < t1.date!

JLL


Elaine Lindelef wrote:
>
> I am doing a query with a 3-way join. The join and select are working
> fine. However, what I want is to select only the row with the
> smallest timediff for each distinct t1.date.
>
> This is the query (simplified):
>
> select t1.date, t1.parent,
> t1.id, t2.id, t3.id, t3.date,
> (t3.date - t1.date) as timediff
>   from (t1 LEFT JOIN t2
> ON t1.parent = t2.id)
> LEFT JOIN t3 ON t2.page = t3.page
> where
> t3.date < t1.date and
> t3.event_type = 'page' and
> t1.user_id = '61516' and
> order by t1.date, timediff;
>
> Here are my results:
>
>            t1.date       | parent | t1.id  | t2.id  | t3.id  |
> t3.date          | timediff
> ------------------------+--------+--------+--------+--------+---------
> ---------------+----------
>   2002-11-14 14:46:33-08 | 102846 | 100918 | 102846 | 102845 |
> 2002-11-14 14:46:11-08 | 00:00:22
>   2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102855 |
> 2002-11-14 15:33:50-08 | 00:00:11
>   2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102853 |
> 2002-11-14 15:33:40-08 | 00:00:21
>   2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102847 |
> 2002-11-14 14:46:35-08 | 00:47:26
>   2002-11-14 15:34:01-08 | 102856 | 100921 | 102856 | 102845 |
> 2002-11-14 14:46:11-08 | 00:47:50
> (5 rows)
>
> What I want are only the first two rows. However, I don't know how
> many distinct t1.date values I will have. Using DISTINCT doesn't seem
> to change the output, and I'm not convinced it would keep the correct
> row if it did.
>
> My normal habit is to clean up the results in perl, but it seems to
> me that I should be able to do it in the SQL query and be a bit
> cleaner.
>
> Thank you for your assistance.
>
> Elaine Lindelef
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: limiting join results

From

Elaine Lindelef

Date:

15 November 2002, 13:29:30

Thanks all! Those suggestions worked, and your help is deeply appreciated.

Elaine