Thread: difficult JOIN
Hi, i have the following SQL-Problem: We are using 2 tables. The first, called plan, is holding planned working times for employees per tour: plan.id_tour plan.id_employee plan.begin_time plan.end_time The second table 'work' stores the actual worked times for employees per tour: work.id_tour work.id_employee work.begin_time work.end_time Employees can be multiple times assigned to one tour. One record will be created for every assignment. They can also work multiple times in one tour. Now i wanna merge this infos into one report. I wanna join the first plan entry for one employee in one tour with the first work entry for one employee in one tour and so on. How can i obtain that? A simply USING(id_tour, id_employee) -JOIN will not doit. Thanks for any hints, Thomas
On Tue, Jan 25, 2005 at 05:26:50PM +0100, Thomas Chille wrote: > Hi, > > i have the following SQL-Problem: > > We are using 2 tables. The first, called plan, is holding planned working > times for employees per > tour: > > plan.id_tour > plan.id_employee > plan.begin_time > plan.end_time > > The second table 'work' stores the actual worked times for employees per > tour: > > work.id_tour > work.id_employee > work.begin_time > work.end_time > > Employees can be multiple times assigned to one tour. One record will be > created for every > assignment. They can also work multiple times in one tour. > > Now i wanna merge this infos into one report. I wanna join the first plan > entry for one employee in > one tour with the first work entry for one employee in one tour and so on. > > How can i obtain that? A simply USING(id_tour, id_employee) -JOIN will not > doit. Hrm. So for a given tour, employee, you want to pair the first record in plan with the first record in work, and the second record in plan with the second record in work? Doing that will be pretty tricky. I'm not sure you can even do it in a single SELECT. More important, does it even make sense? What if an employee ends up not working at all for one of his/her planned times? Every record after that would be completely skewed. Wouldn't it make much more sense to either assign an ID to each record in the plan table, and correlate records in the work table using that ID, or correlate based on begin and end time? BTW, I've never seen the convention id_employee; people generally use employee_id. Is it more important to know that you're talking about an ID or that you're talking about an employee? Just food for thought. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
Hi Jim, thanks for your answer! > Hrm. So for a given tour, employee, you want to pair the first record in > plan with the first record in work, and the second record in plan with > the second record in work? Yes you understand me well, thats what i'm trying to achieve. > Doing that will be pretty tricky. I'm not sure you can even do it in a > single SELECT. I'm not sure anymore too. > More important, does it even make sense? What if an employee ends up not > working at all for one of his/her planned times? Every record after that > would be completely skewed. Wouldn't it make much more sense to either > assign an ID to each record in the plan table, and correlate records in > the work table using that ID, or correlate based on begin and end time? This report should not be the final thing. It schould only present the matched times on an easy way and the tourleader can than edit the worktimes manually. He have to do this after every tour because the employees are often not using the timerecording unit correct (i'm talking about restaurants). In the meanwhile i gave every record an position counter (per tour and employee), derrived from begin_time, per trigger, and merged them in this way: SELECT * FROM work LEFT JOIN plan USING(id_tour, id_employee, counter) UNION SELECT * FROM work RIGHT JOIN plan USING(id_tour, id_employee, counter) Thats works for me but if the tourleader change one worktime record the counters have to be recalculate and the order of the records will change. This is hard to handle and i think i will do all merging programmatically in the app or with an ppgsql function. > BTW, I've never seen the convention id_employee; people generally use > employee_id. Is it more important to know that you're talking about an > ID or that you're talking about an employee? Just food for thought. I'm not the father of this strange naming convention :) Thank you again, Thomas
On Wed, Jan 26, 2005 at 11:22:48AM +0100, Thomas Chille wrote: > > More important, does it even make sense? What if an employee ends up not > > working at all for one of his/her planned times? Every record after that > > would be completely skewed. Wouldn't it make much more sense to either > > assign an ID to each record in the plan table, and correlate records in > > the work table using that ID, or correlate based on begin and end time? > > This report should not be the final thing. It schould only present the > matched times on an easy way and the tourleader can than edit the > worktimes manually. He have to do this after every tour because the > employees are often not using the timerecording unit correct (i'm > talking about restaurants). One thing that may help is to do an inner join on (id_tour, id_employee) and then match the following: work.starttime < plan.endtime and plan.starttime < work.endtime This will join then and give a row whenever there is an overlap between the two records. Ofcourse, if there is no overlap the record doesn't appear at all, but an outer join might work here... Ah no, IIRC postgresql only supports outer joins on merge join able conditions so that wont work. But the above should be a start... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Wed, Jan 26, 2005 at 11:22:48AM +0100, Thomas Chille wrote: > In the meanwhile i gave every record an position counter (per tour and > employee), derrived from begin_time, per trigger, and merged them in > this way: How are you generating the position counter? A sequence? If you multiply by some factor it might eliminate the need to re-number everything. > SELECT * FROM work LEFT JOIN plan USING(id_tour, id_employee, counter) > UNION > SELECT * FROM work RIGHT JOIN plan USING(id_tour, id_employee, counter) > > Thats works for me but if the tourleader change one worktime record > the counters have to be recalculate and the order of the records will > change. > > This is hard to handle and i think i will do all merging > programmatically in the app or with an ppgsql function. > > > BTW, I've never seen the convention id_employee; people generally use > > employee_id. Is it more important to know that you're talking about an > > ID or that you're talking about an employee? Just food for thought. > > I'm not the father of this strange naming convention :) I hate getting stuck with other people's stupidity. :) -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"