Thread: I need help creating a query
I need a little help creating a query, I have two tables, worker and position, for simplicity sake worker only has its ID and its name, position has the ID of the worker, the name of his position, a date, and his salary/
worker: worker_id, name
position: position_id, worker_id, position, startdate, salary
If I perfom a query joining both tables, I can obtain all the workers and the positions the've had.
SELECT name, startdate, position, salary FROM worker JOIN position USING(worker_id);
worker1 | 2001-01-01 | boss | 999999
worker2 | 2001-01-01 | cleaning | 100
worker2 | 2006-04-01 | programmer | 20000
worker2 | 2006-07-04 | management | 25000
so far so good, now I need to obtain all the workers only with the position they had on a given date.
if I wanted to know the positions on '2006-05-01' it would return
worker1 | 2001-01-01 | boss | 999999
worker2 | 2006-04-01 | programmer | 20000
Maybe I only need some insight here, maybe it should be done with 2 queries, maybe it should be done with functions, I don't know.
Thanks
The query date supplied should be applied against start date and then take the minimum record from that set.
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Sergio Duran
Sent: Thursday, July 13, 2006 12:20 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] I need help creating a query
Hello,
I need a little help creating a query, I have two tables, worker and position, for simplicity sake worker only has its ID and its name, position has the ID of the worker, the name of his position, a date, and his salary/
worker: worker_id, name
position: position_id, worker_id, position, startdate, salary
If I perfom a query joining both tables, I can obtain all the workers and the positions the've had.
SELECT name, startdate, position, salary FROM worker JOIN position USING(worker_id);
worker1 | 2001-01-01 | boss | 999999
worker2 | 2001-01-01 | cleaning | 100
worker2 | 2006-04-01 | programmer | 20000
worker2 | 2006-07-04 | management | 25000
so far so good, now I need to obtain all the workers only with the position they had on a given date.
if I wanted to know the positions on '2006-05-01' it would return
worker1 | 2001-01-01 | boss | 999999
worker2 | 2006-04-01 | programmer | 20000
Maybe I only need some insight here, maybe it should be done with 2 queries, maybe it should be done with functions, I don't know.
Thanks
> > worker: worker_id, name > position: position_id, worker_id, position, startdate, salary > > If I perfom a query joining both tables, I can obtain all the workers and > the positions the've had. > > SELECT name, startdate, position, salary FROM worker JOIN position > USING(worker_id); > worker1 | 2001-01-01 | boss | 999999 > worker2 | 2001-01-01 | cleaning | 100 > worker2 | 2006-04-01 | programmer | 20000 > worker2 | 2006-07-04 | management | 25000 > > so far so good, now I need to obtain all the workers only with the position > they had on a given date. > if I wanted to know the positions on '2006-05-01' it would return > worker1 | 2001-01-01 | boss | 999999 > worker2 | 2006-04-01 | programmer | 20000 > This is just a quick guess. I am not sure if the logic is correct but it could be a starting point. select P2.worker_id, P2.pdate, P1.position, P1.salary from position as P1 join (select worker_id, max(startdate) as pdate from position where startdate <= '2006-05-01' group by worker_id, position_id) as P2 on (P1.worker_id = P2.worker_id) and (P1.startdate = P2.pdate) ; Regards, Richard Broersma Jr.
My current solution is to write the nested query on the field list, like
SELECT worker.*,
(select salary
FROM position where worker_id=worker.worker_id
and fecha<='2006-04-01' LIMIT 1) as salary
FROM worker;
but I can only return 1 column from that subquery and repeating the same subquery for each column needed (position, date and salary) seems a little too much, if I write a procedure would postgres would optimize the access?
>
> worker: worker_id, name
> position: position_id, worker_id, position, startdate, salary
>
> If I perfom a query joining both tables, I can obtain all the workers and
> the positions the've had.
>
> SELECT name, startdate, position, salary FROM worker JOIN position
> USING(worker_id);
> worker1 | 2001-01-01 | boss | 999999
> worker2 | 2001-01-01 | cleaning | 100
> worker2 | 2006-04-01 | programmer | 20000
> worker2 | 2006-07-04 | management | 25000
>
> so far so good, now I need to obtain all the workers only with the position
> they had on a given date.
> if I wanted to know the positions on '2006-05-01' it would return
> worker1 | 2001-01-01 | boss | 999999
> worker2 | 2006-04-01 | programmer | 20000
>
This is just a quick guess. I am not sure if the logic is correct but it could be a starting
point.
select P2.worker_id, P2.pdate, P1.position, P1.salary
from position as P1
join
(select worker_id, max(startdate) as pdate
from position
where startdate <= '2006-05-01'
group by worker_id, position_id) as P2
on (P1.worker_id = P2.worker_id) and (P1.startdate = P2.pdate)
;
Regards,
Richard Broersma Jr.
create table worker(
name varchar(50),
position varchar(50),
startdate date,
salary numeric(9,2));
insert into worker values ('Jon', 'boss', '2001-01-1', 1000.00);
insert into worker values ('Peter', 'cleaning', '2002-01-01', 100.00);
insert into worker values ('Peter', 'programming', '2004-01-01', 300.00);
insert into worker values ('Peter', 'management', '2006-01-01', 500.00);
select * from worker;
name | position | startdate | salary
-------+-------------+------------+---------
Jon | boss | 2001-01-01 | 1000.00
Peter | cleaning | 2002-01-01 | 100.00
Peter | programming | 2004-01-01 | 300.00
Peter | management | 2006-01-01 | 500.00
I want to group by name, order by date desc and show the first grouped salary, maybe I should write an aggregate function that saves the first value and ignores the next ones. Is there already an aggregate function that does this? I havent written any aggregate functions yet, can anybody spare some pointers?
Nice, Richard, but you use max(startdate), how about the salary? i cant use max(salary) how about if he got a pay cut?
My current solution is to write the nested query on the field list, like
SELECT worker.*,
(select salary
FROM position where worker_id=worker.worker_id
and fecha<='2006-04-01' LIMIT 1) as salary
FROM worker;
but I can only return 1 column from that subquery and repeating the same subquery for each column needed (position, date and salary) seems a little too much, if I write a procedure would postgres would optimize the access?On 7/13/06, Richard Broersma Jr < rabroersma@yahoo.com> wrote:>
> worker: worker_id, name
> position: position_id, worker_id, position, startdate, salary
>
> If I perfom a query joining both tables, I can obtain all the workers and
> the positions the've had.
>
> SELECT name, startdate, position, salary FROM worker JOIN position
> USING(worker_id);
> worker1 | 2001-01-01 | boss | 999999
> worker2 | 2001-01-01 | cleaning | 100
> worker2 | 2006-04-01 | programmer | 20000
> worker2 | 2006-07-04 | management | 25000
>
> so far so good, now I need to obtain all the workers only with the position
> they had on a given date.
> if I wanted to know the positions on '2006-05-01' it would return
> worker1 | 2001-01-01 | boss | 999999
> worker2 | 2006-04-01 | programmer | 20000
>
This is just a quick guess. I am not sure if the logic is correct but it could be a starting
point.
select P2.worker_id, P2.pdate, P1.position, P1.salary
from position as P1
join
(select worker_id, max(startdate) as pdate
from position
where startdate <= '2006-05-01'
group by worker_id, position_id) as P2
on (P1.worker_id = P2.worker_id) and (P1.startdate = P2.pdate)
;
Regards,
Richard Broersma Jr.
How about if we make it simpler, only 1 table
create table worker(
name varchar(50),
position varchar(50),
startdate date,
salary numeric(9,2));
insert into worker values ('Jon', 'boss', '2001-01-1', 1000.00);
insert into worker values ('Peter', 'cleaning', '2002-01-01', 100.00);
insert into worker values ('Peter', 'programming', '2004-01-01', 300.00);
insert into worker values ('Peter', 'management', '2006-01-01', 500.00);
select * from worker;
name | position | startdate | salary
-------+-------------+------------+---------
Jon | boss | 2001-01-01 | 1000.00
Peter | cleaning | 2002-01-01 | 100.00
Peter | programming | 2004-01-01 | 300.00
Peter | management | 2006-01-01 | 500.00
I want to group by name, order by date desc and show the first grouped salary, maybe I should write an aggregate function that saves the first value and ignores the next ones. Is there already an aggregate function that does this? I havent written any aggregate functions yet, can anybody spare some pointers?
--
Seeya...Q
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
_____ / Quinton Dolan - qdolan@gmail.com
__ __/ / / __/ / /
/ __ / _/ / / Gold Coast, QLD, Australia
__/ __/ __/ ____/ / - / Ph: +61 419 729 806
_______ /
_\
> create table worker( > name varchar(50), > position varchar(50), > startdate date, > salary numeric(9,2)); > insert into worker values ('Jon', 'boss', '2001-01-1', 1000.00); > insert into worker values ('Peter', 'cleaning', '2002-01-01', 100.00); > insert into worker values ('Peter', 'programming', '2004-01-01', 300.00); > insert into worker values ('Peter', 'management', '2006-01-01', 500.00); > select * from worker; > name | position | startdate | salary > -------+-------------+------------+--------- > Jon | boss | 2001-01-01 | 1000.00 > Peter | cleaning | 2002-01-01 | 100.00 > Peter | programming | 2004-01-01 | 300.00 > Peter | management | 2006-01-01 | 500.00 > > I want to group by name, order by date desc and show the first grouped > salary, maybe I should write an aggregate function that saves the first > value and ignores the next ones. Is there already an aggregate function that > does this? I havent written any aggregate functions yet, can anybody spare > some pointers? This query didn't give you the max salary. First, the subselect give your maximum start date for each employee the occurred before your given date '2006-05-01', regardless if they get a raise or a cut. Then we join the result of the sub-select to the main table to get the specific records that meet the criteria of the sub-select. select W2.name, W1.position, W2.pdate, w1.salary from worker as W1 join (select name, max(startdate) as pdate from worker where startdate <= '2005-01-01' group by name ) as W2 on (W1.name = W2.name) and (W1.startdate = W2.pdate) ; name | position | pdate | salary -------+-------------+------------+--------- Jon | boss | 2001-01-01 | 1000.00 Peter | programming | 2004-01-01 | 300.00 So with this query, we get what everyones salary would be on the date of '2005-01-01' regardless of raises or cuts. Regards, Richard Broersma Jr.
create or replace function first_accum(anyelement, anyelement) returns anyelement as $$
BEGIN
IF $1 IS NOT NULL THEN return $1;
ELSE return $2;
END IF;
END' language plpgsql;
then I created the aggregate
first_accum is basically the same thing as coalesce, but CREATE AGGREGATE wasn't allowing me to use coalesce.
now I can get the salaries and positions each worker had on a given date.
SELECT name, first(startdate), first(salary)
FROM worker
JOIN position ON position.worker_id=worker.worker_id
WHERE fecha<='2006-05-01'
ORDER BY fecha DESC;
I'd appreciate some feedback, I hope there's a better way to do this. (maybe without creating the plpgsql function but using an internal function)
On 7/14/06, Sergio Duran <sergioduran@gmail.com> wrote:
Nice, Richard, but you use max(startdate), how about the salary? i cant use max(salary) how about if he got a pay cut?
My current solution is to write the nested query on the field list, like
SELECT worker.*,
(select salary
FROM position where worker_id=worker.worker_id
and fecha<='2006-04-01' LIMIT 1) as salary
FROM worker;
but I can only return 1 column from that subquery and repeating the same subquery for each column needed (position, date and salary) seems a little too much, if I write a procedure would postgres would optimize the access?On 7/13/06, Richard Broersma Jr < rabroersma@yahoo.com> wrote:>
> worker: worker_id, name
> position: position_id, worker_id, position, startdate, salary
>
> If I perfom a query joining both tables, I can obtain all the workers and
> the positions the've had.
>
> SELECT name, startdate, position, salary FROM worker JOIN position
> USING(worker_id);
> worker1 | 2001-01-01 | boss | 999999
> worker2 | 2001-01-01 | cleaning | 100
> worker2 | 2006-04-01 | programmer | 20000
> worker2 | 2006-07-04 | management | 25000
>
> so far so good, now I need to obtain all the workers only with the position
> they had on a given date.
> if I wanted to know the positions on '2006-05-01' it would return
> worker1 | 2001-01-01 | boss | 999999
> worker2 | 2006-04-01 | programmer | 20000
>
This is just a quick guess. I am not sure if the logic is correct but it could be a starting
point.
select P2.worker_id, P2.pdate, P1.position, P1.salary
from position as P1
join
(select worker_id, max(startdate) as pdate
from position
where startdate <= '2006-05-01'
group by worker_id, position_id) as P2
on (P1.worker_id = P2.worker_id) and (P1.startdate = P2.pdate)
;
Regards,
Richard Broersma Jr.
----- Original Message ----- From: "Sergio Duran" <sergioduran@gmail.com> To: <pgsql-general@postgresql.org> Sent: Thursday, July 13, 2006 9:20 PM Subject: [GENERAL] I need help creating a query > Hello, > > I need a little help creating a query, I have two tables, worker and > position, for simplicity sake worker only has its ID and its name, position > has the ID of the worker, the name of his position, a date, and his salary/ > > worker: worker_id, name > position: position_id, worker_id, position, startdate, salary > > If I perfom a query joining both tables, I can obtain all the workers and > the positions the've had. > > SELECT name, startdate, position, salary FROM worker JOIN position > USING(worker_id); > worker1 | 2001-01-01 | boss | 999999 > worker2 | 2001-01-01 | cleaning | 100 > worker2 | 2006-04-01 | programmer | 20000 > worker2 | 2006-07-04 | management | 25000 > > so far so good, now I need to obtain all the workers only with the position > they had on a given date. > if I wanted to know the positions on '2006-05-01' it would return > worker1 | 2001-01-01 | boss | 999999 > worker2 | 2006-04-01 | programmer | 20000 > This should work: select distinct on(W.worker_id) W.name,P.position,P.salary from worker W,position P where P.worker_id=W.worker_id and 'SOME DATE' >= P.startdate order by W.worker_id,P.startdate Cheers Marcin
I think I'll stick with Marcin Mank's query for now, I'll also try to work further with Richard Broersma's query later.
Thank you guys, you were really helpful.
----- Original Message -----
From: "Sergio Duran" <sergioduran@gmail.com>
To: <pgsql-general@postgresql.org >
Sent: Thursday, July 13, 2006 9:20 PM
Subject: [GENERAL] I need help creating a query
> Hello,
>
> I need a little help creating a query, I have two tables, worker and
> position, for simplicity sake worker only has its ID and its name,
position
> has the ID of the worker, the name of his position, a date, and his
salary/
>
> worker: worker_id, name
> position: position_id, worker_id, position, startdate, salary
>
> If I perfom a query joining both tables, I can obtain all the workers and
> the positions the've had.
>
> SELECT name, startdate, position, salary FROM worker JOIN position
> USING(worker_id);
> worker1 | 2001-01-01 | boss | 999999
> worker2 | 2001-01-01 | cleaning | 100
> worker2 | 2006-04-01 | programmer | 20000
> worker2 | 2006-07-04 | management | 25000
>
> so far so good, now I need to obtain all the workers only with the
position
> they had on a given date.
> if I wanted to know the positions on '2006-05-01' it would return
> worker1 | 2001-01-01 | boss | 999999
> worker2 | 2006-04-01 | programmer | 20000
>
This should work:
select distinct on(W.worker_id) W.name,P.position,P.salary
from worker W,position P
where P.worker_id=W.worker_id
and 'SOME DATE' >= P.startdate
order by W.worker_id,P.startdate
Cheers
Marcin