Thread: [SQL] Please advice on query optimization

[SQL] Please advice on query optimization

From
Muhannad Shubita
Date:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id  -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:


Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11


I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In  &  Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill


is there a better way to do this? 


Thanks.

Re: [SQL] Please advice on query optimization

From
Samed YILDIRIM
Date:
Hi Muhannad,
 
Did you try using MIN and MAX function? I guess that following query solves your problem.
 
select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;
 
Best regards.
 
 


İyi çalışmalar.
Samed YILDIRIM



20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id  -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:


Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11


I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In  &  Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill


is there a better way to do this? 


Thanks.

Re: [SQL] Please advice on query optimization

From
Muhannad Shubita
Date:
Thanks Samed,

one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:

Day  Employee Total-Hours  First-Sign-In   device-id-of-first-sign-in Last-Sign-In    device-id-of-last-sign-in

20/4 emp1 4 8:22 202cb962ac5.. 3:25                     152d234b70..


On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:
Hi Muhannad,
 
Did you try using MIN and MAX function? I guess that following query solves your problem.
 
select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;
 
Best regards.
 
 


İyi çalışmalar.
Samed YILDIRIM



20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id  -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:


Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11


I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In  &  Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill


is there a better way to do this? 


Thanks.



--
Regards,
Muhannad

Re: [SQL] Please advice on query optimization

From
Samed YILDIRIM
Date:
Hi Muhannad,
 
first_value and last_value functions can be used for these purpose. But query should be rewrited as below. Also for this case I'm not sure which one is better, sub query or window functions.
 
SELECT DISTINCT employee_id, day, sum(total_hours) over w1, min(sign_in) OVER w1 as first_sign_in, max(sign_in) OVER w1 as last_sign_in, first_value(device_id) OVER w2 as first_sign_in_device, first_value(device_id) OVER w3 as last_sign_in_device
FROM bioemployee
WHERE sign_in BETWEEN X and Y
WINDOW w1 as (PARTITION BY employee_id,day),w2 as (PARTITION BY employee_id,day ORDER BY sign_in ASC),w3 as (PARTITION BY employee_id,day ORDER BY sign_in DESC)
ORDER BY day;
 
Best regards.
 
İyi çalışmalar.
Samed YILDIRIM
 
20.04.2017, 20:35, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Thanks Samed,
 
one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:
 
Day  Employee Total-Hours  First-Sign-In   device-id-of-first-sign-in Last-Sign-In    device-id-of-last-sign-in
 
20/4 emp1 4 8:22 202cb962ac5.. 3:25                     152d234b70..
 
 
On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:
Hi Muhannad,
 
Did you try using MIN and MAX function? I guess that following query solves your problem.
 
select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;
 
Best regards.
 
 
 
 
İyi çalışmalar.
Samed YILDIRIM
 
 
 
20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,
 
I have a table with biometric info about employees (let's call it bioemployee):
 
id serial not null -- primary key
employee_id  -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER
 
I want to display details grouped by day & employee Id, for example:
 
 
Day Employee Total-Hours First-Sign-In Last-Sign-In
 
20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11
 
 
I created a pgsql function to get the details through the below query:
 
select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;
 
now the problem is with getting First-Sign-In  &  Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:
 
for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
 
return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP
 
but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill
 
 
is there a better way to do this? 
 
 
Thanks.
 
 
--
Regards,
Muhannad

Re: [SQL] Please advice on query optimization

From
"David G. Johnston"
Date:
Much easier to follow bottom-posting but I'll be consistent here.

It seems like you want a join of 4 sub-queries.

SELECT *
FROM unique_day_employee
LEFT JOIN aggregates_query USING (day, employee)
LEFT JOIN first_signin_query USING (day, employee)
LEFT JOIN last_signin_query USING (day, employee)

Where the definition of the first/last sign-in sub-queries are like

SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time ASC -- first
SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time DESC -- last

David J.


On Thu, Apr 20, 2017 at 10:35 AM, Muhannad Shubita <muhannadshubita@gmail.com> wrote:
Thanks Samed,

one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:

Day  Employee Total-Hours  First-Sign-In   device-id-of-first-sign-in Last-Sign-In    device-id-of-last-sign-in

20/4 emp1 4 8:22 202cb962ac5.. 3:25                     152d234b70..


On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:
Hi Muhannad,
 
Did you try using MIN and MAX function? I guess that following query solves your problem.
 
select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;
 
Best regards.
 
 


İyi çalışmalar.
Samed YILDIRIM



20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id  -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:


Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11


I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In  &  Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill


is there a better way to do this? 


Thanks.



--
Regards,
Muhannad

Re: [SQL] Please advice on query optimization

From
Muhannad Shubita
Date:
Thanks folks for your help, I will benchmark the suggested solutions & see what's the best fit here.

Awesome mailing list :)

On Thu, Apr 20, 2017 at 11:37 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
Much easier to follow bottom-posting but I'll be consistent here.

It seems like you want a join of 4 sub-queries.

SELECT *
FROM unique_day_employee
LEFT JOIN aggregates_query USING (day, employee)
LEFT JOIN first_signin_query USING (day, employee)
LEFT JOIN last_signin_query USING (day, employee)

Where the definition of the first/last sign-in sub-queries are like

SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time ASC -- first
SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time DESC -- last

David J.


On Thu, Apr 20, 2017 at 10:35 AM, Muhannad Shubita <muhannadshubita@gmail.com> wrote:
Thanks Samed,

one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:

Day  Employee Total-Hours  First-Sign-In   device-id-of-first-sign-in Last-Sign-In    device-id-of-last-sign-in

20/4 emp1 4 8:22 202cb962ac5.. 3:25                     152d234b70..


On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:
Hi Muhannad,
 
Did you try using MIN and MAX function? I guess that following query solves your problem.
 
select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;
 
Best regards.
 
 


İyi çalışmalar.
Samed YILDIRIM



20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id  -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:


Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11


I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In  &  Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill


is there a better way to do this? 


Thanks.



--
Regards,
Muhannad




--
Regards,
Muhannad

Re: [SQL] Please advice on query optimization

From
Hector Vass
Date:
or perhaps ...

select
  day,
  employee,
  sum(total_hours),
  max(case when r=1 then sign_in else null end) as first,
  max(case when rr=1 then sign_in else null end) as last
from (
  select
       *,
       row_number() over(partition by day,employee order by sign_in) as r,
       row_number() over(partition by day,employee order by sign_in desc) as rr
   from bioemployee
)x
group by day,employee



On 20 Apr 2017 9:09 p.m., Muhannad Shubita <muhannadshubita@gmail.com> wrote:
Thanks folks for your help, I will benchmark the suggested solutions & see what's the best fit here.

Awesome mailing list :)

On Thu, Apr 20, 2017 at 11:37 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
Much easier to follow bottom-posting but I'll be consistent here.

It seems like you want a join of 4 sub-queries.

SELECT *
FROM unique_day_employee
LEFT JOIN aggregates_query USING (day, employee)
LEFT JOIN first_signin_query USING (day, employee)
LEFT JOIN last_signin_query USING (day, employee)

Where the definition of the first/last sign-in sub-queries are like

SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time ASC -- first
SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time DESC -- last

David J.


On Thu, Apr 20, 2017 at 10:35 AM, Muhannad Shubita <muhannadshubita@gmail.com> wrote:
Thanks Samed,

one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:

Day  Employee Total-Hours  First-Sign-In   device-id-of-first-sign-in Last-Sign-In    device-id-of-last-sign-in

20/4 emp1 4 8:22 202cb962ac5.. 3:25                     152d234b70..


On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:
Hi Muhannad,
 
Did you try using MIN and MAX function? I guess that following query solves your problem.
 
select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;
 
Best regards.
 
 


İyi çalışmalar.
Samed YILDIRIM



20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id  -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:


Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11


I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In  &  Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill


is there a better way to do this? 


Thanks.



--
Regards,
Muhannad




--
Regards,
Muhannad