Home > mailing lists

Thread: [SQL] Please advice on query optimization

[SQL] Please advice on query optimization

From

Muhannad Shubita

Date:

20 April 2017, 19:43:54

Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key

employee_id -- foreign key references employee table

day char(8) -- YY-MM-DD

sign_in TIMESTAMP

total_hours INTEGER

I want to display details grouped by day & employee Id, for example:

Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25

21/4 emp1 7 9:00 4:11

21/4 emp2 2 11:00 01:11

I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name

from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In & Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP

--First-Sign-In

select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;

--Last-Sign-In

select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;

return next json_build_object(first_sign_in, last_sign_in, ..rest of details);

END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill

is there a better way to do this?

Thanks.

Re: [SQL] Please advice on query optimization

From

Samed YILDIRIM

Date:

20 April 2017, 20:18:26

Hi Muhannad,

Did you try using MIN and MAX function? I guess that following query solves your problem.

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in

from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

Best regards.

İyi çalışmalar.

Samed YILDIRIM

20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:

Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:

Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11

I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In & Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;

return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill

is there a better way to do this?

Thanks.

Re: [SQL] Please advice on query optimization

From

Muhannad Shubita

Date:

20 April 2017, 20:35:54

Thanks Samed,

one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:

Day Employee Total-Hours First-Sign-In device-id-of-first-sign-in Last-Sign-In device-id-of-last-sign-in

20/4 emp1 4 8:22 202cb962ac5.. 3:25 152d234b70..

On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:

Hi Muhannad,

Did you try using MIN and MAX function? I guess that following query solves your problem.

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

Best regards.

İyi çalışmalar.
Samed YILDIRIM

20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:

Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11

I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In & Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;

return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill

is there a better way to do this?

Thanks.

Regards,

Muhannad

Re: [SQL] Please advice on query optimization

From

Samed YILDIRIM

Date:

20 April 2017, 22:29:47

Hi Muhannad,

first_value and last_value functions can be used for these purpose. But query should be rewrited as below. Also for this case I'm not sure which one is better, sub query or window functions.

SELECT DISTINCT employee_id, day, sum(total_hours) over w1, min(sign_in) OVER w1 as first_sign_in, max(sign_in) OVER w1 as last_sign_in, first_value(device_id) OVER w2 as first_sign_in_device, first_value(device_id) OVER w3 as last_sign_in_device

FROM bioemployee

WHERE sign_in BETWEEN X and Y

WINDOW w1 as (PARTITION BY employee_id,day),w2 as (PARTITION BY employee_id,day ORDER BY sign_in ASC),w3 as (PARTITION BY employee_id,day ORDER BY sign_in DESC)

ORDER BY day;

Best regards.

İyi çalışmalar.

Samed YILDIRIM

20.04.2017, 20:35, "Muhannad Shubita" <muhannadshubita@gmail.com>:

Thanks Samed,

one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:

Day Employee Total-Hours First-Sign-In device-id-of-first-sign-in Last-Sign-In device-id-of-last-sign-in

20/4 emp1 4 8:22 202cb962ac5.. 3:25 152d234b70..

On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:
Hi Muhannad,

Did you try using MIN and MAX function? I guess that following query solves your problem.

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

Best regards.

İyi çalışmalar.
Samed YILDIRIM

20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:

Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11

I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In & Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;

return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill

is there a better way to do this?

Thanks.

--
Regards,
Muhannad

Re: [SQL] Please advice on query optimization

From

"David G. Johnston"

Date:

20 April 2017, 22:37:03

Much easier to follow bottom-posting but I'll be consistent here.

It seems like you want a join of 4 sub-queries.

SELECT *

FROM unique_day_employee

LEFT JOIN aggregates_query USING (day, employee)

LEFT JOIN first_signin_query USING (day, employee)

LEFT JOIN last_signin_query USING (day, employee)

Where the definition of the first/last sign-in sub-queries are like

SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time ASC -- first

SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time DESC -- last

David J.

On Thu, Apr 20, 2017 at 10:35 AM, Muhannad Shubita <muhannadshubita@gmail.com> wrote:

Thanks Samed,

one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:

Day Employee Total-Hours First-Sign-In device-id-of-first-sign-in Last-Sign-In device-id-of-last-sign-in

20/4 emp1 4 8:22 202cb962ac5.. 3:25 152d234b70..

On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:
Hi Muhannad,

Did you try using MIN and MAX function? I guess that following query solves your problem.

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

Best regards.

İyi çalışmalar.
Samed YILDIRIM

20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:

Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11

I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In & Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;

return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill

is there a better way to do this?

Thanks.

--
Regards,
Muhannad

Re: [SQL] Please advice on query optimization

From

Muhannad Shubita

Date:

20 April 2017, 23:07:35

Thanks folks for your help, I will benchmark the suggested solutions & see what's the best fit here.

Awesome mailing list :)

On Thu, Apr 20, 2017 at 11:37 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

Much easier to follow bottom-posting but I'll be consistent here.

It seems like you want a join of 4 sub-queries.

SELECT *
FROM unique_day_employee
LEFT JOIN aggregates_query USING (day, employee)
LEFT JOIN first_signin_query USING (day, employee)
LEFT JOIN last_signin_query USING (day, employee)

Where the definition of the first/last sign-in sub-queries are like

SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time ASC -- first
SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time DESC -- last

David J.

On Thu, Apr 20, 2017 at 10:35 AM, Muhannad Shubita <muhannadshubita@gmail.com> wrote:
Thanks Samed,

one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:

Day Employee Total-Hours First-Sign-In device-id-of-first-sign-in Last-Sign-In device-id-of-last-sign-in

20/4 emp1 4 8:22 202cb962ac5.. 3:25 152d234b70..

On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:
Hi Muhannad,

Did you try using MIN and MAX function? I guess that following query solves your problem.

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

Best regards.

İyi çalışmalar.
Samed YILDIRIM

20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:

Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11

I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In & Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;

return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill

is there a better way to do this?

Thanks.

--
Regards,
Muhannad

Regards,

Muhannad

Re: [SQL] Please advice on query optimization

From

Hector Vass

Date:

21 April 2017, 01:04:01

or perhaps ...

select

day,

employee,

sum(total_hours),

max(case when r=1 then sign_in else null end) as first,

max(case when rr=1 then sign_in else null end) as last

from (

select

row_number() over(partition by day,employee order by sign_in) as r,

row_number() over(partition by day,employee order by sign_in desc) as rr

from bioemployee

group by day,employee

On 20 Apr 2017 9:09 p.m., Muhannad Shubita <muhannadshubita@gmail.com> wrote:

Thanks folks for your help, I will benchmark the suggested solutions & see what's the best fit here.

Awesome mailing list :)

On Thu, Apr 20, 2017 at 11:37 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
Much easier to follow bottom-posting but I'll be consistent here.

It seems like you want a join of 4 sub-queries.

SELECT *
FROM unique_day_employee
LEFT JOIN aggregates_query USING (day, employee)
LEFT JOIN first_signin_query USING (day, employee)
LEFT JOIN last_signin_query USING (day, employee)

Where the definition of the first/last sign-in sub-queries are like

SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time ASC -- first
SELECT DISTINCT ON (day, employee) day, employee, time, device FROM tbl ORDER BY day, employee, time DESC -- last

David J.

On Thu, Apr 20, 2017 at 10:35 AM, Muhannad Shubita <muhannadshubita@gmail.com> wrote:
Thanks Samed,

one more question, what if I had other columns that cannot be used with an aggregate function (TEXT-based for example) but still needed to be paired with the first & last sign in? for instance, a randomly generated md5-ed ID by the sign in device that needs to be displayed as a reference:

Day Employee Total-Hours First-Sign-In device-id-of-first-sign-in Last-Sign-In device-id-of-last-sign-in

20/4 emp1 4 8:22 202cb962ac5.. 3:25 152d234b70..

On Thu, Apr 20, 2017 at 9:18 PM, Samed YILDIRIM <samed@reddoc.net> wrote:
Hi Muhannad,

Did you try using MIN and MAX function? I guess that following query solves your problem.

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name, min(sign_in) as first_sign_in, max(sign_in) as last_sign_in
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

Best regards.

İyi çalışmalar.
Samed YILDIRIM

20.04.2017, 19:56, "Muhannad Shubita" <muhannadshubita@gmail.com>:
Good day,

I have a table with biometric info about employees (let's call it bioemployee):

id serial not null -- primary key
employee_id -- foreign key references employee table
day char(8) -- YY-MM-DD
sign_in TIMESTAMP
total_hours INTEGER

I want to display details grouped by day & employee Id, for example:

Day Employee Total-Hours First-Sign-In Last-Sign-In

20/4 emp1 4 8:22 3:25
21/4 emp1 7 9:00 4:11
21/4 emp2 2 11:00 01:11

I created a pgsql function to get the details through the below query:

select employee_id, day, sum(total_hours) as total_hours, (select name from employee where id = employee_id) as emp_name
from bioemployee where sign_in BETWEEN X and Y GROUP BY employee_id, day ORDER BY day;

now the problem is with getting First-Sign-In & Last-Sign-In per group (employee & day), I have currently implemented it in a FOR loop:

for RECORD in query LOOP
--First-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in LIMIT 1;
--Last-Sign-In
select sign_in from bioemployee where employee_id = -- and day = -- and sign_in BETWEEN X and Y ORDER BY sign_in DESC LIMIT 1;
return next json_build_object(first_sign_in, last_sign_in, ..rest of details);
END LOOP

but If I had 100 employees over a span of 30 days, this would be 6000 queries inside the loop! which I am sure you would agree is an overkill

is there a better way to do this?

Thanks.

--
Regards,
Muhannad

--
Regards,
Muhannad