Thread: Time differences between rows, not columns?

Time differences between rows, not columns?

From

Date:

30 August 2005, 06:45:34

Using postgresSQL 8.0 for windows, running on windows 2003 server 16gb
RAM, 3Ghz dual p4.
Language: Java - JDBC postgres driver: postgresql-8.0-310.jdbc3.jar

I have an application with 3 tables (in this context that is)

Table bills
bill_id NOT NULL serial ... And other columns

Table bill_status
bill_id (references bills.bill_id) statusid int4 (references bill_statuslookup.statusid) statustime datetime

Table bill_statuslookup
statusid serial not null statusname varchar(255)

The application basically tracks a workflow of bills flowing fromone
department to another.
Everytime the bill moves one step, an entry is made into the bill_status
table.

Eg.
Bills table
-------------
Bill_id otherfield1

1 ........

Bill_status table:
---------------------
Bill_id statusid statustime

1 10 2005-04-04 00:34:31
1 20 2005-04-05 00:55:00

Bill_statuslookup table:
-------------------------
Statusid Statusname

10 submitted
20 received
30 rejected
40 accepted
..
..

Now my problem is this:

1. Find the time taken for each bill to reach from status 10 to status
20 , given the time of status 10 should be between t1 and t2.

Eg I want to know how much time it took for a bill to be accepted after
it was submitted (criteria: submitted between yesterday and today)

2. I want to know how many bills took <7 days, how many tok 7-15 days,
how many took >15 days etc.

The status is a lookup table because the workflow constantly changes,
and I can't have submitted_on, recd_on accepted_on times etc in the main
bills table as columns because there are way too many statuses in the
life of a bill (read invoice).

Hope its clear as to what I'm looking for.
Any help is greatly appreciated!!

Regards,
Amit

Re: Time differences between rows, not columns?

From

"Anthony Molinaro"

Date:

30 August 2005, 08:29:00

Amit, You say "I want to know how much time it took for a bill to be
accepted after it was submitted"

So, do you want between 10 and 40, not 10 and 20?
I assume you meant 10 and 40.
Ok, there's a few approaches to your questions, first
To get "how much time it took for a bill to be accepted after
it was submitted":

select bill_id,      (max(case when status_id = 40                then statustime end) -        max(case when status_id
=10                then statustime end))     from bill_status where status_id in ( 10,40 )group by bill_id; 

that will give you the amt of time (simple date arithmetic, so, in days)
from submission to being accepted.

You also, want a simple distribution, you can use the technique above
But with SUM:

select sum(case when diff < 7 then 1 else 0 end) as lt_7,      sum(case when diff between 7 and 15 then 1 else 0 end)
as
btw_7_15,      sum(case when diff > 15 then 1 else 0 end) as gt_15 from (
select bill_id,      (max(case when status_id = 40                then statustime end) -        max(case when status_id
=10                then statustime end)) as diff     from bill_status where status_id in ( 10,40 )group by bill_id
)x; 


I have a recipe in my upcoming book ('The SQL Cookbook' by O'Reilly)
So, I'd like to know if what I suggested works out for you.
Obviously, you'll need whatever tweaks to make it perfect for your
system, the feedback I am concerned about is the technique.

hope that helps, Anthony

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of
Amit_Wadhwa@Dell.com
Sent: Tuesday, August 30, 2005 5:45 AM
To: pgsql-sql@postgresql.org
Subject: [SQL] Time differences between rows, not columns?
Importance: High

Using postgresSQL 8.0 for windows, running on windows 2003 server 16gb
RAM, 3Ghz dual p4.
Language: Java - JDBC postgres driver: postgresql-8.0-310.jdbc3.jar


I have an application with 3 tables (in this context that is)


Table bills
 bill_id NOT NULL serial ... And other columns


Table bill_status
 bill_id (references bills.bill_id) statusid int4 (references bill_statuslookup.statusid) statustime datetime


Table bill_statuslookup
 statusid serial not null statusname varchar(255)




The application basically tracks a workflow of bills flowing fromone
department to another.
Everytime the bill moves one step, an entry is made into the bill_status
table.

Eg.
Bills table
-------------
Bill_id         otherfield1

1            ........


Bill_status table:
---------------------
Bill_id     statusid    statustime

1             10         2005-04-04 00:34:31
1             20         2005-04-05 00:55:00


Bill_statuslookup table:
-------------------------
Statusid          Statusname

10                submitted
20                received
30                rejected
40                accepted
..
..



Now my problem is this:

1. Find the time taken for each bill to reach from status 10 to status
20 , given the time of status 10 should be between t1 and t2.

Eg I want to know how much time it took for a bill to be accepted after
it was submitted (criteria: submitted between yesterday and today)

2. I want to know how many bills took <7 days, how many tok 7-15 days,
how many took >15 days etc.


The status is a lookup table because the workflow constantly changes,
and I can't have submitted_on, recd_on accepted_on times etc in the main
bills table as columns because there are way too many statuses in the
life of a bill (read invoice).


Hope its clear as to what I'm looking for.
Any help is greatly appreciated!!

Regards,
Amit

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to      choose an index scan if your joining column's
datatypesdo not      match

Re: Time differences between rows, not columns?

From

Date:

30 August 2005, 09:09:03

Thanks!

<snip>Amit,You say "I want to know how much time it took for a bill to
be accepted after it was submitted" So, do you want between 10 and 40,
not 10 and 20? I assume you meant 10 and 40... </snip>
...Could be any status to any status, I wanted to generalize the concept
for future usage.


I tried the below:
1.

select bill_id,(max(case when statusid = 40 then statustime end) -
max(case when statusid = 10 then statustime end)) from
ap.vits_statusupdate where statusid in ( 10,40 ) group by bill_id;
Took 16 ms...

But meanwhile I also tried:
select a.bill_id, b.statustime-a.statustime from ap.vits_statusupdate a,
ap.vits_statusupdate b where a.statusid = 10 and b.statusid = 40 and
a.bill_id = b.bill_id

Took 15 ms

The second query is taking approximately 1 ms lesser time.. (have tried
about 30 times in pgadmin)

Got about 10 records in my test resultset.

Surprisingly, if I add the order by bill_id cluase at the end of both
queries, then your query performs 1 ms faster than mine, Don't know why,
Is there any performance issues if this is run over 10000+ rows?
Which (yours or mine?) do you think will be faster, and more efficient?

2.

select sum(case when diff < 7 then 1 else 0 end) as lt_7,sum(case when
diff between 7 and 15 then 1 else 0 end) as btw_7_15,sum(case when diff
> 15 then 1 else 0 end) as gt_15 from (select bill_id,(max(case when statusid = 40 then statustime end)
- max(case when statusid = 10 then statustime end)) as diff  from
ap.vits_statusupdate where statusid in ( 10,40 ) group by bill_id) x;

Tried this one too with both the queries in the sub-select (yours and
mine), and works perfectly well for me, with accurate answers, and
exactly what I'm looking for! shows 0 ms (??)..

Next question.. Which one should I use? The joined query? Or the max()
query?

Thanks a lot for your time!

Regards,
Amit



-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Anthony Molinaro
Sent: Tuesday, August 30, 2005 4:59 PM
To: Wadhwa, Amit; pgsql-sql@postgresql.org
Subject: Re: [SQL] Time differences between rows, not columns?

Amit, You say "I want to know how much time it took for a bill to be
accepted after it was submitted"

So, do you want between 10 and 40, not 10 and 20?
I assume you meant 10 and 40.
Ok, there's a few approaches to your questions, first
To get "how much time it took for a bill to be accepted after
it was submitted":

select bill_id,      (max(case when status_id = 40                then statustime end) -        max(case when status_id
=10                then statustime end))     from bill_status where status_id in ( 10,40 )group by bill_id; 

that will give you the amt of time (simple date arithmetic, so, in days)
from submission to being accepted.

You also, want a simple distribution, you can use the technique above
But with SUM:

select sum(case when diff < 7 then 1 else 0 end) as lt_7,      sum(case when diff between 7 and 15 then 1 else 0 end)
as
btw_7_15,      sum(case when diff > 15 then 1 else 0 end) as gt_15 from (
select bill_id,      (max(case when status_id = 40                then statustime end) -        max(case when status_id
=10                then statustime end)) as diff     from bill_status where status_id in ( 10,40 )group by bill_id
)x; 


I have a recipe in my upcoming book ('The SQL Cookbook' by O'Reilly)
So, I'd like to know if what I suggested works out for you.
Obviously, you'll need whatever tweaks to make it perfect for your
system, the feedback I am concerned about is the technique.

hope that helps, Anthony

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of
Amit_Wadhwa@Dell.com
Sent: Tuesday, August 30, 2005 5:45 AM
To: pgsql-sql@postgresql.org
Subject: [SQL] Time differences between rows, not columns?
Importance: High

Using postgresSQL 8.0 for windows, running on windows 2003 server 16gb
RAM, 3Ghz dual p4.
Language: Java - JDBC postgres driver: postgresql-8.0-310.jdbc3.jar


I have an application with 3 tables (in this context that is)


Table bills
 bill_id NOT NULL serial ... And other columns


Table bill_status
 bill_id (references bills.bill_id) statusid int4 (references bill_statuslookup.statusid) statustime datetime


Table bill_statuslookup
 statusid serial not null statusname varchar(255)




The application basically tracks a workflow of bills flowing fromone
department to another.
Everytime the bill moves one step, an entry is made into the bill_status
table.

Eg.
Bills table
-------------
Bill_id         otherfield1

1            ........


Bill_status table:
---------------------
Bill_id     statusid    statustime

1             10         2005-04-04 00:34:31
1             20         2005-04-05 00:55:00


Bill_statuslookup table:
-------------------------
Statusid          Statusname

10                submitted
20                received
30                rejected
40                accepted
..
..



Now my problem is this:

1. Find the time taken for each bill to reach from status 10 to status
20 , given the time of status 10 should be between t1 and t2.

Eg I want to know how much time it took for a bill to be accepted after
it was submitted (criteria: submitted between yesterday and today)

2. I want to know how many bills took <7 days, how many tok 7-15 days,
how many took >15 days etc.


The status is a lookup table because the workflow constantly changes,
and I can't have submitted_on, recd_on accepted_on times etc in the main
bills table as columns because there are way too many statuses in the
life of a bill (read invoice).


Hope its clear as to what I'm looking for.
Any help is greatly appreciated!!

Regards,
Amit

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to      choose an index scan if your joining column's
datatypesdo not      match 

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: Time differences between rows, not columns?

From

"Anthony Molinaro"

Date:

30 August 2005, 09:33:57

Amit, Glad it worked out :)

As for the performance, lemme say that while I'm a huge fan
Of postgres, my experience in regards to optimization in a production
Environment is limited to Oracle and DB2.

In oracle for example, if you have an index on a numeric field
and perform min/max on it, there's a very cool algorithm
that allows very fast retrieval of that data (basically the index
scan is optimized cuz oracle knows you want only the extreme values).
So, on oracle I'd use the MAX and SUM versions I suggested because
There's a specific optimization to facilitate it and the query, since
It accesses the table only once, is doing less logical reads.

So, at the very least, in the version I suggested, since you are not
Joining, you are performing less logical reads, which is always good.
But, ultimately, you have to test and see what works for you.
The self join might be just fine.
10k rows should not be a problem for either method,
assuming you have an index on statusid and bill_id.

Give it a spin and update this thread.

Regards, Anthony

-----Original Message-----
From: Amit_Wadhwa@Dell.com [mailto:Amit_Wadhwa@Dell.com]
Sent: Tuesday, August 30, 2005 8:08 AM
To: Anthony Molinaro; pgsql-sql@postgresql.org
Subject: RE: [SQL] Time differences between rows, not columns?
Importance: High

Thanks!

<snip>Amit,You say "I want to know how much time it took for a bill to
be accepted after it was submitted" So, do you want between 10 and 40,
not 10 and 20? I assume you meant 10 and 40... </snip>
...Could be any status to any status, I wanted to generalize the concept
for future usage.


I tried the below:
1.

select bill_id,(max(case when statusid = 40 then statustime end) -
max(case when statusid = 10 then statustime end)) from
ap.vits_statusupdate where statusid in ( 10,40 ) group by bill_id;
Took 16 ms...

But meanwhile I also tried:
select a.bill_id, b.statustime-a.statustime from ap.vits_statusupdate a,
ap.vits_statusupdate b where a.statusid = 10 and b.statusid = 40 and
a.bill_id = b.bill_id

Took 15 ms

The second query is taking approximately 1 ms lesser time.. (have tried
about 30 times in pgadmin)

Got about 10 records in my test resultset.

Surprisingly, if I add the order by bill_id cluase at the end of both
queries, then your query performs 1 ms faster than mine, Don't know why,
Is there any performance issues if this is run over 10000+ rows?
Which (yours or mine?) do you think will be faster, and more efficient?

2.

select sum(case when diff < 7 then 1 else 0 end) as lt_7,sum(case when
diff between 7 and 15 then 1 else 0 end) as btw_7_15,sum(case when diff
> 15 then 1 else 0 end) as gt_15 from (select bill_id,(max(case when statusid = 40 then statustime end)
- max(case when statusid = 10 then statustime end)) as diff  from
ap.vits_statusupdate where statusid in ( 10,40 ) group by bill_id) x;

Tried this one too with both the queries in the sub-select (yours and
mine), and works perfectly well for me, with accurate answers, and
exactly what I'm looking for! shows 0 ms (??)..

Next question.. Which one should I use? The joined query? Or the max()
query?

Thanks a lot for your time!

Regards,
Amit



-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Anthony Molinaro
Sent: Tuesday, August 30, 2005 4:59 PM
To: Wadhwa, Amit; pgsql-sql@postgresql.org
Subject: Re: [SQL] Time differences between rows, not columns?

Amit, You say "I want to know how much time it took for a bill to be
accepted after it was submitted"

So, do you want between 10 and 40, not 10 and 20?
I assume you meant 10 and 40.
Ok, there's a few approaches to your questions, first
To get "how much time it took for a bill to be accepted after
it was submitted":

select bill_id,      (max(case when status_id = 40                then statustime end) -        max(case when status_id
=10                then statustime end))     from bill_status where status_id in ( 10,40 )group by bill_id; 

that will give you the amt of time (simple date arithmetic, so, in days)
from submission to being accepted.

You also, want a simple distribution, you can use the technique above
But with SUM:

select sum(case when diff < 7 then 1 else 0 end) as lt_7,      sum(case when diff between 7 and 15 then 1 else 0 end)
as
btw_7_15,      sum(case when diff > 15 then 1 else 0 end) as gt_15 from (
select bill_id,      (max(case when status_id = 40                then statustime end) -        max(case when status_id
=10                then statustime end)) as diff     from bill_status where status_id in ( 10,40 )group by bill_id
)x; 


I have a recipe in my upcoming book ('The SQL Cookbook' by O'Reilly)
So, I'd like to know if what I suggested works out for you.
Obviously, you'll need whatever tweaks to make it perfect for your
system, the feedback I am concerned about is the technique.

hope that helps, Anthony

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of
Amit_Wadhwa@Dell.com
Sent: Tuesday, August 30, 2005 5:45 AM
To: pgsql-sql@postgresql.org
Subject: [SQL] Time differences between rows, not columns?
Importance: High

Using postgresSQL 8.0 for windows, running on windows 2003 server 16gb
RAM, 3Ghz dual p4.
Language: Java - JDBC postgres driver: postgresql-8.0-310.jdbc3.jar


I have an application with 3 tables (in this context that is)


Table bills
 bill_id NOT NULL serial ... And other columns


Table bill_status
 bill_id (references bills.bill_id) statusid int4 (references bill_statuslookup.statusid) statustime datetime


Table bill_statuslookup
 statusid serial not null statusname varchar(255)




The application basically tracks a workflow of bills flowing fromone
department to another.
Everytime the bill moves one step, an entry is made into the bill_status
table.

Eg.
Bills table
-------------
Bill_id         otherfield1

1            ........


Bill_status table:
---------------------
Bill_id     statusid    statustime

1             10         2005-04-04 00:34:31
1             20         2005-04-05 00:55:00


Bill_statuslookup table:
-------------------------
Statusid          Statusname

10                submitted
20                received
30                rejected
40                accepted
..
..



Now my problem is this:

1. Find the time taken for each bill to reach from status 10 to status
20 , given the time of status 10 should be between t1 and t2.

Eg I want to know how much time it took for a bill to be accepted after
it was submitted (criteria: submitted between yesterday and today)

2. I want to know how many bills took <7 days, how many tok 7-15 days,
how many took >15 days etc.


The status is a lookup table because the workflow constantly changes,
and I can't have submitted_on, recd_on accepted_on times etc in the main
bills table as columns because there are way too many statuses in the
life of a bill (read invoice).


Hope its clear as to what I'm looking for.
Any help is greatly appreciated!!

Regards,
Amit

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to      choose an index scan if your joining column's
datatypesdo not      match 

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings