Re: Time differences between rows, not columns? - Mailing list pgsql-sql
From | |
---|---|
Subject | Re: Time differences between rows, not columns? |
Date | |
Msg-id | 4538E1203665624F97A8494BCF2716E5C07E2B@blrx2kmbgl102.blr.amer.dell.com Whole thread Raw |
In response to | Time differences between rows, not columns? (<Amit_Wadhwa@Dell.com>) |
List | pgsql-sql |
Thanks! <snip>Amit,You say "I want to know how much time it took for a bill to be accepted after it was submitted" So, do you want between 10 and 40, not 10 and 20? I assume you meant 10 and 40... </snip> ...Could be any status to any status, I wanted to generalize the concept for future usage. I tried the below: 1. select bill_id,(max(case when statusid = 40 then statustime end) - max(case when statusid = 10 then statustime end)) from ap.vits_statusupdate where statusid in ( 10,40 ) group by bill_id; Took 16 ms... But meanwhile I also tried: select a.bill_id, b.statustime-a.statustime from ap.vits_statusupdate a, ap.vits_statusupdate b where a.statusid = 10 and b.statusid = 40 and a.bill_id = b.bill_id Took 15 ms The second query is taking approximately 1 ms lesser time.. (have tried about 30 times in pgadmin) Got about 10 records in my test resultset. Surprisingly, if I add the order by bill_id cluase at the end of both queries, then your query performs 1 ms faster than mine, Don't know why, Is there any performance issues if this is run over 10000+ rows? Which (yours or mine?) do you think will be faster, and more efficient? 2. select sum(case when diff < 7 then 1 else 0 end) as lt_7,sum(case when diff between 7 and 15 then 1 else 0 end) as btw_7_15,sum(case when diff > 15 then 1 else 0 end) as gt_15 from (select bill_id,(max(case when statusid = 40 then statustime end) - max(case when statusid = 10 then statustime end)) as diff from ap.vits_statusupdate where statusid in ( 10,40 ) group by bill_id) x; Tried this one too with both the queries in the sub-select (yours and mine), and works perfectly well for me, with accurate answers, and exactly what I'm looking for! shows 0 ms (??).. Next question.. Which one should I use? The joined query? Or the max() query? Thanks a lot for your time! Regards, Amit -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Anthony Molinaro Sent: Tuesday, August 30, 2005 4:59 PM To: Wadhwa, Amit; pgsql-sql@postgresql.org Subject: Re: [SQL] Time differences between rows, not columns? Amit, You say "I want to know how much time it took for a bill to be accepted after it was submitted" So, do you want between 10 and 40, not 10 and 20? I assume you meant 10 and 40. Ok, there's a few approaches to your questions, first To get "how much time it took for a bill to be accepted after it was submitted": select bill_id, (max(case when status_id = 40 then statustime end) - max(case when status_id =10 then statustime end)) from bill_status where status_id in ( 10,40 )group by bill_id; that will give you the amt of time (simple date arithmetic, so, in days) from submission to being accepted. You also, want a simple distribution, you can use the technique above But with SUM: select sum(case when diff < 7 then 1 else 0 end) as lt_7, sum(case when diff between 7 and 15 then 1 else 0 end) as btw_7_15, sum(case when diff > 15 then 1 else 0 end) as gt_15 from ( select bill_id, (max(case when status_id = 40 then statustime end) - max(case when status_id =10 then statustime end)) as diff from bill_status where status_id in ( 10,40 )group by bill_id )x; I have a recipe in my upcoming book ('The SQL Cookbook' by O'Reilly) So, I'd like to know if what I suggested works out for you. Obviously, you'll need whatever tweaks to make it perfect for your system, the feedback I am concerned about is the technique. hope that helps, Anthony -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Amit_Wadhwa@Dell.com Sent: Tuesday, August 30, 2005 5:45 AM To: pgsql-sql@postgresql.org Subject: [SQL] Time differences between rows, not columns? Importance: High Using postgresSQL 8.0 for windows, running on windows 2003 server 16gb RAM, 3Ghz dual p4. Language: Java - JDBC postgres driver: postgresql-8.0-310.jdbc3.jar I have an application with 3 tables (in this context that is) Table bills bill_id NOT NULL serial ... And other columns Table bill_status bill_id (references bills.bill_id) statusid int4 (references bill_statuslookup.statusid) statustime datetime Table bill_statuslookup statusid serial not null statusname varchar(255) The application basically tracks a workflow of bills flowing fromone department to another. Everytime the bill moves one step, an entry is made into the bill_status table. Eg. Bills table ------------- Bill_id otherfield1 1 ........ Bill_status table: --------------------- Bill_id statusid statustime 1 10 2005-04-04 00:34:31 1 20 2005-04-05 00:55:00 Bill_statuslookup table: ------------------------- Statusid Statusname 10 submitted 20 received 30 rejected 40 accepted .. .. Now my problem is this: 1. Find the time taken for each bill to reach from status 10 to status 20 , given the time of status 10 should be between t1 and t2. Eg I want to know how much time it took for a bill to be accepted after it was submitted (criteria: submitted between yesterday and today) 2. I want to know how many bills took <7 days, how many tok 7-15 days, how many took >15 days etc. The status is a lookup table because the workflow constantly changes, and I can't have submitted_on, recd_on accepted_on times etc in the main bills table as columns because there are way too many statuses in the life of a bill (read invoice). Hope its clear as to what I'm looking for. Any help is greatly appreciated!! Regards, Amit ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypesdo not match ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings