Thread: Counting days ...
Hi all, A bit stuck here with something I know I can do with output / loops / filtering in the (web)application but want to do in SQL or within PostgreSQL. Simply said, count days of accommodation for a given time period. E.g. res_id 1, start_day 2008-01-25, end_day 2008-02-15, number of persons 6 res_id 2, start_day 2008-02-10, end_day 2008-02-15, number of persons 4 for the period from 2008-02-01 to 2008-02-29 these two rows would give a total of 15 days x 6 persons + 4 days x 5 persons = 110 days SELECT SUM( CASE WHEN res_start_day >= '2008-01-01' THEN (res_end_day - res_start_day) ELSE (res_end_day - (DATE '2008-01-01' - INTEGER '1')) END * group_size) AS days_in_period FROM product_res pr WHERE res_end_day >= '2008-01-01' AND res_end_day <= '2008-12-31'; days_in_period ---------------- 68383 (1 row) Ok, but a reservation can be of any nationality / country: SELECT count(country_id) FROM countries;count ------- 243 (1 row) Country_id is also stored in the product_res table. I would like to, or need to, get the total split into different nationalities, like: FI 12345 RU 9876 DE 4321 ... Anyone ? With very best regards, -- Aarni Ruuhimäki --- Burglars usually come in through your windows. ---
Aarni Ruuhimäki wrote: > Hi all, > > A bit stuck here with something I know I can do with output / loops / > filtering in the (web)application but want to do in SQL or within PostgreSQL. > > Simply said, count days of accommodation for a given time period. > > E.g. > > res_id 1, start_day 2008-01-25, end_day 2008-02-15, number of persons 6 > res_id 2, start_day 2008-02-10, end_day 2008-02-15, number of persons 4 > > for the period from 2008-02-01 to 2008-02-29 these two rows would give a total > of > > 15 days x 6 persons + 4 days x 5 persons = 110 days > > SELECT SUM( > CASE > WHEN res_start_day >= '2008-01-01' THEN > (res_end_day - res_start_day) > ELSE (res_end_day - (DATE '2008-01-01' - INTEGER '1')) > END > * group_size) AS days_in_period > FROM product_res pr > WHERE res_end_day >= '2008-01-01' AND res_end_day <= '2008-12-31'; > This appears fraught with off-by-one and other errors. For res_id 1 limited to the month of February you do indeed have 6 persons and 15 days = 90 person-days as you are including day 1 and day 15. If you use the same inclusive counting of days for res_id 2, you have 4 persons (don't know where 5 came from) and 6 days for 24 person-days. I'm making an assumption that you have reservations with arbitrary start and end dates (assumed to be inclusive of both start and end) along with group size and you want to see the person-days utilized within a specified period. First, to simply establish upper and lower bounds, date_larger/date_smaller seems a lot easier - ie. for February inclusive dates: select sum ( ((date_smaller(res_end_day, '2008-02-29'::date) - date_larger(res_start_day, '2008-02-01'::date))::int + 1) * group_size ) as person_days; > Country_id is also stored in the product_res table. > > I would like to, or need to, get the total split into different nationalities, > like: > > FI 12345 > RU 9876 > DE 4321 > ... > OK. select country_id, sum ( ((date_smaller(res_end_day, '2008-02-29'::date) - date_larger(res_start_day, '2008-02-01'::date))::int + 1) * group_size ) as person_days group by country_id; Add where-clauses to either for efficiency. Cheers, Steve
This was superfast, thank you ! On Thursday 13 March 2008 20:58, Steve Crawford wrote: > Aarni Ruuhimäki wrote: > > res_id 2, start_day 2008-02-10, end_day 2008-02-15, number of persons 4 > > > > If you use the same inclusive counting of days for res_id 2, you have 4 > persons (don't know where 5 came from) and 6 days for 24 person-days. > Sorry, my bad, I meant 5 days x 4 persons = 20 person-days. From 10 to 15 it is 5 'days' ? Arrival 10th and departure 15th makes 5 'nights' or 'days', whatever one might call it, statistical accommodation units. Likewise, res_id x, start_day some time before Feb 08, end_day 2008-02-16 for a statistical period Feb 08 makes 16 units ? > > First, to simply establish upper and lower bounds, > date_larger/date_smaller seems a lot easier - ie. for February inclusive > dates: > > select > sum ( > ((date_smaller(res_end_day, '2008-02-29'::date) > - date_larger(res_start_day, '2008-02-01'::date))::int + 1) * group_size > ) as person_days; > > > Country_id is also stored in the product_res table. > > > > I would like to, or need to, get the total split into different > > nationalities, like: > > > > FI 12345 > > RU 9876 > > DE 4321 > > ... > > OK. > > select > country_id, > sum ( > ((date_smaller(res_end_day, '2008-02-29'::date) > - date_larger(res_start_day, '2008-02-01'::date))::int + 1) * group_size > ) as person_days > group by country_id; Ok, thanks, I'll dig into this. > > Add where-clauses to either for efficiency. AND region_id = x<if>IsDefined(company_id), AND company_id = x</if> <if>IsDefined(product_id), AND product_id = x</if> > > Cheers, > Steve days_in_period | country ----------------+-------------------- 5519 | Unknown 16 | Germany 18 | Estonia 3061| Russian Federation 1491 | Suomi 20 | Ukraine (6 rows) Getting there. Cheers to all you amazingly helpful folks out there, -- Aarni Ruuhimäki --- Burglars usually come in through your windows. ---
> Sorry, my bad, I meant 5 days x 4 persons = 20 person-days. From 10 to 15 it > is 5 'days' ? Arrival 10th and departure 15th makes 5 'nights' or 'days', > whatever one might call it, statistical accommodation units. > > Likewise, res_id x, start_day some time before Feb 08, end_day 2008-02-16 for > a statistical period Feb 08 makes 16 units ? > > >> First, to simply establish upper and lower bounds, >> date_larger/date_smaller seems a lot easier - ie. for February inclusive >> dates: >> >> select >> sum ( >> ((date_smaller(res_end_day, '2008-02-29'::date) >> - date_larger(res_start_day, '2008-02-01'::date))::int + 1) * group_size >> ) as person_days; >> >> Check my work, but I think the sum part of the query simply becomes: sum ( ( date_smaller(res_end_day, '2008-02-29'::date) - date_larger(res_start_day, '2008-01-31'::date) ) * group_size ) Basically remove the "+1" so we don't include both start and end dates but move the start base back one day so anyone starting prior to Feb 1 gets the extra day added. Cheers, Steve
> > Check my work, but I think the sum part of the query simply becomes: > > sum ( > ( > date_smaller(res_end_day, '2008-02-29'::date) - > date_larger(res_start_day, '2008-01-31'::date) > ) * group_size > ) > > Basically remove the "+1" so we don't include both start and end dates > but move the start base back one day so anyone starting prior to Feb 1 > gets the extra day added. > > Cheers, > Steve Thanks Steve, I'm not sure if I quite grasped this. It gives a bit funny results: SELECT sum ((date_smaller(res_end_day, '2007-12-31'::date) - date_larger(res_start_day, '2006-12-31'::date)) * group_size) AS days_in_period, c.country_name AS country FROM product_res pr LEFT JOIN countries c ON pr.country_id = c.country_id WHERE group_id = 1 AND res_end_day >= '2007-01-01' AND res_end_day <= '2008-12-31' group by pr.country_id, c.country_name;days_in_period | country ----------------+-------------------- -441137 | -30 | Germany -28 | Estonia 60 | Bulgaria -25003 | Russian Federation -207670 | Suomi 256 | Ukraine -6566 | Latvia -280 | United States -1889 | Switzerland 114 | Lithuania 36 | Norway -66 | Sweden 170 | Kazakhstan 72 | Belarus (15 rows) Anyway, I have to rethink and elaborate the query. I know that it will usually be on a monthly or yearly basis, but a reservation can actually be any of the following in relation to the given (arbitrary) period: 1. start_day before period_start, end_day = period_start 2. start_day before period_start, end_day in period 3. start_day before period_start, end_day = period_end 4. start_day = period_start, end_day in period 5. start_day in period, end_day in period 6. start_day = period_start, end_day = period_end 7. start_day in period, end_day = period_end 8. start_day in period, end_day after period_end 9. start_day = period_start, end_day = period_end 10 start_day before period_start, end_day after period_end Hmm ... Best regards, -- Aarni Ruuhimäki --- Burglars usually come in through your windows. ---
Aarni Ruuhimäki wrote: >> Check my work, but I think the sum part of the query simply becomes: >> >> sum ( >> ( >> date_smaller(res_end_day, '2008-02-29'::date) - >> date_larger(res_start_day, '2008-01-31'::date) >> ) * group_size >> ) >> >> Basically remove the "+1" so we don't include both start and end dates >> but move the start base back one day so anyone starting prior to Feb 1 >> gets the extra day added. >> >> Cheers, >> Steve > > Thanks Steve, > > I'm not sure if I quite grasped this. It gives a bit funny results: > > SELECT sum ((date_smaller(res_end_day, '2007-12-31'::date) - > date_larger(res_start_day, '2006-12-31'::date)) * group_size) AS > days_in_period, > c.country_name AS country > FROM product_res pr > LEFT JOIN countries c ON pr.country_id = c.country_id > WHERE group_id = 1 AND res_end_day >= '2007-01-01' AND res_end_day <= > '2008-12-31' group by pr.country_id, c.country_name; > days_in_period | country > ----------------+-------------------- > -441137 | > -30 | Germany > -28 | Estonia > 60 | Bulgaria > -25003 | Russian Federation > -207670 | Suomi > 256 | Ukraine > -6566 | Latvia > -280 | United States > -1889 | Switzerland > 114 | Lithuania > 36 | Norway > -66 | Sweden > 170 | Kazakhstan > 72 | Belarus > (15 rows) > > Anyway, I have to rethink and elaborate the query. I know that it will usually > be on a monthly or yearly basis, but a reservation can actually be any of the > following in relation to the given (arbitrary) period: > > 1. start_day before period_start, end_day = period_start > 2. start_day before period_start, end_day in period > 3. start_day before period_start, end_day = period_end > 4. start_day = period_start, end_day in period > 5. start_day in period, end_day in period > 6. start_day = period_start, end_day = period_end > 7. start_day in period, end_day = period_end > 8. start_day in period, end_day after period_end > 9. start_day = period_start, end_day = period_end > 10 start_day before period_start, end_day after period_end > > Hmm ... > > Best regards, #6 and #9 are the same. You missed these: a start_day before period_start, end_day before period_start b start_day = period_start, end_day = period_start c start_day = period_start, end_day after period_end d start_day = period_end, end_day = period_end e start_day = period_end, end_day after period_end f start_day after period_end, end_day after period_end Granted, a & f should not match where clause; but then groups 10,c,e don't meet your where clause either. Your where clause should probably be: WHERE group_id = 1 AND (res_start_day >= '2007-01-01' AND res_end_day <= '2008-12-31') Are you sure that your database does not have any rows where start_day is after end_day? These rows could certainly skew results. I would suggest that you identify a few rows that meet each of these conditions. Change the where clause to select rows in one group at a time. You might consider using a unique row identifier in where clause during these tests to make sure you are processing the rows you think you are. When all test cases work properly; then run your generalized query again.
Frank Bax wrote: > Aarni Ruuhimäki wrote: >> Anyway, I have to rethink and elaborate the query. I know that it will >> usually be on a monthly or yearly basis, but a reservation can >> actually be any of the following in relation to the given (arbitrary) >> period: >> >> 1. start_day before period_start, end_day = period_start >> 2. start_day before period_start, end_day in period >> 3. start_day before period_start, end_day = period_end >> 4. start_day = period_start, end_day in period >> 5. start_day in period, end_day in period >> 6. start_day = period_start, end_day = period_end >> 7. start_day in period, end_day = period_end >> 8. start_day in period, end_day after period_end >> 9. start_day = period_start, end_day = period_end >> 10 start_day before period_start, end_day after period_end >> > > > #6 and #9 are the same. You missed these: > > a start_day before period_start, end_day before period_start > b start_day = period_start, end_day = period_start > c start_day = period_start, end_day after period_end > d start_day = period_end, end_day = period_end > e start_day = period_end, end_day after period_end > f start_day after period_end, end_day after period_end > > Granted, a & f should not match where clause; but then groups 10,c,e > don't meet your where clause either. Your where clause should probably be: > > WHERE group_id = 1 AND (res_start_day >= '2007-01-01' AND res_end_day <= > '2008-12-31') > > Are you sure that your database does not have any rows where start_day > is after end_day? These rows could certainly skew results. > > I would suggest that you identify a few rows that meet each of these > conditions. Change the where clause to select rows in one group at a > time. You might consider using a unique row identifier in where clause > during these tests to make sure you are processing the rows you think > you are. When all test cases work properly; then run your generalized > query again. > Change 10,c,e to 8,10,c,e - Group 8 also does not meet your initial WHERE clause. My suggestion for WHERE clause also does not work. This might work better (although it still could be wrong): WHERE group_id = 1 AND (res_start_day BETWEEN '2007-01-01' AND '2008-12-31' OR res_end_day BETWEEN '2007-01-01' AND '2008-12-31') In case I still have it wrong, try each test group separately and you'll soon find out if the WHERE clause is correct or not.
Thanks Frank, Top and between posting ... On Friday 14 March 2008 15:58, Frank Bax wrote: > Frank Bax wrote: > > Aarni Ruuhimäki wrote: > >> Anyway, I have to rethink and elaborate the query. I know that it will > >> usually be on a monthly or yearly basis, but a reservation can > >> actually be any of the following in relation to the given (arbitrary) > >> period: > >> > >> 1. start_day before period_start, end_day = period_start > >> 2. start_day before period_start, end_day in period > >> 3. start_day before period_start, end_day = period_end > >> 4. start_day = period_start, end_day in period > >> 5. start_day in period, end_day in period > >> 6. start_day = period_start, end_day = period_end > >> 7. start_day in period, end_day = period_end > >> 8. start_day in period, end_day after period_end > >> 9. start_day = period_start, end_day = period_end > >> 10 start_day before period_start, end_day after period_end > > > > #6 and #9 are the same. You missed these: Whoops, 9 should be c. > > > > a start_day before period_start, end_day before period_start This I don't have to care about as it is not in the period we are looking at. > > b start_day = period_start, end_day = period_start Is zero days/nights, ignored. Not even possible to insert in the application. end_day must be greater than start_day. > > c start_day = period_start, end_day after period_end Yes. Number 9 now. > > d start_day = period_end, end_day = period_end Is zero days, ignored. like b. > > e start_day = period_end, end_day after period_end Is outside the period. Day changes / the night starts at midnight, so this would go in the 'next' period. Like number 1 comes in this period as one day. > > f start_day after period_end, end_day after period_end This is also outside the period we are looking at. > > > > Granted, a & f should not match where clause; but then groups 10,c,e > > don't meet your where clause either. Your where clause should probably > > be: > > > > WHERE group_id = 1 AND (res_start_day >= '2007-01-01' AND res_end_day <= > > '2008-12-31') > > > > Are you sure that your database does not have any rows where start_day > > is after end_day? These rows could certainly skew results. Yes, the application does not allow this. SELECT res_id FROM product_res WHERE res_start_day > res_end_day;res_id -------- (0 rows) > > > > I would suggest that you identify a few rows that meet each of these > > conditions. Change the where clause to select rows in one group at a > > time. You might consider using a unique row identifier in where clause > > during these tests to make sure you are processing the rows you think > > you are. When all test cases work properly; then run your generalized > > query again. > > Change 10,c,e to 8,10,c,e - Group 8 also does not meet your initial > WHERE clause. My suggestion for WHERE clause also does not work. This > might work better (although it still could be wrong): > > WHERE group_id = 1 AND (res_start_day BETWEEN '2007-01-01' AND > '2008-12-31' OR res_end_day BETWEEN '2007-01-01' AND '2008-12-31') > > In case I still have it wrong, try each test group separately and you'll > soon find out if the WHERE clause is correct or not. I think I need more ORs in the WHERE clause to find all res_ids I want to count according to the 10 rules. Ie. if one or more days of a reservation is 'inside' the given period. Testing ... Best regards, -- Aarni Ruuhimäki --- Burglars usually come in through your windows. ---
Aarni Ruuhimäki wrote: > > Thanks Steve, > > I'm not sure if I quite grasped this. It gives a bit funny results: > > SELECT sum ((date_smaller(res_end_day, '2007-12-31'::date) - > date_larger(res_start_day, '2006-12-31'::date)) * group_size) AS > days_in_period, > c.country_name AS country > FROM product_res pr > LEFT JOIN countries c ON pr.country_id = c.country_id > WHERE group_id = 1 AND res_end_day >= '2007-01-01' AND res_end_day <= > '2008-12-31' group by pr.country_id, c.country_name; > days_in_period | country > ----------------+-------------------- > -441137 | > -30 | Germany > -28 | Estonia > ... I see one error in my logic. It doesn't account for the situation where res_end_day is prior to the start of the period you are viewing. You can fix this by limiting records with the appropriate where-clause or by wrapping the date_smaller inside a date_larger (and vice-versa) to ensure that all dates stay inside the desired period. Or you can fix it by using an appropriate where-clause. Yours appears broken - I think you want res_end_day >2006-12-31 (or >=2007-01-01 - I prefer mine as you can use the same date in multiple places in the query) which is what you have. But I think you want the end of period to be limited to res_start_day <=2007-12-31. IOW, if your *end* date is *before* the period of interest or your *start* date is *after* the period of interest, skip the record. My guess is that you have records with res_start_day > 2007-12-31. After applying the larger and smaller functions, this will end up with a res_end_day of 2007-12-31 giving an end_day < start_day. (I'm presuming you have appropriate constraints to prevent end_day from being earlier than start_day. If not, check for that and add the constraints.) Cheers, Steve
Aarni Ruuhimäki wrote: > Thanks Frank, > >>> a start_day before period_start, end_day before period_start > > This I don't have to care about as it is not in the period we are looking at. > >>> b start_day = period_start, end_day = period_start > > Is zero days/nights, ignored. Not even possible to insert in the application. > end_day must be greater than start_day. You should still consider rows that are "out of range" or "zero nights" in your test cases to make sure your report processes them correctly.
On Friday 14 March 2008 18:09, Frank Bax wrote: > > > > Is zero days/nights, ignored. Not even possible to insert in the > > application. end_day must be greater than start_day. > > You should still consider rows that are "out of range" or "zero nights" > in your test cases to make sure your report processes them correctly. For the WHERE clause for Jan 08 (will be $date1 and $date2), getting there. 1. start_day before period_start, end_day = period_start #1 2. start_day before period_start, end_day in period #1 3. start_day before period_start, end_day = period_end #1 4. start_day = period_start, end_day in period #2 5. start_day in period, end_day in period #3 6. start_day = period_start, end_day = period_end #2 7. start_day in period, end_day = period_end #3 8. start_day in period, end_day after period_end #3 9. start_day = period_start, end_day after period_end #2 10. start_day before period_start, end_day after period_end #1 SELECT res_id, to_char(res_start_day, 'DD.MM.YYYY'), to_char(res_end_day, 'DD.MM.YYYY') FROM product_res WHERE group_id = 1 AND res_start_day < '2008-01-01' AND res_end_day >= '2008-01-01' # covers 1,2,3,10 OR group_id = 1 AND res_start_day = '2008-01-01' AND res_end_day >= '2008-01-01' # covers 4,6,9 OR group_id = 1 AND res_start_day >= '2008-01-01' AND res_start_day < '2008-01-31' AND res_end_day >= '2008-01-01'; # covers 5,7,8 (499 rows) not yet summing up or grouping by. But is this getting too heavy ? I have three more (optional) parameters to pass into the query, which narrow down the result. All values are stored also in the product_res table. 1. Area/region ID from dropdown, populated by areas that have products 2. Company ID from dropdown, dynamically populated according to the optional area selection with companies that have products in the selected area 3. Product ID from dropdown, dynamically populated by the optional company selection with the selected company's products in the selected area So the WHERE clause would go like: group_id = 1 AND res_start_day < '$date1' AND res_end_day >= '$date1' [AND region_id = $region_id] [AND company_id = $company_id] [AND product_id = $product_id] OR group_id = 1 AND res_start_day = '$date1' AND res_end_day >= '$date1' [AND region_id = $region_id] [AND company_id = $company_id] [AND product_id = $product_id] OR group_id = 1 AND res_start_day >= '$date1' AND res_start_day < '$date2' AND res_end_day >= '$date1' [AND region_id = $region_id] [AND company_id = $company_id] [AND product_id = $product_id] Cheerio, -- Aarni Ruuhimäki --- Burglars usually come in through your windows. ---
Aarni Ruuhimäki wrote: > So the WHERE clause would go like: > > group_id = 1 AND res_start_day < '$date1' AND res_end_day >= '$date1' [AND > region_id = $region_id] [AND company_id = $company_id] [AND product_id = > $product_id] > OR > group_id = 1 AND res_start_day = '$date1' AND res_end_day >= '$date1' [AND > region_id = $region_id] [AND company_id = $company_id] [AND product_id = > $product_id] > OR > group_id = 1 AND res_start_day >= '$date1' AND res_start_day < '$date2' AND > res_end_day >= '$date1' [AND region_id = $region_id] [AND company_id = > $company_id] [AND product_id = $product_id] This is smaller; and should be equivalent: group_id = 1 AND ( res_start_day <= '$date1' AND res_end_day >= '$date1' OR res_start_day >= '$date1' AND res_start_day < '$date2' ) [AND region_id = $region_id] [AND company_id = $company_id] [AND product_id = $product_id]
On Saturday 15 March 2008 18:05, Frank Bax wrote: > This is smaller; and should be equivalent: > > group_id = 1 AND > ( res_start_day <= '$date1' AND res_end_day >= '$date1' > OR > res_start_day >= '$date1' AND res_start_day < '$date2' ) > [AND region_id = $region_id] > [AND company_id = $company_id] > [AND product_id = $product_id] You're quite right. Here's the whole thing in cfml as it is now. Explain analyze for year 2007 gives runtime 49.675 ms, which is not bad I think. The total page rendering time to browser is 950 ms. SELECT SUM( CASE WHEN res_start_day < '#date1#' AND res_end_day = '#date1#' THEN (res_end_day - (DATE '#date1#' - INTEGER '1')) WHEN res_start_day < '#date1#' AND res_end_day >= '#date1#' AND res_end_day <= '#date2#' THEN (res_end_day - (DATE '#date1#' - INTEGER '1')) WHEN res_start_day < '#date1#' AND res_end_day = '#date2#' THEN (res_end_day - (DATE '#date1#' - INTEGER '1')) WHEN res_start_day = '#date1#' AND res_end_day >= '#date1#' AND res_end_day <= '#date2#' THEN (res_end_day - '#date1#') WHEN res_start_day >= '#date1#' AND res_start_day <= '#date2#' AND res_end_day >= '#date1#' AND res_end_day <= '#date2#' THEN (res_end_day - res_start_day) WHEN res_start_day >= '#date1#' AND res_start_day <= '#date2#' AND res_end_day > '#date2#' THEN ('#date2#' - res_start_day) WHEN res_start_day = '#date1#' AND res_end_day > '#date2#' THEN ('#date2#' - res_start_day) WHEN res_start_day < '#date1#' AND res_end_day > '#date2#' THEN ('#date2#' - (DATE '#date1#' - INTEGER '1')) END * group_size) AS person_days_in_period, c.country_name AS country FROM product_res pr LEFT JOIN countries c ON pr.country_id = c.country_id WHERE group_id = 1 AND group_size > 0 AND res_start_day <= '#date1#' AND res_end_day >= '#date1#' AND res_end_day > res_start_day <cfif form.region GT 0>AND region_id = #form.region#</cfif> <cfif form.company GT 0>AND company_id = #form.companyt#</cfif> <cfif form.product GT 0>AND product_id = #form.product#</cfif> AND res_cancelled IS NOT TRUE OR group_id = 1 AND group_size > 0 AND res_start_day >= '#date1#' AND res_start_day < '#date2#' AND res_end_day >= '#date1#' AND res_end_day > res_start_day <cfif form.region GT 0>AND region_id = #form.region#</cfif> <cfif form.company GT 0>AND company_id = #form.companyt#</cfif> <cfif form.product GT 0>AND product_id = #form.product#</cfif> AND res_cancelled IS NOT TRUE group by pr.country_id, c.country_name; Thank you guys again, -- Aarni Ruuhimäki --- Burglars usually come in through your windows. ---