Thread: group by week (ww), plus -S performance
I thought I'd pass along a work-around I came up with for the limits in 'ww' support (7.0 final). Maybe this would be a useful example for the docs? They're a little lean on date/time examples, IMO. So is the new book. Task: Select a count of records from a table, grouped by the week of the record. The table is something like CREATE table EVENTS (event varchar(128) not null, stamp datetime default now()); And I want the output to use human-readable dates, like week | count ------------+------- 2000-03-06 | 4 2000-03-13 | 5 2000-03-20 | 3 My immediate solution was something like SELECT to_date(date_part('year',stamp)||'-'||date_part('week',stamp), 'yyyy-ww'),count(*) FROM EVENTS GROUP BY to_date; but to_date() doesn't seem to support 'ww' format for text-to-date translation (not documented, AFAIK). The solution I eventually found was SELECT to_date(date_part('year',stamp),'yyyy')-5+7*date_part('week',stamp) as week,count(*) FROM EVENTS GROUP BY week ORDER BY week; I haven't tested the '-5' kludge very extensively. It seems to work ok, for the dates I tested in 2000. I'm sure it would run into trouble with calendar-change weirdness pre-1900. I'd also love to hear any suggestions for performance improvements - it's cpu-bound on my system, and takes about 70 sec to run with 86,000 rows. The query plan is Aggregate (cost=9155.76..9584.66 rows=8578 width=20) -> Group (cost=9155.76..9370.21 rows=85780 width=20) -> Sort (cost=9155.76..9155.76 rows=85780 width=20) -> Seq Scan on events (cost=0.00..2126.80 rows=85780 width=20) The order-by clause doesn't seem to add much overhead - the query plan is the same with or without it. I'm running with "-i -N 64 -B 1024 -o '-F'", and I've tried up to '-S 8192' without seeing any noticeable improvement. At higher values, performance actually went down by almost 50% - something to do with shmem segment sizes? This is on Solaris 2.6, and I compiled PG7.0 using gcc 2.95. The -S does reduce disk I/O, but I think that's only about 5% of the work that's going on (90% user time). An index on 'stamp' doesn't seem to help, either. thanks, -- Mike
Michael Blakeley <mike@blakeley.com> writes: > SELECT to_date(date_part('year',stamp),'yyyy')-5+7*date_part('week',stamp) > as week,count(*) FROM EVENTS GROUP BY week ORDER BY week; Seems like the hard way. You are relying on an implicit conversion from the float8 output of date_part() to the text input to_date expects, which is kind of slow (and IMHO it's a bug in 7.0 that it will do such a conversion silently, anyhow). Better to use date_trunc to save the conversion step: SELECT date_trunc('year',stamp)-5+7*date_part('week',stamp) as week,count(*) FROM EVENTS GROUP BY week ORDER BY week; But the real problem is that this is going to use a start-of-week day that is offset five days from whatever day-of-the-week 1 January is. If you'd tried a few other years than 2000 you'd likely have been dissatisfied with the results... It seems like the right answer is that date_trunc('week',stamp) ought to do what you want, but it doesn't seem to be implemented. That's definitely a shortcoming --- anyone want to fix it? > The order-by clause doesn't seem to add much overhead - the query > plan is the same with or without it. Right, the GROUP BY requires a sort anyway, so the planner knows there's no need to sort again on the same key. regards, tom lane
At 12:54 AM -0400 5/29/2000, Tom Lane wrote: >Michael Blakeley <mike@blakeley.com> writes: >> SELECT to_date(date_part('year',stamp),'yyyy')-5+7*date_part('week',stamp) >> as week,count(*) FROM EVENTS GROUP BY week ORDER BY week; > >Seems like the hard way. You are relying on an implicit conversion from >the float8 output of date_part() to the text input to_date expects, >which is kind of slow (and IMHO it's a bug in 7.0 that it will do such >a conversion silently, anyhow). Better to use date_trunc to save the >conversion step: > >SELECT date_trunc('year',stamp)-5+7*date_part('week',stamp) > as week,count(*) FROM EVENTS GROUP BY week ORDER BY week; Thanks - that seems to be about 10% faster. I also got a 25% boost by recompiling PG7 with -O3. >But the real problem is that this is going to use a start-of-week day >that is offset five days from whatever day-of-the-week 1 January is. >If you'd tried a few other years than 2000 you'd likely have been >dissatisfied with the results... Here's something that works a bit better, at the expense of falling in line with 'dow' and starting each week on Sunday, which I don't mind. I'd only gone with Monday in the first place because that was what 'ww' seemed to do. $ cal 12 1999 December 1999 S M Tu W Th F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 $ cal 1 2000 January 2000 S M Tu W Th F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 $ cal 1 2001 January 2001 S M Tu W Th F S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 $ D="'2000-01-01'"; psql -c "SELECT $D::date-date_part('dow',$D::date);" ?column? ------------ 1999-12-26 (1 row) $ D="'2001-01-01'"; psql -c "SELECT $D::date-date_part('dow',$D::date);" ?column? ------------ 2000-12-31 (1 row) $ D="'2000-01-15'"; psql -c "SELECT $D::date-date_part('dow',$D::date);" ?column? ------------ 2000-01-09 (1 row) $ D="'2000-01-16'"; psql -c "SELECT $D::date-date_part('dow',$D::date);" ?column? ------------ 2000-01-16 (1 row) It's also about 35% faster, and hits me with a healthy "duh!" factor since it's so obvious and so much cleaner. The total time for my original query has been cut by more than half - from 70 sec to 28 sec. Thanks! >It seems like the right answer is that date_trunc('week',stamp) ought >to do what you want, but it doesn't seem to be implemented. That's >definitely a shortcoming --- anyone want to fix it? You know what they say - if you want the right answer, just post the wrong answer and wait to be corrected :-). If you can't find fault with the query above, maybe it could be plugged into date_trunc('week', stamp) as-is? I haven't even looked at the source code, myself.... -- Mike
> Thanks - that seems to be about 10% faster. I also got a 25% boost > by recompiling PG7 with -O3. Do you mean you got a 25% boost by recompiling with -O3 instead of -O2, or instead of -g? If going from -O2 to -O3 gives a 25% boost, time for me to recompile. I'm just now moving from MySQL to PostgreSQL, and I'm also totally changing the data structures into something much less efficient, so I need all the speed boosts I can get. Recompiling is much cheaper than buying a new computer. e
On Sun, 28 May 2000, Michael Blakeley wrote: > At 12:54 AM -0400 5/29/2000, Tom Lane wrote: > >Michael Blakeley <mike@blakeley.com> writes: > >> SELECT to_date(date_part('year',stamp),'yyyy')-5+7*date_part('week',stamp) > >> as week,count(*) FROM EVENTS GROUP BY week ORDER BY week; > > > >Seems like the hard way. You are relying on an implicit conversion from > >the float8 output of date_part() to the text input to_date expects, > >which is kind of slow (and IMHO it's a bug in 7.0 that it will do such > >a conversion silently, anyhow). Better to use date_trunc to save the > >conversion step: > > > >SELECT date_trunc('year',stamp)-5+7*date_part('week',stamp) > > as week,count(*) FROM EVENTS GROUP BY week ORDER BY week; > > Thanks - that seems to be about 10% faster. I also got a 25% boost by > recompiling PG7 with -O3. Well, support for 'WW' in 'to_date()' is just in my TODO. Probably will more faster select to_date( to_char(now(), 'YYYY-WW'), 'YYYY-WW'); than 3 operators and 2 functions in one query. Or best way will add support for 'start-date' per week / month / quarter to 'to_char()'. Hmm, I try it. But how format-pictires use for this? SWW / SMM / SQ ? IMHO, the postgreSQL will first DB with this feature :-) Karel