Thread: Temporal Units
I would like to store a temporal frequency as NUMERIC, without units, and have the application's front end (or middleware) transform the number to the appropriate interval name. I'm having difficulties figuring out how to do this. For example, an event might occur once per work shift, day, week, fortnight, or month. For comparison and checking purposes, it would be easier to store a single frequency (e.g., days), but have the users enter and display using other units (for example, a radiobox that presents the units as days or other times). Has this been done before? Or, do I need two columns (frequency and name) and do the conversion to a common time unit behind the scenes? Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On 4/28/07, Rich Shepard <rshepard@appl-ecosys.com> wrote: > I would like to store a temporal frequency as NUMERIC, without units, and > have the application's front end (or middleware) transform the number to the > appropriate interval name. I'm having difficulties figuring out how to do > this. This is a common enough problem. Three factors come to mind: (1) Can all your intervals be expressed in absolute time units, such as number of days? "Work shift" is a human concept whose length is defined by context. (2) When expressed as absolute time units, are all intervals valid for your data, or do you only permit subsets of the total set of possible intervals? In other words, if your user interface allows "1 week" today, but you remove this option in the future, is old data referring to this interval invalidated, or is that fine? (3) Do you need to refer to specific months or years? The length of these units vary according to month and leap year, and cannot be reliably encoded as "n days". The exception is when working with native PostgreSQL intervals; see below. If possible, I recommend dealing with absolute units and avoiding #2 and #3 altogether. PostgreSQL does have an interval data type that was designed for this very problem: create table intervals (name text, value interval); insert into intervals ('day', '1 day'::interval), ('week', '1 week'::interval), ('month', '1 month'::interval); Calculations on intervals are internally consistent with the Gregorian calendar system: # select current_date; 2007-04-29 00:00:00 # select current_date + '1 month'::interval; 2007-05-29 00:00:00 # select current_date + '1 month'::interval * 3; 2007-07-29 00:00:00 Based on this, you could create the table above as a lookup table for symbolic constants. Alexander.
On Sun, 29 Apr 2007, Alexander Staubo wrote: > This is a common enough problem. Three factors come to mind: > > (1) Can all your intervals be expressed in absolute time units, such as > number of days? "Work shift" is a human concept whose length is defined by > context. Alexander, If the requirements were in days, weeks, and months I could probably find a time unit that worked -- including the PostgreSQL-specific solution. However, when a regulatory agencies requires sampling once per shift, there may be 1, 2, or 3 shifts per day. I don't want to hard code the current solution because the company may add or subtract shifts depending on business conditions. I have a second column for 'interval_unit', but while that works it strikes me as less-than-elegant SQL. However, it might be the pragmatic answer, despite making it very difficult to do temporal arithmetic to see if required sampling was done and entered in the application. Thanks, Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Rich Shepard <rshepard@appl-ecosys.com> writes: > If the requirements were in days, weeks, and months I could probably find > a time unit that worked -- including the PostgreSQL-specific solution. What gives you the idea that type INTERVAL is Postgres-specific? It's in the SQL standard. regards, tom lane
On Sun, 29 Apr 2007, Tom Lane wrote: > What gives you the idea that type INTERVAL is Postgres-specific? It's in > the SQL standard. Tom, I know that and that was not to what I referred. Perhaps I mis-understood Alexander's reference to the internal postgres interval support as different from the SQL standard INTERVAL. If so, it's my mis-writing. Regulatory requirements are that monitoring is to be done 'once per shift,' 'daily,' 'weekly,' 'twice each month,' or 'monthly.' That does not mean the same day of the week or the same date each month. Even more vague is the requirement for monitoring after a storm event that results in 0.5 inch of rain or more. I've not dealt with this type of temporal situation before. I've recorded monitoring dates and have calculated intervals from them. But, now I want to design the tables so as to make it easy to determine compliance with the monitoring criteria. Perhaps I am in a deep rut and cannot see the obvious solution just over the rim. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Sun, Apr 29, 2007 at 07:43:52AM -0700, Rich Shepard wrote: > Alexander's reference to the internal postgres interval support as different > from the SQL standard INTERVAL. If so, it's my mis-writing. > > Regulatory requirements are that monitoring is to be done 'once per > shift,' 'daily,' 'weekly,' 'twice each month,' or 'monthly.' That does not > mean the same day of the week or the same date each month. Even more vague > is the requirement for monitoring after a storm event that results in 0.5 > inch of rain or more. Then I'm afraid you havn't indicated your requirements properly. All I can see is that the interval type does exactly what you want. It can store days, weeks, months or any combination thereof. You can multiply them and add them to dates and all works as expected. If this isn't what you want, you need to give some actual examples of what you want to store into the database and how you expect it to be presented to you again. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Sun, 29 Apr 2007, Martijn van Oosterhout wrote: > Then I'm afraid you havn't indicated your requirements properly. All I can > see is that the interval type does exactly what you want. It can store > days, weeks, months or any combination thereof. You can multiply them and > add them to dates and all works as expected. How does one define 'shift' with intervals? 0.33 DAY? Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Sun, 29 Apr 2007, Rich Shepard wrote: > How does one define 'shift' with intervals? 0.33 DAY? On further reflection, I understand how to make the interval 'day' work by comparing the current timestamp with the month and hour. If there's no record within the necessary range, a message is e-mailed to the client noting the deficiency. Thanks all, Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Sun, 29 Apr 2007 22:33:37 +0200, Rich Shepard <rshepard@appl-ecosys.com> wrote: > On Sun, 29 Apr 2007, Martijn van Oosterhout wrote: > >> Then I'm afraid you havn't indicated your requirements properly. All I >> can >> see is that the interval type does exactly what you want. It can store >> days, weeks, months or any combination thereof. You can multiply them >> and >> add them to dates and all works as expected. > > How does one define 'shift' with intervals? 0.33 DAY? Yeah, that's the problem. An Interval in the mathematical sense is a range (say, [1...2], or [monday 10AM ... thursday 10PM]) which means two known endpoints. INTERVAL in postgres is simply a convenient way to express a time difference in a way which is very helpful to solve practical problems (ie. today + '1 month' behaves as expected whatever the current month), but which might not be suited to your problem. I see your shifts as ranges expressed over a modulo something set : example, morning shift, day shift, night shift, are [ begin hour .. end hour ] modulo 24 hour, since they repeat every day. Work days are [monday .. friday] modulo 7 days. Postgres intervals can't express this, since they have no fixed beginning or end points, they are simply differences. So if you want to know how many times a thing has been monitored each month, maybe count(*) GROUP BY EXTRACT( month FROM monitoring_time ); same thing for week and weekdays, and more funky formulations will be needed for shifts...
Listmail wrote: > On Sun, 29 Apr 2007 22:33:37 +0200, Rich Shepard > <rshepard@appl-ecosys.com> wrote: > >> On Sun, 29 Apr 2007, Martijn van Oosterhout wrote: >> >>> Then I'm afraid you havn't indicated your requirements properly. All >>> I can >>> see is that the interval type does exactly what you want. It can store >>> days, weeks, months or any combination thereof. You can multiply >>> them and >>> add them to dates and all works as expected. >> >> How does one define 'shift' with intervals? 0.33 DAY? > If I'm following this correctly, then interval & extract timepart can be used to provide all the required functionality: If you know what hours your shifts (periods) cover, and you want to check that you have a value for that shift today (ignoring for now issues with shifts only on 5 day working weeks & holidays, ...) then you can do something like: if (select count(*) from table where extract day = today and extract hour (now) >= extract hour from (shift start timestamp) and extract hour <= extract hour from(shift start timestamp + interval shift length) ) =0 then a value is missing So for any check, you want to ensure you have no periods without a value, which can only be done at the end of each period. If you have a table defining each period, a cron job can run (for example) hourly, identifying any periods which ended in the last hour and did not have a value. Or periods about to end in the next "interval" which do not yet have a value, to prompt before the period ends. The trickier part is how to specify your periods, and which days/hours/months/etc are included. Each period needs to be defined by data which allows a start and a finish date/time expressed in a generic way which is relative to "now" to be determined. So for any "now" we can evaluate which periods are about to end or have just ended. Cheers, Brent Wood
On Mon, 30 Apr 2007, Brent Wood wrote: > If I'm following this correctly, then interval & extract timepart can be > used to provide all the required functionality: Thanks, Brent. Your suggestions complete the approach I was considering. There is no need for real-time response, to checking after each shift or day -- or other time period -- will be sufficient. I wonder if a workweek/holiday calendar table for PostgreSQL already exists. If not I need to track down the procedure for creating one as Joe Celko references such a calendar in his books. I think that any schema that has temporal components needs such a table. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Rich Shepard wrote: > I wonder if a workweek/holiday calendar table for PostgreSQL already > exists. If not I need to track down the procedure for creating one > as Joe > Celko references such a calendar in his books. I think that any > schema that > has temporal components needs such a table. There was a brief discussion of this just last week, with a few solutions suggested: http://archives.postgresql.org/pgsql-general/2007-04/msg01098.php - John D. Burger MITRE
On 4/28/07, Rich Shepard <rshepard@appl-ecosys.com> wrote: > I would like to store a temporal frequency as NUMERIC, without units, and > have the application's front end (or middleware) transform the number to the > appropriate interval name. I'm having difficulties figuring out how to do > this. This is a common enough problem. Three factors come to mind: (1) Can all your intervals be expressed in absolute time units, such as number of days? "Work shift" is a human concept whose length is defined by context. (2) When expressed as absolute time units, are all intervals valid for your data, or do you only permit subsets of the total set of possible intervals? In other words, if your user interface allows "1 week" today, but you remove this option in the future, is old data referring to this interval invalidated, or is that fine? (3) Do you need to refer to specific months or years? The length of these units vary according to month and leap year, and cannot be reliably encoded as "n days". The exception is when working with native PostgreSQL intervals; see below. If possible, I recommend dealing with absolute units and avoiding #2 and #3 altogether. PostgreSQL does have an interval data type that was designed for this very problem: create table intervals (name text, value interval); insert into intervals ('day', '1 day'::interval), ('week', '1 week'::interval), ('month', '1 month'::interval); Calculations on intervals are internally consistent with the Gregorian calendar system: # select current_date; 2007-04-29 00:00:00 # select current_date + '1 month'::interval; 2007-05-29 00:00:00 # select current_date + '1 month'::interval * 3; 2007-07-29 00:00:00 Based on this, you could create the table above as a lookup table for symbolic constants. Alexander.
I am not sure I see why it would be good to do this using SQL, but I do know that I have used a number of Perl packages for this sort of thing. When I have done this in the past, I'd do the date and time calculations in Perl and feed the result to whatever RDBMS I happen to be using (PostgreSQL, MySQL, MS SQL Server 2005). I suppose that if it is nessary to do it within an SQL script, resort could be made to functions that in turm use the Perl packages.
But a question: Why would any schema that includes temporal components need a calendar table?
I use temporal components all the time and have yet to need a calendar table. In fact, some of my database applications are multitemporal, keeping track of edits to data that correct or update data, so that during an audit script, one can determine what a decision maker knew at the time he made a decision. This is so that a decision that was bad, but based on good data can be distinguished from a decision that had been based on bad data, but which would have been a good decision had the data been correct. The first option warrants correction of the decision maker while the latter warrants examination of the data entry process.
I have found my Perl scripts adequate for those few instances where use of my tyemporal data depended on a calendar.
I am not arguing with you. I just want to know in what circumstances my schemas can be improved by a calendar table, and how it provides a benefit over my more usual Perl functions.
Cheers,
Ted
Rich Shepard <rshepard@appl-ecosys.com> wrote:
Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Mon, 30 Apr 2007, Brent Wood wrote:
> If I'm following this correctly, then interval & extract timepart can be
> used to provide all the required functionality:
Thanks, Brent. Your suggestions complete the approach I was considering.
There is no need for real-time response, to checking after each shift or day
-- or other time period -- will be sufficient.
I wonder if a workweek/holiday calendar table for PostgreSQL already
exists. If not I need to track down the procedure for creating one as Joe
Celko references such a calendar in his books. I think that any schema that
has temporal components needs such a table.
Rich
--
Richard B. Shepard, Ph.D. | The Environmental Permitting
Applied Ecosystem Services, Inc. | Accelerator(TM)
Voice: 503-667-4517 Fax: 503-667-8863
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
On Mon, 30 Apr 2007, John D. Burger wrote: > There was a brief discussion of this just last week, with a few solutions > suggested: > > http://archives.postgresql.org/pgsql-general/2007-04/msg01098.php John, That thread asked how to find business days between any two specified dates. I would like to create a calendar table that includes business days, holidays, Julianized dates, and other interesting tid-bits that are of value in a business application. Then dates can be looked up in the table to learn their attributes and the calculations don't need to be done each time. A Google search with the terms "sql calendar table" returns > 1.4 million hits. They're almost all SQL Server, T-SQL, and similar. I am really surprised at not finding a postgresql solution among all these (but I looked only at the first 40 hits). Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Mon, 30 Apr 2007, Ted Byers wrote: > I am not sure I see why it would be good to do this using SQL, but I do > know that I have used a number of Perl packages for this sort of thing. > I am not arguing with you. I just want to know in what circumstances my > schemas can be improved by a calendar table, and how it provides a > benefit over my more usual Perl functions. Ted, Having never used such a table -- or having written an application that had such a heavy use of temporal data rather than scientific data -- I have no idea in what circumstances your schemas might be improved with a calendar table. I suspect, however, that a SQL table lookup may well be quicker than running a script (or compiled function) in another language, and the table is available for use in multiple apps. Isn't it faster or more efficient to run SELECT queries with table lookups rather then use stored procedures? For this web-based application, the UI and communications between client and server are being written in Ruby (with Rails) while the report generation is written in Python using ReportLab. If most of the queries can be done with SQL, I think it will be much easier to maintain, modify, and expand. Could be wrong, of course. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Rich,
I would think that as an ecologist, you would have a better sense than most here of the kinds of things I'd be doing. After all, I am a mathematical ecologist by training and the majority of applications I have developed have been either for agricultural consultants or environmental consultants. As a very simplistic example of where I find the kind of auditing I mentioned being priceless, consider a decision support tool for which the input data involves concentrations of various contaminants in water leaving some facility (at this point we don't care what the facility is). While I would prefer that the input come from a laboratory information system, and fully automated, imagine that the data is recorded and input by some lab tech. as each test is completed. On April 1, 2006, there is a typo leading the decision maker to believe that the concentration of contaminant X is 1 mg/l, and that this is an error and the true value is 1 g/l. The decision maker may decide to do nothing because 1 mg/l is considered safe. But doing nothing, some bad environmental effect occurs, and it is discovered days after the decision was made that the data entered is in error. Who is to blame? Where does responsibility lay and what corrective action is needed to ensure the problem does not recurr? When there is an issue of accountability and responsibility involved in environmental issues, every scientific measurement taken becomes a candidate for the kind of multitemporal data I routinely use.
Now, for lookup tables vs functions, there is always a tradeoff even within a given programming language (I am most familiar with FORTRAN, C++ and Java). The tradeoff is between memory use (including how that may affect performance if the table is large relative to available memory) and the speed or size of the function. In the case of the date and time functions I typically use in my perl scripts, they are so short, I doubt the performance impact is significant or easily measurable.
On a side note, how do you like Ruby and Python? How would you compare them to the other options (such as C++, Java, Perl, &c.)?
Cheers,
Ted
Rich Shepard <rshepard@appl-ecosys.com> wrote:
Rich Shepard <rshepard@appl-ecosys.com> wrote:
On Mon, 30 Apr 2007, Ted Byers wrote:
> I am not sure I see why it would be good to do this using SQL, but I do
> know that I have used a number of Perl packages for this sort of thing.
> I am not arguing with you. I just want to know in what circumstances my
> schemas can be improved by a calendar table, and how it provides a
> benefit over my more usual Perl functions.
Ted,
Having never used such a table -- or having written an application that
had such a heavy use of temporal data rather than scientific data -- I have
no idea in what circumstances your schemas might be improved with a calendar
table.
I suspect, however, that a SQL table lookup may well be quicker than
running a script (or compiled function) in another language, and the table
is available for use in multiple apps. Isn't it faster or more efficient to
run SELECT queries with table lookups rather then use stored procedures?
For this web-based application, the UI and communications between client
and server are being written in Ruby (with Rails) while the report
generation is written in Python using ReportLab. If most of the queries can
be done with SQL, I think it will be much easier to maintain, modify, and
expand. Could be wrong, of course.
Rich
--
Richard B. Shepard, Ph.D. | The Environmental Permitting
Applied Ecosystem Services, Inc. | Accelerator(TM)
Voice: 503-667-4517 Fax: 503-667-8863
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
John D. Burger wrote: >> There was a brief discussion of this just last week, with a few solutions >> suggested: >> >> http://archives.postgresql.org/pgsql-general/2007-04/msg01098.php Rich Shepard wrote: > That thread asked how to find business days between any two specified > dates. I would like to create a calendar table that includes business days, > holidays, Julianized dates, and other interesting tid-bits that are of > value > in a business application. Then dates can be looked up in the table to > learn > their attributes and the calculations don't need to be done each time. The best solution I've encountered so far to this type of problem is to have a table of "days" with columns like isWeekday, isHoliday, julianDay, otherTidbit, ... Then you select or join the days within the interval of interest and factor out weekdays, or holidays, or whatever. -- Lew
On Wed, 2 May 2007, Lew wrote: > The best solution I've encountered so far to this type of problem is to > have a table of "days" with columns like isWeekday, isHoliday, julianDay, > otherTidbit, ... > > Then you select or join the days within the interval of interest and > factor out weekdays, or holidays, or whatever. Thank you, Lew. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863