Thread: timestamps, formatting, and internals
I'm trying to debug an intermittent problem I'm seeing in one of our rollup scripts. I'll try to summarize. A table has a measured_at field, of which I calculate another time value based on that field and a longitude value, called solar_noon, and I summarize min/max values grouped around this solarnoon. While I'm doing this I also calculate a minimum time difference between the calcualted solar noon value and all the measured_at times within the group. I then join this summary table back with the original table it's summarizing, trying to pick out the one record in it that has the measured_at value that's closest to the solarnoon value of the grouping. Clear as mud? Anyways, there seems to be a problem on that last part. I'm thinking the join on these date values is a bit funky. Perhaps things aren't matching up on micro second values, but it's hard to know with queries if I'm seeing what the DB is seeing, as date values are stored in seconds and what queries give you is a format of that. So one question I have is if there a way to set PG in the way Oracle does it.. set nls_date_format = 'YYYY...' so I can query and see exactly what PG is seeing, even to the microseconds? Is there a config parameter I can set in PG so that calculations are done only to the second? It seems this join doesn't always find a record that's closest to solar noon, and therefore drops the summary and join record all together. PG 9.0, Linux Here's the immediate code I'm thinking is in question. SELECT DISTINCT on ( site_id, solarnoon ) yada. WHERE sds.site_id = sd.site_id and calculate_local_solar_noon( sd.measured_at, sds.longitude ) = sds.solarnoon -- match with the record that's closest to solarnoon. -- At this point we know the time difference, -- but not whether it's more or less. The higher level -- DISTINCT clause removes any duplicates caused should -- solarnoon fall exactly between two measured_at times. and enough_measurements > 0 and ( ( sd.measured_at = ( calculate_local_solar_noon(sd.measured_at,sds.longitude) + ( sds.minimum_time_between_measured_and_solarnoon::text || ' secs' )::interval ) ) or ( sd.measured_at = ( calculate_local_solar_noon(sd.measured_at,sds.longitude) - ( sds.minimum_time_between_measured_and_solarnoon::text || ' secs' )::interval ) ) ) ) end_distinct
Actually, figured I'd post the whole function, painful as it might be for anyone to read. If anyone sees something that's a bit of a risk ( like perhaps the whole thing ;) On 5/18/12 5:19 PM, David Salisbury wrote: > > I'm trying to debug an intermittent problem I'm seeing in one of our > rollup scripts. > > I'll try to summarize. A table has a measured_at field, of which I > calculate another > time value based on that field and a longitude value, called solar_noon, > and I summarize > min/max values grouped around this solarnoon. While I'm doing this I > also calculate a > minimum time difference between the calcualted solar noon value and all > the measured_at times > within the group. I then join this summary table back with the original > table it's > summarizing, trying to pick out the one record in it that has the > measured_at value that's closest > to the solarnoon value of the grouping. > > Clear as mud? Anyways, there seems to be a problem on that last part. > I'm thinking > the join on these date values is a bit funky. Perhaps things aren't > matching up on micro > second values, but it's hard to know with queries if I'm seeing what the > DB is seeing, as > date values are stored in seconds and what queries give you is a format > of that. > > So one question I have is if there a way to set PG in the way Oracle > does it.. > set nls_date_format = 'YYYY...' so I can query and see exactly what PG > is seeing, > even to the microseconds? Is there a config parameter I can set in PG so > that calculations > are done only to the second? It seems this join doesn't always find a > record that's closest > to solar noon, and therefore drops the summary and join record all > together. > > PG 9.0, Linux > > execute " CREATE OR REPLACE FUNCTION rollup_and_insert_subdays() returns void as $$ declare v_created_at timestamp without time zone; BEGIN v_created_at := now(); INSERT INTO air_temp_dailies ( uuid, site_id, organizationid, protocol_id, measured_at, current_temp_c, maximum_temp_c, minimum_temp_c, created_at, comments ) SELECT subd_summary.uuid, subd_summary.site_id, subd_summary.organizationid, subd_summary.protocol_id, subd_summary.measured_at, subd_summary.current_temp_c, subd_summary.max_temp_c, subd_summary.min_temp_c, v_created_at as updated_at, 'automated station rollup' as comments FROM ( SELECT DISTINCT on ( site_id, solarnoon ) site_id, organizationid, protocol_id, uuid, solarnoon, measured_at, current_temp_c, max_temp_c, min_temp_c FROM ( SELECT sd.site_id, sd.organizationid, sd.protocol_id, sd.uuid, sds.solarnoon, sd.measured_at, sd.current_temp_c, sds.max_temp_c as max_temp_c, sds.min_temp_c as min_temp_c FROM ( SELECT site_id, longitude, calculate_local_solar_noon(measured_at,longitude) as solarnoon, max(current_temp_c) as max_temp_c, min(current_temp_c) as min_temp_c, min( abs( cast( extract( epoch FROM ( measured_at - calculate_local_solar_noon(measured_at,longitude) ) ) as integer ) ) ) as minimum_time_between_measured_and_solarnoon, trunc(count(*)/93) as enough_measurements FROM ( SELECT site_id, current_temp_c, measured_at, ST_X(point) as longitude FROM air_temp_sub_days INNER JOIN sites on ( air_temp_sub_days.site_id = sites.id ) ) as appending_longitude_to_sub_day_values WHERE measured_at is not null GROUP BY site_id, calculate_local_solar_noon(measured_at,longitude), -- we assume that for any site, for any given solarnoon, the -- longitude is constant, so adding this value in the -- group by has no effect other than allowing these values -- to percolate to the outer select longitude ) sds, air_temp_sub_days sd WHERE sds.site_id = sd.site_id and calculate_local_solar_noon( sd.measured_at, sds.longitude ) = sds.solarnoon -- match with the record that's closest to solarnoon. -- At this point we know the time difference, -- but not whether it's more or less. The higher level -- DISTINCT clause removes any duplicates caused should -- solarnoon fall exactly between two measured_at times. and enough_measurements > 0 and ( ( sd.measured_at = ( calculate_local_solar_noon(sd.measured_at,sds.longitude) + ( sds.minimum_time_between_measured_and_solarnoon::text || ' secs' )::interval ) ) or ( sd.measured_at = ( calculate_local_solar_noon(sd.measured_at,sds.longitude) - ( sds.minimum_time_between_measured_and_solarnoon::text || ' secs' )::interval ) ) ) ) end_distinct ) subd_summary LEFT OUTER JOIN air_temp_dailies on subd_summary.site_id = air_temp_dailies.site_id and subd_summary.measured_at = air_temp_dailies.measured_at WHERE air_temp_dailies.site_id is null and air_temp_dailies.measured_at is null order by subd_summary.site_id, subd_summary.measured_at ; END $$ LANGUAGE plpgsql;
Oh.. and while I'm polluting this list (sorry) it's a timestamp field without a time zone. thanks for any ideas, -Dave
On 05/18/2012 04:19 PM, David Salisbury wrote: > > I'm trying to debug an intermittent problem I'm seeing in one of our > rollup scripts. > > I'll try to summarize. A table has a measured_at field, of which I > calculate another > time value based on that field and a longitude value, called solar_noon, > and I summarize > min/max values grouped around this solarnoon. While I'm doing this I > also calculate a > minimum time difference between the calcualted solar noon value and all > the measured_at times > within the group. I then join this summary table back with the original > table it's > summarizing, trying to pick out the one record in it that has the > measured_at value that's closest > to the solarnoon value of the grouping. > > Clear as mud? Anyways, there seems to be a problem on that last part. > I'm thinking > the join on these date values is a bit funky. Perhaps things aren't > matching up on micro > second values, but it's hard to know with queries if I'm seeing what the > DB is seeing, as > date values are stored in seconds and what queries give you is a format > of that. I am not sure I follow. Timestamps(which is what I think you are referring to) can be stored with up to microsecond precision and the values will be returned at the precision specified. See here; http://www.postgresql.org/docs/9.0/interactive/datatype-datetime.html > > So one question I have is if there a way to set PG in the way Oracle > does it.. > set nls_date_format = 'YYYY...' so I can query and see exactly what PG > is seeing, > even to the microseconds? Maybe you are looking for data formatting?: http://www.postgresql.org/docs/9.0/interactive/functions-formatting.html Is there a config parameter I can set in PG so > that calculations > are done only to the second? It seems this join doesn't always find a > record that's closest > to solar noon, and therefore drops the summary and join record all > together. > > PG 9.0, Linux -- Adrian Klaver adrian.klaver@gmail.com
On 05/19/2012 10:34 AM, David Salisbury wrote: CCing the list. > > > On 5/19/12 8:12 AM, Adrian Klaver wrote: > > I hope no one looks further into the problem as the case is closed. It > was a coding > problem and not a time matchup problem. Late Friday afternoons just > aren't my most > shining moments. ;) > > But I do still wonder... Is there a setting that I can set a default > formatting of the date. If I do a "select timestamp '01-jan-2012'" I > would want > it to always return everything, down to the last microsecond. A "Give me > everything!" > setting without doing some to_char ugliness. It would just return.. > > 01-jan-2012 00:00:00.000 <NOTE: A timestamp contains a date, but a date is not a timestamp(actually it is one fixed at midnight). To avoid confusion you might want to be specific what type of time you are working with.> It will return what is provided:) In other words it depends on how the timestamp field is set up. That is determined by the precision parameter passed to the timezone type when creating or altering the field. As far as ad-hoc timestamp as shown above the best you can do is change the DateStyle but that only goes to two decimal places.: test=# SET datestyle ='SQL'; SET test=# SELECT timestamp '01-jan-2012'; timestamp --------------------- 01/01/2012 00:00:00 (1 row) In either case it will return all significant digits: test=# SELECT '01-jan-2012 00:00:00.000012'::timestamp; timestamp ---------------------------- 2012-01-01 00:00:00.000012 > > > And conversely, is there a setting so that any time value will always be > rounded to the second, > shaving off the micro-seconds. So that select now(); would return, and > store _internally_! .. > > 2012-05-19 11:25:12.000 Change the precision of the timestamp field to 0. See below: test=# \d timestamp_test Table "public.timestamp_test" Column | Type | Modifiers ---------+-----------------------------+----------- id | integer | not null txt_fld | text | ts_fld | timestamp with time zone | ts_fld2 | timestamp(0) with time zone | Indexes: "timestamp_test_pkey" PRIMARY KEY, btree (id) test=# SELECT * from timestamp_test ; id | txt_fld | ts_fld | ts_fld2 ----+---------+-------------------------------+------------------------ 7 | test4 | 2009-12-24 13:37:32.499764-08 | 2009-12-24 13:37:32-08 8 | t | 2010-05-20 08:13:28.157027-07 | 2010-05-20 12:13:28-07 9 | t | 2010-05-20 08:13:43.265383-07 | 2010-05-20 10:13:43-07 10 | t | 2010-05-20 08:13:53.718519-07 | 2010-05-20 13:13:54-07 11 | s | 2011-03-25 09:00:00.124-07 | 2011-03-25 14:15:13-07 12 | s | 2011-03-25 09:12:00.124-07 | 2011-03-25 14:16:27-07 > > > Thanks for the reply and any help! > > -ds -- Adrian Klaver adrian.klaver@gmail.com
David Salisbury <salisbury@globe.gov> writes: > Actually, figured I'd post the whole function, painful as it > might be for anyone to read. If anyone sees something that's a bit > of a risk ( like perhaps the whole thing ;) Well, I don't know exactly what's causing your issue, but I see a few things that seem rather dubious: > min( > abs( > cast( > extract( > epoch FROM ( > measured_at - > calculate_local_solar_noon(measured_at,longitude) > ) > ) as integer > ) > ) > ) as > minimum_time_between_measured_and_solarnoon, Is there a really good reason to force the interval value to integer here? I forget offhand whether you get truncation or rounding when you do that, but in either case it's entirely likely that the computed min() will be less than the actual difference for *any* specific real site, if the smallest such difference has a fractional part. I'd lose the CAST step and see what happens. > ( > ( > sd.measured_at = ( > calculate_local_solar_noon(sd.measured_at,sds.longitude) + ( > sds.minimum_time_between_measured_and_solarnoon::text || > ' secs' > )::interval > ) > ) > or > ( > sd.measured_at = ( > calculate_local_solar_noon(sd.measured_at,sds.longitude) - ( > sds.minimum_time_between_measured_and_solarnoon::text || > ' secs' > )::interval > ) > ) > ) Because of the CAST above, these tests are guaranteed to fail if the measured_at value has a fractional-second part, and I'm not sure why you are assuming that that should be zero. Also, the above is an expensive, grotty, imprecise way to convert a number back to an interval. Consider using sds.minimum_time_between_measured_and_solarnoon * interval '1 second' or even better, what about abs (extract (epoch from ( sd.measured_at - calculate_local_solar_noon(sd.measured_at,sds.longitude)))) <= sds.minimum_time_between_measured_and_solarnoon which seems to me to be a lot more naturally related to what you're doing to compute minimum_time_between_measured_and_solarnoon in the first place. regards, tom lane
On 2012-05-18, David Salisbury <salisbury@globe.gov> wrote: > So one question I have is if there a way to set PG in the way Oracle does it.. probably not. > set nls_date_format = 'YYYY...' so I can query and see exactly what PG is seeing, > even to the microseconds? set datestyle to 'ISO'; > Is there a config parameter I can set in PG so that calculations > are done only to the second? no, but you can truncate explicitly date_trunc('second',your_timestamp_expression) The query: "show integer_datetimes;" should return 'on' which means timestamps are microsecond precision if it returns 'off' your database was built with floating point timstamps and equality tests will be unreliable, > It seems this join doesn't always find a record that's closest > to solar noon, and therefore drops the summary and join record all together. you didn't show the actual join only the where clause. given your task I would create CTEs finding the first record before and after local solar noon, then combine them with a union and feed the result of that to another disctint on to filter the most appropriate record for each site. -- ⚂⚃ 100% natural
On 5/27/12 12:25 AM, Jasen Betts wrote: > The query: "show integer_datetimes;" should return 'on' which means > timestamps are microsecond precision if it returns 'off' your database > was built with floating point timstamps and equality tests will be > unreliable, I find that rather interesting. I was told that I was losing microseconds when I extracted an epoch from the difference between two timestamps and casted that value to an integer. So if I have integer timestamps ( your case above ) I get microseconds, but integer epochs is without microseconds? Thanks, -ds
On 05/29/2012 04:28 PM, David Salisbury wrote: > > > On 5/27/12 12:25 AM, Jasen Betts wrote: >> The query: "show integer_datetimes;" should return 'on' which means >> timestamps are microsecond precision if it returns 'off' your database >> was built with floating point timstamps and equality tests will be >> unreliable, > > I find that rather interesting. I was told that I was losing microseconds > when I extracted an epoch from the difference between two timestamps and > casted > that value to an integer. So if I have integer timestamps ( your case > above ) > I get microseconds, but integer epochs is without microseconds? test=> SELECT extract(epoch from(now() - (now() - interval '1.345577 sec'))); date_part ----------- 1.345577 test=> SELECT extract(epoch from(now() - (now() - interval '1.345577 sec')))::int; date_part ----------- 1 An integer is an integer so you will lose all the fractional parts:) > > Thanks, > > -ds > -- Adrian Klaver adrian.klaver@gmail.com
On 05/29/2012 07:08 PM, Adrian Klaver wrote: > On 05/29/2012 04:28 PM, David Salisbury wrote: >> >> >> On 5/27/12 12:25 AM, Jasen Betts wrote: >>> The query: "show integer_datetimes;" should return 'on' which means >>> timestamps are microsecond precision if it returns 'off' your database >>> was built with floating point timstamps and equality tests will be >>> unreliable, >> >> I find that rather interesting. I was told that I was losing microseconds >> when I extracted an epoch from the difference between two timestamps and >> casted >> that value to an integer. So if I have integer timestamps ( your case >> above ) >> I get microseconds, but integer epochs is without microseconds? > Think I realize where the confusion is now. When Jasen mentioned integer datetimes he was referring to the internal storage format Postgres uses to record the datetime value. Via the magic of programming(others will have to fill that part in) the internal format can represent time down to microseconds even though the value is actually stored as an eight-byte integer. When you do an explicit cast of a timestamp value to integer you are asking that the value be only a whole number and the decimal portion is discarded. In other words the internal integer encodes the decimal values the external integer does not. >> >> Thanks, >> >> -ds >> > > -- Adrian Klaver adrian.klaver@gmail.com
On 5/30/12 9:42 AM, Adrian Klaver wrote: > Think I realize where the confusion is now. When Jasen mentioned integer > datetimes he was referring to the internal storage format Postgres uses > to record the datetime value. Via the magic of programming(others will > have to fill that part in) the internal format can represent time down > to microseconds even though the value is actually stored as an > eight-byte integer. When you do an explicit cast of a timestamp value to > integer you are asking that the value be only a whole number and the > decimal portion is discarded. In other words the internal integer > encodes the decimal values the external integer does not. Thanks! I was looking for some sort of verification along these lines. So in my mind, the internal storage of a timestamp would be the number of milliseconds since 1970 ( or similar ). But to me, if I cast something that is an integer into an integer it would still be an integer ;) , and still hold the milliseconds. Perhaps if I cast a datetime into a bigint it'll still hold the number of ms? Some sort of parameter setting for dates would be nice to be able to default a date/time format down to the ms, w/o having to explicitly format it with every select... imho. -ds
On 05/30/2012 01:48 PM, David Salisbury wrote: > > > On 5/30/12 9:42 AM, Adrian Klaver wrote: >> Think I realize where the confusion is now. When Jasen mentioned integer >> datetimes he was referring to the internal storage format Postgres uses >> to record the datetime value. Via the magic of programming(others will >> have to fill that part in) the internal format can represent time down >> to microseconds even though the value is actually stored as an >> eight-byte integer. When you do an explicit cast of a timestamp value to >> integer you are asking that the value be only a whole number and the >> decimal portion is discarded. In other words the internal integer >> encodes the decimal values the external integer does not. > > Thanks! I was looking for some sort of verification along these lines. > So in my mind, the internal storage of a timestamp would be the number > of milliseconds since 1970 ( or similar ). But to me, if I cast something > that is an integer into an integer it would still be an integer ;) , and > still hold the milliseconds. Perhaps if I cast a datetime into a bigint > it'll > still hold the number of ms? Some sort of parameter setting for dates > would be nice to be able to default a date/time format down to the ms, w/o > having to explicitly format it with every select... imho The client does not see the internal value. That value is decoded when presented to the client and encoded when the client presents a datetime value. Any integer casting you do as the client will not change that. As to datetime formatting, I believe that was covered in another thread:) test=> SELECT now(); now ------------------------------- 2012-05-30 14:25:28.719475-07 (1 row) test=> SELECT '2012-05-30'::TIMESTAMP; timestamp --------------------- 2012-05-30 00:00:00 (1 row) test=> SELECT '2012-05-30 2:26:45.56789'::TIMESTAMP; timestamp --------------------------- 2012-05-30 02:26:45.56789 (1 row) A timestamp will return values that are significant. > > -ds -- Adrian Klaver adrian.klaver@gmail.com
On 2012-05-29, David Salisbury <salisbury@globe.gov> wrote: > > > On 5/27/12 12:25 AM, Jasen Betts wrote: >> The query: "show integer_datetimes;" should return 'on' which means >> timestamps are microsecond precision if it returns 'off' your database >> was built with floating point timstamps and equality tests will be >> unreliable, > > I find that rather interesting. I was told that I was losing microseconds > when I extracted an epoch from the difference between two timestamps and casted > that value to an integer. So if I have integer timestamps ( your case above ) > I get microseconds, but integer epochs is without microseconds? yeah, the microseconds appear as fractions of seconds, so in the conversion to integer epoch they get rounded off. > > Thanks, > > -ds > -- ⚂⚃ 100% natural
On 2012-05-30, David Salisbury <salisbury@globe.gov> wrote: > > > On 5/30/12 9:42 AM, Adrian Klaver wrote: >> Think I realize where the confusion is now. When Jasen mentioned integer >> datetimes he was referring to the internal storage format Postgres uses >> to record the datetime value. Via the magic of programming(others will >> have to fill that part in) the internal format can represent time down >> to microseconds even though the value is actually stored as an >> eight-byte integer. When you do an explicit cast of a timestamp value to >> integer you are asking that the value be only a whole number and the >> decimal portion is discarded. In other words the internal integer >> encodes the decimal values the external integer does not. > > Thanks! I was looking for some sort of verification along these lines. > So in my mind, the internal storage of a timestamp would be the number > of milliseconds since 1970 ( or similar ). But to me, if I cast something > that is an integer into an integer it would still be an integer ;) , and > still hold the milliseconds. It's internally stored as int8, but treated arithmetically as a number of millionths. "Fixed point" is the apropiate term I think. > Perhaps if I cast a datetime into a bigint it'll > still hold the number of ms? only if you multiply it by 1000000 > Some sort of parameter setting for dates > would be nice to be able to default a date/time format down to the ms, w/o > having to explicitly format it with every select... imho. sounds like a potential foot-gun to me. -- ⚂⚃ 100% natural
Jasen Betts wrote: > On 2012-05-29, David Salisbury <salisbury@globe.gov> wrote: >> >> On 5/27/12 12:25 AM, Jasen Betts wrote: >>> The query: "show integer_datetimes;" should return 'on' which means >>> timestamps are microsecond precision if it returns 'off' your database >>> was built with floating point timstamps and equality tests will be >>> unreliable, >> I find that rather interesting. I was told that I was losing microseconds >> when I extracted an epoch from the difference between two timestamps and casted >> that value to an integer. So if I have integer timestamps ( your case above ) >> I get microseconds, but integer epochs is without microseconds? > > yeah, the microseconds appear as fractions of seconds, so in the > conversion to integer epoch they get rounded off. I think you need to consider what you're actually computing and measuring. My understanding is that Meeus's Equation of Time calculation is good to something like 250mSec so that's the limit of your accuracy, but as soon as you start taking refraction and atmospheric turbulence into account- even with the Sun high above the horizon- you're going to degrade that. -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues]