Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones
Date
Msg-id EB0DA992-5D41-470B-A521-41706A6E3EBA@gmail.com
Whole thread Raw
In response to Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones  (Lincoln Swaine-Moore <lswainemoore@gmail.com>)
Responses Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones  (Lincoln Swaine-Moore <lswainemoore@gmail.com>)
List pgsql-general
> On 4 Oct 2023, at 17:58, Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:
>
> > SELECT
> >     sub.gs AS ts_in_utc
> >     ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz
> >     ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York',
> > '2023-01-01')
> > FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz,
> > '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub
> > WHERE
> >     sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND
> > sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz
>
> I believe this query will be funky around DST borders, because `sub.gs AT TIME ZONE 'America/New_York'` will be
localizedin a way that erases the difference between hours with different offsets, which are genuinely different. For
instance,I ran this and there are two rows within it that look like:  
>
> ` 2023-11-05 05:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`
> and
> ` 2023-11-05 06:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`
>
> I think that the non-unique second column will pose an issue for the date binning at a resolution finer than 1 day.

What I do in such cases is to add an extra column with the UTC timestamp to serve as a linear scale to the local
timestamps.That also helps with ordering buckets in reports and such during DST changes (especially the ones where an
hourrepeats). 

Filtering in the queries occurs on the UTC scale, with the local timestamps calculated back to UTC, so that it doesn’t
matterwhether the local time has 23, 24, 25 or 24.5 or 23.5 or whatever number of hours on a date-range - it all maps
backbecause UTC always has 24 hours. 

Something that I also do is to create calendar tables and views for the buckets, with 2 timestamps per bucket: the
startof the bucket and the start of the next bucket. That gives you a range to put actual timestamps between (not
BETWEENbetween, because that’s inclusive). You can store and index that, as opposed to generated results using
generate_series- basically I materialise those. 

For hours and quarter hours I found it to be fairly convenient to base a view on a join between a date calendar and an
(quarterof an) hour per UTC day table, but materialising that with some indexes may perform better (at the cost of disk
space).I do materialise that currently, but our database server doesn’t have a lot of memory so I’m often not hitting
thecache and performance suffers a bit (infrastructure is about to change for the better though). 

Regards,

Alban Hertroys
--
There is always an exception to always.







pgsql-general by date:

Previous
From: Steve Crawford
Date:
Subject: Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones
Next
From: Lincoln Swaine-Moore
Date:
Subject: Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones