Re: truncating timestamps on arbitrary intervals - Mailing list pgsql-hackers

From John Naylor
Subject Re: truncating timestamps on arbitrary intervals
Date
Msg-id CAFBsxsGUDP2jky8SLk_zyWXFzj_A0G6BLR7R2kVjOLc5vNROQw@mail.gmail.com
Whole thread Raw
In response to Re: truncating timestamps on arbitrary intervals  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: truncating timestamps on arbitrary intervals  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 2020-06-30 06:34, John Naylor wrote:
> > In v9, I've simplified the patch somewhat to make it easier for future
> > work to build on.
> >
> > - When truncating on month-or-greater intervals, require the origin to
> > align on month. This removes the need to handle weird corner cases
> > that have no straightforward behavior.
> > - Remove hackish and possibly broken code to allow origin to be after
> > the input timestamp. The default origin is Jan 1, 1 AD, so only AD
> > dates will behave correctly by default. This is not enforced for now,
> > since it may be desirable to find a way to get this to work in a nicer
> > way.
> > - Rebase docs over PG13 formatting changes.
>
> This looks pretty solid now.  Are there any more corner cases and other
> areas with unclear behavior that you are aware of?

Hi Peter,

Thanks for taking a look!

I believe there are no known corner cases aside from not throwing an error if origin > input, but I'll revisit that when we are more firm on what features we want support.

> A couple of thoughts:
>
> - After reading the discussion a few times, I'm not so sure anymore
> whether making this a cousin of date_trunc is the right way to go.  As
> you mentioned, there are some behaviors specific to date_trunc that
> don't really make sense in date_trunc_interval, and maybe we'll have
> more of those.

As far as the behaviors, I'm not sure exactly what you what you were thinking of, but here are two issues off the top of my head:

- If the new functions are considered variants of date_trunc(), there is the expectation that the options work the same way, in particular the timezone parameter. You asked specifically about that below, so I'll address that separately.
- In the "week" case, the boundary position depends on the origin, since a week-long interval is just 7 days.

> Also, date_trunc_interval isn't exactly a handy name.
> Maybe something to think about.  It's obviously fairly straightforward
> to change it.

Effectively, it puts timestamps into bins, so maybe date_bin() or something like that?

> - There were various issues with the stride interval having months and
> years.  I'm not sure we even need that.  It could be omitted unless you
> are confident that your implementation is now sufficient.

Months and years were a bit tricky, so I'd be happy to leave that out if there is not much demand for it. date_trunc() already has quarters, decades, centuries, and millenia.

> - Also, negative intervals could be prohibited, but I suppose that
> matters less.

Good for the sake of completeness. I think they happen to work in v9 by accident, but it would be better not to expose that.

> - I'm curious about the origin being set to 0001-01-01.  This seems to
> work correctly in that it sets the origin to a Monday, which is what we
> wanted, but according to Google that day was a Saturday.  Something to
> do with Julian vs. Gregorian calendar?

Right, working backwards from our calendar today, it's Monday, but at the time it would theoretically be Saturday, barring leap year miscalculations.

> Maybe we should choose a date
> that is a bit more recent and easier to reason with.

2001-01-01 would also be a Monday aligned with centuries and millenia, so that would be my next suggestion. If we don't care to match with date_trunc() on those larger units, we could also use 1900-01-01, so the vast majority of dates in databases are after the origin.

> - Then again, I'm thinking that maybe we should make the origin
> mandatory.  Otherwise, the default answers when having strides larger
> than a day are entirely arbitrary, e.g.,
>
> => select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
> 0190-01-01 00:00:00 BC
>
> => select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
> 0191-01-01 00:00:00

Right. In the first case, the default origin is also after the input, and crosses the AD/BC boundary. Tricky to get right.

> Perhaps the origin could be defaulted if the interval is less than a day
> or something like that.

If we didn't allow months and years to be units, it seems the default would always make sense?

> - I'm wondering whether we need the date_trunc_interval(interval,
> timestamptz, timezone) variant.  Isn't that the same as
> date_trunc_interval(foo AT ZONE xyz, value)?

I based this on 600b04d6b5ef6 for date_trunc(), whose message states:

date_trunc(field, timestamptz, zone_name)

is the same as

date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name

so without the shorthand, you need to specify the timezone twice, once for the calculation, and once for the output.

--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Online checksums patch - once again
Next
From: "Corbit, Dann"
Date:
Subject: Re: Connection using ODBC and SSL