Thread: DecodeInterval fixes

DecodeInterval fixes

From
Joseph Koshakow
Date:
Hi all,

This patch does three things in the DecodeInterval function:

1) Removes dead code for handling unit type RESERVE. There used to be
a unit called "invalid" that was of type RESERVE. At some point that
unit was removed and there were no more units of type RESERVE.
Therefore, the code for RESERVE unit handling is unreachable.

2) Restrict the unit "ago" to only appear at the end of the
interval. According to the docs [0], this is the only valid place to
put it, but we allowed it multiple times at any point in the input.

3) Error when the user has multiple consecutive units or a unit without
an accompanying value. I spent a lot of time trying to come up with
robust ways to detect this and ultimately settled on using the "type"
field. I'm not entirely happy with this approach, because it involves
having to look ahead to the next field in a couple of places. The other
approach I was considering was to introduce a boolean flag called
"unhandled_unit". After parsing a unit it would be set to true, after
applying the unit to a number it would be set to false. If it was true
right before parsing a unit, then we would error. Please let me know
if you have any suggestions here.

There is one more problem I noticed, but didn't fix. We allow multiple
"@" to be sprinkled anywhere in the input, even though the docs [0]
only allow it to appear at the beginning of the input. For example,
the following query works fine:

    # SELECT INTERVAL '1 @ year @ @ @ 6 days @ 10 @ minutes';
            interval        
    ------------------------
     1 year 6 days 00:10:00
    (1 row)

Unfortunately, this can't be fixed in DecodeInterval, because all of
the "@" fields are filtered out before this method. Additionally, I
believe this means that the lines

     if (type == IGNORE_DTF)
         continue;

in DecodeInterval, that appears right after decoding the units, are unreachable since
"@" is the only unit of type IGNORE_DTF. Since "@" is filtered out,
we'll never decode a unit of type IGNORE_DTF.

For reference, I previously merged a couple similar patches to this
one, but for other date-time types [1], [2].

Thanks,
Joe Koshakow
Attachment

Re: DecodeInterval fixes

From
Jacob Champion
Date:
Hi Joe, here's a partial review:

On Sun, Apr 9, 2023 at 5:44 PM Joseph Koshakow <koshy44@gmail.com> wrote:
> 1) Removes dead code for handling unit type RESERVE.

Looks good. From a quick skim it looks like the ECPG copy of this code
(ecpg/pgtypeslib/interval.c) might need to be updated as well?

> 2) Restrict the unit "ago" to only appear at the end of the
> interval. According to the docs [0], this is the only valid place to
> put it, but we allowed it multiple times at any point in the input.

Also looks reasonable to me. (Same note re: ECPG.)

> 3) Error when the user has multiple consecutive units or a unit without
> an accompanying value. I spent a lot of time trying to come up with
> robust ways to detect this and ultimately settled on using the "type"
> field. I'm not entirely happy with this approach, because it involves
> having to look ahead to the next field in a couple of places. The other
> approach I was considering was to introduce a boolean flag called
> "unhandled_unit". After parsing a unit it would be set to true, after
> applying the unit to a number it would be set to false. If it was true
> right before parsing a unit, then we would error. Please let me know
> if you have any suggestions here.

I'm new to this code, but I agree that the use of `type` and the
lookahead are not particularly obvious/intuitive. At the very least,
they'd need some more explanation in the code. Your boolean flag idea
sounds reasonable, though.

> There is one more problem I noticed, but didn't fix. We allow multiple
> "@" to be sprinkled anywhere in the input, even though the docs [0]
> only allow it to appear at the beginning of the input.

(No particular opinion on this.)

It looks like this patch needs a rebase for the CI, too, but there are
no conflicts.

Thanks!
--Jacob



Re: DecodeInterval fixes

From
Tom Lane
Date:
Jacob Champion <jchampion@timescale.com> writes:
> Hi Joe, here's a partial review:
> On Sun, Apr 9, 2023 at 5:44 PM Joseph Koshakow <koshy44@gmail.com> wrote:
>> 1) Removes dead code for handling unit type RESERVE.

> Looks good. From a quick skim it looks like the ECPG copy of this code
> (ecpg/pgtypeslib/interval.c) might need to be updated as well?

The ECPG datetime datatype support code has been basically unmaintained
for years, and has diverged quite far at this point from the core code.
I wouldn't expect that a patch to the core code necessarily applies
easily to ECPG, nor would I expect somebody patching the core to bother
trying.

Perhaps modernizing/resyncing that ECPG code would be a worthwhile
undertaking, but it'd be a mighty large one, and I'm not sure about
the size of the return.  In the meantime, benign neglect is the policy.

            regards, tom lane



Re: DecodeInterval fixes

From
Joseph Koshakow
Date:


Jacob Champion <jchampion@timescale.com> writes:
> Hi Joe, here's a partial review:

Thanks so much for the review!

> I'm new to this code, but I agree that the use of `type` and the
> lookahead are not particularly obvious/intuitive. At the very least,
> they'd need some more explanation in the code. Your boolean flag idea
> sounds reasonable, though.

I've updated the patch with the boolean flag idea. I think it's a
bit cleaner and more readable.

>> There is one more problem I noticed, but didn't fix. We allow multiple
>> "@" to be sprinkled anywhere in the input, even though the docs [0]
>> only allow it to appear at the beginning of the input.
>
> (No particular opinion on this.)

I looked into this a bit. The reason this works is because the date
time lexer filters out all punctuation. That's what allows us to parse
things like `SELECT date 'January 8, 1999';`. It's probably not worth
trying to be smarter about what punctuation we allow where, at least
for now. Maybe in the future we can exclude "@" from the punctuation
that get's filtered out.

> It looks like this patch needs a rebase for the CI, too, but there are
> no conflicts.

The attached patch is rebased against master.

Thanks,
Joe Koshakow
Attachment

Re: DecodeInterval fixes

From
Gurjeet Singh
Date:
On Fri, Jul 7, 2023 at 4:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> The ECPG datetime datatype support code has been basically unmaintained
> for years, and has diverged quite far at this point from the core code.

I was under the impression that anything in the postgresql.git
repository is considered core, and hence maintained as one unit, from
release to release. An example of this, to me, were all the contrib/*
modules.

> I wouldn't expect that a patch to the core code necessarily applies
> easily to ECPG, nor would I expect somebody patching the core to bother
> trying.

The above statement makes me think that only the code inside
src/backend/ is considered core. Is that the right assumption?

> Perhaps modernizing/resyncing that ECPG code would be a worthwhile
> undertaking, but it'd be a mighty large one, and I'm not sure about
> the size of the return.  In the meantime, benign neglect is the policy.

Benign neglect doesn't sound nice from a user's/consumer's
perspective. Can it be labeled (i.e. declared as such in docs) as
deprecated.

Knowing that the tool you use has now been deprecated would be a
better message for someone still using it, even if it was left marked
deprecated indefinitely. Discovering benign neglect for the tool you
depend on, from secondary sources (like this list, forums, etc.), does
not evoke a lot of confidence.

Best regards,
Gurjeet
http://Gurje.et



Re: DecodeInterval fixes

From
Tom Lane
Date:
Gurjeet Singh <gurjeet@singh.im> writes:
> On Fri, Jul 7, 2023 at 4:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The ECPG datetime datatype support code has been basically unmaintained
>> for years, and has diverged quite far at this point from the core code.

> I was under the impression that anything in the postgresql.git
> repository is considered core, and hence maintained as one unit, from
> release to release.

When I say that ecpglib is next door to unmaintained, I'm just stating
the facts on the ground; project policy is irrelevant.  That situation
is not likely to change until somebody steps up to do (a lot of) work
on it, which is probably unlikely to happen unless we start getting
actual complaints from ECPG users.  For the meantime, what's there now
seems to be satisfactory to whoever is using it ... which might be
nobody?

In any case, you don't have to look far to notice that some parts of
the tree are maintained far more actively than others.  ecpglib is
just one of the more identifiable bits that's receiving little love.
The quality of the code under contrib/ is wildly variable, and even
the server code itself has backwaters.  (For instance, the "bit" types,
which aren't even in the standard anymore; or the geometric types,
or "money".)

By and large, I don't see this unevenness of maintenance effort as
a problem.  It's more like directing our limited resources into the
most useful places.  Code that isn't getting worked on is either not
used at all by anybody, or it serves the needs of those who use it
well enough already.  Since it's difficult to tell which of those
cases applies, removing code just because it's not been improved
lately is a hard choice to sell.  But so is putting maintenance effort
into code that there might be little audience for.  In the end we
solve this via the principle of "scratch your own itch": if somebody
is concerned enough about a particular piece of code to put their own
time into improving it, then great, it'll get improved.

> Benign neglect doesn't sound nice from a user's/consumer's
> perspective. Can it be labeled (i.e. declared as such in docs) as
> deprecated.

Deprecation would imply that we're planning to remove it, which
we are not.

            regards, tom lane



Re: DecodeInterval fixes

From
Gurjeet Singh
Date:
On Sat, Jul 8, 2023 at 1:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Gurjeet Singh <gurjeet@singh.im> writes:
> > On Fri, Jul 7, 2023 at 4:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> The ECPG datetime datatype support code has been basically unmaintained
> >> for years, and has diverged quite far at this point from the core code.
>
> > I was under the impression that anything in the postgresql.git
> > repository is considered core, and hence maintained as one unit, from
> > release to release.
>
> When I say that ecpglib is next door to unmaintained, I'm just stating
> the facts on the ground; project policy is irrelevant.  That situation
> is not likely to change until somebody steps up to do (a lot of) work
> on it, which is probably unlikely to happen unless we start getting
> actual complaints from ECPG users.  For the meantime, what's there now
> seems to be satisfactory to whoever is using it ... which might be
> nobody?
>
> In any case, you don't have to look far to notice that some parts of
> the tree are maintained far more actively than others.  ecpglib is
> just one of the more identifiable bits that's receiving little love.
> The quality of the code under contrib/ is wildly variable, and even
> the server code itself has backwaters.  (For instance, the "bit" types,
> which aren't even in the standard anymore; or the geometric types,
> or "money".)

Thanks for sharing your view on different parts of the code. This give
a clear direction if someone would be interested in stepping up.

As part of my mentoring a GSoC 2023 participant, last night we were
looking at the TODO wiki page, for something for the mentee to pick up
next. I feel the staleness/deficiencies you mention above are not
captured in the TODO wiki page. It'd be nice if these were documented,
so that newcomers to the community can pick up work that they feel is
an easy lift for them.

> By and large, I don't see this unevenness of maintenance effort as
> a problem.  It's more like directing our limited resources into the
> most useful places.  Code that isn't getting worked on is either not
> used at all by anybody, or it serves the needs of those who use it
> well enough already.  Since it's difficult to tell which of those
> cases applies, removing code just because it's not been improved
> lately is a hard choice to sell.  But so is putting maintenance effort
> into code that there might be little audience for.  In the end we
> solve this via the principle of "scratch your own itch": if somebody
> is concerned enough about a particular piece of code to put their own
> time into improving it, then great, it'll get improved.
>
> > Benign neglect doesn't sound nice from a user's/consumer's
> > perspective. Can it be labeled (i.e. declared as such in docs) as
> > deprecated.
>
> Deprecation would imply that we're planning to remove it, which
> we are not.

Good to know. Sorry I took "benign neglect" to mean that there's no
willingness to improve it. This makes it clear that community wants to
improve and maintain ECPG, it's just a matter of someone volunteering,
and better use of the resources available.



Best regards,
Gurjeet
http://Gurje.et



Re: DecodeInterval fixes

From
Joseph Koshakow
Date:
On Sat, Jul 8, 2023 at 5:06 PM Gurjeet Singh <gurjeet@singh.im> wrote:

> I feel the staleness/deficiencies you mention above are not
> captured in the TODO wiki page. It'd be nice if these were documented,
> so that newcomers to the community can pick up work that they feel is
> an easy lift for them.

I think that's a good idea. I've definitely been confused by this in
previous patches I've submitted.


I've broken up this patch into three logical commits and attached them.
None of the actual code has changed.

Thanks,
Joe Koshakow
Attachment

Re: DecodeInterval fixes

From
reid.thompson@crunchydata.com
Date:
On Sat, 2023-07-08 at 13:18 -0400, Joseph Koshakow wrote:
> Jacob Champion <jchampion@timescale.com> writes:
> > Hi Joe, here's a partial review:
>
> Thanks so much for the review!
>
> > I'm new to this code, but I agree that the use of `type` and the
> > lookahead are not particularly obvious/intuitive. At the very
> > least,
> > they'd need some more explanation in the code. Your boolean flag
> > idea
> > sounds reasonable, though.
>
> I've updated the patch with the boolean flag idea. I think it's a
> bit cleaner and more readable.
>
> > > There is one more problem I noticed, but didn't fix. We allow
> > > multiple
> > > "@" to be sprinkled anywhere in the input, even though the docs
> > > [0]
> > > only allow it to appear at the beginning of the input.
> >
> > (No particular opinion on this.)
>
> I looked into this a bit. The reason this works is because the date
> time lexer filters out all punctuation. That's what allows us to
> parse
> things like `SELECT date 'January 8, 1999';`. It's probably not worth
> trying to be smarter about what punctuation we allow where, at least
> for now. Maybe in the future we can exclude "@" from the punctuation
> that get's filtered out.
>
> > It looks like this patch needs a rebase for the CI, too, but there
> > are
> > no conflicts.
>
> The attached patch is rebased against master.
>
> Thanks,
> Joe Koshakow

Apologies, I'm posting a little behind the curve here. My initial
thoughts on the original patch mirrored Jacob's re 1 and 2 - that it looked
good, did we need to consider the modified ecpg copy (which has been
answered by Tom). I didn't have have any strong thought re 3) or the '@'.

The updated patch looks good to me. Seems a little clearer/cleaner.

Thanks,
Reid







Re: DecodeInterval fixes

From
Reid Thompson
Date:
On Sun, 2023-07-09 at 13:24 -0400, Joseph Koshakow wrote:
>
> I've broken up this patch into three logical commits and attached
> them.
> None of the actual code has changed.
>
> Thanks,
> Joe Koshakow

I made a another pass through the separated patches, it looks good to
me.

Thanks,
Reid





Re: DecodeInterval fixes

From
Jacob Champion
Date:
On Mon, Jul 10, 2023 at 10:19 AM Reid Thompson <jreidthompson@nc.rr.com> wrote:
> I made a another pass through the separated patches, it looks good to
> me.

LGTM too. (Thanks Tom for the clarification on ECPG.)

Marked RfC.

--Jacob



Re: DecodeInterval fixes

From
Michael Paquier
Date:
On Mon, Jul 10, 2023 at 10:42:31AM -0700, Jacob Champion wrote:
> On Mon, Jul 10, 2023 at 10:19 AM Reid Thompson <jreidthompson@nc.rr.com> wrote:
>> I made a another pass through the separated patches, it looks good to
>> me.
>
> LGTM too. (Thanks Tom for the clarification on ECPG.)

+SELECT INTERVAL '42 days 2 seconds ago ago';
+SELECT INTERVAL '2 minutes ago 5 days';
[...]
+SELECT INTERVAL 'hour 5 months';
+SELECT INTERVAL '1 year months days 5 hours';

0002 and 0003 make this stuff fail, but isn't there a risk that this
breaks applications that relied on these accidental behaviors?
Assuming that this is OK makes me nervous.
--
Michael

Attachment

Re: DecodeInterval fixes

From
Jacob Champion
Date:
On Mon, Aug 21, 2023 at 10:39 PM Michael Paquier <michael@paquier.xyz> wrote:
> 0002 and 0003 make this stuff fail, but isn't there a risk that this
> breaks applications that relied on these accidental behaviors?
> Assuming that this is OK makes me nervous.

I wouldn't argue for backpatching, for sure, but I guess I saw this as
falling into the same vein as 5b3c5953 and bcc704b52 which were
already committed.

--Jacob



Re: DecodeInterval fixes

From
Joseph Koshakow
Date:


On Tue, Aug 22, 2023 at 12:58 PM Jacob Champion <jchampion@timescale.com> wrote:
>
> On Mon, Aug 21, 2023 at 10:39 PM Michael Paquier <michael@paquier.xyz> wrote:
> > 0002 and 0003 make this stuff fail, but isn't there a risk that this
> > breaks applications that relied on these accidental behaviors?
> > Assuming that this is OK makes me nervous.
>
> I wouldn't argue for backpatching, for sure, but I guess I saw this as
> falling into the same vein as 5b3c5953 and bcc704b52 which were
> already committed.

I agree, I don't think we should try and backport this. As Jacob
highlighted, we've merged similar patches for other date time types.
If applications were relying on this behavior, the upgrade may be a
good time for them to re-evaluate their usage since it's outside the
documented spec and they may not be getting the units they're expecting
from intervals like '1 day month'.

Thanks,
Joe Koshakow

Re: DecodeInterval fixes

From
Michael Paquier
Date:
On Sun, Aug 27, 2023 at 04:14:00PM -0400, Joseph Koshakow wrote:
> On Tue, Aug 22, 2023 at 12:58 PM Jacob Champion <jchampion@timescale.com>
> wrote:
>> I wouldn't argue for backpatching, for sure, but I guess I saw this as
>> falling into the same vein as 5b3c5953 and bcc704b52 which were
>> already committed.
>
> I agree, I don't think we should try and backport this. As Jacob
> highlighted, we've merged similar patches for other date time types.
> If applications were relying on this behavior, the upgrade may be a
> good time for them to re-evaluate their usage since it's outside the
> documented spec and they may not be getting the units they're expecting
> from intervals like '1 day month'.

I felt like asking anyway.  I have looked at the patch series and the
past compatibility changes in this area, and I kind of agree that this
seems like an improvement against confusing interval values.  So, I
have applied 0001, 0002 and 0003 after more review.

0002 was a bit careless with the code indentation.

In 0003, I was wondering a bit if we'd better set parsing_unit_val to
false for AGO, but as we do a backward lookup and because after 0002
AGO can only be last, I've let the code as you have suggested, relying
on the initial value of this variable.
--
Michael

Attachment