Thread: Remove dependence on integer wrapping
Hi,
In [0] Andres suggested enabling -ftrapv in assert enabled builds. While
I vastly underestimated the complexity of updating `configure` to do
this, I was able to enable the flag locally. Enabling this flag causes
our existing regression tests to trap and fail in multiple different
spots. The attached patch resolves all of these overflows so that all
of our existing tests will pass with the -ftrapv flag enabled.
Some notes on the patch itself are:
I originally added the helper functions to int.h thinking I'd find
many more instances of overflow due to integer negation, however I
didn't find that many. So let me know if you think we'd be better
off without the functions.
I considered using #ifdef to rely on wrapping when -fwrapv was
enabled. This would save us some unnecessary branching when we could
rely on wrapping behavior, but it would mean that we could only enable
-ftrapv when -fwrapv was disabled, greatly reducing its utility.
The following comment was in the code for parsing timestamps:
/* check for just-barely overflow (okay except time-of-day wraps) */
/* caution: we want to allow 1999-12-31 24:00:00 */
I wasn't able to fully understand it even after staring at it for
a while. Is the comment suggesting that it is ok for the months field,
for example, to wrap around? That doesn't sound right to me I tested
the supplied timestamp, 1999-12-31 24:00:00, and it behaves the same
before and after the patch.
Thanks,
Joe Koshakow
[0] https://www.postgresql.org/message-id/20240213191401.jjhsic7et4tiahjs%40awork3.anarazel.de
In [0] Andres suggested enabling -ftrapv in assert enabled builds. While
I vastly underestimated the complexity of updating `configure` to do
this, I was able to enable the flag locally. Enabling this flag causes
our existing regression tests to trap and fail in multiple different
spots. The attached patch resolves all of these overflows so that all
of our existing tests will pass with the -ftrapv flag enabled.
Some notes on the patch itself are:
I originally added the helper functions to int.h thinking I'd find
many more instances of overflow due to integer negation, however I
didn't find that many. So let me know if you think we'd be better
off without the functions.
I considered using #ifdef to rely on wrapping when -fwrapv was
enabled. This would save us some unnecessary branching when we could
rely on wrapping behavior, but it would mean that we could only enable
-ftrapv when -fwrapv was disabled, greatly reducing its utility.
The following comment was in the code for parsing timestamps:
/* check for just-barely overflow (okay except time-of-day wraps) */
/* caution: we want to allow 1999-12-31 24:00:00 */
I wasn't able to fully understand it even after staring at it for
a while. Is the comment suggesting that it is ok for the months field,
for example, to wrap around? That doesn't sound right to me I tested
the supplied timestamp, 1999-12-31 24:00:00, and it behaves the same
before and after the patch.
Thanks,
Joe Koshakow
[0] https://www.postgresql.org/message-id/20240213191401.jjhsic7et4tiahjs%40awork3.anarazel.de
Attachment
On Sun, Jun 09, 2024 at 04:59:22PM -0400, Joseph Koshakow wrote: > I originally added the helper functions to int.h thinking I'd find > many more instances of overflow due to integer negation, however I > didn't find that many. So let me know if you think we'd be better > off without the functions. I'd vote for the functions, even if it's just future-proofing at the moment. IMHO it helps with readability, too. > The following comment was in the code for parsing timestamps: > > /* check for just-barely overflow (okay except time-of-day wraps) */ > /* caution: we want to allow 1999-12-31 24:00:00 */ > > I wasn't able to fully understand it even after staring at it for > a while. Is the comment suggesting that it is ok for the months field, > for example, to wrap around? That doesn't sound right to me I tested > the supplied timestamp, 1999-12-31 24:00:00, and it behaves the same > before and after the patch. I haven't stared at this for a while like you, but I am likewise confused at first glance. This dates back to commit 84df54b, and it looks like this comment was present in the first version of the patch in the thread [0]. I CTRL+F'd for any discussion about this but couldn't immediately find anything. > /* check the negative equivalent will fit without overflowing */ > if (unlikely(tmp > (uint16) (-(PG_INT16_MIN + 1)) + 1)) > goto out_of_range; > + > + /* > + * special case the minimum integer because its negation cannot be > + * represented > + */ > + if (tmp == ((uint16) PG_INT16_MAX) + 1) > + return PG_INT16_MIN; > return -((int16) tmp); My first impression is that there appears to be two overflow checks, one of which sends us to out_of_range, and another that just returns a special result. Why shouldn't we add a pg_neg_s16_overflow() and replace this whole chunk with something like this? if (unlikely(pg_neg_s16_overflow(tmp, &tmp))) goto out_of_range; else return tmp; > + return ((uint32) INT32_MAX) + 1; > + return ((uint64) INT64_MAX) + 1; nitpick: Any reason not to use PG_INT32_MAX/PG_INT64_MAX for these? [0] https://postgr.es/m/flat/CAFj8pRBwqALkzc%3D1WV%2Bh5e%2BDcALY2EizjHCvFi9vHbs%2Bz1OhjA%40mail.gmail.com -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes: > On Sun, Jun 09, 2024 at 04:59:22PM -0400, Joseph Koshakow wrote: >> The following comment was in the code for parsing timestamps: >> /* check for just-barely overflow (okay except time-of-day wraps) */ >> /* caution: we want to allow 1999-12-31 24:00:00 */ >> >> I wasn't able to fully understand it even after staring at it for >> a while. Is the comment suggesting that it is ok for the months field, >> for example, to wrap around? That doesn't sound right to me I tested >> the supplied timestamp, 1999-12-31 24:00:00, and it behaves the same >> before and after the patch. > I haven't stared at this for a while like you, but I am likewise confused > at first glance. This dates back to commit 84df54b, and it looks like this > comment was present in the first version of the patch in the thread [0]. I > CTRL+F'd for any discussion about this but couldn't immediately find > anything. I believe this is a copy-and-paste from 841b4a2d5, which added this: + *result = (date * INT64CONST(86400000000)) + time; + /* check for major overflow */ + if ((*result - time) / INT64CONST(86400000000) != date) + return -1; + /* check for just-barely overflow (okay except time-of-day wraps) */ + if ((*result < 0) ? (date >= 0) : (date < 0)) + return -1; I think you could replace the whole thing by using overflow-aware multiplication and addition primitives in the result calculation. Lines 2-4 basically check for mult overflow and 5-7 for addition overflow. BTW, while I approve of trying to get rid of our need for -fwrapv, I'm quite scared of actually doing it. Whatever cases you may have discovered by running the regression tests will be at best the tip of the iceberg. Is there any chance of using static analysis to find all the places of concern? regards, tom lane
On Sun, Jun 09, 2024 at 09:57:54PM -0400, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> On Sun, Jun 09, 2024 at 04:59:22PM -0400, Joseph Koshakow wrote: >>> The following comment was in the code for parsing timestamps: >>> /* check for just-barely overflow (okay except time-of-day wraps) */ >>> /* caution: we want to allow 1999-12-31 24:00:00 */ >>> >>> I wasn't able to fully understand it even after staring at it for >>> a while. Is the comment suggesting that it is ok for the months field, >>> for example, to wrap around? That doesn't sound right to me I tested >>> the supplied timestamp, 1999-12-31 24:00:00, and it behaves the same >>> before and after the patch. > >> I haven't stared at this for a while like you, but I am likewise confused >> at first glance. This dates back to commit 84df54b, and it looks like this >> comment was present in the first version of the patch in the thread [0]. I >> CTRL+F'd for any discussion about this but couldn't immediately find >> anything. > > I believe this is a copy-and-paste from 841b4a2d5, which added this: > > + *result = (date * INT64CONST(86400000000)) + time; > + /* check for major overflow */ > + if ((*result - time) / INT64CONST(86400000000) != date) > + return -1; > + /* check for just-barely overflow (okay except time-of-day wraps) */ > + if ((*result < 0) ? (date >= 0) : (date < 0)) > + return -1; > > I think you could replace the whole thing by using overflow-aware > multiplication and addition primitives in the result calculation. > Lines 2-4 basically check for mult overflow and 5-7 for addition > overflow. Ah, I see. Joe's patch does that in one place. It's probably worth doing that in the other places this "just-barefly overflow" comment appears IMHO. I was still confused by the comment about 1999, but I tracked it down to commit 542eeba [0]. IIUC it literally means that we need special handling for that date because POSTGRES_EPOCH_JDATE is 2000-01-01. [0] https://postgr.es/m/CABUevEx5zUO%3DKRUg06a9qnQ_e9EvTKscL6HxAM_L3xj71R7AQw%40mail.gmail.com -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes: > On Sun, Jun 09, 2024 at 09:57:54PM -0400, Tom Lane wrote: >> I think you could replace the whole thing by using overflow-aware >> multiplication and addition primitives in the result calculation. > I was still confused by the comment about 1999, but I tracked it down to > commit 542eeba [0]. IIUC it literally means that we need special handling > for that date because POSTGRES_EPOCH_JDATE is 2000-01-01. Yeah, I think so, and I think we probably don't need any special care if we switch to direct tests of overflow-aware primitives. (Though it'd be worth checking that '1999-12-31 24:00:00'::timestamp still works. It doesn't look like I actually added a test case for that.) regards, tom lane
Hi, On 2024-06-09 21:57:54 -0400, Tom Lane wrote: > BTW, while I approve of trying to get rid of our need for -fwrapv, > I'm quite scared of actually doing it. I think that's a quite fair concern. One potentially relevant datapoint is that we actually don't have -fwrapv equivalent on all platforms, and I don't recall a lot of complaints from windows users. Of course it's quite possible that they'd never notice... I think this is a good argument for enabling -ftrapv in development builds. That gives us at least a *chance* of seeing these issues. > Whatever cases you may have discovered by running the regression tests will > be at best the tip of the iceberg. Is there any chance of using static > analysis to find all the places of concern? The last time I tried removing -fwrapv both gcc and clang found quite a few issues. I think I fixed most of those though, so presumably the issue that remain are ones less easily found by simple static analysis. I wonder if coverity would find more if we built without -fwrapv. GCC has -Wstrict-overflow=n, which at least tells us where the compiler optimizes based on the assumption that there cannot be overflow. It does generate a bunch of noise, but I think it's almost exclusively due to our MemSet() macro. Greetings, Andres Freund
>> /* check the negative equivalent will fit without overflowing */
>> if (unlikely(tmp > (uint16) (-(PG_INT16_MIN + 1)) + 1))
>> goto out_of_range;
>> +
>> + /*
>> + * special case the minimum integer because its negation cannot be
>> + * represented
>> + */
>> + if (tmp == ((uint16) PG_INT16_MAX) + 1)
>> + return PG_INT16_MIN;
>> return -((int16) tmp);
>
> My first impression is that there appears to be two overflow checks, one of
> which sends us to out_of_range, and another that just returns a special
> result. Why shouldn't we add a pg_neg_s16_overflow() and replace this
> whole chunk with something like this?
>
> if (unlikely(pg_neg_s16_overflow(tmp, &tmp)))
> goto out_of_range;
> else
> return tmp;
tmp is an uint16 here, it seems like you might have read it as an
int16? We would need some helper method like
static inline bool
pg_neg_u16_overflow(uint16 a, int16 *result);
and then we could replace that whole chunk with something like
if (unlikely(pg_neg_u16_overflow(tmp, &result)))
goto out_of_range;
else
return result;
that pattern shows up a lot in this file, but I was worried that it
wasn't useful as a general purpose function. Happy to add it
though if you still feel otherwise.
>> + return ((uint32) INT32_MAX) + 1;
>>
>> + return ((uint64) INT64_MAX) + 1;
>
> nitpick: Any reason not to use PG_INT32_MAX/PG_INT64_MAX for these?
Carelessness, sorry about that, it's been fixed in the attached patch.
>> I believe this is a copy-and-paste from 841b4a2d5, which added this:
>>
>> + *result = (date * INT64CONST(86400000000)) + time;
>> + /* check for major overflow */
>> + if ((*result - time) / INT64CONST(86400000000) != date)
>> + return -1;
>> + /* check for just-barely overflow (okay except time-of-day wraps) */
>> + if ((*result < 0) ? (date >= 0) : (date < 0))
>> + return -1;
>>
>> I think you could replace the whole thing by using overflow-aware
>> multiplication and addition primitives in the result calculation.
>> Lines 2-4 basically check for mult overflow and 5-7 for addition
>> overflow.
>
> Ah, I see. Joe's patch does that in one place. It's probably worth doing
> that in the other places this "just-barefly overflow" comment appears IMHO.
>
> I was still confused by the comment about 1999, but I tracked it down to
>commit 542eeba [0]. IIUC it literally means that we need special handling
>for that date because POSTGRES_EPOCH_JDATE is 2000-01-01.
>
> [0] https://postgr.es/m/CABUevEx5zUO%3DKRUg06a9qnQ_e9EvTKscL6HxAM_L3xj71R7AQw%40mail.gmail.com
> Yeah, I think so, and I think we probably don't need any special care
> if we switch to direct tests of overflow-aware primitives. (Though
>it'd be worth checking that '1999-12-31 24:00:00'::timestamp still
> works. It doesn't look like I actually added a test case for that.)
The only other place I found this comment was in
`make_timestamp_internal`. I've updated that function and added some
tests. I also manually verified that the behavior matches before and
after this patch.
>> BTW, while I approve of trying to get rid of our need for -fwrapv,
>> I'm quite scared of actually doing it.
>
> I think that's a quite fair concern. One potentially relevant datapoint is
> that we actually don't have -fwrapv equivalent on all platforms, and I don't
>recall a lot of complaints from windows users. Of course it's quite possible
> that they'd never notice...
>
> I think this is a good argument for enabling -ftrapv in development
> builds. That gives us at least a *chance* of seeing these issues.
+1, I wouldn't recommend removing -fwrapv immediately after this
commit. However, if we can enable -ftrapv in development builds, then
we can find overflows much more easily.
> Whatever cases you may have discovered by running the regression tests will
> be at best the tip of the iceberg.
Agreed.
> Is there any chance of using static
> analysis to find all the places of concern?
I'm not personally familiar with any static analysis tools, but I can
try and do some research. Andres had previously suggested SQLSmith. I
think any kind of fuzz testing with -ftrapv enabled will reveal a lot
of issues. Honestly just grepping for +,-,* in certain directories
(like backend/utils/adt) would probably be fairly fruitful for anyone
with the patience. My previous overflow patch was the result of looking
through all the arithmetic in datetime.c.
Thanks,
Joe Koshakow
>> if (unlikely(tmp > (uint16) (-(PG_INT16_MIN + 1)) + 1))
>> goto out_of_range;
>> +
>> + /*
>> + * special case the minimum integer because its negation cannot be
>> + * represented
>> + */
>> + if (tmp == ((uint16) PG_INT16_MAX) + 1)
>> + return PG_INT16_MIN;
>> return -((int16) tmp);
>
> My first impression is that there appears to be two overflow checks, one of
> which sends us to out_of_range, and another that just returns a special
> result. Why shouldn't we add a pg_neg_s16_overflow() and replace this
> whole chunk with something like this?
>
> if (unlikely(pg_neg_s16_overflow(tmp, &tmp)))
> goto out_of_range;
> else
> return tmp;
tmp is an uint16 here, it seems like you might have read it as an
int16? We would need some helper method like
static inline bool
pg_neg_u16_overflow(uint16 a, int16 *result);
and then we could replace that whole chunk with something like
if (unlikely(pg_neg_u16_overflow(tmp, &result)))
goto out_of_range;
else
return result;
that pattern shows up a lot in this file, but I was worried that it
wasn't useful as a general purpose function. Happy to add it
though if you still feel otherwise.
>> + return ((uint32) INT32_MAX) + 1;
>>
>> + return ((uint64) INT64_MAX) + 1;
>
> nitpick: Any reason not to use PG_INT32_MAX/PG_INT64_MAX for these?
Carelessness, sorry about that, it's been fixed in the attached patch.
>> I believe this is a copy-and-paste from 841b4a2d5, which added this:
>>
>> + *result = (date * INT64CONST(86400000000)) + time;
>> + /* check for major overflow */
>> + if ((*result - time) / INT64CONST(86400000000) != date)
>> + return -1;
>> + /* check for just-barely overflow (okay except time-of-day wraps) */
>> + if ((*result < 0) ? (date >= 0) : (date < 0))
>> + return -1;
>>
>> I think you could replace the whole thing by using overflow-aware
>> multiplication and addition primitives in the result calculation.
>> Lines 2-4 basically check for mult overflow and 5-7 for addition
>> overflow.
>
> Ah, I see. Joe's patch does that in one place. It's probably worth doing
> that in the other places this "just-barefly overflow" comment appears IMHO.
>
> I was still confused by the comment about 1999, but I tracked it down to
>commit 542eeba [0]. IIUC it literally means that we need special handling
>for that date because POSTGRES_EPOCH_JDATE is 2000-01-01.
>
> [0] https://postgr.es/m/CABUevEx5zUO%3DKRUg06a9qnQ_e9EvTKscL6HxAM_L3xj71R7AQw%40mail.gmail.com
> Yeah, I think so, and I think we probably don't need any special care
> if we switch to direct tests of overflow-aware primitives. (Though
>it'd be worth checking that '1999-12-31 24:00:00'::timestamp still
> works. It doesn't look like I actually added a test case for that.)
The only other place I found this comment was in
`make_timestamp_internal`. I've updated that function and added some
tests. I also manually verified that the behavior matches before and
after this patch.
>> BTW, while I approve of trying to get rid of our need for -fwrapv,
>> I'm quite scared of actually doing it.
>
> I think that's a quite fair concern. One potentially relevant datapoint is
> that we actually don't have -fwrapv equivalent on all platforms, and I don't
>recall a lot of complaints from windows users. Of course it's quite possible
> that they'd never notice...
>
> I think this is a good argument for enabling -ftrapv in development
> builds. That gives us at least a *chance* of seeing these issues.
+1, I wouldn't recommend removing -fwrapv immediately after this
commit. However, if we can enable -ftrapv in development builds, then
we can find overflows much more easily.
> Whatever cases you may have discovered by running the regression tests will
> be at best the tip of the iceberg.
Agreed.
> Is there any chance of using static
> analysis to find all the places of concern?
I'm not personally familiar with any static analysis tools, but I can
try and do some research. Andres had previously suggested SQLSmith. I
think any kind of fuzz testing with -ftrapv enabled will reveal a lot
of issues. Honestly just grepping for +,-,* in certain directories
(like backend/utils/adt) would probably be fairly fruitful for anyone
with the patience. My previous overflow patch was the result of looking
through all the arithmetic in datetime.c.
Thanks,
Joe Koshakow
Attachment
On Tue, Jun 11, 2024 at 09:31:39AM -0400, Joseph Koshakow wrote: > tmp is an uint16 here, it seems like you might have read it as an > int16? We would need some helper method like > > static inline bool > pg_neg_u16_overflow(uint16 a, int16 *result); > > and then we could replace that whole chunk with something like > > if (unlikely(pg_neg_u16_overflow(tmp, &result))) > goto out_of_range; > else > return result; > > > that pattern shows up a lot in this file, but I was worried that it > wasn't useful as a general purpose function. Happy to add it > though if you still feel otherwise. I personally find that much easier to read. Since the existing open-coded overflow check is apparently insufficient, I think there's a reasonably strong case for centralizing this sort of thing so that we don't continue to make the same mistakes. >> Ah, I see. Joe's patch does that in one place. It's probably worth doing >> that in the other places this "just-barefly overflow" comment appears >> IMHO. > > The only other place I found this comment was in > `make_timestamp_internal`. I've updated that function and added some > tests. I also manually verified that the behavior matches before and > after this patch. tm2timestamp() in src/interfaces/ecpg/pgtypeslib/timestamp.c has the same comment. The code there looks very similar to the code for tm2timestamp() in the other timestamp.c... -- nathan
On Tue, Jun 11, 2024 at 12:22 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I personally find that much easier to read. Since the existing open-coded
> overflow check is apparently insufficient, I think there's a reasonably
> strong case for centralizing this sort of thing so that we don't continue
> to make the same mistakes.
Sounds good, the attached patch has these changes.
> tm2timestamp() in src/interfaces/ecpg/pgtypeslib/timestamp.c has the same
> comment. The code there looks very similar to the code for tm2timestamp()
> in the other timestamp.c...
The attached patch has updated this file too. For some reason I was
under the impression that I should leave the ecpg/ files alone, though
I can't remember why.
Thanks,
Joe Koshakow
> I personally find that much easier to read. Since the existing open-coded
> overflow check is apparently insufficient, I think there's a reasonably
> strong case for centralizing this sort of thing so that we don't continue
> to make the same mistakes.
Sounds good, the attached patch has these changes.
> tm2timestamp() in src/interfaces/ecpg/pgtypeslib/timestamp.c has the same
> comment. The code there looks very similar to the code for tm2timestamp()
> in the other timestamp.c...
The attached patch has updated this file too. For some reason I was
under the impression that I should leave the ecpg/ files alone, though
I can't remember why.
Thanks,
Joe Koshakow
Attachment
On Tue, Jun 11, 2024 at 09:10:44PM -0400, Joseph Koshakow wrote: > The attached patch has updated this file too. For some reason I was > under the impression that I should leave the ecpg/ files alone, though > I can't remember why. Thanks. This looks pretty good to me after a skim, so I'll see about committing/back-patching it in the near future. IIUC there is likely more to come in this area, but I see no need to wait. (I'm chuckling that we are adding Y2K tests in 2024...) -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes: > Thanks. This looks pretty good to me after a skim, so I'll see about > committing/back-patching it in the near future. IIUC there is likely more > to come in this area, but I see no need to wait. Uh ... what of this would we back-patch? It seems like v18 material to me. regards, tom lane
On Wed, Jun 12, 2024 at 11:45:20AM -0400, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: >> Thanks. This looks pretty good to me after a skim, so I'll see about >> committing/back-patching it in the near future. IIUC there is likely more >> to come in this area, but I see no need to wait. > > Uh ... what of this would we back-patch? It seems like v18 > material to me. D'oh. I was under the impression that the numutils.c changes were arguably bug fixes. Even in that case, we should probably split out the other stuff for v18. But you're probably right that we could just wait for all of it. -- nathan
On Mon, Jun 10, 2024 at 2:30 PM Andres Freund <andres@anarazel.de> wrote: > On 2024-06-09 21:57:54 -0400, Tom Lane wrote: > > BTW, while I approve of trying to get rid of our need for -fwrapv, > > I'm quite scared of actually doing it. IMV it's perfectly fine to defensively assume that we need -fwrapv, even given a total lack of evidence that removing it would cause harm. Doing it to "be more in line with the standard" is a terrible argument. > I think that's a quite fair concern. One potentially relevant datapoint is > that we actually don't have -fwrapv equivalent on all platforms, and I don't > recall a lot of complaints from windows users. That might just be because MSVC inherently doesn't do optimizations that rely on signed wrap being undefined behavior. Last I checked MSVC just doesn't rely on the standard's strict aliasing rules, even as an opt-in thing. > Of course it's quite possible > that they'd never notice... Removing -fwrapv is something that I see as a potential optimization. It should be justified in about the same way as any other optimization. I suspect that it doesn't actually have any clear benefits for us. But if I'm wrong about that then the benefit might still be achievable through other means (something far short of just removing -fwrapv globally). > I think this is a good argument for enabling -ftrapv in development > builds. That gives us at least a *chance* of seeing these issues. +1. I'm definitely prepared to say that code that actively relies on -fwrapv is broken code. -- Peter Geoghegan
10.06.2024 04:57, Tom Lane wrote: > BTW, while I approve of trying to get rid of our need for -fwrapv, > I'm quite scared of actually doing it. Whatever cases you may have > discovered by running the regression tests will be at best the > tip of the iceberg. Is there any chance of using static analysis > to find all the places of concern? Let me remind you of bug #18240. Yes, that was about float8, but with -ftrapv we can get into the trap with: SELECT 1_000_000_000::money * 1_000_000_000::int; server closed the connection unexpectedly Also there are several trap-producing cases with date types: SELECT to_date('100000000', 'CC'); SELECT to_timestamp('1000000000,999', 'Y,YYY'); SELECT make_date(-2147483648, 1, 1); And one more with array... CREATE TABLE t (ia int[]); INSERT INTO t(ia[2147483647:2147483647]) VALUES ('{}'); I think it's not the whole iceberg too. Best regards, Alexander
On Thu, Jun 13, 2024 at 12:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Let me remind you of bug #18240. Yes, that was about float8, but with
> -ftrapv we can get into the trap with:
> SELECT 1_000_000_000::money * 1_000_000_000::int;
> server closed the connection unexpectedly
Interesting, it looks like there's no overflow handling of any money
arithmetic. I've attached
v4-0002-Handle-overflow-in-money-arithmetic.patch which adds some
overflow checks and tests. I didn't address the float multiplication
because I didn't see any helper methods in int.h. I did some some
useful helpers in float.h, but they raise an error directly instead
of returning a bool. Would those be appropriate for use with the
money type? If not I can refactor out the inner parts into a new method
that returns a bool.
v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I
just incremented the version number.
> Also there are several trap-producing cases with date types:
> SELECT to_date('100000000', 'CC');
> SELECT to_timestamp('1000000000,999', 'Y,YYY');
> SELECT make_date(-2147483648, 1, 1);
>
> And one more with array...
> CREATE TABLE t (ia int[]);
> INSERT INTO t(ia[2147483647:2147483647]) VALUES ('{}');
I'll try and get patches to address these too in the next couple of
weeks unless someone beats me to it.
> I think it's not the whole iceberg too.
+1
Thanks,
Joe Koshakow
>
> Let me remind you of bug #18240. Yes, that was about float8, but with
> -ftrapv we can get into the trap with:
> SELECT 1_000_000_000::money * 1_000_000_000::int;
> server closed the connection unexpectedly
Interesting, it looks like there's no overflow handling of any money
arithmetic. I've attached
v4-0002-Handle-overflow-in-money-arithmetic.patch which adds some
overflow checks and tests. I didn't address the float multiplication
because I didn't see any helper methods in int.h. I did some some
useful helpers in float.h, but they raise an error directly instead
of returning a bool. Would those be appropriate for use with the
money type? If not I can refactor out the inner parts into a new method
that returns a bool.
v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I
just incremented the version number.
> Also there are several trap-producing cases with date types:
> SELECT to_date('100000000', 'CC');
> SELECT to_timestamp('1000000000,999', 'Y,YYY');
> SELECT make_date(-2147483648, 1, 1);
>
> And one more with array...
> CREATE TABLE t (ia int[]);
> INSERT INTO t(ia[2147483647:2147483647]) VALUES ('{}');
I'll try and get patches to address these too in the next couple of
weeks unless someone beats me to it.
> I think it's not the whole iceberg too.
+1
Thanks,
Joe Koshakow
Attachment
On Thu, Jun 13, 2024 at 10:48 PM Joseph Koshakow <koshy44@gmail.com> wrote:
> I've attached
> v4-0002-Handle-overflow-in-money-arithmetic.patch which adds some
> overflow checks and tests. I didn't address the float multiplication
> because I didn't see any helper methods in int.h. I did some some
> useful helpers in float.h, but they raise an error directly instead
> of returning a bool. Would those be appropriate for use with the
> money type? If not I can refactor out the inner parts into a new method
> that returns a bool.
> v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I
> just incremented the version number.
Oops I left a careless mistake in that last patch, my apologies. It's
> I've attached
> v4-0002-Handle-overflow-in-money-arithmetic.patch which adds some
> overflow checks and tests. I didn't address the float multiplication
> because I didn't see any helper methods in int.h. I did some some
> useful helpers in float.h, but they raise an error directly instead
> of returning a bool. Would those be appropriate for use with the
> money type? If not I can refactor out the inner parts into a new method
> that returns a bool.
> v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I
> just incremented the version number.
Oops I left a careless mistake in that last patch, my apologies. It's
fixed in the attached patches.
Thanks,
Joe Koshakow
Attachment
14.06.2024 05:48, Joseph Koshakow wrote: > > v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I > just incremented the version number. > > > Also there are several trap-producing cases with date types: > > SELECT to_date('100000000', 'CC'); > > SELECT to_timestamp('1000000000,999', 'Y,YYY'); > > SELECT make_date(-2147483648, 1, 1); > > > > And one more with array... > > CREATE TABLE t (ia int[]); > > INSERT INTO t(ia[2147483647:2147483647]) VALUES ('{}'); > > I'll try and get patches to address these too in the next couple of > weeks unless someone beats me to it. > > > I think it's not the whole iceberg too. > > +1 After sending my message, I toyed with -ftrapv a little time more and found other cases: SELECT '[]'::jsonb -> -2147483648; #4 0x00007efe232d67f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x000055e8fde9f211 in __negvsi2 () #6 0x000055e8fdcca62c in jsonb_array_element (fcinfo=0x55e8fec28220) at jsonfuncs.c:948 (gdb) f 6 #6 0x000055e14cb9362c in jsonb_array_element (fcinfo=0x55e14d493220) at jsonfuncs.c:948 948 if (-element > nelements) (gdb) p element $1 = -2147483648 --- SELECT jsonb_delete_path('{"a":[]}', '{"a",-2147483648}'); #4 0x00007f1873bef7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000564a009d2211 in __negvsi2 () #6 0x0000564a00807c89 in setPathArray (it=0x7fff865c7380, path_elems=0x564a017baf20, path_nulls=0x564a017baf40, path_len=2, st=0x7fff865c7388, level=1, newval=0x0, nelems=2, op_type=2) at jsonfuncs.c:5407 (gdb) f 6 #6 0x000055985e823c89 in setPathArray (it=0x7ffc22258fe0, path_elems=0x559860286f20, path_nulls=0x559860286f40, path_len=2, st=0x7ffc22258fe8, level=1, newval=0x0, nelems=0, op_type=2) at jsonfuncs.c:5407 5407 if (-idx > nelems) (gdb) p idx $1 = -2147483648 --- CREATE FUNCTION check_foreign_key () RETURNS trigger AS .../refint.so' LANGUAGE C; CREATE TABLE t (i int4 NOT NULL); CREATE TRIGGER check_fkey BEFORE DELETE ON t FOR EACH ROW EXECUTE PROCEDURE check_foreign_key (2147483647, 'cascade', 'i', "ft", "i"); INSERT INTO t VALUES (1); DELETE FROM t; #4 0x00007f57f0bef7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007f57f1671351 in __addvsi3 () from .../src/test/regress/refint.so #6 0x00007f57f1670234 in check_foreign_key (fcinfo=0x7ffebf523650) at refint.c:321 (gdb) f 6 #6 0x00007f3400ef9234 in check_foreign_key (fcinfo=0x7ffd6e16a600) at refint.c:321 321 nkeys = (nargs - nrefs) / (nrefs + 1); (gdb) p nargs $1 = 3 (gdb) p nrefs $2 = 2147483647 --- And the most interesting case to me: SET temp_buffers TO 1000000000; CREATE TEMP TABLE t(i int PRIMARY KEY); INSERT INTO t VALUES(1); #4 0x00007f385cdd37f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00005620071c4f51 in __addvsi3 () #6 0x0000562007143f3c in init_htab (hashp=0x562008facb20, nelem=610070812) at dynahash.c:720 (gdb) f 6 #6 0x0000560915207f3c in init_htab (hashp=0x560916039930, nelem=1000000000) at dynahash.c:720 720 hctl->high_mask = (nbuckets << 1) - 1; (gdb) p nbuckets $1 = 1073741824 Best regards, Alexander
On Thu, Jun 13, 2024 at 10:56 PM Joseph Koshakow <koshy44@gmail.com> wrote:
On Thu, Jun 13, 2024 at 10:48 PM Joseph Koshakow <koshy44@gmail.com> wrote:
> I've attached
> v4-0002-Handle-overflow-in-money-arithmetic.patch which adds some
> overflow checks and tests. I didn't address the float multiplication
> because I didn't see any helper methods in int.h. I did some some
> useful helpers in float.h, but they raise an error directly instead
> of returning a bool. Would those be appropriate for use with the
> money type? If not I can refactor out the inner parts into a new method
> that returns a bool.
> v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I
> just incremented the version number.
I added overflow handling for float arithmetic to the `money` type.
v6-0002-Handle-overflow-in-money-arithmetic.patch is ready for review.
v6-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I
just incremented the version number.
Thanks,
Joe Koshakow
On Thu, Jun 13, 2024 at 10:48 PM Joseph Koshakow <koshy44@gmail.com> wrote:
> I've attached
> v4-0002-Handle-overflow-in-money-arithmetic.patch which adds some
> overflow checks and tests. I didn't address the float multiplication
> because I didn't see any helper methods in int.h. I did some some
> useful helpers in float.h, but they raise an error directly instead
> of returning a bool. Would those be appropriate for use with the
> money type? If not I can refactor out the inner parts into a new method
> that returns a bool.
> v4-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I
> just incremented the version number.
I added overflow handling for float arithmetic to the `money` type.
v6-0002-Handle-overflow-in-money-arithmetic.patch is ready for review.
v6-0001-Remove-dependence-on-integer-wrapping.patch is unchanged, I
just incremented the version number.
Thanks,
Joe Koshakow
Attachment
On Thu, Jun 13, 2024 at 12:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> And one more with array...
> CREATE TABLE t (ia int[]);
> INSERT INTO t(ia[2147483647:2147483647]) VALUES ('{}');
I've added another patch, 0003, to resolve this wrap-around. In fact I
discovered a bug that the following statement is accepted and inserts
an empty array into the table.
INSERT INTO t(ia[-2147483648:2147483647]) VALUES ('{}');
My patch resolves this bug as well.
The other patches, 0001 and 0002, are unchanged but have their version
Thanks,
Joe Koshakow
>
> And one more with array...
> CREATE TABLE t (ia int[]);
> INSERT INTO t(ia[2147483647:2147483647]) VALUES ('{}');
I've added another patch, 0003, to resolve this wrap-around. In fact I
discovered a bug that the following statement is accepted and inserts
an empty array into the table.
INSERT INTO t(ia[-2147483648:2147483647]) VALUES ('{}');
My patch resolves this bug as well.
The other patches, 0001 and 0002, are unchanged but have their version
number incremented.
As a reminder, 0001 is reviewed and waiting for v18 and a committer.
0002 and 0003 are unreviewed. So, I'm going to mark this as waiting for
a reviewer.
0002 and 0003 are unreviewed. So, I'm going to mark this as waiting for
a reviewer.
Thanks,
Joe Koshakow
Attachment
On Thu, Jun 13, 2024 at 12:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> SELECT '[]'::jsonb -> -2147483648;
>
> #4 0x00007efe232d67f3 in __GI_abort () at ./stdlib/abort.c:79
> #5 0x000055e8fde9f211 in __negvsi2 ()
> #6 0x000055e8fdcca62c in jsonb_array_element (fcinfo=0x55e8fec28220) at jsonfuncs.c:948
>
> (gdb) f 6
> #6 0x000055e14cb9362c in jsonb_array_element (fcinfo=0x55e14d493220) at jsonfuncs.c:948
> 948 if (-element > nelements)
> (gdb) p element
> $1 = -2147483648
>
> ---
> SELECT jsonb_delete_path('{"a":[]}', '{"a",-2147483648}');
>
> #4 0x00007f1873bef7f3 in __GI_abort () at ./stdlib/abort.c:79
> #5 0x0000564a009d2211 in __negvsi2 ()
> #6 0x0000564a00807c89 in setPathArray (it=0x7fff865c7380, path_elems=0x564a017baf20, path_nulls=0x564a017baf40,
> path_len=2, st=0x7fff865c7388, level=1, newval=0x0, nelems=2, op_type=2) at jsonfuncs.c:5407
>
> (gdb) f 6
> #6 0x000055985e823c89 in setPathArray (it=0x7ffc22258fe0, path_elems=0x559860286f20, path_nulls=0x559860286f40,
> path_len=2, st=0x7ffc22258fe8, level=1, newval=0x0, nelems=0, op_type=2) at jsonfuncs.c:5407
> 5407 if (-idx > nelems)
> (gdb) p idx
> $1 = -2147483648
I've added another patch, 0004, to resolve the jsonb wrap-arounds.
The other patches, 0001, 0002, and 0003 are unchanged but have their
version number incremented.
Thanks,
Joe Koshakow
> SELECT '[]'::jsonb -> -2147483648;
>
> #4 0x00007efe232d67f3 in __GI_abort () at ./stdlib/abort.c:79
> #5 0x000055e8fde9f211 in __negvsi2 ()
> #6 0x000055e8fdcca62c in jsonb_array_element (fcinfo=0x55e8fec28220) at jsonfuncs.c:948
>
> (gdb) f 6
> #6 0x000055e14cb9362c in jsonb_array_element (fcinfo=0x55e14d493220) at jsonfuncs.c:948
> 948 if (-element > nelements)
> (gdb) p element
> $1 = -2147483648
>
> ---
> SELECT jsonb_delete_path('{"a":[]}', '{"a",-2147483648}');
>
> #4 0x00007f1873bef7f3 in __GI_abort () at ./stdlib/abort.c:79
> #5 0x0000564a009d2211 in __negvsi2 ()
> #6 0x0000564a00807c89 in setPathArray (it=0x7fff865c7380, path_elems=0x564a017baf20, path_nulls=0x564a017baf40,
> path_len=2, st=0x7fff865c7388, level=1, newval=0x0, nelems=2, op_type=2) at jsonfuncs.c:5407
>
> (gdb) f 6
> #6 0x000055985e823c89 in setPathArray (it=0x7ffc22258fe0, path_elems=0x559860286f20, path_nulls=0x559860286f40,
> path_len=2, st=0x7ffc22258fe8, level=1, newval=0x0, nelems=0, op_type=2) at jsonfuncs.c:5407
> 5407 if (-idx > nelems)
> (gdb) p idx
> $1 = -2147483648
I've added another patch, 0004, to resolve the jsonb wrap-arounds.
The other patches, 0001, 0002, and 0003 are unchanged but have their
version number incremented.
Thanks,
Joe Koshakow
Attachment
On Sat, Jul 06, 2024 at 07:04:38PM -0400, Joseph Koshakow wrote: > I've added another patch, 0004, to resolve the jsonb wrap-arounds. > > The other patches, 0001, 0002, and 0003 are unchanged but have their > version number incremented. IIUC some of these changes are bug fixes. Can we split out the bug fixes to their own patches so that they can be back-patched? -- nathan
On Fri, Jul 12, 2024 at 12:49 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Sat, Jul 06, 2024 at 07:04:38PM -0400, Joseph Koshakow wrote:
>> I've added another patch, 0004, to resolve the jsonb wrap-arounds.
>>
>> The other patches, 0001, 0002, and 0003 are unchanged but have their
>> version number incremented.
>
> IIUC some of these changes are bug fixes. Can we split out the bug fixes
> to their own patches so that they can be back-patched?
They happen to already be split out into their own patches. 0002 and
0003 are both bug fixes (in the sense that they fix queries that
produce incorrect results even with -fwrapv). They also both apply
cleanly to master. If it would be useful, I can re-order the patches so
that the bug-fixes are first.
Thanks,
Joe Koshakow
On Sat, Jul 13, 2024 at 10:24:16AM -0400, Joseph Koshakow wrote: > On Fri, Jul 12, 2024 at 12:49 PM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> IIUC some of these changes are bug fixes. Can we split out the bug fixes >> to their own patches so that they can be back-patched? > > They happen to already be split out into their own patches. 0002 and > 0003 are both bug fixes (in the sense that they fix queries that > produce incorrect results even with -fwrapv). They also both apply > cleanly to master. If it would be useful, I can re-order the patches so > that the bug-fixes are first. Oh, thanks. I'm planning on taking a closer look at this one next week. -- nathan
I took a closer look at 0002. + if (unlikely(isinf(f) || isnan(f))) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("invalid float value"))); + + fresult = rint(f * c); + if (unlikely(f == 0.0)) + ereport(ERROR, + (errcode(ERRCODE_DIVISION_BY_ZERO), + errmsg("division by zero"))); + if (unlikely(isinf(f) || isnan(f))) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("invalid float value"))); + + fresult = rint(c / f); I'm curious why you aren't using float8_mul/float8_div here, i.e., fresult = rint(float8_mul((float8) c, f)); fresult = rint(float8_div((float8) c, f)); nitpick: I'd name the functions something like "cash_mul_float8" and "cash_div_float8". Perhaps we could also add functions like "cash_mul_int64" and "cash_sub_int64" so that we don't need several copies of the same "money out of range" ERROR. -- nathan
Thanks for the review!
On Mon, Jul 15, 2024 at 11:31 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> I took a closer look at 0002.
>
> I'm curious why you aren't using float8_mul/float8_div here, i.e.,
>
> fresult = rint(float8_mul((float8) c, f));
> fresult = rint(float8_div((float8) c, f));
I wrongly assumed that it was only meant to be used to implement
multiplication and division for the built-in float types. I've updated
the patch to use these functions.
> nitpick: I'd name the functions something like "cash_mul_float8" and
> "cash_div_float8". Perhaps we could also add functions like
> "cash_mul_int64"
Done in the updated patch.
> and "cash_sub_int64"
Did you mean "cash_div_int64"? There's only a single function that
subtracts cash and an integer, but there's multiple functions that
divide cash by an integer. I've added a "cash_div_int64" in the updated
patch.
The other patches, 0001, 0003, and 0004 are unchanged but have their
version number incremented.
Thanks,
Joe Koshakow
On Mon, Jul 15, 2024 at 11:31 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> I took a closer look at 0002.
>
> I'm curious why you aren't using float8_mul/float8_div here, i.e.,
>
> fresult = rint(float8_mul((float8) c, f));
> fresult = rint(float8_div((float8) c, f));
I wrongly assumed that it was only meant to be used to implement
multiplication and division for the built-in float types. I've updated
the patch to use these functions.
> nitpick: I'd name the functions something like "cash_mul_float8" and
> "cash_div_float8". Perhaps we could also add functions like
> "cash_mul_int64"
Done in the updated patch.
> and "cash_sub_int64"
Did you mean "cash_div_int64"? There's only a single function that
subtracts cash and an integer, but there's multiple functions that
divide cash by an integer. I've added a "cash_div_int64" in the updated
patch.
The other patches, 0001, 0003, and 0004 are unchanged but have their
version number incremented.
Thanks,
Joe Koshakow
Attachment
On Mon, Jul 15, 2024 at 07:55:22PM -0400, Joseph Koshakow wrote: > On Mon, Jul 15, 2024 at 11:31 AM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> I'm curious why you aren't using float8_mul/float8_div here, i.e., >> >> fresult = rint(float8_mul((float8) c, f)); >> fresult = rint(float8_div((float8) c, f)); > > I wrongly assumed that it was only meant to be used to implement > multiplication and division for the built-in float types. I've updated > the patch to use these functions. The reason I suggested this is so that we could omit all the prerequisite isinf(), isnan(), etc. checks in the cash_mul_float8() and friends. The checks are slighly different, but from a quick glance it just looks like we might end up relying on the FLOAT8_FITS_IN_INT64 check in more cases. >> and "cash_sub_int64" > > Did you mean "cash_div_int64"? There's only a single function that > subtracts cash and an integer, but there's multiple functions that > divide cash by an integer. I've added a "cash_div_int64" in the updated > patch. My personal preference would be to add helper functions for each of these so that all the overflow, etc. checks are centralized in one place and don't clutter the calling code. Plus, it might help ensure error handling/messages remain consistent. +static Cash +cash_mul_float8(Cash c, float8 f) nitpick: Can you mark these "inline"? I imagine most compilers inline them without any prompting, but we might as well make our intent clear. -- nathan
On Tue, Jul 16, 2024 at 1:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>>
>> On Mon, Jul 15, 2024 at 07:55:22PM -0400, Joseph Koshakow wrote:
>> On Mon, Jul 15, 2024 at 11:31 AM Nathan Bossart <nathandbossart@gmail.com>
>> wrote:
>>> I'm curious why you aren't using float8_mul/float8_div here, i.e.,
>>>
>>> fresult = rint(float8_mul((float8) c, f));
>>> fresult = rint(float8_div((float8) c, f));
>>
>> I wrongly assumed that it was only meant to be used to implement
>> multiplication and division for the built-in float types. I've updated
>> the patch to use these functions.
>
> The reason I suggested this is so that we could omit all the prerequisite
> isinf(), isnan(), etc. checks in the cash_mul_float8() and friends. The
> checks are slighly different, but from a quick glance it just looks like we
> might end up relying on the FLOAT8_FITS_IN_INT64 check in more cases.
I don't think we can omit the prerequisite isnan() checks. Neither
float8_mul() nor float8_div() reject nan inputs/result, and
FLOAT8_FITS_IN_INT64 has the following to say about inf and nan
These macros will do the right thing for Inf, but not necessarily for NaN,
so check isnan(num) first if that's a possibility.
Though, I think you're right that we can remove the isinf() check from
cash_mul_float8(). That check is fully covered by FLOAT8_FITS_IN_INT64,
since all infinite inputs will result in an infinite output. That also
makes the infinite result check in float8_mul() redundant.
Additionally, I believe that the underflow check in float8_mul() is
unnecessary. val1 is an int64 casted to a float8, so it can never be
-1 < val < 1, so it can never cause an underflow to 0. So I went ahead
and removed float8_mul() since all of its checks are redundant.
For cash_div_float8() we have a choice. The isinf() input check
protects against the following, which is not rejected by any of
the other checks.
test=# SELECT '5'::money / 'inf'::float8;
?column?
----------
$0.00
(1 row)
For now, I've kept the isinf() input check to reject the above query,
let me know if you think we should allow this.
The infinite check in float8_div() is redundant because it's covered
by FLOAT8_FITS_IN_INT64. Also, the underflow check in float8_div() is
unnecessary for similar reasons to float8_mul(). So if we continue to
have a divide by zero check in cash_div_float8(), then we can remove
float8_div() as well.
>>> and "cash_sub_int64"
>>
>> Did you mean "cash_div_int64"? There's only a single function that
>> subtracts cash and an integer, but there's multiple functions that
>> divide cash by an integer. I've added a "cash_div_int64" in the updated
>> patch.
>
> My personal preference would be to add helper functions for each of these
> so that all the overflow, etc. checks are centralized in one place and
> don't clutter the calling code. Plus, it might help ensure error
> handling/messages remain consistent.
Ah, OK. I've added helpers for both subtraction and addition then.
> +static Cash
> +cash_mul_float8(Cash c, float8 f)
>
> nitpick: Can you mark these "inline"? I imagine most compilers inline them
> without any prompting, but we might as well make our intent clear.
Updated in the attached patch.
Once again, the other patches, 0001, 0003, and 0004 are unchanged but
have their version number incremented.
have their version number incremented.
Thanks,
Joe Koshakow
Attachment
On Tue, Jul 16, 2024 at 09:23:27PM -0400, Joseph Koshakow wrote: > On Tue, Jul 16, 2024 at 1:57 PM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> The reason I suggested this is so that we could omit all the prerequisite >> isinf(), isnan(), etc. checks in the cash_mul_float8() and friends. The >> checks are slighly different, but from a quick glance it just looks like > we >> might end up relying on the FLOAT8_FITS_IN_INT64 check in more cases. > > I don't think we can omit the prerequisite isnan() checks. Neither > float8_mul() nor float8_div() reject nan inputs/result, and > FLOAT8_FITS_IN_INT64 has the following to say about inf and nan > > These macros will do the right thing for Inf, but not necessarily for > NaN, > so check isnan(num) first if that's a possibility. My instinct would be to just let float8_mul(), float8_div(), etc. handle the inputs and then to add an additional isnan() check for the result. This is how it's done elsewhere (e.g., dtoi8() in int8.c). If something about the float8 helper functions is inadequate, let's go fix those instead. > Though, I think you're right that we can remove the isinf() check from > cash_mul_float8(). That check is fully covered by FLOAT8_FITS_IN_INT64, > since all infinite inputs will result in an infinite output. That also > makes the infinite result check in float8_mul() redundant. > Additionally, I believe that the underflow check in float8_mul() is > unnecessary. val1 is an int64 casted to a float8, so it can never be > -1 < val < 1, so it can never cause an underflow to 0. So I went ahead > and removed float8_mul() since all of its checks are redundant. TBH I'd rather keep this stuff super simple by farming out the corner case handling whenever we can, even if it is redundant in a few cases. That keeps it centralized and consistent with the rest of the tree so that any fixes apply everywhere. If there's a performance angle, then maybe it's worth considering the added complexity, though... > For cash_div_float8() we have a choice. The isinf() input check > protects against the following, which is not rejected by any of > the other checks. > > test=# SELECT '5'::money / 'inf'::float8; > ?column? > ---------- > $0.00 > (1 row) > > For now, I've kept the isinf() input check to reject the above query, > let me know if you think we should allow this. We appear to allow this for other data types, so I see no reason to disallow it for the money type. I've attached an editorialized version of 0002 based on my thoughts above. -- nathan
Attachment
On Tue, Jul 16, 2024 at 11:17 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I've attached an editorialized version of 0002 based on my thoughts above.
Looks great, thanks!
Thanks,
Joe Koshakow
> I've attached an editorialized version of 0002 based on my thoughts above.
Looks great, thanks!
Thanks,
Joe Koshakow
On Wed, Jul 17, 2024 at 9:23 AM Joseph Koshakow <koshy44@gmail.com> wrote: > > Updated in the attached patch. > > Once again, the other patches, 0001, 0003, and 0004 are unchanged but > have their version number incremented. > +-- Test for overflow in array slicing +CREATE temp table arroverflowtest (i int[]); +INSERT INTO arroverflowtest(i[-2147483648:2147483647]) VALUES ('{}'); +INSERT INTO arroverflowtest(i[1:2147483647]) VALUES ('{}'); +INSERT INTO arroverflowtest(i[2147483647:2147483647]) VALUES ('{}'); +INSERT INTO arroverflowtest(i[2147483647:2147483647]) VALUES ('{}'); this example, master error is ERROR: source array too small your patch error is ERROR: array size exceeds the maximum allowed (134217727) INSERT INTO arroverflowtest(i[2147483647:2147483647]) VALUES ('{1}'); master: ERROR: array lower bound is too large: 2147483647 you patch ERROR: array size exceeds the maximum allowed (134217727) i think "INSERT INTO arroverflowtest(i[2147483647:2147483647]) VALUES ('{}');" means to insert one element (size) to a customized lower/upper bounds.
On Wed, Jul 17, 2024 at 9:31 PM jian he <jian.universality@gmail.com> wrote:
>
> i think "INSERT INTO arroverflowtest(i[2147483647:2147483647]) VALUES ('{}');"
> means to insert one element (size) to a customized lower/upper bounds.
Ah, thank you, I mistakenly understood that as an array with size
2147483647, with the first 2147483646 elements NULL.
I've updated the first calculation (upper_bound + 1) to retrun an error
saying "array upper bound is too large: %d" when it overflows. This
will change some of the existing error messages, but is just as correct
and doesn't require us to check the source array. Is there backwards
compatibility guarantees on error messages or is that acceptable?
For the second calculation ((upper_bound + 1) - lower_bound), I've kept the
existing error of "array size exceeds the maximum allowed (%d)". The
only way for that to underflow is if the upper bound is very negative
and the lower bound is very positive. I'm not entirely sure how to
interpret this scenario, but it's consistent with similar scenarios.
# INSERT INTO arroverflowtest(i[10:-999999]) VALUES ('{1,2,3}');
ERROR: array size exceeds the maximum allowed (134217727)
As a reminder:
- 0001 is reviewed.
- 0002 is reviewed and a bug fix.
- 0003 is currently under review and a bug fix.
- 0004 needs a review.
Thanks,
Joe Koshakow
Attachment
On Thu, Jul 18, 2024 at 09:08:30PM -0400, Joseph Koshakow wrote: > As a reminder: > - 0001 is reviewed. > - 0002 is reviewed and a bug fix. > - 0003 is currently under review and a bug fix. > - 0004 needs a review. I've committed/back-patched 0002. I plan to review 0003 next. -- nathan
I took a look at 0003. + /* dim[i] = 1 + upperIndx[i] - lowerIndx[i]; */ + if (pg_add_s32_overflow(1, upperIndx[i], &dim[i])) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("array upper bound is too large: %d", + upperIndx[i]))); + if (pg_sub_s32_overflow(dim[i], lowerIndx[i], &dim[i])) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("array size exceeds the maximum allowed (%d)", + (int) MaxArraySize))); I think the problem with fixing it this way is that it prohibits more than is necessary. For example, doing the subtraction first might prevent the addition from overflowing, and doing the addition first can prevent the subtraction from overflowing. Granted, this is probably not really worth worrying about too much, but we're already dealing with "absurd slice ranges," so we might as well set an example for elsewhere. An easy way to deal with this problem is to first perform the calculation with everything cast to an int64. Before setting dim[i], you'd check that the result is in [PG_INT32_MIN, PG_INT32_MAX] and fail if needed. int64 newdim; ... newdim = (int64) 1 + (int64) upperIndx[i] - (int64) lowerIndx[i]; if (unlikely(newdim < PG_INT32_MIN || newdim > PG_INT32_MAX)) ereport(ERROR, ... dim[i] = (int32) newdim; -- nathan
On Fri, Jul 19, 2024 at 2:45 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> + /* dim[i] = 1 + upperIndx[i] - lowerIndx[i]; */
> + if (pg_add_s32_overflow(1, upperIndx[i], &dim[i]))
> + ereport(ERROR,
> + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> + errmsg("array upper bound is too large: %d",
> + upperIndx[i])));
> + if (pg_sub_s32_overflow(dim[i], lowerIndx[i], &dim[i]))
> + ereport(ERROR,
> + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> + errmsg("array size exceeds the maximum allowed (%d)",
> + (int) MaxArraySize)));
>
> I think the problem with fixing it this way is that it prohibits more than
> is necessary.
My understanding is that 2147483647 (INT32_MAX) is not a valid upper
bound, which is what the first overflow check is checking. Any query of
the form
`INSERT INTO arroverflowtest(i[<lb>:2147483647]) VALUES ('{...}');`
will fail with an error of
`ERROR: array lower bound is too large: <lb>`
The reason is the following bounds check found in arrayutils.c
/*
* Verify sanity of proposed lower-bound values for an array
*
* The lower-bound values must not be so large as to cause overflow when
* calculating subscripts, e.g. lower bound 2147483640 with length 10
* must be disallowed. We actually insist that dims[i] + lb[i] be
* computable without overflow, meaning that an array with last subscript
* equal to INT_MAX will be disallowed.
*
* It is assumed that the caller already called ArrayGetNItems, so that
* overflowed (negative) dims[] values have been eliminated.
*/
void
ArrayCheckBounds(int ndim, const int *dims, const int *lb)
{
(void) ArrayCheckBoundsSafe(ndim, dims, lb, NULL);
}
/*
* This entry point can return the error into an ErrorSaveContext
* instead of throwing an exception.
*/
bool
ArrayCheckBoundsSafe(int ndim, const int *dims, const int *lb,
struct Node *escontext)
{
int i;
for (i = 0; i < ndim; i++)
{
/* PG_USED_FOR_ASSERTS_ONLY prevents variable-isn't-read warnings */
int32 sum PG_USED_FOR_ASSERTS_ONLY;
if (pg_add_s32_overflow(dims[i], lb[i], &sum))
ereturn(escontext, false,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("array lower bound is too large: %d",
lb[i])));
}
return true;
}
Specifically "We actually insist that dims[i] + lb[i] be computable
without overflow, meaning that an array with last subscript equal to
INT32_MAX will be disallowed." If the upper bound is INT32_MAX,
then there's no lower bound where (lower_bound + size) won't overflow.
It might be possible to remove this restriction, but it's probably
easier to keep it.
> An easy way to deal with this problem is to first perform the calculation
> with everything cast to an int64. Before setting dim[i], you'd check that
> the result is in [PG_INT32_MIN, PG_INT32_MAX] and fail if needed.
>
> int64 newdim;
>
> ...
>
> newdim = (int64) 1 + (int64) upperIndx[i] - (int64) lowerIndx[i];
> if (unlikely(newdim < PG_INT32_MIN || newdim > PG_INT32_MAX))
> ereport(ERROR,
> ...
> dim[i] = (int32) newdim;
I've rebased my patches and updated 0002 with this approach if this is
still the approach you want to go with. I went with the array size too
large error for similar reasons as the previous version of the patch.
Since the patches have been renumbered, here's an overview of their
status:
- 0001 is reviewed and waiting for v18.
- 0002 is under review and a bug fix.
- 0003 needs review.
Thanks,
Joseph Koshakow
Attachment
On Fri, Jul 19, 2024 at 07:32:18PM -0400, Joseph Koshakow wrote: > On Fri, Jul 19, 2024 at 2:45 PM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> + /* dim[i] = 1 + upperIndx[i] - lowerIndx[i]; */ >> + if (pg_add_s32_overflow(1, upperIndx[i], &dim[i])) >> + ereport(ERROR, >> + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), >> + errmsg("array upper bound is too large: %d", >> + upperIndx[i]))); >> + if (pg_sub_s32_overflow(dim[i], lowerIndx[i], &dim[i])) >> + ereport(ERROR, >> + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), >> + errmsg("array size exceeds the maximum allowed > (%d)", >> + (int) MaxArraySize))); >> >> I think the problem with fixing it this way is that it prohibits more than >> is necessary. > > My understanding is that 2147483647 (INT32_MAX) is not a valid upper > bound, which is what the first overflow check is checking. Any query of > the form > `INSERT INTO arroverflowtest(i[<lb>:2147483647]) VALUES ('{...}');` > will fail with an error of > `ERROR: array lower bound is too large: <lb>` > > The reason is the following bounds check found in arrayutils.c > > /* > * Verify sanity of proposed lower-bound values for an array > * > * The lower-bound values must not be so large as to cause overflow when > * calculating subscripts, e.g. lower bound 2147483640 with length 10 > * must be disallowed. We actually insist that dims[i] + lb[i] be > * computable without overflow, meaning that an array with last > subscript > * equal to INT_MAX will be disallowed. I see. I'm still not sure that this is the right place to enforce this check, especially given we explicitly check the bounds later on in construct_md_array(). Am I understanding correctly that the main behavioral difference between these two approaches is that users will see different error messages? -- nathan
On Mon, Jul 22, 2024 at 11:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Fri, Jul 19, 2024 at 07:32:18PM -0400, Joseph Koshakow wrote:
>> On Fri, Jul 19, 2024 at 2:45 PM Nathan Bossart <nathandbossart@gmail.com>
>> wrote:
>>> + /* dim[i] = 1 + upperIndx[i] - lowerIndx[i]; */
>>> + if (pg_add_s32_overflow(1, upperIndx[i], &dim[i]))
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
>>> + errmsg("array upper bound is too large: %d",
>>> + upperIndx[i])));
>>> + if (pg_sub_s32_overflow(dim[i], lowerIndx[i], &dim[i]))
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
>>> + errmsg("array size exceeds the maximum allowed
>> (%d)",
>>> + (int) MaxArraySize)));
>
> Am I understanding correctly that the main
> behavioral difference between these two approaches is that users will see
> different error messages?
Yes, you are understanding correctly. The approach written above will
have the error message "array upper bound is too large", while the
approach attached in patch
v13-0002-Remove-overflow-from-array_set_slice.patch will have the error
message "array lower bound is too large".
Thanks,
Joseph Koshakow
On Mon, Jul 22, 2024 at 05:20:15PM -0400, Joseph Koshakow wrote: > On Mon, Jul 22, 2024 at 11:17 AM Nathan Bossart <nathandbossart@gmail.com> > wrote: >> Am I understanding correctly that the main >> behavioral difference between these two approaches is that users will see >> different error messages? > > Yes, you are understanding correctly. The approach written above will > have the error message "array upper bound is too large", while the > approach attached in patch > v13-0002-Remove-overflow-from-array_set_slice.patch will have the error > message "array lower bound is too large". Okay. I'll plan on committing v13-0002 in the next couple of days, then. -- nathan
On Mon, Jul 22, 2024 at 5:52 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> Also there are several trap-producing cases with date types:
> SELECT to_date('100000000', 'CC');
Hi, I’ve attached a patch that fixes the to_date() overflow. Patches 1 through 3 remain unchanged.
Thank you,
Matthew Kim
Attachment
On Mon, Jul 22, 2024 at 04:36:33PM -0500, Nathan Bossart wrote: > Okay. I'll plan on committing v13-0002 in the next couple of days, then. Actually, I think my concerns about prohibiting more than necessary go away if we do the subtraction first. If "upperIndx[i] - lowerIndx[i]" overflows, we know the array size is too big. Similarly, if adding one to that result overflows, we again know the the array size is too big. This appears to be how the surrounding code handles this problem (e.g., ReadArrayDimensions()). Thoughts? -- nathan
Attachment
On Mon, Jul 22, 2024 at 6:27 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Actually, I think my concerns about prohibiting more than necessary go away
> if we do the subtraction first. If "upperIndx[i] - lowerIndx[i]"
> overflows, we know the array size is too big. Similarly, if adding one to
> that result overflows, we again know the the array size is too big. This
> appears to be how the surrounding code handles this problem (e.g.,
> ReadArrayDimensions()). Thoughts?
I like that approach! It won't reject any valid bounds and is
consistent with the surrounding code. Also statements of the following
format will maintain the same error messages they had previously:
# INSERT INTO arroverflowtest(i[2147483646:2147483647]) VALUES ('{1,2}');
ERROR: array lower bound is too large: 2147483646
The specific bug that this patch fixes is preventing the following
statement:
# INSERT INTO arroverflowtest(i[-2147483648:2147483647]) VALUES ('{1}');
So we may want to add that test back in.
Thanks,
Joseph Koshakow
>
> Actually, I think my concerns about prohibiting more than necessary go away
> if we do the subtraction first. If "upperIndx[i] - lowerIndx[i]"
> overflows, we know the array size is too big. Similarly, if adding one to
> that result overflows, we again know the the array size is too big. This
> appears to be how the surrounding code handles this problem (e.g.,
> ReadArrayDimensions()). Thoughts?
I like that approach! It won't reject any valid bounds and is
consistent with the surrounding code. Also statements of the following
format will maintain the same error messages they had previously:
# INSERT INTO arroverflowtest(i[2147483646:2147483647]) VALUES ('{1,2}');
ERROR: array lower bound is too large: 2147483646
The specific bug that this patch fixes is preventing the following
statement:
# INSERT INTO arroverflowtest(i[-2147483648:2147483647]) VALUES ('{1}');
So we may want to add that test back in.
Thanks,
Joseph Koshakow
On Tue, Jul 23, 2024 at 6:56 AM Joseph Koshakow <koshy44@gmail.com> wrote: > > The specific bug that this patch fixes is preventing the following > statement: > > # INSERT INTO arroverflowtest(i[-2147483648:2147483647]) VALUES ('{1}'); > > So we may want to add that test back in. > I agree with you. also v13-0003-Remove-dependence-on-integer-wrapping-for-jsonb.patch in setPathArray we change to can if (idx == PG_INT32_MIN || -idx > nelems) { /* * If asked to keep elements position consistent, it's not allowed * to prepend the array. */ if (op_type & JB_PATH_CONSISTENT_POSITION) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("path element at position %d is out of range: %d", level + 1, idx))); idx = PG_INT32_MIN; }
On Mon, Jul 22, 2024 at 6:07 PM Matthew Kim <matthewkmkim@gmail.com> wrote:
>
> On Mon, Jul 22, 2024 at 5:52 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>
>> Also there are several trap-producing cases with date types:
>> SELECT to_date('100000000', 'CC');
>
> Hi, I’ve attached a patch that fixes the to_date() overflow. Patches 1 through 3 remain unchanged.
Thanks for the contribution Mattew!
On Tue, Jul 23, 2024 at 2:14 AM jian he <jian.universality@gmail.com> wrote:
>
> On Tue, Jul 23, 2024 at 6:56 AM Joseph Koshakow <koshy44@gmail.com> wrote:
>>
>> The specific bug that this patch fixes is preventing the following
>> statement:
>>
>> # INSERT INTO arroverflowtest(i[-2147483648:2147483647]) VALUES ('{1}');
>>
>> So we may want to add that test back in.
>>
> I agree with you.
I've updated the patch to add this test back in.
> also v13-0003-Remove-dependence-on-integer-wrapping-for-jsonb.patch
> in setPathArray we change to can
>
> if (idx == PG_INT32_MIN || -idx > nelems)
> {
> /*
> * If asked to keep elements position consistent, it's not allowed
> * to prepend the array.
> */
> if (op_type & JB_PATH_CONSISTENT_POSITION)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("path element at position %d is out of
> range: %d",
> level + 1, idx)));
> idx = PG_INT32_MIN;
> }
Done in the attached patch.
Attachment
On Tue, Jul 23, 2024 at 05:41:18PM -0400, Joseph Koshakow wrote: > On Tue, Jul 23, 2024 at 2:14 AM jian he <jian.universality@gmail.com> wrote: >> On Tue, Jul 23, 2024 at 6:56 AM Joseph Koshakow <koshy44@gmail.com> wrote: >>> The specific bug that this patch fixes is preventing the following >>> statement: >>> >>> # INSERT INTO arroverflowtest(i[-2147483648:2147483647]) VALUES > ('{1}'); >>> >>> So we may want to add that test back in. >>> >> I agree with you. > > I've updated the patch to add this test back in. I've committed/back-patched the fix for array_set_slice(). I ended up fiddling with the test cases a bit more because they generate errors that include a platform-dependent value (MaxArraySize). To handle that, I moved the new tests to the section added in commit 18b5851 where VERBOSITY is set to "sqlstate". -- nathan
On Mon, Jul 22, 2024 at 5:52 PM Alexander Lakhin <exclusion(at)gmail(dot)com>
wrote:
> Also there are several trap-producing cases with date types:
> SELECT make_date(-2147483648, 1, 1);
Hi, I’ve attached a patch that fixes the make_date overflow. I’ve upgraded the patches accordingly to reflect Joseph’s committed fix.
Thank you,
Matthew Kim
wrote:
> Also there are several trap-producing cases with date types:
> SELECT make_date(-2147483648, 1, 1);
Hi, I’ve attached a patch that fixes the make_date overflow. I’ve upgraded the patches accordingly to reflect Joseph’s committed fix.
Thank you,
Matthew Kim
Attachment
On Fri, Jun 14, 2024 at 8:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> And the most interesting case to me:
> SET temp_buffers TO 1000000000;
>
> CREATE TEMP TABLE t(i int PRIMARY KEY);
> INSERT INTO t VALUES(1);
>
> #4 0x00007f385cdd37f3 in __GI_abort () at ./stdlib/abort.c:79
> #5 0x00005620071c4f51 in __addvsi3 ()
> #6 0x0000562007143f3c in init_htab (hashp=0x562008facb20, nelem=610070812) at dynahash.c:720
>
> (gdb) f 6
> #6 0x0000560915207f3c in init_htab (hashp=0x560916039930, nelem=1000000000) at dynahash.c:720
> 720 hctl->high_mask = (nbuckets << 1) - 1;
> (gdb) p nbuckets
> $1 = 1073741824
Alex, are you able to get a full stack trace for this panic? I'm unable
to reproduce this because I don't have enough memory in my system. I've
tried reducing `BLCKSZ` to 1024, which is the lowest value allowed per
my understanding, and I still don't have enough memory.
Here's what it looks like is happening:
1. When inserting into the table, we create a new dynamic hash table
and set `nelem` equal to `temp_buffers`, which is 1000000000.
2. `nbuckets` is then set to the the next highest power of 2 from
`nelem`, which is 1073741824.
/*
* Allocate space for the next greater power of two number of buckets,
* assuming a desired maximum load factor of 1.
*/
nbuckets = next_pow2_int(nelem);
3. Shift `nbuckets` to the left by 1. This would equal 2147483648,
which is larger than `INT_MAX`, which causes an overflow.
hctl->high_mask = (nbuckets << 1) - 1;
The max value allowed for `temp_buffers` is `INT_MAX / 2` (1073741823),
So any value of `temp_buffers` in the range (536870912, 1073741823]
would cause this overflow. Without `-ftrapv`, `nbuckets` would wrap
around to -2147483648, which is likely to cause all sorts of havoc, I'm
just not sure what exactly.
Also, `nbuckets = next_pow2_int(nelem);`, by itself is a bit sketchy
considering that `nelem` is a `long` and `nbuckets` is an `int`.
Potentially, the fix here is to just convert `nbuckets` to a `long`. I
plan on checking if that's feasible.
I also found this commit [0] that increased the max of `nbuckets` from
`INT_MAX / BLCKSZ` to `INT_MAX / 2`, which introduced the possibility
of this overflow. So I plan on reading through that as well.
Thanks,
Joseph Koshakow
[0] https://github.com/postgres/postgres/commit/0007490e0964d194a606ba79bb11ae1642da3372
On Wed, Jul 24, 2024 at 05:49:41PM -0400, Matthew Kim wrote: > Hi, I´ve attached a patch that fixes the make_date overflow. I´ve upgraded > the patches accordingly to reflect Joseph´s committed fix. cfbot is unhappy with this one on Windows [0], and I think I see why. While the patch uses pg_mul_s32_overflow(), it doesn't check its return value, so we'll just proceed with a bogus value in the event of an overflow. Windows apparently doesn't have HAVE__BUILTIN_OP_OVERFLOW defined, so pg_mul_s32_overflow() sets the result to 0x5eed (24,301 in decimal), which is where I'm guessing the year -24300 in the ERROR is from. [0] https://api.cirrus-ci.com/v1/artifact/task/5872294914949120/testrun/build/testrun/regress/regress/regression.diffs -- nathan
I started looking at 0001 again with the intent of committing it, and this caught my eye: - /* make the amount positive for digit-reconstruction loop */ - value = -value; + /* + * make the amount positive for digit-reconstruction loop, we can + * leave INT64_MIN unchanged + */ + pg_neg_s64_overflow(value, &value); The comment mentions that we can leave the minimum value unchanged, but it doesn't explain why. Can we explain why? +static inline bool +pg_neg_s64_overflow(int64 a, int64 *result) +{ + if (unlikely(a == PG_INT64_MIN)) + { + return true; + } + else + { + *result = -a; + return false; + } +} Can we add a comment that these routines do not set "result" when true is returned? -- nathan
On Wed, Aug 7, 2024 at 11:08 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> I started looking at 0001 again with the intent of committing it, and this
> caught my eye:
>
> - /* make the amount positive for digit-reconstruction loop */
> - value = -value;
> + /*
> + * make the amount positive for digit-reconstruction loop, we can
> + * leave INT64_MIN unchanged
> + */
> + pg_neg_s64_overflow(value, &value);
>
> The comment mentions that we can leave the minimum value unchanged, but it
> doesn't explain why. Can we explain why?
I went back to try and figure this out and realized that it would be
much simpler to just convert value to an unsigned integer and not worry
about overflow. So I've updated the patch to do that.
> +static inline bool
> +pg_neg_s64_overflow(int64 a, int64 *result)
> +{
> + if (unlikely(a == PG_INT64_MIN))
> + {
> + return true;
> + }
> + else
> + {
> + *result = -a;
> + return false;
> + }
> +}
>
> Can we add a comment that these routines do not set "result" when true is
> returned?
I've added a comment to the top of the file where we describe the
return values of the other functions.
I also updated the implementations of the pg_abs_sX() functions to
something a bit simpler. This was based on feedback in another patch
[0], and more closely matches similar logic in other places.
Thanks,
Joseph Koshakow
[0] https://postgr.es/m/CAAvxfHdTsMZPWEHUrZ=h3cky9Ccc3Mtx2whUHygY+ABP-mCmUw@mail.gmail.co
>
> I started looking at 0001 again with the intent of committing it, and this
> caught my eye:
>
> - /* make the amount positive for digit-reconstruction loop */
> - value = -value;
> + /*
> + * make the amount positive for digit-reconstruction loop, we can
> + * leave INT64_MIN unchanged
> + */
> + pg_neg_s64_overflow(value, &value);
>
> The comment mentions that we can leave the minimum value unchanged, but it
> doesn't explain why. Can we explain why?
I went back to try and figure this out and realized that it would be
much simpler to just convert value to an unsigned integer and not worry
about overflow. So I've updated the patch to do that.
> +static inline bool
> +pg_neg_s64_overflow(int64 a, int64 *result)
> +{
> + if (unlikely(a == PG_INT64_MIN))
> + {
> + return true;
> + }
> + else
> + {
> + *result = -a;
> + return false;
> + }
> +}
>
> Can we add a comment that these routines do not set "result" when true is
> returned?
I've added a comment to the top of the file where we describe the
return values of the other functions.
I also updated the implementations of the pg_abs_sX() functions to
something a bit simpler. This was based on feedback in another patch
[0], and more closely matches similar logic in other places.
Thanks,
Joseph Koshakow
[0] https://postgr.es/m/CAAvxfHdTsMZPWEHUrZ=h3cky9Ccc3Mtx2whUHygY+ABP-mCmUw@mail.gmail.co
Attachment
I've updated patch 0004 to check the return value of pg_mul_s32_overflow(). Since tm.tm_year overflowed, the error message is hardcoded.
Thanks,
Matthew Kim
Attachment
On Fri, Aug 9, 2024 at 6:16 AM Matthew Kim <matthewkmkim@gmail.com> wrote: > > I've updated patch 0004 to check the return value of pg_mul_s32_overflow(). Since tm.tm_year overflowed, the error messageis hardcoded. > --- a/src/backend/utils/adt/date.c +++ b/src/backend/utils/adt/date.c @@ -257,7 +257,10 @@ make_date(PG_FUNCTION_ARGS) if (tm.tm_year < 0) { bc = true; - tm.tm_year = -tm.tm_year; + if (pg_mul_s32_overflow(tm.tm_year, -1, &tm.tm_year)) + ereport(ERROR, + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), + errmsg("date field value out of range"))); } Should the error about integers be out of range? SELECT make_date(-2147483648, 1, 1); "-2147483648" is not an allowed integer. \df make_date List of functions Schema | Name | Result data type | Argument data types | Type ------------+-----------+------------------+------------------------------------------+------ pg_catalog | make_date | date | year integer, month integer, day integer | func
On Thu, Aug 8, 2024 at 9:01 PM jian he <jian.universality@gmail.com> wrote:
>
> Should the error about integers be out of range?
>
> SELECT make_date(-2147483648, 1, 1);
> "-2147483648" is not an allowed integer.
>
> \df make_date
> List of functions
> Schema | Name | Result data type | Argument data
> types | Type
> ------------+-----------+------------------+------------------------------------------+------
> pg_catalog | make_date | date | year integer, month
> integer, day integer | func
Are you saying that with the patch applied you're seeing the above
error? If so, I see a different error.
test=# SELECT make_date(-2147483648, 1, 1);
ERROR: date field value out of range
Or are you saying that we should change the code in the patch so that
it returns the above error? If so, I'm not sure I understand the
reasoning. -2147483648 is an allowed integer, it's the minimum allowed
value for integers.
test=# SELECT (-2147483648)::integer;
int4
-------------
-2147483648
(1 row)
Thanks,
Joseph Koshakow
On Mon, Jul 22, 2024 at 5:52 PM Alexander Lakhin <exclusion(at)gmail(dot)com>
wrote:
> Also there are several trap-producing cases with date types:
> SELECT to_timestamp('1000000000,999', 'Y,YYY');
I attached patch 5 that fixes the to_timestamp overflow.
Thank you,
Matthew Kim
Attachment
- v18-0003-Handle-overflows-in-do_to_timestamp.patch
- v18-0004-Handle-overflow-when-taking-absolute-value-of-BC-yea.patch
- v18-0002-Remove-dependence-on-integer-wrapping-for-jsonb.patch
- v18-0001-Remove-dependence-on-integer-wrapping.patch
- v18-0005-Prevent-overflow-when-converting-years-to-millenia.patch
On Sat, Aug 10, 2024 at 11:41 PM Joseph Koshakow <koshy44@gmail.com> wrote: > > > > On Thu, Aug 8, 2024 at 9:01 PM jian he <jian.universality@gmail.com> wrote: > > > > Should the error about integers be out of range? > > > > SELECT make_date(-2147483648, 1, 1); > > "-2147483648" is not an allowed integer. > > > > \df make_date > > List of functions > > Schema | Name | Result data type | Argument data > > types | Type > > ------------+-----------+------------------+------------------------------------------+------ > > pg_catalog | make_date | date | year integer, month > > integer, day integer | func > > Are you saying that with the patch applied you're seeing the above > error? If so, I see a different error. > > test=# SELECT make_date(-2147483648, 1, 1); > ERROR: date field value out of range > > Or are you saying that we should change the code in the patch so that > it returns the above error? If so, I'm not sure I understand the > reasoning. -2147483648 is an allowed integer, it's the minimum allowed > value for integers. > > test=# SELECT (-2147483648)::integer; > int4 > ------------- > -2147483648 > (1 row) > sorry, i mixed up select (-2147483648)::int; with select -2147483648::int; looks good to me. maybe make it more explicit: errmsg("date field (year) value out of range"))); i don't have a huge opinion though.
I've been preparing 0001 for commit. I've attached what I have so far. The main changes are the implementations of pg_abs_* and pg_neg_*. For the former, I've used abs()/i64abs() for the short/int implementations. For the latter, I've tried to use __builtin_sub_overflow() when possible, as that appears to produce slightly better code. When __builtin_sub_overflow() is not available, the values are upcasted before negation, and we check that result before casting to the return type. That approach more closely matches the surrounding functions. (One exception is pg_neg_u64_overflow() when we have neither HAVE__BUILTIN_OP_OVERFLOW nor HAVE_INT128. In that case, we have to hand-roll everything.) Thoughts? -- nathan
Attachment
On Tue, Aug 13, 2024 at 04:46:34PM -0500, Nathan Bossart wrote: > I've been preparing 0001 for commit. I've attached what I have so far. > > The main changes are the implementations of pg_abs_* and pg_neg_*. For the > former, I've used abs()/i64abs() for the short/int implementations. For > the latter, I've tried to use __builtin_sub_overflow() when possible, as > that appears to produce slightly better code. When > __builtin_sub_overflow() is not available, the values are upcasted before > negation, and we check that result before casting to the return type. That > approach more closely matches the surrounding functions. (One exception is > pg_neg_u64_overflow() when we have neither HAVE__BUILTIN_OP_OVERFLOW nor > HAVE_INT128. In that case, we have to hand-roll everything.) And here's a new version of the patch in which I've attempted to fix the silly mistakes. -- nathan
Attachment
On 14/08/2024 06:07, Nathan Bossart wrote: > On Tue, Aug 13, 2024 at 04:46:34PM -0500, Nathan Bossart wrote: >> I've been preparing 0001 for commit. I've attached what I have so far. >> >> The main changes are the implementations of pg_abs_* and pg_neg_*. For the >> former, I've used abs()/i64abs() for the short/int implementations. For >> the latter, I've tried to use __builtin_sub_overflow() when possible, as >> that appears to produce slightly better code. When >> __builtin_sub_overflow() is not available, the values are upcasted before >> negation, and we check that result before casting to the return type. That >> approach more closely matches the surrounding functions. (One exception is >> pg_neg_u64_overflow() when we have neither HAVE__BUILTIN_OP_OVERFLOW nor >> HAVE_INT128. In that case, we have to hand-roll everything.) > > And here's a new version of the patch in which I've attempted to fix the > silly mistakes. LGTM, just a few small comments: > * - If a * b overflows, return true, otherwise store the result of a * b > * into *result. The content of *result is implementation defined in case of > * overflow. > + * - If -a overflows, return true, otherwise store the result of -a into > + * *result. The content of *result is implementation defined in case of > + * overflow. > + * - Return the absolute value of a as an unsigned integer of the same > + * width. > *--------- > */ The last "Return the absolute value of a ..." sentence feels a bit weird. In all the preceding sentences, 'a' is part of an "If a" sentence that defines what 'a' is. In the last one, it's kind of just hanging there. > +static inline uint16 > +pg_abs_s16(int16 a) > +{ > + return abs(a); > +} > + This is correct, but it took me a while to understand why. Maybe some comments would be in order. The function it calls is "int abs(int)". So this first widens the int16 to int32, and then narrows the result from int32 to uint16. The man page for abs() says "Trying to take the absolute value of the most negative integer is not defined." That's OK in this case, because that refers to the most negative int32 value, and the argument here is int16. But that's why the pg_abs_s64(int64) function needs the special check for the most negative value. There's also some code in libpq's pqCheckOutBufferSpace() function that could use these functions. -- Heikki Linnakangas Neon (https://neon.tech)
Thanks for the improvements Nathan. The current iteration LGTM, just a
single comment on `pg_abs_s64`
> +static inline uint64
> +pg_abs_s64(int64 a)
> +{
> + if (unlikely(a == PG_INT64_MIN))
> + return (uint64) PG_INT64_MAX + 1;
> + if (a < 0)
> + return -a;
> + return a;
> +}
Since we know that a does not equal PG_INT64_MIN, could we shorten the
last three lines and do the following?
static inline uint64
pg_abs_s64(int64 a)
{
if (unlikely(a == PG_INT64_MIN))
return (uint64) PG_INT64_MAX + 1;
return i64_abs(a);
}
Thanks,
Joseph Koshakow
single comment on `pg_abs_s64`
> +static inline uint64
> +pg_abs_s64(int64 a)
> +{
> + if (unlikely(a == PG_INT64_MIN))
> + return (uint64) PG_INT64_MAX + 1;
> + if (a < 0)
> + return -a;
> + return a;
> +}
Since we know that a does not equal PG_INT64_MIN, could we shorten the
last three lines and do the following?
static inline uint64
pg_abs_s64(int64 a)
{
if (unlikely(a == PG_INT64_MIN))
return (uint64) PG_INT64_MAX + 1;
return i64_abs(a);
}
Thanks,
Joseph Koshakow
On 14/08/2024 20:20, Nathan Bossart wrote: > On Wed, Aug 14, 2024 at 10:02:28AM +0300, Heikki Linnakangas wrote: >> On 14/08/2024 06:07, Nathan Bossart wrote: >>> * - If a * b overflows, return true, otherwise store the result of a * b >>> * into *result. The content of *result is implementation defined in case of >>> * overflow. >>> + * - If -a overflows, return true, otherwise store the result of -a into >>> + * *result. The content of *result is implementation defined in case of >>> + * overflow. >>> + * - Return the absolute value of a as an unsigned integer of the same >>> + * width. >>> *--------- >>> */ >> >> The last "Return the absolute value of a ..." sentence feels a bit weird. In >> all the preceding sentences, 'a' is part of an "If a" sentence that defines >> what 'a' is. In the last one, it's kind of just hanging there. > > How about: > > If a is negative, return -a, otherwise return a. Overflow cannot occur > because the return value is an unsigned integer with the same width as > the argument. Hmm, that still doesn't say what operation it's referring to. They existing comments say "a + b", "a - b" or "a * b", but this one isn't referring to anything at all. IMHO the existing comments are not too clear on that either though. How about something like this: /*--------- * * The following guidelines apply to all the *_overflow routines: * * If the result overflows, return true, otherwise store the result into * *result. The content of *result is implementation defined in case of * overflow * * bool pg_add_*_overflow(a, b, *result) * * Calculate a + b * * bool pg_sub_*_overflow(a, b, *result) * * Calculate a - b * * bool pg_mul_*_overflow(a, b, *result) * * Calculate a * b * * bool pg_neg_*_overflow(a, *result) * * Calculate -a * * * In addition, this file contains: * * <unsigned int type> pg_abs_*(<signed int type> a) * * Calculate absolute value of a. Unlike the standard library abs() and * labs() functions, the the return type is unsigned, and the operation * cannot overflow. *--------- */ >>> +static inline uint16 >>> +pg_abs_s16(int16 a) >>> +{ >>> + return abs(a); >>> +} >>> + >> >> This is correct, but it took me a while to understand why. Maybe some >> comments would be in order. >> >> The function it calls is "int abs(int)". So this first widens the int16 to >> int32, and then narrows the result from int32 to uint16. >> >> The man page for abs() says "Trying to take the absolute value of the most >> negative integer is not defined." That's OK in this case, because that >> refers to the most negative int32 value, and the argument here is int16. But >> that's why the pg_abs_s64(int64) function needs the special check for the >> most negative value. > > Yeah, I've added some casts/comments to make this clear. I got too excited > about trimming it down and ended up obfuscating these important details. That's better, thanks! -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Aug 14, 2024 at 10:29:39PM +0300, Heikki Linnakangas wrote: > Hmm, that still doesn't say what operation it's referring to. They existing > comments say "a + b", "a - b" or "a * b", but this one isn't referring to > anything at all. IMHO the existing comments are not too clear on that either > though. How about something like this: Yeah, this crossed my mind, too. I suppose now is as good a time as any to improve it. Your suggestion looks good to me, so I will modify the patch accordingly before committing. -- nathan
On Thu, Aug 15, 2024 at 2:16 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > +static inline bool +pg_neg_u64_overflow(uint64 a, int64 *result) +{ +#if defined(HAVE__BUILTIN_OP_OVERFLOW) + return __builtin_sub_overflow(0, a, result); +#elif defined(HAVE_INT128) + uint128 res = -((int128) a); + + if (unlikely(res < PG_INT64_MIN)) + { + *result = 0x5EED; /* to avoid spurious warnings */ + return true; + } + *result = res; + return false; +#else + if (unlikely(a > (uint64) PG_INT64_MAX + 1)) + { + *result = 0x5EED; /* to avoid spurious warnings */ + return true; + } + if (unlikely(a == (uint64) PG_INT64_MAX + 1)) + *result = PG_INT64_MIN; + else + *result = -((int64) a); + return false; +#endif sorry to bother you. i am confused with " +#elif defined(HAVE_INT128) + uint128 res = -((int128) a); " I thought "unsigned" means non-negative, therefore uint128 means non-negative. therefore "int128 res = -((int128) a);" makes sense to me. also in HAVE_INT128 branch do we need cast int128 to int64, like *result = (int64) res;
Hello Joe, 05.08.2024 02:55, Joseph Koshakow wrote: > > On Fri, Jun 14, 2024 at 8:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > > > And the most interesting case to me: > > SET temp_buffers TO 1000000000; > > > > CREATE TEMP TABLE t(i int PRIMARY KEY); > > INSERT INTO t VALUES(1); > > > ... > Alex, are you able to get a full stack trace for this panic? I'm unable > to reproduce this because I don't have enough memory in my system. I've > tried reducing `BLCKSZ` to 1024, which is the lowest value allowed per > my understanding, and I still don't have enough memory. Yes, please take a look at it (sorry for the late reply): (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140438687430464) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140438687430464) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140438687430464, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007fba70025476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007fba7000b7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000563945aed511 in __addvsi3 () #6 0x0000563945a6c106 in init_htab (hashp=0x563947700980, nelem=1000000000) at dynahash.c:720 #7 0x0000563945a6bd22 in hash_create (tabname=0x563945c591d9 "Local Buffer Lookup Table", nelem=1000000000, info=0x7ffd4d394620, flags=40) at dynahash.c:567 #8 0x00005639457f2760 in el () at localbuf.c:635 #9 0x00005639457f19e3 in ExtendBufferedRelLocal (bmr=..., fork=MAIN_FORKNUM, flags=8, extend_by=1, extend_upto=4294967295, buffers=0x7ffd4d3948e0, extended_by=0x7ffd4d3947ac) at localbuf.c:326 #10 0x00005639457e8851 in ExtendBufferedRelCommon (bmr=..., fork=MAIN_FORKNUM, strategy=0x0, flags=8, extend_by=1, extend_upto=4294967295, buffers=0x7ffd4d3948e0, extended_by=0x7ffd4d39488c) at bufmgr.c:2175 #11 0x00005639457e6850 in ExtendBufferedRelBy (bmr=..., fork=MAIN_FORKNUM, strategy=0x0, flags=8, extend_by=1, buffers=0x7ffd4d3948e0, extended_by=0x7ffd4d39488c) at bufmgr.c:923 #12 0x00005639452d8ae6 in RelationAddBlocks (relation=0x7fba650abd78, bistate=0x0, num_pages=1, use_fsm=true, did_unlock=0x7ffd4d394a3d) at hio.c:341 #13 0x00005639452d944a in RelationGetBufferForTuple (relation=0x7fba650abd78, len=32, otherBuffer=0, options=0, bistate=0x0, vmbuffer=0x7ffd4d394ac4, vmbuffer_other=0x0, num_pages=1) at hio.c:767 #14 0x00005639452be996 in heap_insert (relation=0x7fba650abd78, tup=0x5639476ecfc0, cid=0, options=0, bistate=0x0) at heapam.c:2019 #15 0x00005639452cee84 in heapam_tuple_insert (relation=0x7fba650abd78, slot=0x5639476ecf30, cid=0, options=0, bistate=0x0) at heapam_handler.c:251 #16 0x00005639455b3b07 in table_tuple_insert (rel=0x7fba650abd78, slot=0x5639476ecf30, cid=0, options=0, bistate=0x0) at ../../../src/include/access/tableam.h:1405 #17 0x00005639455b5c60 in ExecInsert (context=0x7ffd4d394d20, resultRelInfo=0x5639476ec390, slot=0x5639476ecf30, canSetTag=true, inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1139 #18 0x00005639455ba942 in ExecModifyTable (pstate=0x5639476ec180) at nodeModifyTable.c:4077 #19 0x0000563945575425 in ExecProcNodeFirst (node=0x5639476ec180) at execProcnode.c:469 #20 0x0000563945568095 in ExecProcNode (node=0x5639476ec180) at ../../../src/include/executor/executor.h:274 #21 0x000056394556af65 in ExecutePlan (estate=0x5639476ebf00, planstate=0x5639476ec180, use_parallel_mode=false, operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x5639476f5470, execute_once=true) at execMain.c:1646 #22 0x00005639455687e3 in standard_ExecutorRun (queryDesc=0x5639476f3e70, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363 #23 0x00005639455685b9 in ExecutorRun (queryDesc=0x5639476f3e70, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:304 #24 0x000056394584986e in ProcessQuery (plan=0x5639476f5310, sourceText=0x56394760d610 "INSERT INTO t VALUES(1);", params=0x0, queryEnv=0x0, dest=0x5639476f5470, qc=0x7ffd4d395180) at pquery.c:160 #25 0x000056394584b445 in PortalRunMulti (portal=0x56394768ab20, isTopLevel=true, setHoldSnapshot=false, dest=0x5639476f5470, altdest=0x5639476f5470, qc=0x7ffd4d395180) at pquery.c:1278 #26 0x000056394584a93c in PortalRun (portal=0x56394768ab20, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x5639476f5470, altdest=0x5639476f5470, qc=0x7ffd4d395180) at pquery.c:791 #27 0x0000563945842fd9 in exec_simple_query (query_string=0x56394760d610 "INSERT INTO t VALUES(1);") at postgres.c:1284 #28 0x0000563945848536 in PostgresMain (dbname=0x563947644900 "regression", username=0x5639476448e8 "law") at postgres.c:4766 #29 0x000056394583eb67 in BackendMain (startup_data=0x7ffd4d395404 "", startup_data_len=4) at backend_startup.c:107 #30 0x000056394574e00e in postmaster_child_launch (child_type=B_BACKEND, startup_data=0x7ffd4d395404 "", startup_data_len=4, client_sock=0x7ffd4d395450) at launch_backend.c:274 #31 0x0000563945753f74 in BackendStartup (client_sock=0x7ffd4d395450) at postmaster.c:3414 #32 0x00005639457515eb in ServerLoop () at postmaster.c:1648 #33 0x0000563945750eaa in PostmasterMain (argc=3, argv=0x5639476087b0) at postmaster.c:1346 #34 0x00005639455f738a in main (argc=3, argv=0x5639476087b0) at main.c:197 > > Here's what it looks like is happening: > ... > > The max value allowed for `temp_buffers` is `INT_MAX / 2` (1073741823), > So any value of `temp_buffers` in the range (536870912, 1073741823] > would cause this overflow. Without `-ftrapv`, `nbuckets` would wrap > around to -2147483648, which is likely to cause all sorts of havoc, I'm > just not sure what exactly. Yeah, the minimum value that triggers the trap is 536870913 and the maximum accepted is 1073741823. Without -ftrapv, hctl->high_mask is set to 2147483647 on my machine, when nbuckets is 1073741824, and the INSERT apparently succeeds. > > Also, `nbuckets = next_pow2_int(nelem);`, by itself is a bit sketchy > considering that `nelem` is a `long` and `nbuckets` is an `int`. > Potentially, the fix here is to just convert `nbuckets` to a `long`. I > plan on checking if that's feasible. Yes, it works for me; with s/int nbuckets;/long nbuckets;/ I see no issue on 64-bit Linux. Best regards, Alexander
On Thu, Aug 15, 2024 at 02:56:00PM +0800, jian he wrote: > i am confused with > " > +#elif defined(HAVE_INT128) > + uint128 res = -((int128) a); > " > I thought "unsigned" means non-negative, therefore uint128 means non-negative. > therefore "int128 res = -((int128) a);" makes sense to me. Ah, that's a typo, thanks for pointing it out. > also in HAVE_INT128 branch > do we need cast int128 to int64, like > > *result = (int64) res; I don't think we need an explicit cast here since *result is known to be an int64. But it certainly wouldn't hurt anything... -- nathan
On Thu, Aug 15, 2024 at 5:34 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Now to 0002...
>
> - if (-element > nelements)
> + if (element == PG_INT32_MIN || -element > nelements)
>
> This seems like a good opportunity to use our new pg_abs_s32() function,
> and godbolt.org [0] seems to indicate that it might produce better code,
> too (at least to my eye).
This updated version LGTM, I agree it's a good use of pg_abs_s32().
Thanks,
Joseph Koshakow
> Now to 0002...
>
> - if (-element > nelements)
> + if (element == PG_INT32_MIN || -element > nelements)
>
> This seems like a good opportunity to use our new pg_abs_s32() function,
> and godbolt.org [0] seems to indicate that it might produce better code,
> too (at least to my eye).
This updated version LGTM, I agree it's a good use of pg_abs_s32().
Thanks,
Joseph Koshakow
On Thu, Aug 15, 2024 at 10:49:46PM -0400, Joseph Koshakow wrote: > This updated version LGTM, I agree it's a good use of pg_abs_s32(). Committed. -- nathan
Hello Nathan and Joe, 16.08.2024 19:52, Nathan Bossart wrote: > On Thu, Aug 15, 2024 at 10:49:46PM -0400, Joseph Koshakow wrote: >> This updated version LGTM, I agree it's a good use of pg_abs_s32(). > Committed. Thank you for working on that issue! I've tried `make check` with CC=gcc-13 CPPFLAGS="-O0 -ftrapv" and got a server crash: 2024-08-16 17:14:36.102 UTC postmaster[1982703] LOG: (PID 1983867) was terminated by signal 6: Aborted 2024-08-16 17:14:36.102 UTC postmaster[1982703] DETAIL: Failed process was running: select '[]'::jsonb ->> -2147483648; with the stack trace ... #5 0x0000556aec224a11 in __negvsi2 () #6 0x0000556aec046238 in jsonb_array_element_text (fcinfo=0x556aedd70240) at jsonfuncs.c:993 #7 0x0000556aebc90b68 in ExecInterpExpr (state=0x556aedd70160, econtext=0x556aedd706a0, isnull=0x7ffdf82211e4) at execExprInterp.c:765 ... (gdb) f 6 #6 0x0000556aec046238 in jsonb_array_element_text (fcinfo=0x556aedd70240) at jsonfuncs.c:993 993 if (-element > nelements) (gdb) p element $1 = -2147483648 Sp it looks like jsonb_array_element_text() still needs the same treatment as jsonb_array_element(). Moreover, I tried to use "-ftrapv" on 32-bit Debian and came across another failure: select '9223372036854775807'::int8 * 2147483648::int8; server closed the connection unexpectedly ... #4 0xb722226a in __GI_abort () at ./stdlib/abort.c:79 #5 0x004cb2e1 in __mulvdi3.cold () #6 0x00abe7ab in pg_mul_s64_overflow (a=9223372036854775807, b=2147483648, result=0xbff1da68) at ../../../../src/include/common/int.h:264 #7 0x00abfbff in int8mul (fcinfo=0x14d9d04) at int8.c:496 #8 0x00782675 in ExecInterpExpr (state=0x14d9c4c, econtext=0x14da15c, isnull=0xbff1dc3f) at execExprInterp.c:765 Whilst select '9223372036854775807'::int8 * 2147483647::int8; emits ERROR: bigint out of range I've also discovered another trap-triggering case for a 64-bit platform: select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1; server closed the connection unexpectedly ... #5 0x00005576cfb1c9f3 in __negvdi2 () #6 0x00005576cf627c68 in bms_singleton_member (a=0x5576d09f7fb0) at bitmapset.c:691 #7 0x00005576cf72be0f in fix_append_rel_relids (root=0x5576d09df198, varno=31, subrelids=0x5576d09f7fb0) at prepjointree.c:3830 #8 0x00005576cf7278c2 in pull_up_simple_subquery (root=0x5576d09df198, jtnode=0x5576d09f7470, rte=0x5576d09de300, lowest_outer_join=0x0, containing_appendrel=0x5576d09f7368) at prepjointree.c:1277 ... (gdb) f 6 #6 0x00005576cf627c68 in bms_singleton_member (a=0x5576d09f7fb0) at bitmapset.c:691 691 if (result >= 0 || HAS_MULTIPLE_ONES(w)) (gdb) p/x w $1 = 0x8000000000000000 Best regards, Alexander
On Fri, Aug 16, 2024 at 09:00:00PM +0300, Alexander Lakhin wrote: > Sp it looks like jsonb_array_element_text() still needs the same > treatment as jsonb_array_element(). D'oh. I added a test for that but didn't actually fix the code. I think we just need something like the following. diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index 1f8ea51e6a..69cdd84393 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c @@ -990,7 +990,7 @@ jsonb_array_element_text(PG_FUNCTION_ARGS) { uint32 nelements = JB_ROOT_COUNT(jb); - if (-element > nelements) + if (pg_abs_s32(element) > nelements) PG_RETURN_NULL(); else element += nelements; > Moreover, I tried to use "-ftrapv" on 32-bit Debian and came across > another failure: > select '9223372036854775807'::int8 * 2147483648::int8; > server closed the connection unexpectedly > ... > #4 0xb722226a in __GI_abort () at ./stdlib/abort.c:79 > #5 0x004cb2e1 in __mulvdi3.cold () > #6 0x00abe7ab in pg_mul_s64_overflow (a=9223372036854775807, b=2147483648, result=0xbff1da68) > at ../../../../src/include/common/int.h:264 > #7 0x00abfbff in int8mul (fcinfo=0x14d9d04) at int8.c:496 > #8 0x00782675 in ExecInterpExpr (state=0x14d9c4c, econtext=0x14da15c, isnull=0xbff1dc3f) at execExprInterp.c:765 Hm. It looks like that is pointing to __builtin_mul_overflow(), which seems strange. > #6 0x00005576cf627c68 in bms_singleton_member (a=0x5576d09f7fb0) at bitmapset.c:691 > 691 if (result >= 0 || HAS_MULTIPLE_ONES(w)) At a glance, this appears to be caused by the RIGHTMOST_ONE macro: #define RIGHTMOST_ONE(x) ((signedbitmapword) (x) & -((signedbitmapword) (x))) -- nathan
On Fri, Aug 16, 2024 at 01:35:01PM -0500, Nathan Bossart wrote: > On Fri, Aug 16, 2024 at 09:00:00PM +0300, Alexander Lakhin wrote: >> #6 0x00005576cf627c68 in bms_singleton_member (a=0x5576d09f7fb0) at bitmapset.c:691 >> 691 if (result >= 0 || HAS_MULTIPLE_ONES(w)) > > At a glance, this appears to be caused by the RIGHTMOST_ONE macro: > > #define RIGHTMOST_ONE(x) ((signedbitmapword) (x) & -((signedbitmapword) (x))) I believe hand-rolling the two's complement calculation should be sufficient to avoid depending on -fwrapv here. godbolt.org indicates that it produces roughly the same code, too. diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c index cd05c642b0..d37a997c0e 100644 --- a/src/backend/nodes/bitmapset.c +++ b/src/backend/nodes/bitmapset.c @@ -67,7 +67,7 @@ * we get zero. *---------- */ -#define RIGHTMOST_ONE(x) ((signedbitmapword) (x) & -((signedbitmapword) (x))) +#define RIGHTMOST_ONE(x) ((bitmapword) (x) & (~((bitmapword) (x)) + 1)) #define HAS_MULTIPLE_ONES(x) ((bitmapword) RIGHTMOST_ONE(x) != (x)) -- nathan
On Fri, Aug 16, 2024 at 01:35:01PM -0500, Nathan Bossart wrote: > On Fri, Aug 16, 2024 at 09:00:00PM +0300, Alexander Lakhin wrote: >> Sp it looks like jsonb_array_element_text() still needs the same >> treatment as jsonb_array_element(). > > D'oh. I added a test for that but didn't actually fix the code. I think > we just need something like the following. > > diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c > index 1f8ea51e6a..69cdd84393 100644 > --- a/src/backend/utils/adt/jsonfuncs.c > +++ b/src/backend/utils/adt/jsonfuncs.c > @@ -990,7 +990,7 @@ jsonb_array_element_text(PG_FUNCTION_ARGS) > { > uint32 nelements = JB_ROOT_COUNT(jb); > > - if (-element > nelements) > + if (pg_abs_s32(element) > nelements) > PG_RETURN_NULL(); > else > element += nelements; I've committed this one. -- nathan
Hello Joe, 17.08.2024 22:16, Joseph Koshakow wrote: > Hi, > > I wanted to take this opportunity to provide a brief summary of > outstanding work. > > > Also there are several trap-producing cases with date types: > > SELECT to_date('100000000', 'CC'); > > SELECT to_timestamp('1000000000,999', 'Y,YYY'); > > SELECT make_date(-2147483648, 1, 1); > > This is resolved with Matthew's patches, which I've rebased, squashed > and attached to this email. They still require a review. > I've filed a separate bug report about date/time conversion issues yesterday. Maybe it was excessive, but it also demonstrates other problematic cases: https://www.postgresql.org/message-id/18585-db646741dd649abd%40postgresql.org Best regards, Alexander
18.08.2024 00:52, Joseph Koshakow wrote: > The largest possible (theoretical) value for `nbuckets` is > `1073741824`, the largest power of 2 that fits into an `int`. So, the > largest possible value for `nbuckets << 1` is `2147483648`. This can > fully fit in a `uint32`, so the simple fix for this case is to cast > `nbuckets` to a `uint32` before shifting. I've attached this fix, > Alexander if you have time I would appreciate if you were able to test > it. > Yes, I've tested v25-0002-*.patch and can confirm that this fix works as well. Best regards, Alexander
On Wed, Aug 21, 2024 at 10:00:00AM +0300, Alexander Lakhin wrote: > I'd like to add some info to show how big the iceberg is. Hm. It seems pretty clear that removing -fwrapv won't be happening anytime soon. I don't mind trying to fix a handful of cases from time to time, but unless there's a live bug, I'm probably not going to treat this stuff as high priority. -- nathan
On Wed, Aug 21, 2024 at 11:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Hm. It seems pretty clear that removing -fwrapv won't be happening anytime
> soon. I don't mind trying to fix a handful of cases from time to time, but
> unless there's a live bug, I'm probably not going to treat this stuff as
> high priority.
I think I'm also going to take a step back because I'm a bit
fatigued on the overflow work. My goal here wasn't necessarily to
remove -fwrapv, because I think it will always be a useful safeguard.
Instead I wanted to add -ftrapv to builds with asserts enabled to try
and prevent future overflow based bugs. Though, it looks like that
won't happen anytime soon either.
FWIW, Matthew's patch actually does resolve a bug with `to_timestamp`
and `to_date`. It converts the following incorrect queries
test=# SELECT to_timestamp('2147483647,999', 'Y,YYY');
to_timestamp
---------------------------------
0001-01-01 00:00:00-04:56:02 BC
(1 row)
test=# SELECT to_date('-2147483648', 'CC');
to_date
------------
0001-01-01
(1 row)
into errors
test=# SELECT to_timestamp('2147483647,999', 'Y,YYY');
ERROR: invalid input string for "Y,YYY"
test=# SELECT to_date('-2147483648', 'CC');
ERROR: date out of range: "-2147483648"
So, it might be worth committing only his changes before moving on.
Thanks,
Joseph Koshakow