Thread: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval havedifferent constraints
[HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval havedifferent constraints
From
Vitaly Burovoy
Date:
On 1/4/17, Pantelis Theodosiou <ypercube@gmail.com> wrote: > On Wed, Jan 4, 2017 at 3:03 PM, <web+postgresql@modin.io> wrote: > >> The following bug has been logged on the website: >> >> Bug reference: 14486 >> Logged by: Per Modin >> Email address: web+postgresql@modin.io >> PostgreSQL version: 9.6.1 >> Operating system: Linux 303a92173594 4.8.15-moby #1 SMP Sat Dec 17 0 >> Description: >> >> Found this bug in 9.4.8, tried it in docker towards psql 9.6.1 and it's >> in >> there as well. A minimum working example would be as follows: >> >> ``` >> postgres=# CREATE TABLE tbl AS SELECT 9223372036854 * interval '1 second' >> col; TABLE tbl; >> SELECT 1 >> ERROR: interval out of range >> ``` >> >> ``` >> postgres=# SELECT count(*) FROM tbl; >> count >> ------- >> 1 >> (1 row) >> ``` >> >> It seems that inserting and retrieving data have different constraints. >> As >> you >> can see from query 2, the data still gets inserted. >> >> ``` >> postgres=# select version(); >> version >> ------------------------------------------------------------ >> ------------------------------ >> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian >> 4.9.2-10) >> 4.9.2, 64-bit >> (1 row) >> ``` >> >> Best regards, >> Per Modin >> >> > And these work without error: > > postgres=# select col - 9223372036854 * interval '1 second' from tbl ; > ?column? > ---------- > 00:00:00 > (1 row) > > postgres=# select col from xx where col < interval '100000 year' ; > col > ----- > (0 rows) > Yes, it is a bug, but it is not a constraint, it is just different internal checks. Moreover even if a function does not raise an error, output could be wrong (pay attention to the duplicated '-' sign) postgres=# SELECT '-2147483647:59:59'::interval - '1s'::interval; ?column? ----------------------2147483648:00:00 (1 row) I've written a patch which fixes that bug (in attachment). Should it be registered in the CF? -- Best regards, Vitaly Burovoy
[HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval havedifferent constraints
From
Vitaly Burovoy
Date:
On 1/5/17, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > On 1/4/17, Pantelis Theodosiou <ypercube@gmail.com> wrote: >> On Wed, Jan 4, 2017 at 3:03 PM, <web+postgresql@modin.io> wrote: >> >>> The following bug has been logged on the website: >>> >>> Bug reference: 14486 >>> Logged by: Per Modin >>> Email address: web+postgresql@modin.io >>> PostgreSQL version: 9.6.1 >>> Operating system: Linux 303a92173594 4.8.15-moby #1 SMP Sat Dec 17 0 >>> Description: >>> >>> Found this bug in 9.4.8, tried it in docker towards psql 9.6.1 and it's >>> in >>> there as well. A minimum working example would be as follows: >>> >>> ``` >>> postgres=# CREATE TABLE tbl AS SELECT 9223372036854 * interval '1 >>> second' >>> col; TABLE tbl; >>> SELECT 1 >>> ERROR: interval out of range >>> ``` >>> >>> ``` >>> postgres=# SELECT count(*) FROM tbl; >>> count >>> ------- >>> 1 >>> (1 row) >>> ``` >>> >>> It seems that inserting and retrieving data have different constraints. >>> As >>> you >>> can see from query 2, the data still gets inserted. >>> >>> ``` >>> postgres=# select version(); >>> version >>> ------------------------------------------------------------ >>> ------------------------------ >>> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian >>> 4.9.2-10) >>> 4.9.2, 64-bit >>> (1 row) >>> ``` >>> >>> Best regards, >>> Per Modin >>> >>> >> And these work without error: >> >> postgres=# select col - 9223372036854 * interval '1 second' from tbl ; >> ?column? >> ---------- >> 00:00:00 >> (1 row) >> >> postgres=# select col from xx where col < interval '100000 year' ; >> col >> ----- >> (0 rows) >> > > Yes, it is a bug, but it is not a constraint, it is just different > internal checks. > Moreover even if a function does not raise an error, output could be wrong > (pay attention to the duplicated '-' sign) > postgres=# SELECT '-2147483647:59:59'::interval - '1s'::interval; > ?column? > -------------------- > --2147483648:00:00 > (1 row) > > I've written a patch which fixes that bug (in attachment). > Should it be registered in the CF? Oops. Forgot to attach the patch. Fixed. -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints
From
Tom Lane
Date:
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: >> I've written a patch which fixes that bug (in attachment). >> Should it be registered in the CF? > Oops. Forgot to attach the patch. Fixed. I suspect that many of these SAMESIGN() tests you've added are not actually adequate/useful. That's only sufficient when the output could be at most a factor of 2 out-of-range. If it could overflow past the sign bit then you need to work harder. (By the same token, the existing SAMESIGN test in interval2tm is wrong.) Possibly we should consider alternatives before plowing ahead in this direction, since adding guards to interval_in and interval computations doesn't help with oversize values that are already stored in a database. We could think about replacing interval2tm's output format with some other struct that uses a TimeOffset for hours and so cannot overflow. I'm not sure though how far the effects would propagate; it might be more work than we want to put into this. regards, tom lane
Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selectinginterval have different constraints
From
Vitaly Burovoy
Date:
On 1/5/17, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: >>> I've written a patch which fixes that bug (in attachment). >>> Should it be registered in the CF? > >> Oops. Forgot to attach the patch. Fixed. > > I suspect that many of these SAMESIGN() tests you've added are not > actually adequate/useful. That's only sufficient when the output could be > at most a factor of 2 out-of-range. If it could overflow past the sign > bit then you need to work harder. I disagree. These SAMESIGN() tests cover addition where result can not be more than one out-of-range. Multiplications are covered just before. > (By the same token, the existing SAMESIGN test in interval2tm is > wrong.) > > Possibly we should consider alternatives before plowing ahead in this > direction, since adding guards to interval_in and interval computations > doesn't help with oversize values that are already stored in a database. We can do nothing with values are already stored in a database. But we can prevent appearing them later. > We could think about replacing interval2tm's output format with some > other struct that uses a TimeOffset for hours and so cannot overflow. > I'm not sure though how far the effects would propagate; it might be > more work than we want to put into this. If values with overflow are already in a database, what do you expect a new output function should fix? -- Best regards, Vitaly Burovoy
Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints
From
Tom Lane
Date:
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: > On 1/5/17, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We could think about replacing interval2tm's output format with some >> other struct that uses a TimeOffset for hours and so cannot overflow. >> I'm not sure though how far the effects would propagate; it might be >> more work than we want to put into this. > If values with overflow are already in a database, what do you expect > a new output function should fix? My point is that ideally, any value that can physically fit into struct Interval ought to be considered valid. The fact that interval_out can't cope is a bug in interval_out, which ideally we would fix without artificially restricting the range of the datatype. Now, the problem with that of course is that it's not only interval_out but multiple other places. But your proposed patch also requires touching nearly everything interval-related, so I'm not sure it has any advantage that way over the less restrictive answer. regards, tom lane
Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selectinginterval have different constraints
From
Vitaly Burovoy
Date:
On 1/5/17, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: >> On 1/5/17, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> We could think about replacing interval2tm's output format with some >>> other struct that uses a TimeOffset for hours and so cannot overflow. >>> I'm not sure though how far the effects would propagate; it might be >>> more work than we want to put into this. > >> If values with overflow are already in a database, what do you expect >> a new output function should fix? > > My point is that ideally, any value that can physically fit into struct > Interval ought to be considered valid. The fact that interval_out can't > cope is a bug in interval_out, which ideally we would fix without > artificially restricting the range of the datatype. > > Now, the problem with that of course is that it's not only interval_out > but multiple other places. But your proposed patch also requires touching > nearly everything interval-related, so I'm not sure it has any advantage > that way over the less restrictive answer. OK, I try to limit values from 9223372036854775807/3600000000/24/365=292471.2 years to 7730941132800000000/3600000000/24/365=245146.5 years i.e. cut 47325 years or ~16%, but it is only for interval's part measured in seconds. Am I correct that we are still limited by ECPG which is limited by the system "tm" struct? Also users who use a binary protocol can also use the "tm" struct and can not expect overflow. The docs say[1] the highest value of the interval type is 178_000_000 years which is significantly bigger than both of values (current and limited) measured in seconds. I don't think applying that limitation is a big deal. I tried to find functions should be touched and there are not so many of them: -- internal functions: interval_check_overflow(Interval *i) tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span) interval2tm(Interval span, struct pg_tm * tm, fsec_t *fsec) intervaltypmodleastfield(int32 typmod) -- output functions: interval_out(PG_FUNCTION_ARGS) -- skip output interval_send(PG_FUNCTION_ARGS) -- skip output -- main input/computing functions: interval_in(PG_FUNCTION_ARGS) -- forgot to cover interval_recv(PG_FUNCTION_ARGS) -- covered make_interval(PG_FUNCTION_ARGS) -- covered AdjustIntervalForTypmod(Interval *interval, int32 typmod) -- can lead to overflow interval_justify_interval(PG_FUNCTION_ARGS) -- can lead to overflow interval_justify_hours(PG_FUNCTION_ARGS) -- can lead to overflow interval_justify_days(PG_FUNCTION_ARGS) -- can lead to overflow interval_um(PG_FUNCTION_ARGS) -- covered interval_pl(PG_FUNCTION_ARGS) -- covered interval_mi(PG_FUNCTION_ARGS) -- covered interval_mul(PG_FUNCTION_ARGS) -- covered -- can not lead to overflow interval_transform(PG_FUNCTION_ARGS) interval_div(PG_FUNCTION_ARGS) interval_trunc(PG_FUNCTION_ARGS) interval_part(PG_FUNCTION_ARGS) -- based on other functions interval_scale(PG_FUNCTION_ARGS) -- based on AdjustIntervalForTypmod interval_accum(PG_FUNCTION_ARGS) -- based on interval_pl interval_accum_inv(PG_FUNCTION_ARGS) -- based on interval_mi interval_avg(PG_FUNCTION_ARGS) -- based on interval_pl interval_combine(PG_FUNCTION_ARGS) -- based on interval_pl mul_d_interval(PG_FUNCTION_ARGS) -- based on interval_mul -- checking, not changing: interval_finite(PG_FUNCTION_ARGS) interval_smaller(PG_FUNCTION_ARGS) interval_larger(PG_FUNCTION_ARGS) interval_cmp_value(const Interval *interval) interval_cmp_internal(Interval *interval1, Interval *interval2) interval_eq(PG_FUNCTION_ARGS) interval_ne(PG_FUNCTION_ARGS) interval_lt(PG_FUNCTION_ARGS) interval_gt(PG_FUNCTION_ARGS) interval_le(PG_FUNCTION_ARGS) interval_ge(PG_FUNCTION_ARGS) interval_cmp(PG_FUNCTION_ARGS) interval_hash(PG_FUNCTION_ARGS) -- not applicable: intervaltypmodin(PG_FUNCTION_ARGS) intervaltypmodout(PG_FUNCTION_ARGS) timestamp_pl_interval(PG_FUNCTION_ARGS) timestamp_mi_interval(PG_FUNCTION_ARGS) timestamptz_pl_interval(PG_FUNCTION_ARGS) timestamptz_mi_interval(PG_FUNCTION_ARGS) ===== In fact, only six of them (not "touching nearly everything interval-related") should be covered: * interval_in * interval_out (yes, the SAMESIGN there is useless) * AdjustIntervalForTypmod * interval_justify_interval * interval_justify_hours * interval_justify_days [1]https://www.postgresql.org/docs/current/static/datatype-datetime.html -- Best regards, Vitaly Burovoy
Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selecting interval have different constraints
From
Tom Lane
Date:
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: > On 1/5/17, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My point is that ideally, any value that can physically fit into struct >> Interval ought to be considered valid. The fact that interval_out can't >> cope is a bug in interval_out, which ideally we would fix without >> artificially restricting the range of the datatype. > Am I correct that we are still limited by ECPG which is limited by the > system "tm" struct? I'm not really that concerned about whether ECPG can deal with enormous intervals. If it bothers you, and you want to go fix it, more power to you --- but I think fixing the backend is much higher priority. > Also users who use a binary protocol can also use the "tm" struct and > can not expect overflow. If they store an enormous interval value, its really up to them not to choke on it when they read it back. Not our problem. > The docs say[1] the highest value of the interval type is 178_000_000 > years which is ... irrelevant really. That's talking about the largest possible value of the "months" field, viz (2^31-1)/12. Perhaps we ought to document the other field limits, but right now there's nothing there about how large the hours field can get. regards, tom lane
Re: [HACKERS] Re: [BUGS][PATCH] BUG #14486: Inserting and selectinginterval have different constraints
From
Vitaly Burovoy
Date:
On 1/5/17, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Vitaly Burovoy <vitaly.burovoy@gmail.com> writes: >> On 1/5/17, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> My point is that ideally, any value that can physically fit into struct >>> Interval ought to be considered valid. The fact that interval_out can't >>> cope is a bug in interval_out, which ideally we would fix without >>> artificially restricting the range of the datatype. > >> Am I correct that we are still limited by ECPG which is limited by the >> system "tm" struct? > > I'm not really that concerned about whether ECPG can deal with enormous > intervals. If it bothers you, and you want to go fix it, more power to > you --- but I think fixing the backend is much higher priority. I really do not think it is possible since it uses system struct. My point - ECPG is a part of Postgres, we can't fix server side and leave the rest. >> Also users who use a binary protocol can also use the "tm" struct and >> can not expect overflow. > > If they store an enormous interval value, its really up to them not to > choke on it when they read it back. Not our problem. Those who store and those who read it back can be different groups of people. For example, the second group are libraries' writers. My point - that interval is big enough and limiting it can help people from errors. Because finally * either they have an error in their data and that change will not break their reslut (since it is wrong because of wrong source data) * or they still get overflow and they have to find a solution - with that patch or without it * or they get result in that 16% interval between fitting hours into "int" and "seconds" in PG_INT64, get silently corrupted result because of ECPG or a library. A solution for the third case can be the same as for the second one because these groups can be the same (just with different data). >> The docs say[1] the highest value of the interval type is 178_000_000 >> years which is > > ... irrelevant really. That's talking about the largest possible value of > the "months" field, viz (2^31-1)/12. Perhaps we ought to document the > other field limits, but right now there's nothing there about how large > the hours field can get. No, it was bad explanation of a point - now there is nothing documented about "seconds" part of the "interval" type. And we can just declare limit. Users don't mind reason of limitation, they just will plan workaround if their data do not fit in limits whatever they are. -- Best regards, Vitaly Burovoy