Thread: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Bill Moran
Date:
I'm working on a project converting a bunch of code from another database system to PostgreSQL. One of the issues is that the previous system accepted integers and binary data in the same hex format as C ... i.e. 0xff I understand that the proper way to handle this in postgres is x'ff', but the problem is that a large number of #define values use the 0xff syntax and these macros are used both in C code, as well as in SQL. The simple fact is that a LOT of code does this, and correcting it all and ensuring that the modified code is correct is a BIG job. Just to clarify some of the complexity: there is about 80,000 lines of PL/PGSQL code that contains these macros, then is run through the C preprocessor to substitute actual values for them before being loaded into Postgres. Obviously, there are many options for fixing this. One of those options is modifying PostgreSQL to accept the 0xff syntax ... and evaluating that option is the reason for my post. So, one of my questions is: does anyone have an existing simple answer on how to fix this? My other question: is there a specific reason why PostgreSQL doesn't support this syntax, aside from "nobody has bothered to add such support"? Because I'm considering writing a patch to Postgres and submitting it, but I'm not going to go down that path if there's a specific reason why supporting this syntax would be _bad_. Personally, I feel like it would be a good thing, as it seems like a lot of other database systems support it, and even though it's not ANSI, it's pretty much the de-facto standard. -- Bill <wmoran@potentialtech.com>
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Tom Lane
Date:
Bill Moran <wmoran@potentialtech.com> writes: > My other question: is there a specific reason why PostgreSQL doesn't support > this syntax, aside from "nobody has bothered to add such support"? Because > I'm considering writing a patch to Postgres and submitting it, but I'm not > going to go down that path if there's a specific reason why supporting this > syntax would be _bad_. Personally, I feel like it would be a good thing, as > it seems like a lot of other database systems support it, and even though > it's not ANSI, it's pretty much the de-facto standard. How many is "a lot", and do any of the responsible vendors sit on the SQL standards committee? One large concern about doing anything like this is whether future versions of the SQL standard might blindside us with some not-terribly-compatible interpretation of that syntax. If we do something that is also in Oracle or DB2 or one of the other big boys, then we can probably rely on the assumption that they'll block anything really incompatible from becoming standardized ;-). OTOH, if the actual meaning of "a lot" is "MySQL", I'd be pretty worried about this scenario. regards, tom lane
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
"Brown, Joseph E."
Date:
Unsubscribe pgsql-general -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Lane Sent: Thursday, May 21, 2015 1:57 PM To: Bill Moran Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Allowing postgresql to accept 0xff syntax for data types that it makes sense for? Bill Moran <wmoran@potentialtech.com> writes: > My other question: is there a specific reason why PostgreSQL doesn't > support this syntax, aside from "nobody has bothered to add such > support"? Because I'm considering writing a patch to Postgres and > submitting it, but I'm not going to go down that path if there's a > specific reason why supporting this syntax would be _bad_. Personally, > I feel like it would be a good thing, as it seems like a lot of other > database systems support it, and even though it's not ANSI, it's pretty much the de-facto standard. How many is "a lot", and do any of the responsible vendors sit on the SQL standards committee? One large concern about doing anything like this is whether future versions of the SQL standard might blindside us with somenot-terribly-compatible interpretation of that syntax. If we do something that is also in Oracle or DB2 or one of theother big boys, then we can probably rely on the assumption that they'll block anything really incompatible from becomingstandardized ;-). OTOH, if the actual meaning of "a lot" is "MySQL", I'd be pretty worried about this scenario. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Jan de Visser
Date:
On May 21, 2015 06:04:37 PM Brown, Joseph E. wrote:
> Unsubscribe pgsql-general
This doesn't work. See the footer of the posts to the mailing list.
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Bill Moran
Date:
On Thu, 21 May 2015 13:57:24 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bill Moran <wmoran@potentialtech.com> writes: > > My other question: is there a specific reason why PostgreSQL doesn't support > > this syntax, aside from "nobody has bothered to add such support"? Because > > I'm considering writing a patch to Postgres and submitting it, but I'm not > > going to go down that path if there's a specific reason why supporting this > > syntax would be _bad_. Personally, I feel like it would be a good thing, as > > it seems like a lot of other database systems support it, and even though > > it's not ANSI, it's pretty much the de-facto standard. > > How many is "a lot", and do any of the responsible vendors sit on the SQL > standards committee? Well, I've personally worked with (in addition to PostgreSQL) Microsoft SQL Server, MySQL, and Sybase -- PostgreSQL is the only one of those 4 that doesn't support the 0xff syntax. I did a litle research and it appears that neither Oracle nor db2 supports the 0xff syntax ... so not _quite_ as common as it seemed to me. > One large concern about doing anything like this is whether future > versions of the SQL standard might blindside us with some > not-terribly-compatible interpretation of that syntax. If we do something > that is also in Oracle or DB2 or one of the other big boys, then we can > probably rely on the assumption that they'll block anything really > incompatible from becoming standardized ;-). I assume that Microsoft is big enough to prevent anything that would hurt SQL Server's compatibility from becomming a standard? > OTOH, if the actual meaning of "a lot" is "MySQL", I'd be pretty worried > about this scenario. Well, MySQL _does_ support that syntax ... but I couldn't care less. MySQL also throws away your data instead of giving you errors and I would never ask PostgreSQL to start behaving like that. With all that being said, if I were to build a patch, would it be likely to be accepted into core? -- Bill Moran
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Alban Hertroys
Date:
On 22 May 2015 at 04:46, Bill Moran <wmoran@potentialtech.com> wrote: > I did a litle research and it appears that neither Oracle nor db2 supports > the 0xff syntax ... so not _quite_ as common as it seemed to me. > With all that being said, if I were to build a patch, would it be likely > to be accepted into core? Wouldn't you also need to support similar syntax for octal numbers for the patch to be complete? Or are those already supported (ISTR that's '077' for decimal 63)? Not that I care at all about octal numbers, but supporting one and not the other just doesn't seem right. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Tom Lane
Date:
Alban Hertroys <haramrae@gmail.com> writes: > On 22 May 2015 at 04:46, Bill Moran <wmoran@potentialtech.com> wrote: >> With all that being said, if I were to build a patch, would it be likely >> to be accepted into core? > Wouldn't you also need to support similar syntax for octal numbers for > the patch to be complete? Or are those already supported (ISTR that's > '077' for decimal 63)? A patch that made 077 be interpreted as octal would certainly get rejected out of hand, because that's valid syntax right now and it doesn't mean 63. A similar objection might be raised to 0x..., as that is also valid syntax; it's read as 0 followed by an identifier: regression=# select 0xff; xff ----- 0 (1 row) (Yet another place where the fact that AS is optional yields surprising results...) So there would be a backwards compatibility break here, and no you can't fix it with a GUC. Still, it might be a small enough break that we could get away with it. I'm not personally very excited but other people might be. Other questions you'd have to think about: what is the data type of 0xffffffff; what do you do with 0xffffffffffffffffffffffff (too big even for int8). And it'd likely behoove you to check how Microsoft answers those questions, if you want to point to SQL Server as what's going to keep you out of standards-compatibility problems. (IOW, if 0x ever did get standardized, the text might well match what SQL Server does.) regards, tom lane
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Dennis Jenkins
Date:
On Fri, May 22, 2015 at 10:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alban Hertroys <haramrae@gmail.com> writes:
> On 22 May 2015 at 04:46, Bill Moran <wmoran@potentialtech.com> wrote:
>> With all that being said, if I were to build a patch, would it be likely
>> to be accepted into core?
How feasible would it be to write a network proxy, like pg_bouncer, to handle converting the values on the fly, so that you need to change neither your original code base (with respect to this issue), nor change PostgreSQL itself?
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Bill Moran
Date:
On Fri, 22 May 2015 11:02:47 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alban Hertroys <haramrae@gmail.com> writes: > > On 22 May 2015 at 04:46, Bill Moran <wmoran@potentialtech.com> wrote: > >> With all that being said, if I were to build a patch, would it be likely > >> to be accepted into core? > > > Wouldn't you also need to support similar syntax for octal numbers for > > the patch to be complete? Or are those already supported (ISTR that's > > '077' for decimal 63)? > > A patch that made 077 be interpreted as octal would certainly get rejected > out of hand, because that's valid syntax right now and it doesn't mean 63. You'll get no objection from me on that point. > A similar objection might be raised to 0x..., as that is also valid > syntax; it's read as 0 followed by an identifier: > > regression=# select 0xff; > xff > ----- > 0 > (1 row) > > (Yet another place where the fact that AS is optional yields surprising > results...) So there would be a backwards compatibility break here, > and no you can't fix it with a GUC. Still, it might be a small enough > break that we could get away with it. I hadn't even considered that issue. I really hate the fact that AS is optional, and I'm irritated by code that omits it ... but nobody's asking me ... > I'm not personally very excited > but other people might be. > > Other questions you'd have to think about: what is the data type of > 0xffffffff; what do you do with 0xffffffffffffffffffffffff (too big > even for int8). And it'd likely behoove you to check how Microsoft > answers those questions, if you want to point to SQL Server as what's > going to keep you out of standards-compatibility problems. (IOW, > if 0x ever did get standardized, the text might well match what > SQL Server does.) MSSQL seems to use it specifically for the equivalent of BYTEA types, and it seems to me that should be how it works in PostgreSQL. Does anyone watching this thread have access to a MSSQL server to verify? If an implicit cast from a 4-byte BYTEA to int works now, then it should work ... otherwise an explicit cast would be needed, with the same behavior if you tried to specify a number that overflows an int in any other way. MySQL is more liberal in that you can use it to specify ints as well, but I don't think MySQL is a good example of proper behavior. -- Bill Moran
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Bill Moran
Date:
On Fri, 22 May 2015 11:27:49 -0500 Dennis Jenkins <dennis.jenkins.75@gmail.com> wrote: > On Fri, May 22, 2015 at 10:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Alban Hertroys <haramrae@gmail.com> writes: > > > On 22 May 2015 at 04:46, Bill Moran <wmoran@potentialtech.com> wrote: > > >> With all that being said, if I were to build a patch, would it be likely > > >> to be accepted into core? > > How feasible would it be to write a network proxy, like pg_bouncer, to > handle converting the values on the fly, so that you need to change neither > your original code base (with respect to this issue), nor change PostgreSQL > itself? Certainly feasible, but absolutely undesirable. The system I'm working on is needlessly complex as it is ... I'd rather convince my bosses to let me rewrite 80,000 lines of code than add another compatibility shim to the mess. I brought it up because I saw an opportunity to benefit my employer and the PostgreSQL community at the same time. I have about 4 fallback plans if there's a reason not to do this one. Quite frankly, adding a compatibility shim isn't even on that list. -- Bill Moran
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Tom Lane
Date:
Bill Moran <wmoran@potentialtech.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Other questions you'd have to think about: what is the data type of >> 0xffffffff; what do you do with 0xffffffffffffffffffffffff (too big >> even for int8). And it'd likely behoove you to check how Microsoft >> answers those questions, if you want to point to SQL Server as what's >> going to keep you out of standards-compatibility problems. (IOW, >> if 0x ever did get standardized, the text might well match what >> SQL Server does.) > MSSQL seems to use it specifically for the equivalent of BYTEA types, > and it seems to me that should be how it works in PostgreSQL. Oh really? Wow, I'd just assumed you wanted this as a way to write integers. That's certainly the use-case I would have personally. I'm not even sure I like the idea of being able to write byteas without quotes --- they seem like strings to me, not numbers. > If an implicit cast from a 4-byte BYTEA to int works now, then it > should work ... otherwise an explicit cast would be needed, with the > same behavior if you tried to specify a number that overflows an int > in any other way. There's no cast at all from bytea to int. For one thing, it's quite unclear what endianness should be assumed for such a cast. (To get unsurprising behavior from what you're describing, I think we'd have to use a big-endian interpretation of the bytea; but that would be a pain for a lot of other scenarios, or even for this case if you'd written a bytea of length other than 4 or 8 bytes.) regards, tom lane
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Bill Moran
Date:
On Fri, 22 May 2015 12:44:40 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bill Moran <wmoran@potentialtech.com> writes: > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Other questions you'd have to think about: what is the data type of > >> 0xffffffff; what do you do with 0xffffffffffffffffffffffff (too big > >> even for int8). And it'd likely behoove you to check how Microsoft > >> answers those questions, if you want to point to SQL Server as what's > >> going to keep you out of standards-compatibility problems. (IOW, > >> if 0x ever did get standardized, the text might well match what > >> SQL Server does.) > > > MSSQL seems to use it specifically for the equivalent of BYTEA types, > > and it seems to me that should be how it works in PostgreSQL. > > Oh really? Wow, I'd just assumed you wanted this as a way to write > integers. That's certainly the use-case I would have personally. > I'm not even sure I like the idea of being able to write byteas without > quotes --- they seem like strings to me, not numbers. Arrgh ... it's good that you're bringing this up, but you're making me realize that there's more to figure out than I originally thought ... My focus had been on it being used for BYTEA columns, but there _are_ plenty of places in the code that do things like: WHERE int_col & 0x04 = 0x04 Which means that Sybase will implicitly cast that to an int, which probably means that MSSQL will as well. Once I take that into consideration, I start thinking that int_col should actualy be a bit string. which means that: WHERE bit_varying_col & 0x04 = 0x04 should probably work without explicit casts as well. > > If an implicit cast from a 4-byte BYTEA to int works now, then it > > should work ... otherwise an explicit cast would be needed, with the > > same behavior if you tried to specify a number that overflows an int > > in any other way. > > There's no cast at all from bytea to int. For one thing, it's quite > unclear what endianness should be assumed for such a cast. (To get > unsurprising behavior from what you're describing, I think we'd have > to use a big-endian interpretation of the bytea; but that would be > a pain for a lot of other scenarios, or even for this case if you'd > written a bytea of length other than 4 or 8 bytes.) As an implicit cast, obviously anything too large to fit in the target data type should be an error. But the subject of endianness becomes damn important. It's more complex than I original thought, but it still seems like it can be done without creating idiocy in the way things are cast. I'll think about it some more and try to come up with some more specific use scenarios to see what behavior seems the most POLA to me. -- Bill Moran
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Arjen Nienhuis
Date:
On Fri, May 22, 2015 at 6:57 PM, Bill Moran <wmoran@potentialtech.com> wrote: > On Fri, 22 May 2015 12:44:40 -0400 > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Bill Moran <wmoran@potentialtech.com> writes: >> > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Other questions you'd have to think about: what is the data type of >> >> 0xffffffff; what do you do with 0xffffffffffffffffffffffff (too big >> >> even for int8). And it'd likely behoove you to check how Microsoft >> >> answers those questions, if you want to point to SQL Server as what's >> >> going to keep you out of standards-compatibility problems. (IOW, >> >> if 0x ever did get standardized, the text might well match what >> >> SQL Server does.) >> >> > MSSQL seems to use it specifically for the equivalent of BYTEA types, >> > and it seems to me that should be how it works in PostgreSQL. >> >> Oh really? Wow, I'd just assumed you wanted this as a way to write >> integers. That's certainly the use-case I would have personally. >> I'm not even sure I like the idea of being able to write byteas without >> quotes --- they seem like strings to me, not numbers. > > Arrgh ... it's good that you're bringing this up, but you're making me > realize that there's more to figure out than I originally thought ... > My focus had been on it being used for BYTEA columns, but there _are_ > plenty of places in the code that do things like: > > WHERE int_col & 0x04 = 0x04 > > Which means that Sybase will implicitly cast that to an int, which > probably means that MSSQL will as well. > > Once I take that into consideration, I start thinking that int_col > should actualy be a bit string. which means that: > > WHERE bit_varying_col & 0x04 = 0x04 > > should probably work without explicit casts as well. > >> > If an implicit cast from a 4-byte BYTEA to int works now, then it >> > should work ... otherwise an explicit cast would be needed, with the >> > same behavior if you tried to specify a number that overflows an int >> > in any other way. >> >> There's no cast at all from bytea to int. For one thing, it's quite >> unclear what endianness should be assumed for such a cast. (To get >> unsurprising behavior from what you're describing, I think we'd have >> to use a big-endian interpretation of the bytea; but that would be >> a pain for a lot of other scenarios, or even for this case if you'd >> written a bytea of length other than 4 or 8 bytes.) > > As an implicit cast, obviously anything too large to fit in the > target data type should be an error. But the subject of endianness > becomes damn important. > > It's more complex than I original thought, but it still seems like it > can be done without creating idiocy in the way things are cast. I'll > think about it some more and try to come up with some more specific > use scenarios to see what behavior seems the most POLA to me. > SQL server does this: 0x10 is VARBINARY: 0x10 + 0x00 = 0x1000 There is an implicit cast from varbinary to int: 0x10 + 0 = 16 0xfffffff9 + 0 = -7 And there is silent truncation: 0xff00000000 + 0 = 0
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Thomas Kellerer
Date:
Bill Moran wrote on 22.05.2015 18:57: > Arrgh ... it's good that you're bringing this up, but you're making me > realize that there's more to figure out than I originally thought ... > My focus had been on it being used for BYTEA columns, but there _are_ > plenty of places in the code that do things like: If this is only about a more compact format for BYTEA, the following is valid in Postgres: select '\xff'::bytea It's not exactly what you are after, but maybe "good enough".
Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Thomas Kellerer
Date:
Tom Lane wrote on 21.05.2015 19:57: > One large concern about doing anything like this is whether future > versions of the SQL standard might blindside us with some > not-terribly-compatible interpretation of that syntax. If we do something > that is also in Oracle or DB2 or one of the other big boys, then we can > probably rely on the assumption that they'll block anything really > incompatible from becoming standardized ;-). The SQL standard already specifies the format for "binary strings": <binary string literal> ::= X <quote> [ <space>... ] [ { <hexit> [ <space>... ] <hexit> [ <space>... ] }... ] <quote> [ { <separator> <quote> [ <space>... ] [ { <hexit> [ <space>... ] <hexit> [ <space>... ] }... ] <quote> }... ] <hexit> ::= <digit> | A | B | C | D | E | F | a | b | c | d | e | f The data type for such a literal is somewhat "undefined": It is implementation-defined whether the declared type of a <binary string literal> is a fixed-length binary string type, a variable-length binary string type, or a binary large object string type But the above syntax seems to be only supported by H2, HSQLDB and Apache Derby.
Re: Re: Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes: > The SQL standard already specifies the format for "binary strings": > <binary string literal> ::= > X <quote> [ <space>... ] [ { <hexit> [ <space>... ] <hexit> [ <space>... ] }... ] <quote> > [ { <separator> <quote> [ <space>... ] [ { <hexit> [ <space>... ] > <hexit> [ <space>... ] }... ] <quote> }... ] Yeah, and we do honor that: regression=# select x'1234abcdf'; ?column? -------------------------------------- 000100100011010010101011110011011111 (1 row) ... although I notice we don't allow spaces in the input, which seems like something to fix. (On further investigation, it looks like the optional spaces are new since SQL99, which probably explains why we have not got 'em.) > The data type for such a literal is somewhat "undefined": > It is implementation-defined whether the declared type of a <binary string literal> > is a fixed-length binary string type, a variable-length binary string type, > or a binary large object string type Hm, we think it's bit(N): regression=# select pg_typeof(x'1234abcdf'); pg_typeof ----------- bit (1 row) which is what the SQL standard *used* to say, before they unceremoniously threw the bit types under the bus and replaced them with "binary strings". I wonder how painful it would be to update that stuff to be compatible with more recent versions of the standard. Or whether anyone really cares --- people doing this in PG seem to be happy enough with bytea. regards, tom lane
Well, whether good or bad, my employer has nixed the idea of paying me to work on this, and I don't have personal time right now to do it, so it's not going to be addressed by me at this time. -- Bill <wmoran@potentialtech.com>