Thread: Unsigned integer types
Hi all I know this topic was discussed before, but there doesn't seem to be any conclusion. The lack of unsigned integer types is one of the biggest sources of grief in my daily work with pgsql. Before I go and start hacking, I'd like to discuss few points: 1. Is there a strong objection against merging this kind of patch? I can provide numerous reasons why using bigger int or numeric type just doesn't cut. 2. How/if should the behaviour of numeric literals change? The minimalistic solution is: it shouldn't, literals should be assumed signed by default. More complex solution could involve using C-style suffix ('123456u'). 3. How/if should comparing singed and unsigned types work? IMO they shouldn't be allowed and explicit cast should be required. Thanks in advance! Maciek
Maciej Gajewski wrote: > I know this topic was discussed before, but there doesn't seem to be > any conclusion. > > The lack of unsigned integer types is one of the biggest sources of > grief in my daily work with pgsql. > > Before I go and start hacking, I'd like to discuss few points: > > 1. Is there a strong objection against merging this kind of patch? > > I can provide numerous reasons why using bigger int or numeric type > just doesn't cut. It would be interesting to know these reasons. Yours, Laurenz Albe
Maciej Gajewski <maciej.gajewski0@gmail.com> writes: > The lack of unsigned integer types is one of the biggest sources of > grief in my daily work with pgsql. > Before I go and start hacking, I'd like to discuss few points: > 1. Is there a strong objection against merging this kind of patch? Basically, there is zero chance this will happen unless you can find a way of fitting them into the numeric promotion hierarchy that doesn't break a lot of existing applications. We have looked at this more than once, if memory serves, and failed to come up with a workable design that didn't seem to violate the POLA. > 2. How/if should the behaviour of numeric literals change? > The minimalistic solution is: it shouldn't, literals should be assumed > signed by default. More complex solution could involve using C-style > suffix ('123456u'). Well, if you don't do that, there is no need for you to merge anything: you can build unsigned types as an external extension if they aren't affecting the core parser's behavior. As long as it's external, you don't need to satisfy anybody else's idea of what reasonable behavior is ... regards, tom lane
The reasons are: performance, storage and frustration. I think the frustration comes from the fact that unsigned integers are universally available, except in PostgreSQL. I work with a really complex system, with many moving parts, and Postgres really is one of the components that causes the least trouble (compared to other opens-source and closed-sourced systems, which I shall leave unnamed), except for the unsigned integers. Let me give you few examples: 1. SMALLINT Probably the most popular unsigned short int on the planet: IP port number. I had to store some network traffic data in DB; I instinctively started to prototyping it like this: CREATE TABLE packets (addr INET, port SMALLINT, ... ); Of course it failed quickly and I had to bump the size to INTEGER. No real harm here, as the 2 bytes will probably go into some padding anyway, but somehow it feels wrong. 2. INTEGER I had to store a record with several uint32. I had to store an awful lot of them; hundreds GB of data per day. Roughly half of the record consists of uint32 fields. Increasing the data type to bigint would mean that I could store 3 instead of 4 days worth of data on available storage. Continuing with int4 meant that I would have to deal with the data in special way when in enters and leaves the DB. It's easy in C: just cast uint32_t to int32_t. But python code requires more complex changes. And the web backend too... It's suffering either way! Just imagine the conversation I had to have with my boss: "Either we'll increase budged for storage, or we need to touch every bit of the system". 3 .BIGINT There is no escape from bigint. Numeric (or TEXT!) is the only thing that can keep uint64, but when you have 10^9 and more records, and you need to do some arithmetic on it, numeric it's just too slow. We use uint64 all across our system as unique event identifier. It works fine, it's fast, and it's very convenient. Passing uint64 around, storing it, looking it up. We use it everywhere, including UI and log files. So once I decided to use BIGINT to store it, I had to guard all the inputs and outputs and make sure it is handled correctly. Or so I though. It turned out that some guys from different department are parsing some logs with perl parser and they store it in DB. They choose to store the uint64 id as TEXT. They probably tried BIGINT and failed and decided that - since they have low volume and they are not doing any arithmetics - to store it as TEXT. And now someone came up with an idea to join one table with another, bigint with text. I did it. Initially I wrote function that converted the text to numeric, then rotated it around 2^64 if necessary. It was too slow. Too slow for something that should be a simple reinterpretation of data. Eventually I ended up writing a C function, that first scanf( "%llu")'d the text into uint64_t, and then PG_RETURN_INT64-ed the uint64_t value. Works fast, but operations hate for increasing the complexity of DB deployment. --- I know some cynical people that love this kind of problems, they feel that the constant struggle is what keeps them employed :) But I'm ready to use my private time to solve it once and for all. I'm afraid that implementing uints as and extension would introduce some performance penalty (I may be wrong). I'm also afraid that with the extension I'd be left on my own maintaining it forever. While if this could go into the core product, it would live forever. As for the POLA violation: programmers experienced with statically typed languages shouldn't have problems dealing with all the issues surrounding signed/unsigned integers (like the ones described here: http://c-faq.com/expr/preservingrules.html). Others don't need to use them. Maciek On 27 May 2013 16:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maciej Gajewski <maciej.gajewski0@gmail.com> writes: >> The lack of unsigned integer types is one of the biggest sources of >> grief in my daily work with pgsql. >> Before I go and start hacking, I'd like to discuss few points: >> 1. Is there a strong objection against merging this kind of patch? > > Basically, there is zero chance this will happen unless you can find > a way of fitting them into the numeric promotion hierarchy that doesn't > break a lot of existing applications. We have looked at this more than > once, if memory serves, and failed to come up with a workable design > that didn't seem to violate the POLA. > >> 2. How/if should the behaviour of numeric literals change? > >> The minimalistic solution is: it shouldn't, literals should be assumed >> signed by default. More complex solution could involve using C-style >> suffix ('123456u'). > > Well, if you don't do that, there is no need for you to merge anything: > you can build unsigned types as an external extension if they aren't > affecting the core parser's behavior. As long as it's external, you > don't need to satisfy anybody else's idea of what reasonable behavior > is ... > > regards, tom lane
On 05/28/2013 05:17 AM, Maciej Gajewski wrote: > I'm afraid that implementing uints as and extension would introduce > some performance penalty (I may be wrong). You are. > I'm also afraid that with > the extension I'd be left on my own maintaining it forever. While if > this could go into the core product, it would live forever. This is an argument against ever doing anything as an extension. You have not at all addressed the real problem with doing what you are asking for, the one that Tom Lane stated: >> Basically, there is zero chance this will happen unless you can find >> a way of fitting them into the numeric promotion hierarchy that doesn't >> break a lot of existing applications. We have looked at this more than >> once, if memory serves, and failed to come up with a workable design >> that didn't seem to violate the POLA. >> cheers andrew
On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote: > 2. INTEGER > > I had to store a record with several uint32. I had to store an awful > lot of them; hundreds GB of data per day. Roughly half of the record > consists of uint32 fields. > Increasing the data type to bigint would mean that I could store 3 > instead of 4 days worth of data on available storage. > Continuing with int4 meant that I would have to deal with the data in > special way when in enters and leaves the DB. It's easy in C: just > cast uint32_t to int32_t. But python code requires more complex > changes. And the web backend too... > > It's suffering either way! > > Just imagine the conversation I had to have with my boss: "Either > we'll increase budged for storage, or we need to touch every bit of > the system". Did you try 'oid' as an unsigned int4? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 5/28/13 4:07 PM, Bruce Momjian wrote: > On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote: >> 2. INTEGER >> >> I had to store a record with several uint32. I had to store an awful >> lot of them; hundreds GB of data per day. Roughly half of the record >> consists of uint32 fields. >> Increasing the data type to bigint would mean that I could store 3 >> instead of 4 days worth of data on available storage. >> Continuing with int4 meant that I would have to deal with the data in >> special way when in enters and leaves the DB. It's easy in C: just >> cast uint32_t to int32_t. But python code requires more complex >> changes. And the web backend too... >> >> It's suffering either way! >> >> Just imagine the conversation I had to have with my boss: "Either >> we'll increase budged for storage, or we need to touch every bit of >> the system". > > Did you try 'oid' as an unsigned int4? Using an internal catalog type for user data seems like a horrible idea to me... I'll also add that Maciej hasn't explained why these types couldn't be an extension (in fact, I'm pretty sure there's alreadycode for this out there, though possibly not utilizing the extension framework). If you don't need implicit casting it should actually be pretty easy to do this externally, and I don't think maintenancewould be an issue (it's not like uint's change...). -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Maciej Gajewski wrote > I'm also afraid that with > the extension I'd be left on my own maintaining it forever. While if > this could go into the core product, it would live forever. Clarification from the gallery: are we talking an extension or a custom PostgreSQL build/fork? If it is an extension the stick it up on GitHub and let whomever finds it valuable help contribute to keeping it relevant. No use letting perfection stand in the way of usability. If the current solutions are too slow then exploring the extension aspect - even if it falls short - is worthwhile. At minimum you learn from the experience and maybe someone else (or even yourself) can build on that foundation. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Unsigned-integer-types-tp5756994p5757234.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote: > On 5/28/13 4:07 PM, Bruce Momjian wrote: > >On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote: > >>2. INTEGER > >> > >>I had to store a record with several uint32. I had to store an awful > >>lot of them; hundreds GB of data per day. Roughly half of the record > >>consists of uint32 fields. > >>Increasing the data type to bigint would mean that I could store 3 > >>instead of 4 days worth of data on available storage. > >>Continuing with int4 meant that I would have to deal with the data in > >>special way when in enters and leaves the DB. It's easy in C: just > >>cast uint32_t to int32_t. But python code requires more complex > >>changes. And the web backend too... > >> > >>It's suffering either way! > >> > >>Just imagine the conversation I had to have with my boss: "Either > >>we'll increase budged for storage, or we need to touch every bit of > >>the system". > > > >Did you try 'oid' as an unsigned int4? > > Using an internal catalog type for user data seems like a horrible idea to me... Uh, not sure if we can say oid is only an internal catalog type. It is certainly used for storing large object references. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 05/28/2013 07:00 PM, Bruce Momjian wrote: > On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote: >>> >>> Did you try 'oid' as an unsigned int4? >> Using an internal catalog type for user data seems like a horrible idea to me... > Uh, not sure if we can say oid is only an internal catalog type. It is > certainly used for storing large object references. > pg_largeobject has oids. I don't thing the fact that we use oids to store references to pg_largeobject should blind us to the fact that oid should be an opaque type. Using them as substitute unsigned ints seems like a horrible idea to me too. cheers andrew
I will implement it as an extension then. My feeling is that PostgreSQL extensions tend to fall into obscurity. As an ordinary user it took me really long time to find out that interesting features are available in form of extensions; they are certainly under-marketed. But this is a topic for separate discussion. > You have not at all addressed the real problem with doing what you are asking for, the one that Tom Lane stated: >> Basically, there is zero chance this will happen unless you can find >> a way of fitting them into the numeric promotion hierarchy that doesn't >> break a lot of existing applications. We have looked at this more than >> once, if memory serves, and failed to come up with a workable design >> that didn't seem to violate the POLA. >> I'm sorry, I thought my proposal was clear. I propose to not integrate the unsigned types into existing promotion hierarchy, and behave just like gcc would with -Werror: require explicit cast. Between them, the unsigned types would be automatically converted up (uint2 > uint4 > uint8). Maciek
On Wed, May 29, 2013 at 4:33 AM, Maciej Gajewski <maciej.gajewski0@gmail.com> wrote: > I propose to not integrate the unsigned types into existing promotion > hierarchy, and behave just like gcc would with -Werror: require > explicit cast. Between them, the unsigned types would be automatically > converted up (uint2 > uint4 > uint8). Seems pretty sensible to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 05/29/2013 11:33 AM, Maciej Gajewski wrote: > I will implement it as an extension then. > > My feeling is that PostgreSQL extensions tend to fall into obscurity. > As an ordinary user it took me really long time to find out that > interesting features are available in form of extensions; they are > certainly under-marketed. But this is a topic for separate discussion. > >> You have not at all addressed the real problem with doing what you are asking for, the one that Tom Lane stated: > >>> Basically, there is zero chance this will happen unless you can find >>> a way of fitting them into the numeric promotion hierarchy that doesn't >>> break a lot of existing applications. We have looked at this more than >>> once, if memory serves, and failed to come up with a workable design >>> that didn't seem to violate the POLA. >>> > I'm sorry, I thought my proposal was clear. > > I propose to not integrate the unsigned types into existing promotion > hierarchy, and behave just like gcc would with -Werror: require > explicit cast. Between them, the unsigned types would be automatically > converted up (uint2 > uint4 > uint8). +1 (And we could even put some possible sample cast sets in contrib for those who need automatic casts and are willing to do the required debugging ) -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
I agree that extensions are undermarketed. Although pgxn is a good step, I could not find it from "postgresql.org":-( > I propose to not integrate the unsigned types into existing promotion > hierarchy, and behave just like gcc would with -Werror: require > explicit cast. Between them, the unsigned types would be automatically > converted up (uint2 > uint4 > uint8). If you do it, having uint1 (1 byte) would be nice as well. -- Fabien.
On May 29, 2013, at 10:48 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > If you do it, having uint1 (1 byte) would be nice as well. There is a signed 1byte int on PGXN, FWIW: http://pgxn.org/extension/tinyint Best, David
>> If you do it, having uint1 (1 byte) would be nice as well. > > There is a signed 1byte int on PGXN, FWIW: > http://pgxn.org/extension/tinyint That's good, thanks for the pointer! However, it is a signed tinyint (-128..127 range), not an unsigned one. -- Fabien.