Thread: Is there any plan to add unsigned integer types?
MySQL already has unsigned INT type, and it has double the range of signed INT type. It's not just the bigger range that UINT type brings. If unsigned INT type exists, I wouldn't have to execute "create domain UINT" in every database. If "INT unsigned" and "SERIAL unsigned" exist, PostgreSQL would bring more convenience to users.
On mån, 2011-09-26 at 19:41 +0900, crocket wrote: > MySQL already has unsigned INT type, and it has double the range of > signed INT type. > It's not just the bigger range that UINT type brings. > If unsigned INT type exists, I wouldn't have to execute "create domain > UINT" in every database. > > If "INT unsigned" and "SERIAL unsigned" exist, PostgreSQL would bring > more convenience to users. > I believe there have been many discussions about this in the past, outlining the various issues that would come with this project. A first step would be to start implementing this in user space and see how much breaks.
On Mon, Sep 26, 2011 at 5:41 AM, crocket <crockabiscuit@gmail.com> wrote: > MySQL already has unsigned INT type, and it has double the range of > signed INT type. > It's not just the bigger range that UINT type brings. > If unsigned INT type exists, I wouldn't have to execute "create domain > UINT" in every database. > > If "INT unsigned" and "SERIAL unsigned" exist, PostgreSQL would bring > more convenience to users. This comes up now and then. The problem is the benefit gained is not really worth the pain. In today's 64 bit world, choosing a 64 bit int nails the cases where you need the extra range and you have the ability to use constraints (if necessary, through a domain) to enforce correctness. On the downside, you have to add a lot of casts, overloads, etc. Figuring out the casting rules is non trivial and could lead to surprising behaviors...inferring the type of 'unknown' strings is bad enough as it is. TBH, what I'd greatly prefer to see is to have domains be finished up so that you don't have to carefully consider their use (for example, you can't make arrays out of them). Then an unsigned int could simply be: create domain uint as bigint check (value >= 0); merlin
On Mon, Sep 26, 2011 at 10:02 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Sep 26, 2011 at 5:41 AM, crocket <crockabiscuit@gmail.com> wrote: >> MySQL already has unsigned INT type, and it has double the range of >> signed INT type. >> It's not just the bigger range that UINT type brings. >> If unsigned INT type exists, I wouldn't have to execute "create domain >> UINT" in every database. >> >> If "INT unsigned" and "SERIAL unsigned" exist, PostgreSQL would bring >> more convenience to users. > > This comes up now and then. The problem is the benefit gained is not > really worth the pain. In today's 64 bit world, choosing a 64 bit int > nails the cases where you need the extra range and you have the > ability to use constraints (if necessary, through a domain) to enforce > correctness. > > On the downside, you have to add a lot of casts, overloads, etc. > Figuring out the casting rules is non trivial and could lead to > surprising behaviors...inferring the type of 'unknown' strings is bad > enough as it is. > > TBH, what I'd greatly prefer to see is to have domains be finished up > so that you don't have to carefully consider their use (for example, > you can't make arrays out of them). Then an unsigned int could simply > be: > > create domain uint as bigint check (value >= 0); Even if we did that, there might still be cases where people would want unsigned integers as a means of reducing storage. 4 extra bytes may not seem like that much, but if you have billions of rows, it adds up - not just in terms of actual storage space, but also in terms of disk and memory bandwidth requirements when you want to do anything with that data. I have seen some recent data (which is not entirely conclusive) that suggests that memory bandwidth can be a huge problem for PostgreSQL performance on large boxes; and I think Greg Smith has made similar comments in the past (correct me if I'm wrong, Greg). I think, though, that if we choose to attack that problem in the first instance by adding support for unsigned integers, we're probably going to be only nibbling around the edges of the problem. Reducing alignment padding and adding block-level compression would benefit a much larger number of workloads. Those are not easy projects, but unfortunately, due to the constraints of our type system, neither is this. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 26, 2011 at 9:21 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Sep 26, 2011 at 10:02 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Mon, Sep 26, 2011 at 5:41 AM, crocket <crockabiscuit@gmail.com> wrote: >>> MySQL already has unsigned INT type, and it has double the range of >>> signed INT type. >>> It's not just the bigger range that UINT type brings. >>> If unsigned INT type exists, I wouldn't have to execute "create domain >>> UINT" in every database. >>> >>> If "INT unsigned" and "SERIAL unsigned" exist, PostgreSQL would bring >>> more convenience to users. >> >> This comes up now and then. The problem is the benefit gained is not >> really worth the pain. In today's 64 bit world, choosing a 64 bit int >> nails the cases where you need the extra range and you have the >> ability to use constraints (if necessary, through a domain) to enforce >> correctness. >> >> On the downside, you have to add a lot of casts, overloads, etc. >> Figuring out the casting rules is non trivial and could lead to >> surprising behaviors...inferring the type of 'unknown' strings is bad >> enough as it is. >> >> TBH, what I'd greatly prefer to see is to have domains be finished up >> so that you don't have to carefully consider their use (for example, >> you can't make arrays out of them). Then an unsigned int could simply >> be: >> >> create domain uint as bigint check (value >= 0); > > Even if we did that, there might still be cases where people would > want unsigned integers as a means of reducing storage. 4 extra bytes > may not seem like that much, but if you have billions of rows, it adds > up - not just in terms of actual storage space, but also in terms of > disk and memory bandwidth requirements when you want to do anything > with that data. I have seen some recent data (which is not entirely > conclusive) that suggests that memory bandwidth can be a huge problem > for PostgreSQL performance on large boxes; and I think Greg Smith has > made similar comments in the past (correct me if I'm wrong, Greg). > > I think, though, that if we choose to attack that problem in the first > instance by adding support for unsigned integers, we're probably going > to be only nibbling around the edges of the problem. Reducing > alignment padding and adding block-level compression would benefit a > much larger number of workloads. Those are not easy projects, but > unfortunately, due to the constraints of our type system, neither is > this. :-( right -- exactly. most 'savings' in this vein are nothing but due to padding and other factors such that (at least today) there is no disadvantage to going to 64 bit in range constrained cases. also, I'd submit history has been unkind to hardware dependent optimization strategies in userland -- the engine should be dealing with this problem. better to define your data the proper way and let the assembly instruction counting gurus in -hackers worry about it :-). compression is an interesting topic: the guys over at tokudb are making some wild claims...i'm curious if they are real, and what the real tradeoffs are. merlin
> compression is an interesting topic: the guys over at tokudb are > making some wild claims...i'm curious if they are real, and what the > real tradeoffs are. I don't know how much of the performance they claim comes from compression and how much from the different indexing technique they use (see the my post here, where nobody answered... http://archives.postgresql.org/pgsql-general/2011-09/msg00615.php )