Thread: Unsigned integer types

Unsigned integer types

From
Maciej Gajewski
Date:
Hi all

I know this topic was discussed before, but there doesn't seem to be
any conclusion.

The lack of unsigned integer types is one of the biggest sources of
grief in my daily work with pgsql.

Before I go and start hacking, I'd like to discuss few points:

1. Is there a strong objection against merging this kind of patch?

I can provide numerous reasons why using bigger int or numeric type
just doesn't cut.

2. How/if should the behaviour of numeric literals change?

The minimalistic solution is: it shouldn't, literals should be assumed
signed by default. More complex solution could involve using C-style
suffix ('123456u').

3. How/if should comparing singed and unsigned types work?

IMO they shouldn't be allowed and explicit cast should be required.

Thanks in advance!

Maciek



Re: Unsigned integer types

From
Albe Laurenz
Date:
Maciej Gajewski wrote:
> I know this topic was discussed before, but there doesn't seem to be
> any conclusion.
> 
> The lack of unsigned integer types is one of the biggest sources of
> grief in my daily work with pgsql.
> 
> Before I go and start hacking, I'd like to discuss few points:
> 
> 1. Is there a strong objection against merging this kind of patch?
> 
> I can provide numerous reasons why using bigger int or numeric type
> just doesn't cut.

It would be interesting to know these reasons.

Yours,
Laurenz Albe

Re: Unsigned integer types

From
Tom Lane
Date:
Maciej Gajewski <maciej.gajewski0@gmail.com> writes:
> The lack of unsigned integer types is one of the biggest sources of
> grief in my daily work with pgsql.
> Before I go and start hacking, I'd like to discuss few points:
> 1. Is there a strong objection against merging this kind of patch?

Basically, there is zero chance this will happen unless you can find
a way of fitting them into the numeric promotion hierarchy that doesn't
break a lot of existing applications.  We have looked at this more than
once, if memory serves, and failed to come up with a workable design
that didn't seem to violate the POLA.

> 2. How/if should the behaviour of numeric literals change?

> The minimalistic solution is: it shouldn't, literals should be assumed
> signed by default. More complex solution could involve using C-style
> suffix ('123456u').

Well, if you don't do that, there is no need for you to merge anything:
you can build unsigned types as an external extension if they aren't
affecting the core parser's behavior.  As long as it's external, you
don't need to satisfy anybody else's idea of what reasonable behavior
is ...
        regards, tom lane



Re: Unsigned integer types

From
Maciej Gajewski
Date:
The reasons are: performance, storage and frustration.

I think the frustration comes from the fact that unsigned integers are
universally available, except in PostgreSQL. I work with a really
complex system, with many moving parts, and Postgres really is one of
the components that causes the least trouble (compared to other
opens-source and closed-sourced systems, which I shall leave unnamed),
except for the unsigned integers.

Let me give you few examples:

1. SMALLINT

Probably the most popular unsigned short int on the planet: IP port
number. I had to store some network traffic data in DB; I
instinctively started to prototyping it like this:

CREATE TABLE packets (addr INET, port SMALLINT, ... );

Of course it failed quickly and I had to bump the size to INTEGER. No
real harm here, as the 2 bytes will probably go into some padding
anyway, but somehow it feels wrong.

2. INTEGER

I had to store a record with several uint32. I had to store an awful
lot of them; hundreds GB of data per day. Roughly half of the record
consists of uint32 fields.
Increasing the data type to bigint would mean that I could store 3
instead of 4 days worth of data on available storage.
Continuing with int4 meant that I would have to deal with the data in
special way when in enters and leaves the DB. It's easy in C: just
cast uint32_t to int32_t. But python code requires more complex
changes. And the web backend too...

It's suffering either way!

Just imagine the conversation I had to have with my boss: "Either
we'll increase budged for storage, or we need to touch every bit of
the system".

3 .BIGINT

There is no escape from bigint. Numeric (or TEXT!) is the only thing
that can keep uint64, but when you have 10^9 and more records, and you
need to do some arithmetic on it, numeric it's just too slow.

We use uint64 all across our system as unique event identifier. It
works fine, it's fast, and it's very convenient. Passing uint64
around, storing it, looking it up. We use it everywhere, including UI
and log files. So once I decided to use BIGINT to store it, I had to
guard all the inputs and outputs and make sure it is handled
correctly. Or so I though.

It turned out that some guys from different department are parsing
some logs with perl parser and they store it in DB. They choose to
store the uint64 id as TEXT. They probably tried BIGINT and failed and
decided that - since they have low volume and they are not doing any
arithmetics - to store it as TEXT.

And now someone came up with an idea to join one table with another,
bigint with text. I did it. Initially I wrote function that converted
the text to numeric, then rotated it around 2^64 if necessary. It was
too slow. Too slow for something that should be a simple
reinterpretation of data.

Eventually I ended up writing a C function, that first scanf(
"%llu")'d the text into uint64_t, and then PG_RETURN_INT64-ed the
uint64_t value. Works fast, but operations hate for increasing the
complexity of DB deployment.

---

I know some cynical people that love this kind of problems, they feel
that the constant struggle is what keeps them employed :) But I'm
ready to use my private time to solve it once and for all.

I'm afraid that implementing uints as and extension would introduce
some performance penalty (I may be wrong). I'm also afraid that with
the extension I'd be left on my own maintaining it forever. While if
this could go into the core product, it would live forever.

As for the POLA violation: programmers experienced with statically
typed languages shouldn't have problems dealing with all the issues
surrounding signed/unsigned integers (like the ones described here:
http://c-faq.com/expr/preservingrules.html). Others don't need to use
them.

Maciek


On 27 May 2013 16:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maciej Gajewski <maciej.gajewski0@gmail.com> writes:
>> The lack of unsigned integer types is one of the biggest sources of
>> grief in my daily work with pgsql.
>> Before I go and start hacking, I'd like to discuss few points:
>> 1. Is there a strong objection against merging this kind of patch?
>
> Basically, there is zero chance this will happen unless you can find
> a way of fitting them into the numeric promotion hierarchy that doesn't
> break a lot of existing applications.  We have looked at this more than
> once, if memory serves, and failed to come up with a workable design
> that didn't seem to violate the POLA.
>
>> 2. How/if should the behaviour of numeric literals change?
>
>> The minimalistic solution is: it shouldn't, literals should be assumed
>> signed by default. More complex solution could involve using C-style
>> suffix ('123456u').
>
> Well, if you don't do that, there is no need for you to merge anything:
> you can build unsigned types as an external extension if they aren't
> affecting the core parser's behavior.  As long as it's external, you
> don't need to satisfy anybody else's idea of what reasonable behavior
> is ...
>
>                         regards, tom lane



Re: Unsigned integer types

From
Andrew Dunstan
Date:
On 05/28/2013 05:17 AM, Maciej Gajewski wrote:
> I'm afraid that implementing uints as and extension would introduce
> some performance penalty (I may be wrong).

You are.

> I'm also afraid that with
> the extension I'd be left on my own maintaining it forever. While if
> this could go into the core product, it would live forever.

This is an argument against ever doing anything as an extension.


You have not at all addressed the real problem with doing what you are 
asking for, the one that Tom Lane stated:

>> Basically, there is zero chance this will happen unless you can find
>> a way of fitting them into the numeric promotion hierarchy that doesn't
>> break a lot of existing applications.  We have looked at this more than
>> once, if memory serves, and failed to come up with a workable design
>> that didn't seem to violate the POLA.
>>


cheers

andrew



Re: Unsigned integer types

From
Bruce Momjian
Date:
On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote:
> 2. INTEGER
> 
> I had to store a record with several uint32. I had to store an awful
> lot of them; hundreds GB of data per day. Roughly half of the record
> consists of uint32 fields.
> Increasing the data type to bigint would mean that I could store 3
> instead of 4 days worth of data on available storage.
> Continuing with int4 meant that I would have to deal with the data in
> special way when in enters and leaves the DB. It's easy in C: just
> cast uint32_t to int32_t. But python code requires more complex
> changes. And the web backend too...
> 
> It's suffering either way!
> 
> Just imagine the conversation I had to have with my boss: "Either
> we'll increase budged for storage, or we need to touch every bit of
> the system".

Did you try 'oid' as an unsigned int4?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Unsigned integer types

From
Jim Nasby
Date:
On 5/28/13 4:07 PM, Bruce Momjian wrote:
> On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote:
>> 2. INTEGER
>>
>> I had to store a record with several uint32. I had to store an awful
>> lot of them; hundreds GB of data per day. Roughly half of the record
>> consists of uint32 fields.
>> Increasing the data type to bigint would mean that I could store 3
>> instead of 4 days worth of data on available storage.
>> Continuing with int4 meant that I would have to deal with the data in
>> special way when in enters and leaves the DB. It's easy in C: just
>> cast uint32_t to int32_t. But python code requires more complex
>> changes. And the web backend too...
>>
>> It's suffering either way!
>>
>> Just imagine the conversation I had to have with my boss: "Either
>> we'll increase budged for storage, or we need to touch every bit of
>> the system".
>
> Did you try 'oid' as an unsigned int4?

Using an internal catalog type for user data seems like a horrible idea to me...

I'll also add that Maciej hasn't explained why these types couldn't be an extension (in fact, I'm pretty sure there's
alreadycode for this out there, though possibly not utilizing the extension framework).
 

If you don't need implicit casting it should actually be pretty easy to do this externally, and I don't think
maintenancewould be an issue (it's not like uint's change...).
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Unsigned integer types

From
David Johnston
Date:
Maciej Gajewski wrote
> I'm also afraid that with
> the extension I'd be left on my own maintaining it forever. While if
> this could go into the core product, it would live forever.

Clarification from the gallery: are we talking an extension or a custom
PostgreSQL build/fork?

If it is an extension the stick it up on GitHub and let whomever finds it
valuable help contribute to keeping it relevant.

No use letting perfection stand in the way of usability.  If the current
solutions are too slow then exploring the extension aspect - even if it
falls short - is worthwhile.  At minimum you learn from the experience and
maybe someone else (or even yourself) can build on that foundation.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Unsigned-integer-types-tp5756994p5757234.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Unsigned integer types

From
Bruce Momjian
Date:
On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote:
> On 5/28/13 4:07 PM, Bruce Momjian wrote:
> >On Tue, May 28, 2013 at 11:17:42AM +0200, Maciej Gajewski wrote:
> >>2. INTEGER
> >>
> >>I had to store a record with several uint32. I had to store an awful
> >>lot of them; hundreds GB of data per day. Roughly half of the record
> >>consists of uint32 fields.
> >>Increasing the data type to bigint would mean that I could store 3
> >>instead of 4 days worth of data on available storage.
> >>Continuing with int4 meant that I would have to deal with the data in
> >>special way when in enters and leaves the DB. It's easy in C: just
> >>cast uint32_t to int32_t. But python code requires more complex
> >>changes. And the web backend too...
> >>
> >>It's suffering either way!
> >>
> >>Just imagine the conversation I had to have with my boss: "Either
> >>we'll increase budged for storage, or we need to touch every bit of
> >>the system".
> >
> >Did you try 'oid' as an unsigned int4?
> 
> Using an internal catalog type for user data seems like a horrible idea to me...

Uh, not sure if we can say oid is only an internal catalog type.  It is
certainly used for storing large object references.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Unsigned integer types

From
Andrew Dunstan
Date:
On 05/28/2013 07:00 PM, Bruce Momjian wrote:
> On Tue, May 28, 2013 at 05:57:41PM -0500, Jim Nasby wrote:
>>>
>>> Did you try 'oid' as an unsigned int4?
>> Using an internal catalog type for user data seems like a horrible idea to me...
> Uh, not sure if we can say oid is only an internal catalog type.  It is
> certainly used for storing large object references.
>

pg_largeobject has oids. I don't thing the fact that we use oids to 
store references to pg_largeobject should blind us to the fact that oid 
should be an opaque type. Using them as substitute unsigned ints seems 
like a horrible idea to me too.


cheers

andrew




Re: Unsigned integer types

From
Maciej Gajewski
Date:
I will implement it as an extension then.

My feeling is that PostgreSQL extensions tend to fall into obscurity.
As an ordinary user it took me really long time to find out that
interesting features are available in form of extensions; they are
certainly under-marketed. But this is a topic for separate discussion.

> You have not at all addressed the real problem with doing what you are asking for, the one that Tom Lane stated:


>> Basically, there is zero chance this will happen unless you can find
>> a way of fitting them into the numeric promotion hierarchy that doesn't
>> break a lot of existing applications.  We have looked at this more than
>> once, if memory serves, and failed to come up with a workable design
>> that didn't seem to violate the POLA.
>>

I'm sorry, I thought my proposal was clear.

I propose to not integrate the unsigned types into existing promotion
hierarchy, and behave just like gcc would with -Werror: require
explicit cast. Between them, the unsigned types would be automatically
converted up (uint2 > uint4 > uint8).


Maciek



Re: Unsigned integer types

From
Robert Haas
Date:
On Wed, May 29, 2013 at 4:33 AM, Maciej Gajewski
<maciej.gajewski0@gmail.com> wrote:
> I propose to not integrate the unsigned types into existing promotion
> hierarchy, and behave just like gcc would with -Werror: require
> explicit cast. Between them, the unsigned types would be automatically
> converted up (uint2 > uint4 > uint8).

Seems pretty sensible to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Unsigned integer types

From
Hannu Krosing
Date:
On 05/29/2013 11:33 AM, Maciej Gajewski wrote:
> I will implement it as an extension then.
>
> My feeling is that PostgreSQL extensions tend to fall into obscurity.
> As an ordinary user it took me really long time to find out that
> interesting features are available in form of extensions; they are
> certainly under-marketed. But this is a topic for separate discussion.
>
>> You have not at all addressed the real problem with doing what you are asking for, the one that Tom Lane stated:
>
>>> Basically, there is zero chance this will happen unless you can find
>>> a way of fitting them into the numeric promotion hierarchy that doesn't
>>> break a lot of existing applications.  We have looked at this more than
>>> once, if memory serves, and failed to come up with a workable design
>>> that didn't seem to violate the POLA.
>>>
> I'm sorry, I thought my proposal was clear.
>
> I propose to not integrate the unsigned types into existing promotion
> hierarchy, and behave just like gcc would with -Werror: require
> explicit cast. Between them, the unsigned types would be automatically
> converted up (uint2 > uint4 > uint8).
+1

(And we could even put some possible sample cast sets in contrib for those
who need automatic casts and are willing to do the required debugging )

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




Re: Unsigned integer types

From
Fabien COELHO
Date:
I agree that extensions are undermarketed. Although pgxn is a good step, 
I could not find it from "postgresql.org":-(

> I propose to not integrate the unsigned types into existing promotion
> hierarchy, and behave just like gcc would with -Werror: require
> explicit cast. Between them, the unsigned types would be automatically
> converted up (uint2 > uint4 > uint8).

If you do it, having uint1 (1 byte) would be nice as well.

-- 
Fabien.



Re: Unsigned integer types

From
"David E. Wheeler"
Date:
On May 29, 2013, at 10:48 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> If you do it, having uint1 (1 byte) would be nice as well.

There is a signed 1byte int on PGXN, FWIW:
 http://pgxn.org/extension/tinyint

Best,

David




Re: Unsigned integer types

From
Fabien COELHO
Date:
>> If you do it, having uint1 (1 byte) would be nice as well.
>
> There is a signed 1byte int on PGXN, FWIW:
>  http://pgxn.org/extension/tinyint

That's good, thanks for the pointer!

However, it is a signed tinyint (-128..127 range), not an unsigned one.

-- 
Fabien.