Thread: [PgFoundry] Unsigned Data Types [1 of 2]
Hello all,
I have attempted to send this email 3 times over the last 24 hours.
I am not sure what is blocking it, so I am going to break it up into two parts:
uint-base.tar.bz2 -- The core of the unsigned integer type.
uint-tests.tar.bz2 -- The regression tests.
I am suspecting a size limit problem, so I am including the uint-tests.tar.bz2 in a separate email.
I have attached version 2 of the Unsigned Data Types patch.
ChangeLog:
* Converted build system to use PGXS (more portable).
* Added an uninstall script.
* Miscellaneous code cleanups.
* Folded my unit testing into the PGXS regression test suite.
* Added support for HASH indexes.
* Added support for bit operations.
I will update the commit-fest wiki to point to this new patch (assuming this message gets through).
Thanks!
- Ryan
I have attempted to send this email 3 times over the last 24 hours.
I am not sure what is blocking it, so I am going to break it up into two parts:
uint-base.tar.bz2 -- The core of the unsigned integer type.
uint-tests.tar.bz2 -- The regression tests.
I am suspecting a size limit problem, so I am including the uint-tests.tar.bz2 in a separate email.
I have attached version 2 of the Unsigned Data Types patch.
ChangeLog:
* Converted build system to use PGXS (more portable).
* Added an uninstall script.
* Miscellaneous code cleanups.
* Folded my unit testing into the PGXS regression test suite.
* Added support for HASH indexes.
* Added support for bit operations.
I will update the commit-fest wiki to point to this new patch (assuming this message gets through).
Thanks!
- Ryan
Attachment
On Sun, Aug 31, 2008 at 3:35 PM, Ryan Bradetich <rbradetich@gmail.com> wrote: > Hello all, > a few comments. - i think you have to add some more comments in uint.c file and maybe a header indicating this is part of the postgresql project or that is intended to use with postgres or something of the like - what is uint1? i know int, int2, int4, int8 so i think we should have uint, uint2, uint4 (maybe uint8?) > uint-base.tar.bz2 -- The core of the unsigned integer type. seems there is something wrong in the unlikely macro (i'm using GCC 4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64) postgres=# select -256::uint1; ERROR: uint1 out of range STATEMENT: select -256::uint1; ERROR: uint1 out of range postgres=# select -255::uint1; ?column? ---------- -255 (1 row) postgres=# select -2::uint1; ?column? ---------- -2 (1 row) postgres=# select -5::uint1 + 30::uint1; ?column? ---------- 25 (1 row) > uint-tests.tar.bz2 -- The regression tests. > here failed two regression tests but that is because the path > * Converted build system to use PGXS (more portable). the Makefile doesn't work here... i have installed postgres 8.3.3 from ubuntu package and the test env i compile manually (the uint module tried to install in the ubuntu location while it should in the env location) attached a Makefile that fix that i still have to make some more test... -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
Attachment
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > seems there is something wrong in the unlikely macro (i'm using GCC > 4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64) > postgres=# select -256::uint1; > ERROR: uint1 out of range No, that's just because this is parsed as -(256::uint1) regards, tom lane
On Sat, Sep 6, 2008 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Jaime Casanova" <jcasanov@systemguards.com.ec> writes: >> seems there is something wrong in the unlikely macro (i'm using GCC >> 4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64) > >> postgres=# select -256::uint1; >> ERROR: uint1 out of range > > No, that's just because this is parsed as -(256::uint1) > actually, i thought that case is right but the -255::uint1 returning a negative number (aka -255) is what bothers me -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
On Sat, Sep 6, 2008 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Jaime Casanova" <jcasanov@systemguards.com.ec> writes: >> seems there is something wrong in the unlikely macro (i'm using GCC >> 4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64) > >> postgres=# select -256::uint1; >> ERROR: uint1 out of range > > No, that's just because this is parsed as -(256::uint1) > ah! ok, i see the point... postgres=# select 256::uint1; ERROR: uint1 out of range but is right that way of parsing? so i get a negative number instead of an error? -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > On Sat, Sep 6, 2008 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> postgres=# select -256::uint1; >>> ERROR: uint1 out of range >> >> No, that's just because this is parsed as -(256::uint1) > actually, i thought that case is right but the -255::uint1 returning a > negative number (aka -255) is what bothers me Well, again, that's -(255::uint1). I suppose uint1 hasn't got a negation operator (what would it do??), so probably the sequence of events is to form 255::uint1, then implicitly promote it to some signed type or other (most likely int4), then negate. Not much to be done about this unless you want to get rid of the implicit coercion to signed types, which would probably defeat most of the purpose. Now, if (-255)::uint1 fails to throw error, that would be a bug IMHO. Casting any negative value to uint ought to fail, no? regards, tom lane
On Sat, Sep 6, 2008 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Now, if (-255)::uint1 fails to throw error, that would be a bug IMHO. > Casting any negative value to uint ought to fail, no? > then the patch is right but it seems to me like that is broking the law of less surprise i expected -2::uint1 to be equivalent to (-2)::uint1 that should be at least documented, no? -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > then the patch is right but it seems to me like that is broking the > law of less surprise i expected -2::uint1 to be equivalent to > (-2)::uint1 that should be at least documented, no? See the precedence table here: http://www.postgresql.org/docs/8.3/static/sql-syntax-lexical.html#SQL-PRECEDENCE :: binds more tightly than -, and always has. regards, tom lane
On Sat, Sep 6, 2008 at 3:41 PM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > > i still have to make some more test... > why i need the cast in this case? even if the cast is really necesary (the message seems realy ugly) contrib_regression=# select * from t1 where f1 > 35; ERROR: unsupported type: 16486 contrib_regression=# select * from t1 where f1 > 35::uint4; f1 ----- 36 37 38 -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > contrib_regression=# select * from t1 where f1 > 35; > ERROR: unsupported type: 16486 That obviously isn't supposed to happen. Where's it coming from exactly? regards, tom lane
On Sun, Sep 7, 2008 at 2:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Jaime Casanova" <jcasanov@systemguards.com.ec> writes: >> contrib_regression=# select * from t1 where f1 > 35; >> ERROR: unsupported type: 16486 > > That obviously isn't supposed to happen. Where's it coming from > exactly? > convert_numeric_to_scalar() in src/backend/utils/adt/selfuncs.c the problem seems to be that we are asking for each type of numeric and of course that doesn't know nothing about unsigned integers so its treating it as a non-numeric. don't know what to suggest here? a column in pg_type that identifies it? a hook? switch (typid) { case BOOLOID: return (double) DatumGetBool(value); case INT2OID: return (double) DatumGetInt16(value); case INT4OID: return (double) DatumGetInt32(value); case INT8OID: return (double) DatumGetInt64(value); case FLOAT4OID: return (double) DatumGetFloat4(value); case FLOAT8OID: return (double) DatumGetFloat8(value); case NUMERICOID: /* Note: out-of-range values will be clamped to +-HUGE_VAL */ return (double) DatumGetFloat8(DirectFunctionCall1(numeric_float8_no_overflow, value)); case OIDOID: case REGPROCOID: case REGPROCEDUREOID: case REGOPEROID: case REGOPERATOROID: case REGCLASSOID: case REGTYPEOID: case REGCONFIGOID: case REGDICTIONARYOID: /* we can treat OIDs as integers... */ return (double) DatumGetObjectId(value); } /* * Can't get here unless someone tries to use scalarltsel/scalargtsel on * an operator with one numeric and one non-numeric operand. */ elog(ERROR, "unsupported type: %u", typid); return 0; -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > On Sun, Sep 7, 2008 at 2:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That obviously isn't supposed to happen. Where's it coming from >> exactly? > convert_numeric_to_scalar() in src/backend/utils/adt/selfuncs.c > the problem seems to be that we are asking for each type of numeric > and of course that doesn't know nothing about unsigned integers so its > treating it as a non-numeric. Ah. The scalarltsel/scalargtsel stuff has always been a bit bogus for cross-type comparisons; it assumes it will know both or neither of the two datatypes. So you can get away with using those functions for uint > uint (although they won't be very bright about it); but using them for uint > int fails outright. If you read the comments around that stuff it leaves quite a lot to be desired, but I don't really have better ideas at the moment. The best near-term solution for the uint module is probably not to rely on scalarltsel/scalargtsel for uint comparisons, but to make its own selectivity functions that know the uint types plus whatever standard types you want to have comparisons with. regards, tom lane
Hello Jamie and Tom. Thank you very much for the feedback and reviews. I will attempt to answer all the questions I found in this thread in this one email. If I miss any questions, let me know and I will answer it :) Jamie: Thanks for the feedback on missing comments. I will go back and add more comments to the code. Jamie: Thanks for the patches. I have applied the Makefile patch to my local tree. I am still reviewing the regressions.diffs patch. I definitely see the issue now, I am still reviewing to see/understand if your solution is the proper fix. I am reviewing the main PostgreSQL regression tests to understand how the COPY is handled. Jamie: This patch is targeted for as a PGFoundry module. I wanted it reviewed by the PostgreSQL community for two reasons: 1. To make the data type is correct (i.e. the bugs you and Tom identified, etc). 2. To go through the community review process so other people can have faith/trust the module is correct. After the community is happy with the uint data type, I will commit it to the PGFoundry repository. Jamie: The uint1 is an unsigned 8-bit value. It is the unsigned variant of the "char" type. The uint8 type (which I did not provide support for in this patch would be the unsigned variant of the int8 (64-bit type). Jamie and Tom: Tom, you were correct. The -255::uint1 is being promoted to the int4 data type. Here is the sample c program I used to verify this: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <assert.h> #include <postgresql/libpq-fe.h> int main() { PGconn *conn; PGresult *res; char query[255]; conn = PQconnectdb("host=127.0.0.1 dbname=test user=rbrad"); assert(PQstatus(conn) == CONNECTION_OK); res = PQexec(conn, "SELECT -255::uint1;"); assert(PQresultStatus(res) == PGRES_TUPLES_OK); snprintf(query, 255, "SELECT %d::regtype", PQftype(res, 0)); PQclear(res); res = PQexec(conn, query); assert(PQresultStatus(res) == PGRES_TUPLES_OK); printf("Result Type: %s\n", PQgetvalue(res, 0, 0)); PQclear(res); PQfinish(conn); return 0; } Output: Result Type: integer On Sun, Sep 7, 2008 at 9:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ah. The scalarltsel/scalargtsel stuff has always been a bit bogus for > cross-type comparisons; it assumes it will know both or neither of the > two datatypes. So you can get away with using those functions for > uint > uint (although they won't be very bright about it); but using > them for uint > int fails outright. > If you read the comments around that stuff it leaves quite a lot to be > desired, but I don't really have better ideas at the moment. The best > near-term solution for the uint module is probably not to rely on > scalarltsel/scalargtsel for uint comparisons, but to make its own > selectivity functions that know the uint types plus whatever standard > types you want to have comparisons with. Ok. Looks like I need to review these functions and develop new functions specific for the unsigned type. I will work on this tomorrow night and submit an updated patch. Thanks again for the feedback and reviews! - Ryan
On Mon, Sep 8, 2008 at 1:14 AM, Ryan Bradetich <rbradetich@gmail.com> wrote: > >> If you read the comments around that stuff it leaves quite a lot to be >> desired, but I don't really have better ideas at the moment. The best >> near-term solution for the uint module is probably not to rely on >> scalarltsel/scalargtsel for uint comparisons, but to make its own >> selectivity functions that know the uint types plus whatever standard >> types you want to have comparisons with. > > Ok. Looks like I need to review these functions and develop new functions > specific for the unsigned type. > the same problem happens in joins, unions, hash, etc... so you have to look at those functions as well PS: Jaime not Jamie :) -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
Hello Jaime, > the same problem happens in joins, unions, hash, etc... so you have to > look at those functions as well Great! Added to the list to check. I am planning to build regression tests for these types to catch these errors in the future. Thanks again for your testing and review! > PS: Jaime not Jamie :) Sorry! I will spell your name correctly from now on!
Hello Jaime, > why i need the cast in this case? even if the cast is really necesary > (the message seems realy ugly) > > contrib_regression=# select * from t1 where f1 > 35; > ERROR: unsupported type: 16486 > > contrib_regression=# select * from t1 where f1 > 35::uint4; > f1 > ----- > 36 > 37 > 38 Can you send me the test case that generates this error? My regression tests do not include a table t1 so I was not able to reproduce this error directly. I was unable to reproduce this error by guessing. I tried the following tests: contrib_regression=# create table t1 (f1 int4 not null); CREATE TABLE contrib_regression=# insert into t1 values (1), (5), (10), (20); INSERT 0 4 contrib_regression=# select * from t1 where f1 > 7; f1 ---- 10 20 (2 rows) contrib_regression=# drop table t1; DROP TABLE contrib_regression=# create table t1 (f1 uint4 not null); CREATE TABLE contrib_regression=# insert into t1 values (1), (5), (10), (20); INSERT 0 4 contrib_regression=# select * from t1 where f1 > 7; f1 ---- 10 20 (2 rows) contrib_regression=# drop table t1; DROP TABLE contrib_regression=# create table t1 (f1 numeric not null); CREATE TABLE contrib_regression=# insert into t1 values (1), (5), (10), (20); INSERT 0 4 contrib_regression=# select * from t1 where f1 > 7; f1 ---- 10 20 (2 rows) contrib_regression=# drop table t1; DROP TABLE contrib_regression=# create table t1 (f1 int4 primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1" CREATE TABLE contrib_regression=# insert into t1 select * from generate_series(1, 100000); INSERT 0 100000 contrib_regression=# analyze t1; ANALYZE contrib_regression=# explain select f1 from t1 where f1 > 99998; QUERY PLAN ------------------------------------------------------------------- Index Scan using t1_pkey on t1 (cost=0.00..8.43 rows=10 width=4) Index Cond: (f1 > 99998) (2 rows) contrib_regression=# select f1 from t1 where f1 > 99998; f1 -------- 99999 100000 (2 rows) My testing shows this is working correctly. I am very interested in your test case to help me figure out what I am missing! Thanks! - Ryan
On Mon, Sep 8, 2008 at 10:08 PM, Ryan Bradetich <rbradetich@gmail.com> wrote: > > Can you send me the test case that generates this error? > My regression tests do not include a table t1 so I was not able > to reproduce this error directly. > yeah! that table is mine! here are the scripts... > contrib_regression=# select f1 from t1 where f1 > 99998; > f1 > -------- > 99999 > 100000 > (2 rows) > > My testing shows this is working correctly. > mmm... i rebuild my test env and it works for me this time... until i execute an analyze. I guess autovacuum executed an auto analyze last time... -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
Attachment
Hello Jaime, Thank you for the test cases! > mmm... i rebuild my test env and it works for me this time... until i > execute an analyze. I guess autovacuum executed an auto analyze last > time... I am able to duplicate the error you saw in the uint_test2.sql. I am assuming you are seeing this error in the uint_test1.sql: ERROR: could not find hash function for hash operator 16524 I can bypass the error in uint_test1.sql by disabling the hash joins. I am going to dig in and figure out why the hashjoin operation is broken. Just wanted to give you an update that I was able to reproduce the error with your test cases and I am working on a solution now. Thanks! - Ryan
"Ryan Bradetich" <rbradetich@gmail.com> writes: > I am assuming you are seeing this error in the uint_test1.sql: > ERROR: could not find hash function for hash operator 16524 > I can bypass the error in uint_test1.sql by disabling the hash joins. > I am going to dig in and figure out why the hashjoin operation is broken. Well, the cause of that one would've been marking an operator as HASHES without providing a hash opclass to back it up. IIRC the test case involved ">"? That shouldn't even be marked HASHES anyway ... regards, tom lane
Hello Tom, On Tue, Sep 9, 2008 at 5:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Ryan Bradetich" <rbradetich@gmail.com> writes: >> I am assuming you are seeing this error in the uint_test1.sql: >> ERROR: could not find hash function for hash operator 16524 >> I can bypass the error in uint_test1.sql by disabling the hash joins. >> I am going to dig in and figure out why the hashjoin operation is broken. > > Well, the cause of that one would've been marking an operator as HASHES > without providing a hash opclass to back it up. Actually I did provide a hash operator class in the patch: CREATE OPERATOR CLASS uint4_ops DEFAULT FOR TYPE uint4 USING HASH AS OPERATOR 1 =, FUNCTION 1 hashuint4(uint4); This only provides the operator class for uint4 eq uint4. Jaime's test case was uint4 eq int4 which I did not have an operator class for. I was able to fix this test case by adding the int4 eq uint4 operator like this: CREATE OPERATOR CLASS uint4_ops DEFAULT FOR TYPE uint4 USING HASH FAMILY unsigned_integer_ops AS OPERATOR 1 =, FUNCTION 1 hashuint4(uint4); ALTER OPERATOR FAMILY unsigned_integer_ops USING HASH ADD OPERATOR 1 = (int4, uint4), FUNCTION 1 hashuint4_from_int4(int4); I tested uint4 eq int4 and int4 eq uint4 and this one additional hash operator handles them both. [NOTE: The other solution was to cast foo to the uint4 data type.] I am working on adding support for the int4 eq uint2 and int4 eq uint1 cases as well. I am running into an error when I add support for these hash operator classes that I am not quite ready to post about yet (I want to look a bit more first). > IIRC the test case involved ">"? That shouldn't even be marked HASHES > anyway ... That error was in the uint_test2 test case Jaime provided. This test case looks like: drop table if exists t1_uint4; create table t1_uint4 (f1 uint4 primary key); insert into t1_uint4 select generate_series(1, 255); analyze t1_uint4; select * from t1_uint4, generate_series(1, 10) as foo where t1_uint4.f1 = foo; Thanks, - Ryan
Hello Jaime, It is taking longer than I expected to implement the scalarltsel and scalargtsel functions for the unsigned integer data type. I am still working on this solution and hope to have an updated patch later this week (or over the weekend at the latest). Just wanted to keep you updated on my status. Thanks, - Ryan
On 9/15/08, Ryan Bradetich <rbradetich@gmail.com> wrote: > Hello Jaime, > > I have the code and regression tests updated to solve the problems you initially > discovered. great, i will test during this week... -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157
Message Resend. I forgot to spit the attachments so they did not make it through the list. Patch 1 of 2 : Base uint type. Patch 2 of 2 : Regression tests. - Ryan On Mon, Sep 15, 2008 at 8:13 AM, Ryan Bradetich <rbradetich@gmail.com> wrote: > Hello Jaime, > > I have the code and regression tests updated to solve the problems you initially > discovered. After code reading, stepping through with the debugger, and > help from RhodiumToad on irc I was able to implement new restrict > selective estimation > functions for the uint4 vs int4 data types. The uint1 vs int4 and > uint2 vs int4 data > types did not require a custom restrict selective estimation function. > > Here is an updated base and tests tar packages with these changes in place. I > still have better code comments on my TODO list. I wanted to get > updated code out > to see if there are other problems the unsigned data type fails to > address properly. > > I will work on better commenting the code tonight and tomorrow. > > Thanks again for your review and testing! > > - Ryan >
Attachment
On Mon, Sep 15, 2008 at 9:45 PM, Ryan Bradetich <rbradetich@gmail.com> wrote: >> >> I have the code and regression tests updated to solve the problems you initially >> discovered. After code reading, stepping through with the debugger, and >> help from RhodiumToad on irc I was able to implement new restrict >> selective estimation >> functions for the uint4 vs int4 data types. The uint1 vs int4 and >> uint2 vs int4 data >> types did not require a custom restrict selective estimation function. >> i'm still seeing the failures in the copy commands (the ones about the paths) i'm not really sure if this matters. contrib_regression=# select 256::int2::int4; int4 ------ 256 (1 row) contrib_regression=# select 256::uint2::int4; int4 ------ 256 (1 row) contrib_regression=# select 256::int2::uint4; ERROR: cannot cast type smallint to uint4 at character 17 STATEMENT: select 256::int2::uint4; ERROR: cannot cast type smallint to uint4 LINE 1: select 256::int2::uint4; otherwise seems fine -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Hello Jaime, > i'm still seeing the failures in the copy commands (the ones about the paths) I just tested this on a different machine (to get it away from my development environment) I was able to duplicate the failures. It looks like I need to update the expected/ files as well. I will get fixed ASAP. > i'm not really sure if this matters. > > contrib_regression=# select 256::int2::int4; > int4 > ------ > 256 > (1 row) > > contrib_regression=# select 256::uint2::int4; > int4 > ------ > 256 > (1 row) > > contrib_regression=# select 256::int2::uint4; > ERROR: cannot cast type smallint to uint4 at character 17 > STATEMENT: select 256::int2::uint4; > ERROR: cannot cast type smallint to uint4 > LINE 1: select 256::int2::uint4; To keep this type fairly simple, I was not planning to add these casts. My intention was to handle just enough casting for the required ASSIGNMENT and IMPLICIT casts and to gracefully handle the int4 type since naked numbers are implicitly cast to int4. > otherwise seems fine Thank you very much for your review! I am still working on adding comments to the uint.c file. I am hoping to have that completed tonight. Tom: Have you had a chance to look over the RESTRICT selectivity functions I implemented to handle the cross-type problem? Is that what you had in mind? Thanks! - Ryan
Hello all, Just wanted to let everyone know I have committed this patch to the PgFoundry uint project. I have also updated the commit-fest wiki with this status. Thanks to everyone (especially Jaime) for the feedback and reviews. - Ryan