Thread: a problem with index and user define type
Hi all: I write a use define type (UUID) typedef struct uuid { uint32 time_low; uint16 time_mid; uint16 time_hi_and_version; uint8 clock_seq_hi_and_reserved; uint8 clock_seq_low; uint8 node[6]; } uuid; make all btree index function and operator, such as CREATE OPERATOR CLASS uuid_btree_ops DEFAULT FOR TYPE uuid USING btree AS OPERATOR 1 < , OPERATOR 2 <= , OPERATOR 3 = , OPERATOR 4 >= , OPERATOR 5 > , FUNCTION 1 uuid_cmp(uuid, uuid), create table test_uuid(id uuid primary key default uuid_time(), name char(40)); but this query: select * from test_uuid where id = 'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid dosn't use index QUERY PLAN --------------------------------------------------------------- Seq Scan on test_uuid (cost=0.00..22.50 rows=500 width=140) Filter: (id = 'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid) why ?? source code see attachement MikeWang --------------------------------------------------------------------- What is uuid? uuid is a kind of data type, provide for PostgreSQL to implement unique id in cyberspace, it's based one UUID URN name space IETF draft (see doc/draft-mealling-uuid-urn-00.txt), now, pguuid support NIL(0), Time-Base(1), Name-Base(3) and Random-Base(4) type UUID. It's propuse is provide a solution for data replication, merge, and distribute. what is the use of uuid? 1, pguuid provide PostgreSQL a data type: uuid, it can provide unique id in cyberspace. 2, provide type uuid related operator (e.g. =, <>, <, >, >=, <=) 3, provide functions to generate Time-base, Name-base, Random-base and Nil-UUID. 4, provide functions to parse uuid type. license: BSD _________________________________________________________________ 与联机的朋友进行交流,请使用 MSN Messenger: http://messenger.msn.com/cn
Attachment
"Wang Mike" <itlist@msn.com> writes: > but this query: select * from test_uuid where id = > 'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid dosn't use index > QUERY PLAN > --------------------------------------------------------------- > Seq Scan on test_uuid (cost=0.00..22.50 rows=500 width=140) > Filter: (id = 'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid) > why ?? The rows estimate looks pretty fishy --- I think you are getting the 0.5 default selectivity estimate for an operator that has no restriction estimator. Most likely you should have created the operator using eqsel and eqjoinsel as the restriction/join estimators. regards, tom lane
Tom Lane wrote: >"Wang Mike" <itlist@msn.com> writes: > > >>but this query: select * from test_uuid where id = >>'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid dosn't use index >> >> > > > >> QUERY PLAN >>--------------------------------------------------------------- >> Seq Scan on test_uuid (cost=0.00..22.50 rows=500 width=140) >> Filter: (id = 'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid) >> >> > > > >>why ?? >> >> > >The rows estimate looks pretty fishy --- I think you are getting the >0.5 default selectivity estimate for an operator that has no restriction >estimator. Most likely you should have created the operator using eqsel >and eqjoinsel as the restriction/join estimators. > > regards, tom lane > > > Hi, Tom, I'm trying to test it, but don't know if I understood you correctly, you mean we should try to create the operatorusing eqsel/eqjoinsel estimators, right? But after we added those estimators like this: CREATE OPERATOR = ( LEFTARG = uuid, RIGHTARG = uuid, COMMUTATOR = =, NEGATOR = <>, PROCEDURE = uuid_eq, RESTRICT= eqsel, JOIN = eqjoinsel ); the situation trun worse: now the explain shows the query using the index, the we can't select out the match row! Any hint about what's wrong with us? Thanks and Reagards Laser
Weiping He <laser@zhengmai.com.cn> writes: > the situation trun worse: now the explain shows the query using the > index, > the we can't select out the match row! Any hint about what's wrong > with us? My bet: either your operators are broken or your operator class definition is wrong. regards, tom lane
we found the problem: We used IMMUTABLE modifier in our CREATE FUNCTION definition, though it's correct for our function to return same value if input the same *data*, but our data are passed by reference, not by value, so, some times we can't retrive out data. Remove IMMUTABLE fixed the problem. So, it seems to make it clear in docs would be a good help to function writers, would commit a documentation patch later if necessary. Thank you! Regards Laser Tom Lane wrote: >Weiping He <laser@zhengmai.com.cn> writes: > > >> the situation trun worse: now the explain shows the query using the >>index, >> the we can't select out the match row! Any hint about what's wrong >>with us? >> >> > >My bet: either your operators are broken or your operator class >definition is wrong. > > regards, tom lane > > >
Weiping He <laser@zhengmai.com.cn> writes: > we found the problem: > We used IMMUTABLE modifier in our CREATE FUNCTION definition, > though it's correct for our function to return same value if input the > same *data*, > but our data are passed by reference, not by value, so, some times we can't > retrive out data. Remove IMMUTABLE fixed the problem. > So, it seems to make it clear in docs would be a good help to function > writers, would commit a documentation patch later if necessary. I'm not sure what problem you're really describing, but it would be entirely wrong for the docs to claim that pass-by-reference datatypes shouldn't have immutable functions. float8 is pass-by-ref, for instance, but they don't come any more immutable than sqrt(x) ... I'd suggest taking a closer look to understand what the problem really is. Trying to index on a non-immutable function makes no sense, which is why the system forbids it. regards, tom lane
Tom Lane wrote: >Weiping He <laser@zhengmai.com.cn> writes: > > >>we found the problem: >>We used IMMUTABLE modifier in our CREATE FUNCTION definition, >>though it's correct for our function to return same value if input the >>same *data*, >>but our data are passed by reference, not by value, so, some times we can't >>retrive out data. Remove IMMUTABLE fixed the problem. >> >> > > > >>So, it seems to make it clear in docs would be a good help to function >>writers, would commit a documentation patch later if necessary. >> >> > >I'm not sure what problem you're really describing, but it would be >entirely wrong for the docs to claim that pass-by-reference datatypes >shouldn't have immutable functions. float8 is pass-by-ref, for >instance, but they don't come any more immutable than sqrt(x) ... > >I'd suggest taking a closer look to understand what the problem really >is. Trying to index on a non-immutable function makes no sense, which >is why the system forbids it. > > regards, tom lane > > > Sorry for didn't describe my problem clearly. I mean the function implement the operator, like compare function for equal ('=') etc, not to build an index on an function. Here is full version: First we build a user type using CREATE FUNCTION, CREATE TYPE, CREATE OPERATOR and CREATE OPERATOR CLASS command, of course we wrote those C functions needed for operator, type etc. Then we try to test if our type (which is named UUID) could be indexable, and found it didn't use the index, but, we don't know why. Later, we ask the question here why the index didn't get used, and you point out that we should assign the selective restriction function for our operators, espically for '=' operator, we use 'eqsel' per your suggestion. But found out that though the idnex got used, but sometimes not data row return (and sometimes we could get the data row)! Then we re-check our definition, and found out may be we shouldn't use IMMUTABLE key word in the function definition used by the '=' operator to implement the equation compare, the wrong definition is: Datum uuid_eq(PG_FUNCTION_ARGS) { struct uuid *uptr1 = (struct uuid *) PG_GETARG_POINTER(0); struct uuid *uptr2 = (struct uuid *) PG_GETARG_POINTER(1); PG_RETURN_BOOL(uuidcmp(uptr1, uptr2) == 0); } CREATE OR REPLACE FUNCTION uuid_eq(uuid, uuid) RETURNS boolean IMMUTABLE STRICT AS '$libdir/uuid' LANGUAGE 'C'; CREATE OPERATOR = ( LEFTARG = uuid, RIGHTARG = uuid, COMMUTATOR = =, NEGATOR = <>, PROCEDURE = uuid_eq, RESTRICT= eqsel, JOIN = eqjoinsel ); because the data type (UUID) is a struct, and the uuid_eq() function accept two pointer to the value of struct uuid, if make it IMMUTABLE, postgresql would think it should not try to run the function, but return the cached value instead when it get two same pointers input, but, the pointers may be unchanged, the data pointers point to may have changed. So it will cause the weird symptom we found. And removed IMMUTABLE fix the problem. So we think may be the doc for CREATE FUNCTION should point out the difference of passed by ref and passed by value. Thus may avoid this kind of error. Thanks and Regards Laser
Weiping He <laser@zhengmai.com.cn> writes: > because the data type (UUID) is a struct, > and the uuid_eq() function accept two pointer to the value of struct uuid, > if make it IMMUTABLE, postgresql would think it should not try to run > the function, but return the cached value instead when it get two same > pointers input, No, it will not. Your claim above is entirely wrong; the fact that the datatype is pass-by-reference doesn't affect anything (unless you've failed to declare the datatype that way, but if so I'd not think it would work at all). regards, tom lane
Tom Lane wrote: >Weiping He <laser@zhengmai.com.cn> writes: > > >>because the data type (UUID) is a struct, >>and the uuid_eq() function accept two pointer to the value of struct uuid, >>if make it IMMUTABLE, postgresql would think it should not try to run >>the function, but return the cached value instead when it get two same >>pointers input, >> >> > >No, it will not. Your claim above is entirely wrong; the fact that the >datatype is pass-by-reference doesn't affect anything (unless you've >failed to declare the datatype that way, but if so I'd not think it >would work at all). > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > yeah, you are right, it's out fault. We've mistakenly use PG_RETURN_INT16() to return from our am support function, which prune the sign information from the memcmp(), but we still declare the function to return INTEGER when CREATE FUNCTION. So the error, it's fixed now, and the datatype and index run smoothly. Thanks and Regards Laser