Thread: a problem with index and user define type

a problem with index and user define type

From
"Wang Mike"
Date:
Hi all:
   I write a use define type (UUID)

typedef struct uuid
{
    uint32 time_low;
    uint16 time_mid;
    uint16 time_hi_and_version;
    uint8 clock_seq_hi_and_reserved;
    uint8 clock_seq_low;
    uint8 node[6];
} uuid;


make all btree index function and operator, such as

CREATE OPERATOR CLASS uuid_btree_ops
DEFAULT FOR TYPE uuid USING btree
AS
        OPERATOR        1       < ,
        OPERATOR        2       <= ,
        OPERATOR        3       = ,
        OPERATOR        4       >= ,
        OPERATOR        5       > ,
        FUNCTION        1       uuid_cmp(uuid, uuid),


create table test_uuid(id uuid primary key default uuid_time(), name
char(40));

but  this query: select * from test_uuid where id =
'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid   dosn't use index

                          QUERY PLAN
---------------------------------------------------------------
 Seq Scan on test_uuid  (cost=0.00..22.50 rows=500 width=140)
   Filter: (id = 'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid)


why ??


source code  see attachement

                                               MikeWang


---------------------------------------------------------------------
What is uuid?


    uuid is a kind of data type, provide for PostgreSQL to implement unique
id in cyberspace,
    it's based one UUID URN name space IETF draft (see
doc/draft-mealling-uuid-urn-00.txt),
    now, pguuid support NIL(0), Time-Base(1), Name-Base(3) and
Random-Base(4) type UUID.
    It's propuse is
provide a solution
    for data replication, merge, and distribute.


what is the use of uuid?


    1, pguuid provide PostgreSQL a data type: uuid, it can provide unique
id in
cyberspace.
    2, provide type uuid related operator (e.g. =, <>, <, >, >=, <=)
    3, provide functions to generate Time-base, Name-base, Random-base and
Nil-UUID.
    4, provide functions to parse uuid type.

license:
    BSD

_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger:  http://messenger.msn.com/cn

Attachment

Re: a problem with index and user define type

From
Tom Lane
Date:
"Wang Mike" <itlist@msn.com> writes:
> but  this query: select * from test_uuid where id = 
> 'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid   dosn't use index

>                           QUERY PLAN
> ---------------------------------------------------------------
>  Seq Scan on test_uuid  (cost=0.00..22.50 rows=500 width=140)
>    Filter: (id = 'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid)

> why ??

The rows estimate looks pretty fishy --- I think you are getting the
0.5 default selectivity estimate for an operator that has no restriction
estimator.  Most likely you should have created the operator using eqsel
and eqjoinsel as the restriction/join estimators.
        regards, tom lane


Re: a problem with index and user define type

From
Weiping He
Date:
Tom Lane wrote:

>"Wang Mike" <itlist@msn.com> writes:
>  
>
>>but  this query: select * from test_uuid where id = 
>>'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid   dosn't use index
>>    
>>
>
>  
>
>>                          QUERY PLAN
>>---------------------------------------------------------------
>> Seq Scan on test_uuid  (cost=0.00..22.50 rows=500 width=140)
>>   Filter: (id = 'df2b10aa-a31d-11d7-9867-0050babb6029'::uuid)
>>    
>>
>
>  
>
>>why ??
>>    
>>
>
>The rows estimate looks pretty fishy --- I think you are getting the
>0.5 default selectivity estimate for an operator that has no restriction
>estimator.  Most likely you should have created the operator using eqsel
>and eqjoinsel as the restriction/join estimators.
>
>            regards, tom lane
>
>  
>
Hi, Tom,
   I'm trying to test it, but don't know if I understood you correctly,   you mean we should  try to create the
operatorusing 
 
eqsel/eqjoinsel  estimators, right?   But after we added those estimators like this:

CREATE OPERATOR = (   LEFTARG = uuid,   RIGHTARG = uuid,   COMMUTATOR = =,   NEGATOR = <>,   PROCEDURE = uuid_eq,
RESTRICT= eqsel,   JOIN = eqjoinsel
 
);
   the situation trun worse: now the explain shows the query using the 
index,   the we can't select out the match row! Any hint about what's wrong 
with us?


Thanks and Reagards

Laser




Re: a problem with index and user define type

From
Tom Lane
Date:
Weiping He <laser@zhengmai.com.cn> writes:
>     the situation trun worse: now the explain shows the query using the 
> index,
>     the we can't select out the match row! Any hint about what's wrong 
> with us?

My bet: either your operators are broken or your operator class
definition is wrong.
        regards, tom lane


Re: a problem with index and user define type

From
Weiping He
Date:
we found the problem:
We used IMMUTABLE modifier in our CREATE FUNCTION definition,
though it's correct for our function to return same value if input the 
same *data*,
but our data are passed by reference, not by value, so, some times we can't
retrive out data. Remove IMMUTABLE fixed the problem.

So, it seems to make it clear in docs would be a good help to function 
writers,
would commit a documentation patch later if necessary.

Thank you!

Regards
Laser

Tom Lane wrote:

>Weiping He <laser@zhengmai.com.cn> writes:
>  
>
>>    the situation trun worse: now the explain shows the query using the 
>>index,
>>    the we can't select out the match row! Any hint about what's wrong 
>>with us?
>>    
>>
>
>My bet: either your operators are broken or your operator class
>definition is wrong.
>
>            regards, tom lane
>
>  
>




Re: a problem with index and user define type

From
Tom Lane
Date:
Weiping He <laser@zhengmai.com.cn> writes:
> we found the problem:
> We used IMMUTABLE modifier in our CREATE FUNCTION definition,
> though it's correct for our function to return same value if input the 
> same *data*,
> but our data are passed by reference, not by value, so, some times we can't
> retrive out data. Remove IMMUTABLE fixed the problem.

> So, it seems to make it clear in docs would be a good help to function 
> writers, would commit a documentation patch later if necessary.

I'm not sure what problem you're really describing, but it would be
entirely wrong for the docs to claim that pass-by-reference datatypes
shouldn't have immutable functions.  float8 is pass-by-ref, for
instance, but they don't come any more immutable than sqrt(x) ...

I'd suggest taking a closer look to understand what the problem really
is.  Trying to index on a non-immutable function makes no sense, which
is why the system forbids it.
        regards, tom lane


Re: a problem with index and user define type

From
Weiping He
Date:
Tom Lane wrote:

>Weiping He <laser@zhengmai.com.cn> writes:
>  
>
>>we found the problem:
>>We used IMMUTABLE modifier in our CREATE FUNCTION definition,
>>though it's correct for our function to return same value if input the 
>>same *data*,
>>but our data are passed by reference, not by value, so, some times we can't
>>retrive out data. Remove IMMUTABLE fixed the problem.
>>    
>>
>
>  
>
>>So, it seems to make it clear in docs would be a good help to function 
>>writers, would commit a documentation patch later if necessary.
>>    
>>
>
>I'm not sure what problem you're really describing, but it would be
>entirely wrong for the docs to claim that pass-by-reference datatypes
>shouldn't have immutable functions.  float8 is pass-by-ref, for
>instance, but they don't come any more immutable than sqrt(x) ...
>
>I'd suggest taking a closer look to understand what the problem really
>is.  Trying to index on a non-immutable function makes no sense, which
>is why the system forbids it.
>
>            regards, tom lane
>
>  
>
Sorry for didn't describe my problem clearly. I mean the function implement
the operator, like compare function for equal ('=') etc, not to build an 
index
on an function.

Here is full version:

First we build a user type using CREATE FUNCTION, CREATE TYPE, CREATE 
OPERATOR
and CREATE OPERATOR CLASS command, of course we wrote those C functions
needed for operator, type etc.

Then we try to test if our type (which is named UUID) could be 
indexable,  and found
it didn't use the index, but, we don't know why.

Later, we ask the question here why the index didn't get used, and you 
point out that we
should assign the selective restriction function for our operators, 
espically for '=' operator,
we use 'eqsel' per your suggestion. But found out that though the idnex 
got used, but sometimes
not data row return (and sometimes we could get the data row)!

Then we re-check our definition, and found out may be we shouldn't use 
IMMUTABLE
key word in the function definition used by the '=' operator to 
implement the equation compare,
the wrong definition is:

Datum uuid_eq(PG_FUNCTION_ARGS)
{       struct uuid *uptr1 = (struct uuid *) PG_GETARG_POINTER(0);       struct uuid *uptr2 = (struct uuid *)
PG_GETARG_POINTER(1);
       PG_RETURN_BOOL(uuidcmp(uptr1, uptr2) == 0);
}

CREATE OR REPLACE FUNCTION uuid_eq(uuid, uuid)
RETURNS boolean
IMMUTABLE
STRICT
AS '$libdir/uuid'
LANGUAGE 'C';

CREATE OPERATOR = (   LEFTARG = uuid,   RIGHTARG = uuid,   COMMUTATOR = =,   NEGATOR = <>,   PROCEDURE = uuid_eq,
RESTRICT= eqsel,   JOIN = eqjoinsel
 
);

because the data type (UUID) is a struct,
and the uuid_eq() function accept two pointer to the value of struct uuid,
if make it IMMUTABLE, postgresql would think it should not try to run
the function, but return the cached value instead when it get two same 
pointers input,
but, the pointers may be unchanged, the data pointers point to may have 
changed.
So it will cause the weird symptom we found. And removed IMMUTABLE fix the
problem. So we think may be the doc for CREATE FUNCTION should point out
the difference of passed by ref and passed by value. Thus may avoid this 
kind of
error.


Thanks and Regards

Laser



Re: a problem with index and user define type

From
Tom Lane
Date:
Weiping He <laser@zhengmai.com.cn> writes:
> because the data type (UUID) is a struct,
> and the uuid_eq() function accept two pointer to the value of struct uuid,
> if make it IMMUTABLE, postgresql would think it should not try to run
> the function, but return the cached value instead when it get two same 
> pointers input,

No, it will not.  Your claim above is entirely wrong; the fact that the
datatype is pass-by-reference doesn't affect anything (unless you've
failed to declare the datatype that way, but if so I'd not think it
would work at all).
        regards, tom lane


Re: a problem with index and user define type

From
Weiping He
Date:
Tom Lane wrote:

>Weiping He <laser@zhengmai.com.cn> writes:
>  
>
>>because the data type (UUID) is a struct,
>>and the uuid_eq() function accept two pointer to the value of struct uuid,
>>if make it IMMUTABLE, postgresql would think it should not try to run
>>the function, but return the cached value instead when it get two same 
>>pointers input,
>>    
>>
>
>No, it will not.  Your claim above is entirely wrong; the fact that the
>datatype is pass-by-reference doesn't affect anything (unless you've
>failed to declare the datatype that way, but if so I'd not think it
>would work at all).
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>  
>
yeah, you are right, it's out fault. We've mistakenly use PG_RETURN_INT16()
to return from our am support function, which prune the sign  
information from the memcmp(),
but we still declare the function to return INTEGER when CREATE 
FUNCTION. So the error,
it's fixed now, and the datatype and index run smoothly.

Thanks and Regards

Laser