Thread: Various breakages in new contrib/isn module

Various breakages in new contrib/isn module

From
Tom Lane
Date:
It occurred to me to run the regression tests type_sanity and opr_sanity
over the contrib/isn code.  (The easy way to do this is to copy the isn
install script into the regress/sql directory and then add it to
serial_schedule just before those two tests.)  It turned up a couple
of moderately serious problems:

* There are a whole bunch of "shell" operators created; tryselect oid::regoperator from pg_operator where oprcode = 0;
after loading isn.  I didn't track it down in detail, but it looked
like most or all of these come from dangling oprcom links, ie, there's
an operator that claims to have a commutator but you never supplied one.
This is very bad, because the planner *will* try to use those operators
given the right kind of query.

* There are hash opclasses for these datatypes but the corresponding
equality operators are not marked hashable.  This is not quite as bad,
but should be fixed.

Please submit a patch that fixes these.

Note to hackers: it might be worth trying the same thing with the other
contrib modules; I don't have time right now though.
        regards, tom lane


Re: Various breakages in new contrib/isn module

From
Tom Lane
Date:
I wrote:
> * There are a whole bunch of "shell" operators created; try
>     select oid::regoperator from pg_operator where oprcode = 0;
> after loading isn.

Actually, the problem with this module is not that it's got too few
operators, but that it's got too many.  Trying to have the cross-product
of all possible comparison operators for eight nominally different types
is not very sane.

If we were to make the casts to ean13 be IMPLICIT instead of assignment,
then we could have one set of comparison operators and one opclass for
ean13, and let the restricted types rely on those implicitly.  Is there
a good reason to disallow the implicit promotion?
        regards, tom lane


Re: Various breakages in new contrib/isn module

From
Tom Lane
Date:
I wrote:
> If we were to make the casts to ean13 be IMPLICIT instead of assignment,
> then we could have one set of comparison operators and one opclass for
> ean13, and let the restricted types rely on those implicitly.

I tried this and found out it's got one killer disadvantage: if one
writes something like
SELECT ... WHERE isbn_column = 'isbn-constant';

then the unknown-literal constant would get resolved as type EAN13
not type ISBN, which is not what we want.  The existing cases where
we rely on implicit promotion, such as varchar vs text, don't have a
problem here because there's no difference in the input syntax.

It might be possible to adjust the parser so that the unknown literal is
resolved as type ISBN and then promoted to type EAN13, but I'm not going
to mess with that sort of thing just before RC1.  There may be some
interactions with the domain-related resolution rule changes that Elein
would like to make, too.  Something to think about later.

It'd be worth doing something about this, because as contrib/isn now
stands it increases the number of operators named "=" (also "<=" etc) in
a standard installation by more than 50%.  That causes a measurable
performance degradation for parsing simple queries, whether or not they
involve any isn datatypes, because of the cost of resolving the meaning
of any use of "=" :-(.  And we'd really need more --- the module still
doesn't provide the complete set of cross-isn-type comparisons, meaning
we can't mark any of them as mergejoinable.  Ugh.
        regards, tom lane


Re: Various breakages in new contrib/isn module

From
"Jeremy Kronuz"
Date:
Hello,

Tom Lane wrote:
>I tried this and found out it's got one killer disadvantage: if one
>writes something like
>
>    SELECT ... WHERE isbn_column = 'isbn-constant';
>
>then the unknown-literal constant would get resolved as type EAN13
>not type ISBN, which is not what we want.  The existing cases where
>we rely on implicit promotion, such as varchar vs text, don't have a
>problem here because there's no difference in the input syntax.

Even as the unknown-literal constant is resolved as type EAN13 and not a 
ISBN, internally they both are the same type, so casting should be done 
almost immediately and without further conversions; does it really matter 
much if more than a function is called to do the casting? (an isbn_column is 
actually internally the same as a EAN13, thus comparisons are done really 
fast by comparing two 64 bits integers for any two types)

>It'd be worth doing something about this, because as contrib/isn now
>stands it increases the number of operators named "=" (also "<=" etc) in
>a standard installation by more than 50%.  That causes a measurable
>performance degradation for parsing simple queries, whether or not they
>involve any isn datatypes, because of the cost of resolving the meaning
>of any use of "=" :-(.  And we'd really need more --- the module still
>doesn't provide the complete set of cross-isn-type comparisons, meaning
>we can't mark any of them as mergejoinable.  Ugh.

For sure that's a bad thing... there are just too many operators and casting 
rules in the isn module, however I'm sure there should be a way to keep this 
functionality and still maintain performance; I mean, I think we should 
really find a way of making it work.

I also got some suggestions; one of them being to add even more casting 
features such as for being able to cast from ISBN or ISSN directly to EAN13. 
For example, the first one *should* work, or would be nice if it did:

=> select ean13(1421500205);
ERROR:  cannot cast ISBN to EAN13 for number: "1421500205"
=> select ean13(isbn13(1421500205));      ean13
-------------------
978-1-4215-0020-1
(1 row)

I've received some very possitive feedback from the users about the isn 
module:

"I just started playing around with your most excellent isn functions,
and I'm pretty impressed."

"The developers here LOOOOOVE [the isn module]! We're actually
changing our data strcutures, to use it as a primary key - now that we
can finally trust the data! It is INVALUABLE! Thanks again!"

...so I belive it's a really useful module, we must try to find a way to 
improve the way operators are handled or a way to reduce the number of them 
needed by the module to accomplish the same tasks.

I'll wait for your suggestions, I hope we can work this out for the better 
and find a good way to increase the performance, I'll try to help in any way 
I can.

Regards,
Kronuz.
"Fools rush in where fools have been before" - Unknown

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: Various breakages in new contrib/isn module

From
Bruce Momjian
Date:
Added to TODO:

* Clean up casting in /contrib/isn
 http://archives.postgresql.org/pgsql-hackers/2006-11/msg00245.php


---------------------------------------------------------------------------

Tom Lane wrote:
> It occurred to me to run the regression tests type_sanity and opr_sanity
> over the contrib/isn code.  (The easy way to do this is to copy the isn
> install script into the regress/sql directory and then add it to
> serial_schedule just before those two tests.)  It turned up a couple
> of moderately serious problems:
> 
> * There are a whole bunch of "shell" operators created; try
>     select oid::regoperator from pg_operator where oprcode = 0;
> after loading isn.  I didn't track it down in detail, but it looked
> like most or all of these come from dangling oprcom links, ie, there's
> an operator that claims to have a commutator but you never supplied one.
> This is very bad, because the planner *will* try to use those operators
> given the right kind of query.
> 
> * There are hash opclasses for these datatypes but the corresponding
> equality operators are not marked hashable.  This is not quite as bad,
> but should be fixed.
> 
> Please submit a patch that fixes these.
> 
> Note to hackers: it might be worth trying the same thing with the other
> contrib modules; I don't have time right now though.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +