Thread: [patch,rfc] binary operators on integers

[patch,rfc] binary operators on integers

From
Marko Kreen
Date:
Well, I was interested in binary operators on integers
and as Peter suggested that I should look into it
myself, so I did it.

Choice of operators:
~ - not& - and# - xor  - I like it :)| - or

Things I am unsure of:

1) Precedence.  I quite nonscientifically hacked in gram.y,  and could not still make it understand expression '5 # ~1'
nor the precedence between '&' and '|#'...
 
  At the moment all the gram.y changes could be dropped and  it works ok, but without operator precedence.  Any hints?

2) Choice of oids.  I took 1890 - 1913.  Should I have taken  directly from 1874 upwards, or somewhere else?

3) Choice of operators.  As I understand the '^' is taken,  I wont get it.  Now, in gram.y I found that the '|' is
usedin weird situations and with weird precedence so  maybe I should use something else for OR too?
 

4) Is anybody else interested? ;)


I would like to get comments/further hints on this...



-- 
marko


Re: [patch,rfc] binary operators on integers

From
Tom Lane
Date:
Marko Kreen <marko@l-t.ee> writes:
> 1) Precedence.  I quite nonscientifically hacked in gram.y,
>    and could not still make it understand expression '5 # ~1'
>    nor the precedence between '&' and '|#'...
>    At the moment all the gram.y changes could be dropped and
>    it works ok, but without operator precedence.  Any hints?

What you missed is that there's a close coupling between gram.y and
scan.l.  There are certain single-character operators that are returned
as individual-character tokens by scan.l, and these are exactly the ones
that gram.y wants to treat specially.  All else are folded into the
generic token "Op".  You'd need to twiddle the character type lists in
scan.l if you want to treat '~' '&' or '#' specially in gram.y.

However, I'm pretty dubious of the idea of changing the precedence
assigned to these operator names, because there's a real strong risk
of breaking existing applications if you do that --- worse, of breaking
them in a subtle, hard-to-detect way.  Even though I think '|' is
clearly given a bogus precedence, I doubt it's a good idea to change it.

> 3) Choice of operators.  As I understand the '^' is taken,
>    I wont get it.  Now, in gram.y I found that the '|' is
>    used in weird situations and with weird precedence so
>    maybe I should use something else for OR too?

Well, you *could* use '^' since there's no definition of it for integer
operands.  But that would mean that something like '4^2', which was
formerly implicitly coerced to float and interpreted as floating
power function, would suddenly mean something different.  Again a
serious risk of silently breaking applications.  This doesn't apply to
'|' though, since it has no numeric interpretation at all right now.

> 4) Is anybody else interested? ;)

Dunno.  I think the bitstring datatype is probably a better choice,
since it's standard and this feature is not.
        regards, tom lane


Re: [patch,rfc] binary operators on integers

From
Marko Kreen
Date:
On Fri, Sep 22, 2000 at 12:26:45PM -0400, Tom Lane wrote:
> Marko Kreen <marko@l-t.ee> writes:
> > 1) Precedence.  I quite nonscientifically hacked in gram.y,
> >    and could not still make it understand expression '5 # ~1'
> >    nor the precedence between '&' and '|#'...
> >    At the moment all the gram.y changes could be dropped and
> >    it works ok, but without operator precedence.  Any hints?
> 
> What you missed is that there's a close coupling between gram.y and
> scan.l.  There are certain single-character operators that are returned
> as individual-character tokens by scan.l, and these are exactly the ones
> that gram.y wants to treat specially.  All else are folded into the
> generic token "Op".  You'd need to twiddle the character type lists in
> scan.l if you want to treat '~' '&' or '#' specially in gram.y.
> 
> However, I'm pretty dubious of the idea of changing the precedence
> assigned to these operator names, because there's a real strong risk
> of breaking existing applications if you do that --- worse, of breaking
> them in a subtle, hard-to-detect way.  Even though I think '|' is
> clearly given a bogus precedence, I doubt it's a good idea to change it.
> 
I guess I better drop it then...

One idea I had while looking at gram.y is that the precedence
should somehow based on context e.g. depending on what datatypes
operator is used.  Escpecially because one symbol has different
meaning based on data.  Heh, but this would be complex...

> > 3) Choice of operators.  As I understand the '^' is taken,
> >    I wont get it.  Now, in gram.y I found that the '|' is
> >    used in weird situations and with weird precedence so
> >    maybe I should use something else for OR too?
> 
> Well, you *could* use '^' since there's no definition of it for integer
> operands.  But that would mean that something like '4^2', which was
> formerly implicitly coerced to float and interpreted as floating
> power function, would suddenly mean something different.  Again a
> serious risk of silently breaking applications.  This doesn't apply to
> '|' though, since it has no numeric interpretation at all right now.
> 
I am afraid of '^'.  Also as the 'power' precedence would be used
it would be very un-intuitive, no-precedence is better.  OTOH the
bit-string stuff uses '^' (with its precedence) so it would be nice
to be similar?

> > 4) Is anybody else interested? ;)
> 
> Dunno.  I think the bitstring datatype is probably a better choice,
> since it's standard and this feature is not.
> 
I looked at it and did not liked it.  If it is from some standard
then its nice to PostgreSQL to support it, but somehow I guess
that the binary ops on int's would be used more than the bit-string
stuff.  Mostly because its something familiar from other languages.
But maybe its just me...

I'll send a revised diff shortly

-- 
marko



Re: [patch,rfc] binary operators on integers

From
Peter Eisentraut
Date:
Well, what are we going to do with this?  I think we should take it.  
Since I encouraged him to write it, I'd volunteer to take care of it.

We might want to change the bitxor operator to # (or at least something
distinct from ^) as well, for consistency.

Marko Kreen writes:

> 
> Well, I was interested in binary operators on integers
> and as Peter suggested that I should look into it
> myself, so I did it.
> 
> Choice of operators:
> 
>  ~ - not
>  & - and
>  # - xor  - I like it :)
>  | - or
> 
> Things I am unsure of:
> 
> 1) Precedence.  I quite nonscientifically hacked in gram.y,
>    and could not still make it understand expression '5 # ~1'
>    nor the precedence between '&' and '|#'...
> 
>    At the moment all the gram.y changes could be dropped and
>    it works ok, but without operator precedence.  Any hints?
> 
> 2) Choice of oids.  I took 1890 - 1913.  Should I have taken
>    directly from 1874 upwards, or somewhere else?
> 
> 3) Choice of operators.  As I understand the '^' is taken,
>    I wont get it.  Now, in gram.y I found that the '|' is
>    used in weird situations and with weird precedence so
>    maybe I should use something else for OR too?
> 
> 4) Is anybody else interested? ;)
> 
> 
> I would like to get comments/further hints on this...

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Precedence of '|' operator (was Re: [patch,rfc] binary operators on integers)

From
Peter Eisentraut
Date:
Tom Lane writes:

> Even though I think '|' is clearly given a bogus precedence, I doubt
> it's a good idea to change it.

The only builtin '|' operator, besides the not-there-yet bitor, is some
arcane prefix operator for the "tinterval" type, which returns the start
of the interval.  This is all long dead so that would perhaps give us a
chance to change this before we add "or" operators.  That might weigh more
than the possibility of a few users having highly specialized '|'
operators that rely on this precedence.

The tinterval type has pretty interesting parsing rules, btw.:

peter=# select 'whatever you say'::tinterval;                     ?column?
-----------------------------------------------------["1935-12-23 09:42:00+01" "1974-04-16 17:52:52+01"]
(1 row)

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> Even though I think '|' is clearly given a bogus precedence, I doubt
>> it's a good idea to change it.

> The only builtin '|' operator, besides the not-there-yet bitor, is some
> arcane prefix operator for the "tinterval" type, which returns the start
> of the interval.  This is all long dead so that would perhaps give us a
> chance to change this before we add "or" operators.  That might weigh more
> than the possibility of a few users having highly specialized '|'
> operators that rely on this precedence.

Well, that's a good point --- it isn't going to get any less painful to
fix it later.  Do we want to just remove the special treatment of '|'
and let it become one with the undifferentiated mass of Op, or do we
want to try to set up reasonable precedence for all the bitwise
operators (and if so, what should that be)?  The second choice has a
greater chance of breaking existing apps because it's changing more
operators ...

Thomas, any opinions here?
        regards, tom lane


Re: [patch,rfc] binary operators on integers

From
Marko Kreen
Date:
On Thu, Oct 12, 2000 at 09:34:05PM +0200, Peter Eisentraut wrote:
> Well, what are we going to do with this?  I think we should take it.  
> Since I encouraged him to write it, I'd volunteer to take care of it.

Nice :)

> We might want to change the bitxor operator to # (or at least something
> distinct from ^) as well, for consistency.

Note that a sent a updated patch to pgsql-patches, which had
added <<, >> operators and the gram.y stuff removed.  But there
I changed the xor operator to '^'.  So I can send updated patch
where xor='#', when this was lost?  pg_operator.h was there more
cleaner too.

-- 
marko



On Thu, Oct 12, 2000 at 04:18:05PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Tom Lane writes:
> >> Even though I think '|' is clearly given a bogus precedence, I doubt
> >> it's a good idea to change it.
> 
> > The only builtin '|' operator, besides the not-there-yet bitor, is some
> > arcane prefix operator for the "tinterval" type, which returns the start
> > of the interval.  This is all long dead so that would perhaps give us a
> > chance to change this before we add "or" operators.  That might weigh more
> > than the possibility of a few users having highly specialized '|'
> > operators that rely on this precedence.
> 
> Well, that's a good point --- it isn't going to get any less painful to
> fix it later.  Do we want to just remove the special treatment of '|'
> and let it become one with the undifferentiated mass of Op, or do we
> want to try to set up reasonable precedence for all the bitwise
> operators (and if so, what should that be)?  The second choice has a
> greater chance of breaking existing apps because it's changing more
> operators ...
> 
For bitops it would be nice if '~' had a precedence equal to other
builtin unary operators, '&' had higher precedence than '#' and '|'.
(C has also XOR higher that OR).

About breaking existing apps - all those operators [~|#&] are
not actually in use (well, in PostgreSQL mainstream) Only
bitstring in 7.1 will start using them and I guess it has hopefully
same precedence needs :)  But yes, some outside add-on may use
them or maybe when in future those ops will be used for something
else then it will be messy...

Well, it is not for me to decide, but a Nice Thing would be:
(Looking at 'Lexical precedence' in docs)

[- unary minus]        '~' unary BITNOT

...

[+ - add sub]        & BITAND
[ IS ]

...

[(all other) ]        '#', '|'


Also note that bitstring uses '^' for xor so it has a little
weird rules and is inconsistent with this.

-- 
marko



Re: Precedence of '|' operator (was Re: [patch,rfc] binary operators on integers)

From
Thomas Lockhart
Date:
> Well, that's a good point --- it isn't going to get any less painful to
> fix it later.  Do we want to just remove the special treatment of '|'
> and let it become one with the undifferentiated mass of Op, or do we
> want to try to set up reasonable precedence for all the bitwise
> operators (and if so, what should that be)?  The second choice has a
> greater chance of breaking existing apps because it's changing more
> operators ...
> Thomas, any opinions here?

I'd like to see closer adherence to the "usual" operator precedence. But
I really *hate* having to explicitly call out each rule in the a_expr,
b_expr, and/or c_expr productions. Any way around this?
                         - Thomas


Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> I'd like to see closer adherence to the "usual" operator precedence. But
> I really *hate* having to explicitly call out each rule in the a_expr,
> b_expr, and/or c_expr productions. Any way around this?

It's not easy in yacc/bison, I don't believe.  Precedence of an operator
is converted to precedence of associated productions, so there's no way
to make it work without an explicit production for each operator token
that needs a particular precedence.

In any case, the only way to make things really significantly better
would be if the precedence of an operator could be specified in its
pg_operator entry.  That would be way cool, but (a) yacc can't do it,
(b) there's a fundamental circularity in the idea: you can't identify
an operator's pg_operator entry until you know its input data types,
which means you have to have already decided which subexpressions are
its inputs, and (c) the grammar phase of parsing cannot look at database
entries anyway because of transaction-abort issues.

Because of point (b) there is no chance of driving precedence lookup
from pg_operator anyway.  You can only drive precedence lookup from
the operator *name*, not the input datatypes.  This being so, I don't
see any huge advantage to having the precedence be specified in a
database table as opposed to hard-coding it in the grammar files.

One thing that might reduce the rule bloat a little bit is to have
just one symbolic token (like the existing Op) for each operator
precedence level, thus only one production per precedence level in
a_expr and friends.  Then the lexer would have to have a table to
look up operator names to see which symbolic token to return them
as.  Still don't get to go to the database, but at least setting a
particular operator name's precedence is a one-liner affair instead
of a matter of multiple rules.
        regards, tom lane