Thread: [patch,rfc] binary operators on integers
Well, I was interested in binary operators on integers and as Peter suggested that I should look into it myself, so I did it. Choice of operators: ~ - not& - and# - xor - I like it :)| - or Things I am unsure of: 1) Precedence. I quite nonscientifically hacked in gram.y, and could not still make it understand expression '5 # ~1' nor the precedence between '&' and '|#'... At the moment all the gram.y changes could be dropped and it works ok, but without operator precedence. Any hints? 2) Choice of oids. I took 1890 - 1913. Should I have taken directly from 1874 upwards, or somewhere else? 3) Choice of operators. As I understand the '^' is taken, I wont get it. Now, in gram.y I found that the '|' is usedin weird situations and with weird precedence so maybe I should use something else for OR too? 4) Is anybody else interested? ;) I would like to get comments/further hints on this... -- marko
Marko Kreen <marko@l-t.ee> writes: > 1) Precedence. I quite nonscientifically hacked in gram.y, > and could not still make it understand expression '5 # ~1' > nor the precedence between '&' and '|#'... > At the moment all the gram.y changes could be dropped and > it works ok, but without operator precedence. Any hints? What you missed is that there's a close coupling between gram.y and scan.l. There are certain single-character operators that are returned as individual-character tokens by scan.l, and these are exactly the ones that gram.y wants to treat specially. All else are folded into the generic token "Op". You'd need to twiddle the character type lists in scan.l if you want to treat '~' '&' or '#' specially in gram.y. However, I'm pretty dubious of the idea of changing the precedence assigned to these operator names, because there's a real strong risk of breaking existing applications if you do that --- worse, of breaking them in a subtle, hard-to-detect way. Even though I think '|' is clearly given a bogus precedence, I doubt it's a good idea to change it. > 3) Choice of operators. As I understand the '^' is taken, > I wont get it. Now, in gram.y I found that the '|' is > used in weird situations and with weird precedence so > maybe I should use something else for OR too? Well, you *could* use '^' since there's no definition of it for integer operands. But that would mean that something like '4^2', which was formerly implicitly coerced to float and interpreted as floating power function, would suddenly mean something different. Again a serious risk of silently breaking applications. This doesn't apply to '|' though, since it has no numeric interpretation at all right now. > 4) Is anybody else interested? ;) Dunno. I think the bitstring datatype is probably a better choice, since it's standard and this feature is not. regards, tom lane
On Fri, Sep 22, 2000 at 12:26:45PM -0400, Tom Lane wrote: > Marko Kreen <marko@l-t.ee> writes: > > 1) Precedence. I quite nonscientifically hacked in gram.y, > > and could not still make it understand expression '5 # ~1' > > nor the precedence between '&' and '|#'... > > At the moment all the gram.y changes could be dropped and > > it works ok, but without operator precedence. Any hints? > > What you missed is that there's a close coupling between gram.y and > scan.l. There are certain single-character operators that are returned > as individual-character tokens by scan.l, and these are exactly the ones > that gram.y wants to treat specially. All else are folded into the > generic token "Op". You'd need to twiddle the character type lists in > scan.l if you want to treat '~' '&' or '#' specially in gram.y. > > However, I'm pretty dubious of the idea of changing the precedence > assigned to these operator names, because there's a real strong risk > of breaking existing applications if you do that --- worse, of breaking > them in a subtle, hard-to-detect way. Even though I think '|' is > clearly given a bogus precedence, I doubt it's a good idea to change it. > I guess I better drop it then... One idea I had while looking at gram.y is that the precedence should somehow based on context e.g. depending on what datatypes operator is used. Escpecially because one symbol has different meaning based on data. Heh, but this would be complex... > > 3) Choice of operators. As I understand the '^' is taken, > > I wont get it. Now, in gram.y I found that the '|' is > > used in weird situations and with weird precedence so > > maybe I should use something else for OR too? > > Well, you *could* use '^' since there's no definition of it for integer > operands. But that would mean that something like '4^2', which was > formerly implicitly coerced to float and interpreted as floating > power function, would suddenly mean something different. Again a > serious risk of silently breaking applications. This doesn't apply to > '|' though, since it has no numeric interpretation at all right now. > I am afraid of '^'. Also as the 'power' precedence would be used it would be very un-intuitive, no-precedence is better. OTOH the bit-string stuff uses '^' (with its precedence) so it would be nice to be similar? > > 4) Is anybody else interested? ;) > > Dunno. I think the bitstring datatype is probably a better choice, > since it's standard and this feature is not. > I looked at it and did not liked it. If it is from some standard then its nice to PostgreSQL to support it, but somehow I guess that the binary ops on int's would be used more than the bit-string stuff. Mostly because its something familiar from other languages. But maybe its just me... I'll send a revised diff shortly -- marko
Well, what are we going to do with this? I think we should take it. Since I encouraged him to write it, I'd volunteer to take care of it. We might want to change the bitxor operator to # (or at least something distinct from ^) as well, for consistency. Marko Kreen writes: > > Well, I was interested in binary operators on integers > and as Peter suggested that I should look into it > myself, so I did it. > > Choice of operators: > > ~ - not > & - and > # - xor - I like it :) > | - or > > Things I am unsure of: > > 1) Precedence. I quite nonscientifically hacked in gram.y, > and could not still make it understand expression '5 # ~1' > nor the precedence between '&' and '|#'... > > At the moment all the gram.y changes could be dropped and > it works ok, but without operator precedence. Any hints? > > 2) Choice of oids. I took 1890 - 1913. Should I have taken > directly from 1874 upwards, or somewhere else? > > 3) Choice of operators. As I understand the '^' is taken, > I wont get it. Now, in gram.y I found that the '|' is > used in weird situations and with weird precedence so > maybe I should use something else for OR too? > > 4) Is anybody else interested? ;) > > > I would like to get comments/further hints on this... -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Precedence of '|' operator (was Re: [patch,rfc] binary operators on integers)
From
Peter Eisentraut
Date:
Tom Lane writes: > Even though I think '|' is clearly given a bogus precedence, I doubt > it's a good idea to change it. The only builtin '|' operator, besides the not-there-yet bitor, is some arcane prefix operator for the "tinterval" type, which returns the start of the interval. This is all long dead so that would perhaps give us a chance to change this before we add "or" operators. That might weigh more than the possibility of a few users having highly specialized '|' operators that rely on this precedence. The tinterval type has pretty interesting parsing rules, btw.: peter=# select 'whatever you say'::tinterval; ?column? -----------------------------------------------------["1935-12-23 09:42:00+01" "1974-04-16 17:52:52+01"] (1 row) -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Re: Precedence of '|' operator (was Re: [patch,rfc] binary operators on integers)
From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> Even though I think '|' is clearly given a bogus precedence, I doubt >> it's a good idea to change it. > The only builtin '|' operator, besides the not-there-yet bitor, is some > arcane prefix operator for the "tinterval" type, which returns the start > of the interval. This is all long dead so that would perhaps give us a > chance to change this before we add "or" operators. That might weigh more > than the possibility of a few users having highly specialized '|' > operators that rely on this precedence. Well, that's a good point --- it isn't going to get any less painful to fix it later. Do we want to just remove the special treatment of '|' and let it become one with the undifferentiated mass of Op, or do we want to try to set up reasonable precedence for all the bitwise operators (and if so, what should that be)? The second choice has a greater chance of breaking existing apps because it's changing more operators ... Thomas, any opinions here? regards, tom lane
On Thu, Oct 12, 2000 at 09:34:05PM +0200, Peter Eisentraut wrote: > Well, what are we going to do with this? I think we should take it. > Since I encouraged him to write it, I'd volunteer to take care of it. Nice :) > We might want to change the bitxor operator to # (or at least something > distinct from ^) as well, for consistency. Note that a sent a updated patch to pgsql-patches, which had added <<, >> operators and the gram.y stuff removed. But there I changed the xor operator to '^'. So I can send updated patch where xor='#', when this was lost? pg_operator.h was there more cleaner too. -- marko
Re: Precedence of '|' operator (was Re: [patch,rfc] binary operators on integers)
From
Marko Kreen
Date:
On Thu, Oct 12, 2000 at 04:18:05PM -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane writes: > >> Even though I think '|' is clearly given a bogus precedence, I doubt > >> it's a good idea to change it. > > > The only builtin '|' operator, besides the not-there-yet bitor, is some > > arcane prefix operator for the "tinterval" type, which returns the start > > of the interval. This is all long dead so that would perhaps give us a > > chance to change this before we add "or" operators. That might weigh more > > than the possibility of a few users having highly specialized '|' > > operators that rely on this precedence. > > Well, that's a good point --- it isn't going to get any less painful to > fix it later. Do we want to just remove the special treatment of '|' > and let it become one with the undifferentiated mass of Op, or do we > want to try to set up reasonable precedence for all the bitwise > operators (and if so, what should that be)? The second choice has a > greater chance of breaking existing apps because it's changing more > operators ... > For bitops it would be nice if '~' had a precedence equal to other builtin unary operators, '&' had higher precedence than '#' and '|'. (C has also XOR higher that OR). About breaking existing apps - all those operators [~|#&] are not actually in use (well, in PostgreSQL mainstream) Only bitstring in 7.1 will start using them and I guess it has hopefully same precedence needs :) But yes, some outside add-on may use them or maybe when in future those ops will be used for something else then it will be messy... Well, it is not for me to decide, but a Nice Thing would be: (Looking at 'Lexical precedence' in docs) [- unary minus] '~' unary BITNOT ... [+ - add sub] & BITAND [ IS ] ... [(all other) ] '#', '|' Also note that bitstring uses '^' for xor so it has a little weird rules and is inconsistent with this. -- marko
Re: Precedence of '|' operator (was Re: [patch,rfc] binary operators on integers)
From
Thomas Lockhart
Date:
> Well, that's a good point --- it isn't going to get any less painful to > fix it later. Do we want to just remove the special treatment of '|' > and let it become one with the undifferentiated mass of Op, or do we > want to try to set up reasonable precedence for all the bitwise > operators (and if so, what should that be)? The second choice has a > greater chance of breaking existing apps because it's changing more > operators ... > Thomas, any opinions here? I'd like to see closer adherence to the "usual" operator precedence. But I really *hate* having to explicitly call out each rule in the a_expr, b_expr, and/or c_expr productions. Any way around this? - Thomas
Re: Precedence of '|' operator (was Re: [patch,rfc] binary operators on integers)
From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > I'd like to see closer adherence to the "usual" operator precedence. But > I really *hate* having to explicitly call out each rule in the a_expr, > b_expr, and/or c_expr productions. Any way around this? It's not easy in yacc/bison, I don't believe. Precedence of an operator is converted to precedence of associated productions, so there's no way to make it work without an explicit production for each operator token that needs a particular precedence. In any case, the only way to make things really significantly better would be if the precedence of an operator could be specified in its pg_operator entry. That would be way cool, but (a) yacc can't do it, (b) there's a fundamental circularity in the idea: you can't identify an operator's pg_operator entry until you know its input data types, which means you have to have already decided which subexpressions are its inputs, and (c) the grammar phase of parsing cannot look at database entries anyway because of transaction-abort issues. Because of point (b) there is no chance of driving precedence lookup from pg_operator anyway. You can only drive precedence lookup from the operator *name*, not the input datatypes. This being so, I don't see any huge advantage to having the precedence be specified in a database table as opposed to hard-coding it in the grammar files. One thing that might reduce the rule bloat a little bit is to have just one symbolic token (like the existing Op) for each operator precedence level, thus only one production per precedence level in a_expr and friends. Then the lexer would have to have a table to look up operator names to see which symbolic token to return them as. Still don't get to go to the database, but at least setting a particular operator name's precedence is a one-liner affair instead of a matter of multiple rules. regards, tom lane