Thread: extend pgbench expressions with functions
This patch extends pgbench expression with functions. Currently only one "abs" function is added. The point is rather to bootstrap the infrastructure for other functions (such as hash, random variants...) to be added later. -- Fabien.
> This patch extends pgbench expression with functions. Currently only one > "abs" function is added. The point is rather to bootstrap the infrastructure > for other functions (such as hash, random variants...) to be added later. Here is a small v2 update: - with better error messages on non existing functions - a minimal documentation -- Fabien.
> Here is a small v2 update: v3, just a rebase. -- Fabien.
On Sun, Mar 29, 2015 at 2:20 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> Here is a small v2 update: > > v3, just a rebase. Thanks for working on this. I see it's already registered in the 2015-06 CF, and will have a look at when we get there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>> v3, just a rebase. > > Thanks for working on this. I see it's already registered in the > 2015-06 CF, and will have a look at when we get there. v4, rebase (after pgbench moved to src) and use of the recently added syntax_error function. -- Fabien.
> v4, rebase (after pgbench moved to src) and use of the recently added > syntax_error function. v5 which adds a forgotten header declaration for a new function, so as to avoid a warning. -- Fabien.
>> v4, rebase (after pgbench moved to src) and use of the recently added >> syntax_error function. > > v5 which adds a forgotten header declaration for a new function, so as to > avoid a warning. While playing around for adding other functions I noticed that with the parsing rules I set unknown function were detected quite late in the process, so that the error messages was imprecise. v6 fixes this by checking function names ealier. -- Fabien.
v7: rebase after pgindent run for 9.6. -- Fabien.
On 03/06/2015 11:41 AM, Fabien COELHO wrote: > This patch extends pgbench expression with functions. Currently only one > "abs" function is added. The point is rather to bootstrap the > infrastructure for other functions (such as hash, random variants...) to > be added later. I think it would actually be good to add at least some of those other functions in the initial patch. The infrastructure that this patch adds only supports arguments with a single argument, so it won't get us very far. Also, will we need non-integer (e.g. string, numeric, whatever) arguments for the functions? How about other datatypes for variables in general? Perhaps not, or if we do that can be a separate patch, but it's something to keep in mind. The pgbench script language is evolving into a full-blown Turing-complete programming language... As an initial list of functions, I'd suggest: abs(x) min(x, y, ...) max(x, y, ...) random_uniform(min, max) random_gaussian(min, max, threshold) random_exponential(min, max, threshold) Would that be enough to specify e.g. the As soon as we add more functions, the way they are documented needs to be reworked too; we'll need to add a table in the manual to list them. - Heikki
Hello Heikki, >> This patch extends pgbench expression with functions. Currently only one >> "abs" function is added. The point is rather to bootstrap the >> infrastructure for other functions (such as hash, random variants...) to >> be added later. > > I think it would actually be good to add at least some of those other > functions in the initial patch. Hmmm, sure. I wanted some feedback on the "how" before doing that, hence the infrastructure patch submitted with just one fonction. Obviously I can expand, but before that any opinion on the "how"? For instance I decided against having individual functions recognized by the lexer, and to really have an infrastructure for storing, checking and adding them without lex/yacc. > The infrastructure that this patch adds only supports arguments with a > single argument, so it won't get us very far. > Also, will we need non-integer (e.g. string, numeric, whatever) > arguments for the functions? Maybe float *constants* for random exponential & gaussian. > How about other datatypes for variables in general? The point is not to develop another full language in pgbench. People can do things with PL/pgSQL & server side if it must be really advanced, the point is really to facilitate pgbench "simple" scripts. > Perhaps not, or if we do that can be a separate patch, but it's > something to keep in mind. The pgbench script language is evolving into > a full-blown Turing-complete programming language... The point is *NOT* to do that. For Turing, basically you would need while or recursion & condition. Currently there is no such thing and not plan for such thing, and I do not think it is desirable. > As an initial list of functions, I'd suggest: > > abs(x) This is the one included in the patch. > min(x, y, ...) > max(x, y, ...) Hmm, varargs... > random_uniform(min, max) Ok. probably just "random"? > random_gaussian(min, max, threshold) > random_exponential(min, max, threshold) Hmm, threshold is a float. > As soon as we add more functions, the way they are documented needs to > be reworked too; we'll need to add a table in the manual to list them. Yep. -- Fabien.
Hello Heikki, > As an initial list of functions, I'd suggest: > > abs(x) > min(x, y, ...) > max(x, y, ...) > random_uniform(min, max) > random_gaussian(min, max, threshold) > random_exponential(min, max, threshold) > > Would that be enough to specify e.g. the > > As soon as we add more functions, the way they are documented needs to be > reworked too; we'll need to add a table in the manual to list them. Here is a v8 with "abs", "min", "max", "random", "gaussian" et "exponential". There is no expression typing, but expressions are evaluated and cast to the expected type depending on what is needed (variables expect an int value, function arguments are usually int but for threshold...). There is an evalDouble function to evaluate double arguments for gaussian & exponential thresholds. There is a special "int" function which evaluates its argument as a double and cast the result to an int, for testing purpose. Variables are only int values, and functions only return ints. There is no real doc, WIP... Also included are a set of sql stuff I used for minimal tests. -- Fabien.
Hello Heikki, >> As soon as we add more functions, the way they are documented needs to be >> reworked too; we'll need to add a table in the manual to list them. > > Here is a v8 with "abs", "min", "max", "random", "gaussian" et > "exponential". > > [...] > > There is no real doc, WIP... Here is a v9 with a doc. The implicit typing of expressions is improved. I also added two debug functions which allow to show intermediate integer (idebug) or double (ddebug). \set i idebug(random(1, 10)) will print the random value and assign it to i. I updated the defaut scripts, which seems to speed up meta command evaluations. The initial version does less than 2 million evals per second: sh> cat before.sql \set naccounts 100000 * :scale \setrandom aid 1 :naccounts sh> ./pgbench -T 3 -P 1 -f before.sql [...] tps = 1899004.793098 (excluding connections establishing) The updated version does more than 3 million evals per second: sh> cat after.sql \set aid random(1, 100000 * :scale) sh> ./pgbench -T 3 -P 1 -f after.sql [...] tps = 3143168.813690 (excluding connections establishing) Suggestion: A possibility would be to remove altogether the \setrandom stuff as the functionality is available through \set, maybe with an error message to advise about using \set with one of the random functions. That would remove about 200 ugly locs... Another option would be to translate the setrandom stuff into a \set expression, that would maybe save 100 locs for the eval, but keep and expand a little the "manual" parser part. -- Fabien.
On Sun, Jul 26, 2015 at 6:37 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello Heikki, > >>> As soon as we add more functions, the way they are documented needs to be >>> reworked too; we'll need to add a table in the manual to list them. >> >> >> Here is a v8 with "abs", "min", "max", "random", "gaussian" et >> "exponential". >> >> [...] >> >> There is no real doc, WIP... > > > Here is a v9 with a doc. The implicit typing of expressions is improved. > > I also added two debug functions which allow to show intermediate integer > (idebug) or double (ddebug). > > \set i idebug(random(1, 10)) > > will print the random value and assign it to i. > > > I updated the defaut scripts, which seems to speed up meta command > evaluations. The initial version does less than 2 million evals per second: > > sh> cat before.sql > \set naccounts 100000 * :scale > \setrandom aid 1 :naccounts > > sh> ./pgbench -T 3 -P 1 -f before.sql > [...] > tps = 1899004.793098 (excluding connections establishing) > > > The updated version does more than 3 million evals per second: > > sh> cat after.sql > \set aid random(1, 100000 * :scale) > > sh> ./pgbench -T 3 -P 1 -f after.sql > [...] > tps = 3143168.813690 (excluding connections establishing) > > > Suggestion: > > A possibility would be to remove altogether the \setrandom stuff as the > functionality is available through \set, maybe with an error message to > advise about using \set with one of the random functions. That would remove > about 200 ugly locs... > > Another option would be to translate the setrandom stuff into a \set > expression, that would maybe save 100 locs for the eval, but keep and expand > a little the "manual" parser part. I have moved this patch to the next CF. -- Michael
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation: tested, failed Hello. I installed centos 6.7 version on virtualbax, have cloned postgres from GitHub. And then i patched to postgres as the patch file with the following command: $ patch -p1 < pgbench-expr-abs-9.patch To make sure that the patch apply to postgres, i wrote two script files. The script file is: $ cat before.sql \set naccounts 100000 * :scale \setrandom aid 1 :naccounts $ cat after.sql \set aid random(1, 100000 * :scale) Using the script file in pgbench, i run : $ ./pgbench -T 3 -P 1 -f before.sql $ ./pgbench -T 3 -P 1 -f after.sql This was run well. I test for the functions(e.g abs, ddebug, double, exporand, idebug, int, gaussrand, min, max, random,sqrt) mentioned in the patch, it was also executed without error. Finally i reviewed code in patch, but did not find to error.
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed Hello. I installed centos 6.7 version on virtualbax, have cloned postgres from GitHub. And then i patched to postgres as the patch file with the following command: $ patch -p1 < pgbench-expr-abs-9.patch To make sure that the patch apply to postgres, i wrote two script files. The script file is: $ cat before.sql \set naccounts 100000 * :scale \setrandom aid 1 :naccounts $ cat after.sql \set aid random(1, 100000 * :scale) Using the script file in pgbench, i run : $ ./pgbench -T 3 -P 1 -f before.sql $ ./pgbench -T 3 -P 1 -f after.sql This was run well. I test for the functions(e.g abs, ddebug, double, exporand, idebug, int, gaussrand, min, max, random,sqrt) mentioned in the patch, it was also executed without error. Finally i reviewed code in patch, but did not find to error.
Hi, I had a look on this, this gives amazing performance. (I myself haven't tried that yet:-p) But anyway the new syntax looks far smarter than preparing arbitraly metacommands. michael> I have moved this patch to the next CF. > > Hello Heikki, > > > >>> As soon as we add more functions, the way they are documented needs to be > >>> reworked too; we'll need to add a table in the manual to list them. > >> > >> > >> Here is a v8 with "abs", "min", "max", "random", "gaussian" et > >> "exponential". > >> > >> [...] > >> > >> There is no real doc, WIP... As a quick glance on the patch, I have two simple comment. Would you consider these comments? 1.evalInt and evalDouble are mutually calling on single exprnode. It looks simple and seems to work but could easily brokentofall into infinite call and finally (in a moment) exceedsstack depth. I think it is better not to do so. For example, reunioningevalDouble and evalInt into one evalulateExpr() which can returnbothdouble and int result would do so. The interface would besomething like the follwings or others. Some individualfunctionsmight be better to be split out. static ExprRetType evaluateExpr(TState,CState,PgBenchExpr, int *iret, double *dret); or typedef struct EvalRetVal { bool is_double, union val { int *ival; double *dval; } } EvalRetVal; staticbool evaluateExpr(..., EvalRetVal *ret); 2. You combined abs/sqrt/ddebug and then make a branch for them. It'd be better to split them even if eval*() looks to beduplicate. regards, > > Here is a v9 with a doc. The implicit typing of expressions is improved. > > > > I also added two debug functions which allow to show intermediate integer > > (idebug) or double (ddebug). > > > > \set i idebug(random(1, 10)) > > > > will print the random value and assign it to i. > > > > > > I updated the defaut scripts, which seems to speed up meta command > > evaluations. The initial version does less than 2 million evals per second: > > > > sh> cat before.sql > > \set naccounts 100000 * :scale > > \setrandom aid 1 :naccounts > > > > sh> ./pgbench -T 3 -P 1 -f before.sql > > [...] > > tps = 1899004.793098 (excluding connections establishing) > > > > > > The updated version does more than 3 million evals per second: > > > > sh> cat after.sql > > \set aid random(1, 100000 * :scale) > > > > sh> ./pgbench -T 3 -P 1 -f after.sql > > [...] > > tps = 3143168.813690 (excluding connections establishing) > > > > > > Suggestion: > > > > A possibility would be to remove altogether the \setrandom stuff as the > > functionality is available through \set, maybe with an error message to > > advise about using \set with one of the random functions. That would remove > > about 200 ugly locs... > > > > Another option would be to translate the setrandom stuff into a \set > > expression, that would maybe save 100 locs for the eval, but keep and expand > > a little the "manual" parser part. > > I have moved this patch to the next CF. > -- > Michael -- Kyotaro Horiguchi NTT Open Source Software Center
Hello Kyotaro-san, > 1.evalInt and evalDouble are mutually calling on single expr > node. Indeed. > It looks simple and seems to work but could easily broken > to fall into infinite call and finally (in a moment) exceeds > stack depth. The recursion is on the tree-structure of the expression, which is evaluated depth-first. As the tree must be finite, and there is no such thing as recursive functions in the expression tree, this should never happen. If it happened, it would be a bug in the eval functions, probably a forgotten "case", and could be fixed there easily. > I think it is better not to do so. For example, reunioning > evalDouble and evalInt into one evalulateExpr() which can return > both double and int result would do so. The interface would be > something like the follwings or others. Some individual > functions might be better to be split out. > > static ExprRetType evaluateExpr(TState,CState,PgBenchExpr, > int *iret, double *dret); That would not work as simply as that: this signature does not say whether a double or an int is expected, and without this information you would not know what to do on some operators (the typing is explicit and descendant in the current implementation). Even if you add this information, or you use the returned type information to do the typing dynamically, that would mean testing the result types and implementing the necessary casts depending on this information for every function within the generic eval, which would result in repetitive and ugly code for every function and operator: for instance for '+', you would have to implement 4 cases explicitely, i+i, f+i, i+f, f+f as "int add", "cast left and double add", "cast right and double add", and finally "double add". Then you have other operators to consider... Although some sharing can be done, this would end in lengthy & ugly code. A possible alternative could be to explicitely and statically type all operations, adding the necessary casts explicitely, distinguishing operators, but the would mean more non-obvious code to "compile" the parsed expressions, I'm not sure that pgbench deserves this. The two eval functions and the descendant typing allow to hide/ignore these issues because each eval function handles just one type, type boundaries are explicitely handled by calling the other function, so there is no reason to test and cast everything all the time. So I thing that it would not be an improvement to try to go the way you suggest. > 2. You combined abs/sqrt/ddebug and then make a branch for > them. It'd be better to split them even if eval*() looks to be > duplicate. Hm, why not. Here is a v10 where I did that when it did not add too much code repetition (eg I kept the shared branches for MIN & MAX and for the 3 RANDOM functions so as to avoid large copy pastes). -- Fabien.
> [...] > Here is a v10 where I did that when it did not add too much code repetition > (eg I kept the shared branches for MIN & MAX and for the 3 RANDOM functions > so as to avoid large copy pastes). Ooops, forgotten attachement. -- Fabien.
Hello, At Tue, 15 Sep 2015 19:26:51 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.10.1509151844080.6503@sto> > > 1.evalInt and evalDouble are mutually calling on single expr > > node. > > Indeed. > > > It looks simple and seems to work but could easily broken > > to fall into infinite call and finally (in a moment) exceeds > > stack depth. > > The recursion is on the tree-structure of the expression, which is > evaluated depth-first. As the tree must be finite, and there is no > such thing as recursive functions in the expression tree, this should > never happen. If it happened, it would be a bug in the eval functions, > probably a forgotten "case", and could be fixed there easily. My description should have been obscure. Indeed the call tree is finite for *sane* expression node. But it makes infinit call for a value of expr->etype unknown by both evalDouble and evalInt. This is what I intended to express by the words 'easily broken'. AFAICS most of switches for nodes in the core would stops and emit the message like following, | default: | elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node)); So I think the following code would do well. | evalDoubld() | { | if (IS_INT_EXPR_NODE(expr->etype)) | { | if (!evalInt(...)) | return false; | *retval = (double) ival; | return true; | } | | switch (expr->etype) | { | case ENODE_DOUBLE_CONSTANT: | ... | default: | elog(ERROR, "unrecognized expr type: %d", (int) expr->etype); I suppose here's a big difference between SEGV after run through the stack bottom and server's return with ERROR for the same bug. But IS_INT_EXPR_NODE() would be redundant with the switch-case.. Do you think that I'm taking the difference too serious? However this is needelss discussion if the following change is acceptable for you. > > I think it is better not to do so. For example, reunioning > > evalDouble and evalInt into one evalulateExpr() which can return > > both double and int result would do so. The interface would be > > something like the follwings or others. Some individual > > functions might be better to be split out. > > > > static ExprRetType evaluateExpr(TState,CState,PgBenchExpr, > > int *iret, double *dret); > > That would not work as simply as that: this signature does not say > whether a double or an int is expected, and without this information > you would not know what to do on some operators (the typing is > explicit and descendant in the current implementation). > Even if you add this information, or you use the returned type > information to do the typing dynamically, that would mean testing the > result types and implementing the necessary casts depending on this > information for every function within the generic eval, which would > result in repetitive and ugly code for every function and operator: > for instance for '+', you would have to implement 4 cases explicitely, > i+i, f+i, i+f, f+f as "int add", "cast left and double add", "cast > right and double add", and finally "double add". Then you have other > operators to consider... Although some sharing can be done, this would > end in lengthy & ugly code. Yes, I also think it is ugly as I anticipated:(. By the way, the complexity comes from separating integer and double. If there is no serios reason to separate them, handling all values as double makes things far simpler. Could you let me know the reason why it strictly separates integer and double? | evaluateExpr(..., double *retval) | { | switch (expr->etype) | { | case ENODE_CONSTANT: | *retval = expr->u.constant.val; | return true; | case ENODE_OPERATOR | /* Nothing to worry about */ | case ENODE_FUNCTION: | { | switch (func) | { | case PGBENCH_ABS/SQRT/DEBUG: | /* Nothing to worry about, too */ | /* case PGBENCH_DOUBLE is useless ever after */ | case PGBENCH_INT: | double arg; | if (!evalulateExpr(..., &arg); | return false; | *retval = floor(arg); I don't see no problem in possible errors of floating point calculations for this purpose. Is there any? Anyway set meta command finally converts the result as integer regardless of the real value so the following conversion does the same as your current code. in doCustom() > else if (pg_strcasecmp(argv[0], "set") == 0) > { > char res[64]; > PgBenchExpr *expr = commands[st->state]->expr; > int64 result; > - if (!evalInt(thread, st, expr, &result)) + if (!evaluateExpr(thread, st, expr, &dresult)) + result = (int64)dresult; This should make exprparse.y simpler. > A possible alternative could be to explicitely and statically type all > operations, adding the necessary casts explicitely, distinguishing > operators, but the would mean more non-obvious code to "compile" the > parsed expressions, I'm not sure that pgbench deserves this. > The two eval functions and the descendant typing allow to hide/ignore > these issues because each eval function handles just one type, type > boundaries are explicitely handled by calling the other function, so > there is no reason to test and cast everything all the time. > > So I thing that it would not be an improvement to try to go the way > you suggest. > > 2. You combined abs/sqrt/ddebug and then make a branch for > > them. It'd be better to split them even if eval*() looks to be > > duplicate. > > Hm, why not. Thanks. > Here is a v10 where I did that when it did not add too much code > repetition (eg I kept the shared branches for MIN & MAX and for the 3 > RANDOM functions so as to avoid large copy pastes). regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello Kyotaro-san, > My description should have been obscure. Indeed the call tree is > finite for *sane* expression node. But it makes infinit call for > a value of expr->etype unknown by both evalDouble and > evalInt. Such issue would be detected if the function is actually tested, hopefully this should be the case... :-) However I agree that relying implicitely on the "default" case is not very good practice, so I updated the code in the attached v11 to fail explicitely on such errors. I also attached a small test script, which exercises most (all?) functions: ./pgbench -f functions.sql -t 1 > [...] > By the way, the complexity comes from separating integer and > double. If there is no serios reason to separate them, handling > all values as double makes things far simpler. Yep, but no. > Could you let me know the reason why it strictly separates integer and > double? I don't see no problem in possible errors of floating point > calculations for this purpose. Is there any? Indeed it would make things simpler, but it would break large integers as the int64 -> double -> int64 casts would result in approximations. The integer type is the important one here because it is used for primary keys, and you do not want a key to be approximated in any way, so the int64 type must be fully and exactly supported. -- Fabien.
On Thu, Sep 17, 2015 at 10:58 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > By the way, the complexity comes from separating integer and > double. If there is no serios reason to separate them, handling > all values as double makes things far simpler. -1. double is an inexact type, whereas integer is an exact type. The typical way to handle this sort of thing is to define a struct whose first member is a type field and whose second field is a union of all the types you need to care about. Then that gets passed around everywhere. This patch should be designed in such a way that if we eventually end up with functions that have 10 different return types instead of 2 different return types, we don't need to add 8 more parameters to any functions. Instead, those still return PgBench_Value (or whatever we call it) which is the aforementioned struct, but there are more options for what that can contain. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> -1. double is an inexact type, whereas integer is an exact type. Sure. I already argue on that very line. > The typical way to handle this sort of thing is to define a struct > whose first member is a type field and whose second field is a union > of all the types you need to care about. Yep. > Then that gets passed around everywhere. This patch should be designed > in such a way that if we eventually end up with functions that have 10 > different return types instead of 2 different return types, we don't > need to add 8 more parameters to any functions. Instead, those still > return PgBench_Value (or whatever we call it) which is the > aforementioned struct, but there are more options for what that can > contain. I just put the double type as a proof of concept, but for pgbench only integers really matters. What you suggest would work, but it would also result in ugly and lengthy code, as I argued in another mail, because you have to decide for overloaded operators and functions which actual typed operator must be called, and then perform the necessary type conversions depending on the actual type of the operands. The implicit descendent typing used in the patch hides this, and is more than enough for pgbench, IMO. If this is a blocker, I would rather remove the support for doubles than write verbose and inelegant code. -- Fabien.
On Fri, Sep 18, 2015 at 10:21 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
Hello Kyotaro-san,My description should have been obscure. Indeed the call tree is
finite for *sane* expression node. But it makes infinit call for
a value of expr->etype unknown by both evalDouble and
evalInt.
Such issue would be detected if the function is actually tested, hopefully this should be the case... :-)
However I agree that relying implicitely on the "default" case is not very good practice, so I updated the code in the attached v11 to fail explicitely on such errors.
I also attached a small test script, which exercises most (all?) functions:
./pgbench -f functions.sql -t 1
A short review from me:
1. Patch applies cleanly on current HEAD.
2. It compiles without errors or warnings.
3. The attached test case can be executed w/o symptoms of any problem and it produces meaningful results.
Should we not allow for functions taking 0 arguments? Since we're already into some math here, how about pi()? ;-)
I understand requiring at least 1 arg simplifies the code a bit, but right now it reports syntax error for "random()", while it correctly reports unexpected number of arguments for "random(1,2,3)". We would need another check for min() and max() which expect >=1 arguments, but it's easy to add.
I would also argue that we should rename "random" to "rand" here to avoid confusion with the familiar SQL function "random()" that doesn't take arguments.
--
Alex
Here is a v12 which implements the suggestions below. > Should we not allow for functions taking 0 arguments? Since we're already > into some math here, how about pi()? ;-) Hmmm, why not. > I understand requiring at least 1 arg simplifies the code a bit, but right > now it reports syntax error for "random()", while it correctly reports > unexpected number of arguments for "random(1,2,3)". We would need another > check for min() and max() which expect >=1 arguments, but it's easy to add. Indeed, I had to add a special check. > I would also argue that we should rename "random" to "rand" here to avoid > confusion with the familiar SQL function "random()" that doesn't take > arguments. Why not, as it is also consistent with exporand() & gaussrand(). -- Fabien.
On Fri, Oct 30, 2015 at 1:01 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Here is a v12 which implements the suggestions below. Sorry it's taken me so long to get around to looking at this. Some thoughts on an initial read-through: 1. I think there should really be two patches here, the first adding functions, and the second adding doubles. Those seem like separate changes. And offhand, the double stuff looks a lot less useful that the function call syntax. 2. ddebug and idebug seem like a lousy idea to me. If we really need to be able to print stuff from pgbench, which I kind of doubt, then maybe we should have a string data type and a print() function, or maybe a string data type and a toplevel \echo command. 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt() and evalDouble(). That doesn't seem similar to what I've seen in other expression-evaluation engines. Perhaps I could find out by reading the comments, but actually not, because this entire patch seems to add only one comment: + /* reset column count for this scan */ + yycol = 0; While I'm not a fan of excessive commenting, I think a little more explanation here would be good. 4. The table of functions in pgbench.sgml seems to leave something to be desired. We added a pretty detailed write-up on the Gaussian and exponential options to \setrandom, but exporand() has only this description: + <row> + <entry><literal><function>exporand(<replaceable>i</>, <replaceable>j</>, <replaceable>t</>)</></></> + <entry>integer</> + <entry>exponentially distributed random integer in the bounds, see below</> + <entry><literal>exporand(1, 10, 3.0)</></> + <entry>int between <literal>1</> and <literal>10</></> + </row> That's not very helpful. Without looking at the example, there's no way to guess what i and j mean, and even with looking at the example, there's no way to guess what t means. If, as I'm guessing, exporand() and guassrand() behave like \setrandom with the exponential and/or Gaussian options, then the documentation for one of those things should contain all of the detailed information and the documentation for the other should refer to it. More than likely, exporand() and gaussrand() should get the detailed explanation, and \setrandom should be document as a deprecated alternative to \set ... {gauss,expo,}rand(...) 5. I liked Heikki's proposed function names random_gaussian(min, max, threshold) and random_exponential(min, max, threshold) better than the ones you've picked here. I think random() would be OK instead of his suggestion of random_uniform(), though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello, sorry for the silence. At Fri, 18 Sep 2015 20:35:48 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.10.1509182019100.27223@sto> > > -1. double is an inexact type, whereas integer is an exact type. > > Sure. I already argue on that very line. Agreed. > > The typical way to handle this sort of thing is to define a struct > > whose first member is a type field and whose second field is a union > > of all the types you need to care about. > > Yep. > > > Then that gets passed around everywhere. This patch should be > > designed in such a way that if we eventually end up with functions > > that have 10 different return types instead of 2 different return > > types, we don't need to add 8 more parameters to any functions. > > Instead, those still return PgBench_Value (or whatever we call it) > > which is the aforementioned struct, but there are more options for > > what that can contain. > > I just put the double type as a proof of concept, but for pgbench only > integers really matters. > > What you suggest would work, but it would also result in ugly and > lengthy code, as I argued in another mail, because you have to decide > for overloaded operators and functions which actual typed operator > must be called, and then perform the necessary type conversions > depending on the actual type of the operands. The implicit descendent > typing used in the patch hides this, and is more than enough for > pgbench, IMO. I also agree this. > If this is a blocker, I would rather remove the support for doubles > than write verbose and inelegant code. I understood the situation and agreed for current shape of the code. I no longer object the calling-alternatively code. But I'd like see the abbreviated discussion in the comment on the function. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello Robert, > 1. I think there should really be two patches here, the first adding > functions, and the second adding doubles. Those seem like separate > changes. And offhand, the double stuff looks a lot less useful that > the function call syntax. I first submitted the infrastructure part, but I was asked to show how more functions could be included, especially random variants. As random gaussian/exponential functions require a double argument, there must be some support for double values. Now, it could certainly be split in two patches, but this is rather artificial, IMO. > 2. ddebug and idebug seem like a lousy idea to me. It was really useful to me for debugging and testing. > If we really need to be able to print stuff from pgbench, which I kind > of doubt, then maybe we should have a string data type and a print() > function, or maybe a string data type and a toplevel \echo command. The *debug functions allow to intercept the value computed within an expression. If you rely on variables and some echo (which does not exist) this means that there should be double variables as well, which is not currently the case, and which I do not see as useful for the kind of script written for pgbench. Adding the string type is more work, and I do not see a good use case for those. So the *debug functions are really just a lightweight solution for debugging type related issues in expressions. I can drop them if this is a blocker, but the are really useful for testing quickly a script. > 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt() > and evalDouble(). As explained above in the thread (I think), the reason is that having one overloaded expression evaluation which handles types conversion would produce pretty heavy code, and the two functions with the descending typing allows to have a much smaller code with the same effect. The issue is that with two types all functions must handle argument type conversion explicitely. For instance for "random_gaussian(int, int, double)", it may be called with any combination of 3 int/double arguments, each one must be tested and possibly converted to the target type before calling the actual function. For overloaded operators or functions (arithmetics, abs...) there is also the decision about which operator is called and then what conversions are necessary. With the descending typing and two functions cross recursion all these explicit tests and conversion disappear because the function evaluation calls evalInt or evalDouble depending on the expected types. Basically, the code is significantly shorter and elegant with this option. > That doesn't seem similar to what I've seen in other > expression-evaluation engines. Probably. This is because I choose a descending typing to simplify the implementation. Changing this would bring no real practical benefit from the usage point of view, but would add significant more verbose and ugly code to test and handle type conversions everywhere, so I'm not keen to do that. > Perhaps I could find out by reading the comments, but actually not, > because this entire patch seems to add only one comment: > > + /* reset column count for this scan */ There are a few others, really:-) > While I'm not a fan of excessive commenting, I think a little more > explanation here would be good. I can certainly add more comments to the code, especially around the eval cross recursion functions. > 4. The table of functions in pgbench.sgml seems to leave something to > be desired. We added a pretty detailed write-up on the Gaussian and > exponential options to \setrandom, but exporand() has only this > description: Yep. The idea was *not* to replicate the (painful) explanations about random functions, but that it should be shared between the function and the \set variants. > + <row> > + <entry><literal><function>exporand(<replaceable>i</>, > <replaceable>j</>, <replaceable>t</>)</></></> > + <entry>integer</> > + <entry>exponentially distributed random integer in the bounds, > see below</> > + <entry><literal>exporand(1, 10, 3.0)</></> > + <entry>int between <literal>1</> and <literal>10</></> > + </row> > > That's not very helpful. The table explanation must be kept short for the table format... > Without looking at the example, there's no way to guess what i and j > mean, and even with looking at the example, there's no way to guess what > t means. If, as I'm guessing, exporand() and guassrand() behave like > \setrandom with the exponential and/or Gaussian options, then the > documentation for one of those things should contain all of the detailed > information and the documentation for the other should refer to it. Indeed, that was the idea, but it seems that I forgot the pointer:-) > More than likely, exporand() and gaussrand() should get the detailed > explanation, and \setrandom should be document as a deprecated > alternative to \set ... {gauss,expo,}rand(...) Ok, the description can be moved to the function part and the \set version reference the other. > 5. I liked Heikki's proposed function names random_gaussian(min, max, > threshold) and random_exponential(min, max, threshold) better than the > ones you've picked here. I think random() would be OK instead of his > suggestion of random_uniform(), though. Ok. I'll submit an updated version of the patch, which addresses points 4 & 5 and documents 3, and wait for feedback on the explanations I gave before doing anything for about 1 & 2, as I think that the implied changes are not desirable. I'm not keen at all on changing the cross recursion implementation (3), this would just be pretty ugly code without actual user benefit. -- Fabien.
Hello Kyotaro-san, > [...] I understood the situation and agreed for current shape of the > code. I no longer object the calling-alternatively code. But I'd like > see the abbreviated discussion in the comment on the function. Indeed, this design choice deserves much better explanations. -- Fabien.
Fabien COELHO wrote: > For instance for "random_gaussian(int, int, double)", it may be called with > any combination of 3 int/double arguments, each one must be tested and > possibly converted to the target type before calling the actual function. > For overloaded operators or functions (arithmetics, abs...) there is also > the decision about which operator is called and then what conversions are > necessary. I think we should make this as simple as possible -- in particular I don't see a lot of value in overloading here. I'd rather have it throw an error and force you to write the double with a ".0" when the given value is an integer so that it is parsed as a double, rather than accepting ambiguous input. Conversely, if you pass a double to the integer options, raise an error. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Alvaro, >> For instance for "random_gaussian(int, int, double)", it may be called with >> any combination of 3 int/double arguments, each one must be tested and >> possibly converted to the target type before calling the actual function. >> For overloaded operators or functions (arithmetics, abs...) there is also >> the decision about which operator is called and then what conversions are >> necessary. > > I think we should make this as simple as possible -- in particular I > don't see a lot of value in overloading here. I'd rather have it throw > an error and force you to write the double with a ".0" when the given > value is an integer so that it is parsed as a double, rather than > accepting ambiguous input. Conversely, if you pass a double to the > integer options, raise an error. I agree that the overloading and other stuff is not a decisive and very useful feature of the patch, but I think that the descendant-typing two-functions recursive evaluation is a good compromise, as it works without much ado nor ambiguity, and from my point of view the code stays quite simple with this approach. So although the benefit (of having overloaded operators and handling two types expressions) is indeed in practice quite limited, the cost in the code is low. Being more restrictive or strict as you suggest would not remove much code, and would remove some convenience. Also, removing this feature would induce inconsistencies: integer arguments would allow general expressions, but only double constants would be ok for double arguments... So I would rather keep it as it is. -- Fabien.
Hello Robert, Here is a v13 and a small test script. - change names to random(), random_exponential() and random_gaussian() I find them too long, but if the committer wantthat I cannot help it:-) - more comments, especially about the expression evaluation & type system. - improved documentation, in particular to include suggestions by Tomas Vondra about clarifying explanations about thegaussian & exponential random generators, and clear references from \setrandom to the \set expressions. - still just one patch, because removing double would mean removing the 2 exponential & gaussian random functions whichrequire a double argument. Note that I started with one small patch for adding the infrastructure, but then Heikki requested more functions includingdouble type stuff to illustrate the point, then Robert asks to break it back, going forward and backward istiring... - still "lousy" *debug functions, because I found them useful for debugging and testing, really. It is easy to removethem, but I would advise against doing that as it would make debugging an expression much less straightforward. -- Fabien.
Hello again, The v14 also remove references to the "threshold" word about gaussian and exponential random generation in the source code (comments and variable names), as it has no clear meaning, to replace it with param/parameter depending on the context, as discussed in another thread started by Tomas Vondra. Sorry for the noise. > Here is a v13 and a small test script. > > - change names to random(), random_exponential() and random_gaussian() > I find them too long, but if the committer want that I cannot help > it:-) > > - more comments, especially about the expression evaluation & > type system. > > - improved documentation, in particular to include suggestions by Tomas > Vondra about clarifying explanations about the gaussian & > exponential random generators, and clear references from \setrandom > to the \set expressions. > > - still just one patch, because removing double would mean removing the 2 > exponential & gaussian random functions which require a double > argument. > > Note that I started with one small patch for adding the infrastructure, > but then Heikki requested more functions including double type stuff to > illustrate the point, then Robert asks to break it back, going forward > and backward is tiring... > > - still "lousy" *debug functions, because I found them useful for > debugging and testing, really. It is easy to remove them, but I would > advise against doing that as it would make debugging an expression > much less straightforward. -- Fabien.
On Wed, Nov 4, 2015 at 4:06 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> 1. I think there should really be two patches here, the first adding >> functions, and the second adding doubles. Those seem like separate >> changes. And offhand, the double stuff looks a lot less useful that >> the function call syntax. > > I first submitted the infrastructure part, but I was asked to show how more > functions could be included, especially random variants. As random > gaussian/exponential functions require a double argument, there must be some > support for double values. Ugh, OK. >> 2. ddebug and idebug seem like a lousy idea to me. > > It was really useful to me for debugging and testing. That doesn't mean it belongs in the final patch. >> 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt() >> and evalDouble(). > > Basically, the code is significantly shorter and elegant with this option. I find that pretty hard to swallow. If the backend took this approach, we've have a separate evaluate function for every datatype, which would make it completely impossible to support the creation of new datatypes. And I find this code to be quite difficult to follow. What I think we should have is a type like PgBenchValue that represents a value which may be an integer or a double or whatever else we want to support, but not an expression - specifically a value. Then when you invoke a function or an operator it gets passed one or more PgBenchValue objects and can do whatever it wants with those, storing the result into an output PgBenchValue. So add might look like this: void add(PgBenchValue *result, PgBenchValue *x, PgBenchValue *y) { if (x->type == PGBT_INTEGER && y->type == PGBT_INTEGER) { result->type = PGBT_INTEGER; result->u.ival =x->u.ival + y->u.ival; return; } if (x->type == PGBT_DOUBLE && y->type == PGBT_DOUBLE) { result->type= PGBT_DOUBLE; result->u.ival = x->u.dval + y->u.dval; return; } /* cross-type cases, if desired,go here */ /* if no case applies, somehow report an error */ } The way you have it, the logic for every function and operator exists in two copies: one in the integer function, and the other in the double function. As soon as we add another data type - strings, dates, whatever - we'd need to have cases for all of these things in those functions as well, and every time we add a function, we have to update all the case statements for every datatype's evaluation function. That's just not a scalable model. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, >>> 2. ddebug and idebug seem like a lousy idea to me. >> >> It was really useful to me for debugging and testing. > > That doesn't mean it belongs in the final patch. I think it is useful when debugging a script, not just for debugging the evaluation code itself. >>> 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt() >>> and evalDouble(). >> >> Basically, the code is significantly shorter and elegant with this option. > > I find that pretty hard to swallow. Hmmm, maybe with some more explanations? > If the backend took this approach, Sure, I would never suggest to do anything like that in the backend! > we've have a separate evaluate function for every datatype, which would > make it completely impossible to support the creation of new datatypes. In the backend implementation is generic about types (one can add a type dynamically in the system), which is another abstraction level, it does not compare in any way. > And I find this code to be quite difficult to follow. It is really the function *you* wrote, and there is just one version for int and one for double. > What I think we should have is a type like PgBenchValue that represents > a value which may be an integer or a double or whatever else we want to > support, but not an expression - specifically a value. Then when you > invoke a function or an operator it gets passed one or more PgBenchValue > objects and can do whatever it wants with those, storing the result into > an output PgBenchValue. So add might look like this: Sure, I do know what it looks like, and I want to avoid it, because this is just a lot of ugly code which is useless for pgbench purpose. > void > add(PgBenchValue *result, PgBenchValue *x, PgBenchValue *y) > { > if (x->type == PGBT_INTEGER && y->type == PGBT_INTEGER) > { > result->type = PGBT_INTEGER; > result->u.ival = x->u.ival + y->u.ival; > return; > } > if (x->type == PGBT_DOUBLE && y->type == PGBT_DOUBLE) > { > result->type = PGBT_DOUBLE; > result->u.ival = x->u.dval + y->u.dval; > return; > } > /* cross-type cases, if desired, go here */ Why reject 1 + 2.0 ? So the cross-type cases are really required for user sanity, which adds: if (x->type == PGBT_DOUBLE && y->type == PGBT_INTEGER) { result->type = PGBT_DOUBLE; result->u.ival= x->u.dval + y->u.ival; return; } if (x->type == PGBT_INTEGER && y->type == PGBT_DOUBLE) { result->type = PGBT_DOUBLE; result->u.ival = x->u.ival + y->u.dval; return; } > } For the '+' overload operator with conversions there are 4 cases (2 arguments ** 2 types) to handle. For all 5 binary operators (+ - * / %). that makes 20 cases to handle. Then for every function, you have to deal with type conversion as well, each function times number of arguments ** number of types. That is 2 cases for abs, 4 cases for random, 8 cases for each random_exp*, 8 for random_gaus*, and so on. Some thinking would be required for n-ary functions (min & max). Basically, it can be done, no technical issue, it is just a matter of writing a *lot* of repetitive code, hundreds of lines of them. As I think it does not bring any value for pgbench purpose, I used the other approach which reduces the code size by avoiding the combinatorial "cross-type" conversions. > The way you have it, the logic for every function and operator exists > in two copies: one in the integer function, and the other in the > double function. Yep, but only 2 cases to handle, instead of 4 cases in the above example, that is the point. > As soon as we add another data type - strings, dates, whatever - we'd > need to have cases for all of these things in those functions as well, > and every time we add a function, we have to update all the case > statements for every datatype's evaluation function. That's just not a > scalable model. On the contrary, it is reasonably scalable: With the eval-per-type for every added type one should implement one eval function which handles all functions and operators, each eval function roughly the same size as the current ones. With the above approach, the overloaded add function which handles 2 operands with 3 types ('post' + 'gres' -> 'postgres') would have to deal with 2**3 = 8 types combinations instead of 4, so basically it would be doubling the code size. If you add dates on top of that, 2**4 = 16 cases just for one operator. No difficulty there, just a lot of lines... Basically the code size complexity of the above approach is: #functions * (#args ** #types) while for the approach in the submitted patch it is: #types * #functions Now I obviously agree that the approach is not generic and should not be used in other contexts, it is just that it is simple/short and it fits the bill for pgbench. Note that I do not really envision adding more types for pgbench scripts. The double type is just a side effect of the non uniform randoms. What I plan is to add a few functions, especially a pseudo-random permutation and/or a parametric hash, that would allow running more realistic tests with non uniform distributions. -- Fabien.
On Thu, Nov 5, 2015 at 3:33 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > For the '+' overload operator with conversions there are 4 cases (2 > arguments ** 2 types) to handle. For all 5 binary operators (+ - * / %). > that makes 20 cases to handle. Then for every function, you have to deal > with type conversion as well, each function times number of arguments ** > number of types. That is 2 cases for abs, 4 cases for random, 8 cases for > each random_exp*, 8 for random_gaus*, and so on. Some thinking would be > required for n-ary functions (min & max). > > Basically, it can be done, no technical issue, it is just a matter of > writing a *lot* of repetitive code, hundreds of lines of them. As I think it > does not bring any value for pgbench purpose, I used the other approach > which reduces the code size by avoiding the combinatorial "cross-type" > conversions. Those can be avoided in other ways. For example: if (x->type == PGBT_INTEGER && y->type == PGBT_INTEGER) { result->type = PGBT_INTEGER; result->u.ival = x->u.ival+ y->u.ival; } else { result->type = PGBT_DOUBLE; result->u.ival = coerceToDouble(x) + coerceToDouble(y); } coerceToDouble can re-used for every arithmetic operator and can throw an error if the input type is not coercible. Yet, we still return an exact integer answer when possible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Those can be avoided in other ways. For example: Ok, ok, I surrender:-) Here is a v15 which hides conversions and assignment details in macros and factors out type testing of overloaded operators so that the code expansion is minimal (basically the operator evaluation is duplicated for int & double, but the rest is written once). The evaluation cost is probably slightly higher than the previous version because of the many hidden type tests. Note that variables are only int stored as text. Another patch may try to propagate the value structure for variables, but then it changes the query expansion code, it is more or less orthogonal to add functions. Moreover double variables would not be really useful anyway. -- Fabien.
On Fri, Nov 6, 2015 at 5:00 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> Those can be avoided in other ways. For example: > > Ok, ok, I surrender:-) > > Here is a v15 which hides conversions and assignment details in macros and > factors out type testing of overloaded operators so that the code expansion > is minimal (basically the operator evaluation is duplicated for int & > double, but the rest is written once). The evaluation cost is probably > slightly higher than the previous version because of the many hidden type > tests. > > Note that variables are only int stored as text. Another patch may try to > propagate the value structure for variables, but then it changes the query > expansion code, it is more or less orthogonal to add functions. Moreover > double variables would not be really useful anyway. OK, comments on this version: 1. It's really not appropriate to fold the documentation changes raised on the other thread into this patch. I'm not going to commit something where the commit message is a laundry list of unrelated changes. Please separate out the documentation changes as a separate patch. Let's do that first, and then make this patch apply on top of those changes. The related changes in getGaussianRand etc. should also be part of that patch, not this one. 2. Please reduce the churn in the pgbench output example. Most of the lines that you've changed don't actually need to be changed. 3. I think we should not have ENODE_INTEGER_CONSTANT and ENODE_DOUBLE_CONSTANT. We should just have ENODE_CONSTANT, and it should store the same datatype we use to represent values in the evaluator. 4. The way you've defined the value type is not great. int64_t isn't used anywhere in our source base. Don't start here. I think the is_none, is_int, is_double naming is significantly inferior to what I suggested, and unlike what we do through the rest of our code. Similarly, the coercion functions are not very descriptive named, and I don't see why we'd want those to be macros rather than static functions. 5. The declaration of PgBenchExprList has a cuddled opening brace, which is not PostgreSQL style. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > 1. It's really not appropriate to fold the documentation changes > raised on the other thread into this patch. I'm not going to commit > something where the commit message is a laundry list of unrelated > changes. Please separate out the documentation changes as a separate > patch. Let's do that first, and then make this patch apply on top of > those changes. The related changes in getGaussianRand etc. should > also be part of that patch, not this one. Hmmm. Attached is a two-part v16. > 2. Please reduce the churn in the pgbench output example. Most of the > lines that you've changed don't actually need to be changed. I did a real run to get consistant figures, especially as now more informations are shown. I did some computations to try to generate something consistent without changing too many lines, but just taking a real output would make more sense. > 3. I think we should not have ENODE_INTEGER_CONSTANT and > ENODE_DOUBLE_CONSTANT. We should just have ENODE_CONSTANT, and it > should store the same datatype we use to represent values in the > evaluator. Why not. This induces a two level tag structure, which I do not find that great, but it simplifies the evaluator code to have one less case. > 4. The way you've defined the value type is not great. int64_t isn't > used anywhere in our source base. Don't start here. Indeed. I really meant "int64" which is used elsewhere in pgbench. > I think the is_none, is_int, is_double naming is significantly inferior > to what I suggested, and unlike what we do through the rest of our code. Hmmm. I think that it is rather a matter of taste, and PGBT_DOUBLE does not strike me as intrinsically superior, although possibly more in style. I changed value_t and value_type_t to PgBenchValue and ~Type to blend in, even if I find it on the heavy side. > Similarly, the coercion functions are not very descriptive named, I do not understand. "INT(value)" and "DOUBLE(value) both look pretty explicit to me... The point a choosing short names is to avoid newlines to stay (or try to stay) under 80 columns, especially as pg indentations rules tend to move things quickly on the right in case constructs, which are used in the evaluation function. I use "explicitely named" functions, but used short macros in the evaluator. > and I don't see why we'd want those to be macros rather than static > functions. Can be functions, sure. Switched. > 5. The declaration of PgBenchExprList has a cuddled opening brace, > which is not PostgreSQL style. Indeed. There were other instances. -- Fabien.
On Fri, Nov 6, 2015 at 3:44 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> 1. It's really not appropriate to fold the documentation changes >> raised on the other thread into this patch. I'm not going to commit >> something where the commit message is a laundry list of unrelated >> changes. Please separate out the documentation changes as a separate >> patch. Let's do that first, and then make this patch apply on top of >> those changes. The related changes in getGaussianRand etc. should >> also be part of that patch, not this one. > > Hmmm. Attached is a two-part v16. Thanks. Part 1 looks, on the whole, fine to me, although I think the changes to use less whitespace and removing decimal places in the documentation are going in the wrong direction. That is: - About 67% of values are drawn from the middle <literal>1.0 / threshold</> + About 67% of values are drawn from the middle <literal>1/param</>, I would say 1.0 / param, just as we used to say 1.0 / threshold. Any reason why not? That's easier to read IMHO and makes it more clear that it's integer division. I'm copying Tomas Vondra on this email since he was the one who kicked off the other thread where this was previously being discussed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>>> 1. It's really not appropriate to fold the documentation changes >>> raised on the other thread into this patch. I'm not going to commit >>> something where the commit message is a laundry list of unrelated >>> changes. Please separate out the documentation changes as a separate >>> patch. Let's do that first, and then make this patch apply on top of >>> those changes. The related changes in getGaussianRand etc. should >>> also be part of that patch, not this one. >> >> Hmmm. Attached is a two-part v16. > > Thanks. Part 1 looks, on the whole, fine to me, although I think the > changes to use less whitespace and removing decimal places in the > documentation are going in the wrong direction. That is: > > - About 67% of values are drawn from the middle <literal>1.0 / threshold</> > + About 67% of values are drawn from the middle <literal>1/param</>, > > I would say 1.0 / param, just as we used to say 1.0 / threshold. Any > reason why not? For the 1.0 -> 1, this because in the example afterwards I set param to 2.0 and I wanted it clear where the one half was coming from, and ISTM that the 2.0 stands out more with "1 / 2.0" than with "1.0 / 2.0". For the spaces, this is because with just "1/" the space seemed less necessary for clarity, but it seemed necessary with "1.0 /" Now it is easy to backtrack. > That's easier to read IMHO and makes it more clear that it's integer > division. It is not. One benefit of "1.0" makes it clear that this is a double division. > I'm copying Tomas Vondra on this email since he was the one who kicked > off the other thread where this was previously being discussed. Fine. -- Fabien.
>> Thanks. Part 1 looks, on the whole, fine to me, although I think the >> changes to use less whitespace and removing decimal places in the >> documentation are going in the wrong direction. That is: >> >> - About 67% of values are drawn from the middle <literal>1.0 / >> threshold</> >> + About 67% of values are drawn from the middle <literal>1/param</>, >> >> I would say 1.0 / param, just as we used to say 1.0 / threshold. Any >> reason why not? > > For the 1.0 -> 1, this because in the example afterwards I set param to 2.0 > and I wanted it clear where the one half was coming from, and ISTM that the > 2.0 stands out more with "1 / 2.0" than with "1.0 / 2.0". > > For the spaces, this is because with just "1/" the space seemed less > necessary for clarity, but it seemed necessary with "1.0 /" > > Now it is easy to backtrack. After looking at the generated html version, I find that the "1/param" and "2/param" formula are very simple and pretty easy to read, and they would not be really enhanced with additional spacing. ISTM that adaptative spacing (no spacing for level 1 operations, some for higher level) is a good approach for readability, ie: f(i) - f(i+1) ^ no spacing here ^ some spacing here So I would suggest to keep the submitted version, unless this is a blocker. -- Fabien.
On Sat, Nov 7, 2015 at 2:45 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > After looking at the generated html version, I find that the "1/param" and > "2/param" formula are very simple and pretty easy to read, and they would > not be really enhanced with additional spacing. > > ISTM that adaptative spacing (no spacing for level 1 operations, some for > higher level) is a good approach for readability, ie: > > f(i) - f(i+1) > ^ no spacing here > ^ some spacing here > > So I would suggest to keep the submitted version, unless this is a blocker. Well, I think with the ".0" version it looks more like floating-point math, and I like the extra white-space. But I'm happy to hear other opinions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 9, 2015 at 6:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Nov 7, 2015 at 2:45 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> After looking at the generated html version, I find that the "1/param" and >> "2/param" formula are very simple and pretty easy to read, and they would >> not be really enhanced with additional spacing. >> >> ISTM that adaptative spacing (no spacing for level 1 operations, some for >> higher level) is a good approach for readability, ie: >> >> f(i) - f(i+1) >> ^ no spacing here >> ^ some spacing here >> >> So I would suggest to keep the submitted version, unless this is a blocker. > > Well, I think with the ".0" version it looks more like floating-point > math, and I like the extra white-space. But I'm happy to hear other > opinions. - defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</> + defined as <literal>(max+min)/2</>, with This thing reminds me a bit of the little of TeX I know, when writing things like "\sqrt{1-e^2}" spaces would be printed in the converted html, and as that's floating arythmetic, we should have as well a .0. So I would agree on both points with Robert. I have looked for now at the first patch and finished with the attached while looking at it. Perhaps a committer could look already at that? I am still looking at the 2nd patch in more details... -- Michael
Attachment
On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Nov 9, 2015 at 6:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, Nov 7, 2015 at 2:45 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >>> After looking at the generated html version, I find that the "1/param" and >>> "2/param" formula are very simple and pretty easy to read, and they would >>> not be really enhanced with additional spacing. >>> >>> ISTM that adaptative spacing (no spacing for level 1 operations, some for >>> higher level) is a good approach for readability, ie: >>> >>> f(i) - f(i+1) >>> ^ no spacing here >>> ^ some spacing here >>> >>> So I would suggest to keep the submitted version, unless this is a blocker. >> >> Well, I think with the ".0" version it looks more like floating-point >> math, and I like the extra white-space. But I'm happy to hear other >> opinions. > > - defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</> > + defined as <literal>(max+min)/2</>, with > This thing reminds me a bit of the little of TeX I know, when writing > things like "\sqrt{1-e^2}" spaces would be printed in the converted > html, and as that's floating arythmetic, we should have as well a .0. > So I would agree on both points with Robert. > > I have looked for now at the first patch and finished with the > attached while looking at it. Perhaps a committer could look already > at that? It looks fine to me except that I think we should spell out "param" as "parameter" throughout, instead of abbreviating. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote: >> I have looked for now at the first patch and finished with the >> attached while looking at it. Perhaps a committer could look already >> at that? > > It looks fine to me except that I think we should spell out "param" as > "parameter" throughout, instead of abbreviating. Fine for me. I have updated the first patch as attached (still looking at the second). -- Michael
Attachment
> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> It looks fine to me except that I think we should spell out "param" as >> "parameter" throughout, instead of abbreviating. > > Fine for me. I have updated the first patch as attached (still looking > at the second). This doc update threshold -> parameter & sgml and C code looks ok to me. -- Fabien.
On Wed, Dec 16, 2015 at 3:07 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> >> wrote: >>> >>> It looks fine to me except that I think we should spell out "param" as >>> "parameter" throughout, instead of abbreviating. >> >> >> Fine for me. I have updated the first patch as attached (still looking >> at the second). > > > This doc update threshold -> parameter & sgml and C code looks ok to me. So I am having a look at the second patch... + double constants such as <literal>3.14156</>, You meant perhaps 3.14159 :) + <entry><literal><function>max(<replaceable>i</>, <replaceable>...</>)</></></> + <entry>integer</> Such function declarations are better with optional arguments listed within brackets. + <row> + <entry><literal><function>double(<replaceable>i</>)</></></> + <entry>double</> + <entry>cast to double</> + <entry><literal>double(5432)</></> + <entry><literal>5432.0</></> + </row> + <row> + <entry><literal><function>int(<replaceable>x</>)</></></> + <entry>integer</> + <entry>cast to int</> + <entry><literal>int(5.4 + 3.8)</></> + <entry><literal>9</></> + </row> If there are cast functions, why doesn't this patch introduces the concept of the data type for a variable defined in a script? Both concepts are linked, and for now as I can see the allocation of a \set variable is actually always an integer. In consequence, sqrt behavior is a bit surprising, for example this script: \set aid sqrt(2.0) SELECT :aid; returns that: SELECT 1; Shouldn't a user expect 1.414 instead? Fabien, am I missing a trick? It seems to me that this patch would gain in clarity by focusing a bit more on the infrastructure first and remove a couple of things that are not that mandatory for now... So the following things are not necessary as of now: - cast functions from/to int/double, because a result variable of a \set does not include the idea that a result variable can be something else than an integer. At least no options is given to the user to be able to make a direct use of a double value. - functions that return a double number: sqrt, pi - min and max have value because they allow a user to specify the expression once as a variable instead of writing it perhaps multiple times in SQL, still is it enough to justify having them as a set of functions available by default? I am not really sure either. Really, I like this patch, and I think that it is great to be able to use a set of functions and methods within a pgbench script because this clearly can bring more clarity for a developer, still it seems that this patch is trying to do too many things at the same time: 1) Add infrastructure to support function calls and refactor the existing functions to use it. This is the FUNCTION stuff in the expression scanner. 2) Add the concept of double return type, this could be an extension of \set with a data type, or a new command like \setdouble. This should include as well the casting routines I guess. This is the DOUBLE stuff in the expression scanner. 3) Introduce new functions, as needed. 1) and 2) seem worthwhile to me. 3) may depend on the use cases we'd like to have.. sqrt has for example value if a variable can be set double as a double type. In conclusion, for this CF the patch doing the documentation fixes is "Ready for committer", the second patch introducing the function infrastructure should focus more on its core structure and should be marked as "Returned with feedback". Opinions are welcome. Regards, -- Michael
Michael Paquier wrote: > On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > >> I have looked for now at the first patch and finished with the > >> attached while looking at it. Perhaps a committer could look already > >> at that? > > > > It looks fine to me except that I think we should spell out "param" as > > "parameter" throughout, instead of abbreviating. > > Fine for me. I have updated the first patch as attached (still looking > at the second). hm, so this is to backpatch, not merely for master, yes? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Dec 17, 2015 at 1:42 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > In conclusion, for this CF the patch doing the documentation fixes is > "Ready for committer", the second patch introducing the function > infrastructure should focus more on its core structure and should be > marked as "Returned with feedback". Opinions are welcome. I forgot to attach the second patch that I rebased using the 1st one modified with the documentation changes. -- Michael
Attachment
On Thu, Dec 17, 2015 at 1:54 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: >> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> > On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote: >> >> I have looked for now at the first patch and finished with the >> >> attached while looking at it. Perhaps a committer could look already >> >> at that? >> > >> > It looks fine to me except that I think we should spell out "param" as >> > "parameter" throughout, instead of abbreviating. >> >> Fine for me. I have updated the first patch as attached (still looking >> at the second). > > hm, so this is to backpatch, not merely for master, yes? I haven't thought about that as it is a cosmetic patch.. But yes that's harmless to backpatch to 9.5, and it would actually be good to get a consistent code base with master I guess. -- Michael
Hello Michael, Thanks for your remarks. > + double constants such as <literal>3.14156</>, > You meant perhaps 3.14159 :) Indeed! > + <entry><literal><function>max(<replaceable>i</>, > <replaceable>...</>)</></></> > + <entry>integer</> > Such function declarations are better with optional arguments listed > within brackets. Why not. I did it that way because this is the standard C syntax for varargs. > + <row> > + <entry><literal><function>double(<replaceable>i</>)</></></> > + <entry>double</> > + <entry>cast to double</> > + <entry><literal>double(5432)</></> > + <entry><literal>5432.0</></> > + </row> > + <row> > + <entry><literal><function>int(<replaceable>x</>)</></></> > + <entry>integer</> > + <entry>cast to int</> > + <entry><literal>int(5.4 + 3.8)</></> > + <entry><literal>9</></> > + </row> > If there are cast functions, why doesn't this patch introduces the > concept of the data type for a variable defined in a script? Because this would be a pain and this is really useless as far as pgbench scripts are concerned, it really just needs integers. > Both concepts are linked, and for now as I can see the allocation of a > \set variable is actually always an integer. Yes. > In consequence, sqrt behavior is a bit surprising, for example this script: > \set aid sqrt(2.0) > SELECT :aid; > returns that: > SELECT 1; > Shouldn't a user expect 1.414 instead? Fabien, am I missing a trick? There is no trick. There are only integer variables in pgbench, so 1.4 is rounded to 1 by the assignment. > It seems to me that this patch would gain in clarity by focusing a bit > more on the infrastructure first and remove a couple of things that > are not that mandatory for now... This is more or less what I did in the beginning, and then someone said "please show a complete infra with various examples to show that the infra is really extensible". Now you say the reverse. This is *tiring* and is not a good use of the little time I can give. > So the following things are not necessary as of now: > - cast functions from/to int/double, because a result variable of a > \set does not include the idea that a result variable can be something > else than an integer. At least no options is given to the user to be > able to make a direct use of a double value. > - functions that return a double number: sqrt, pi > - min and max have value because they allow a user to specify the > expression once as a variable instead of writing it perhaps multiple > times in SQL, still is it enough to justify having them as a set of > functions available by default? I am not really sure either. AFAICR this is because I was *ASKED* to show an infra which would deal with various cases: varargs, double/int functions/operators, overloading, so I added some example in each category, hence the current list of functions in the patch. > Really, I like this patch, and I think that it is great to be able to > use a set of functions and methods within a pgbench script because > this clearly can bring more clarity for a developer, still it seems > that this patch is trying to do too many things at the same time: > 1) Add infrastructure to support function calls and refactor the > existing functions to use it. This is the FUNCTION stuff in the > expression scanner. > 2) Add the concept of double return type, this could be an extension > of \set with a data type, or a new command like \setdouble. This > should include as well the casting routines I guess. This is the > DOUBLE stuff in the expression scanner. > 3) Introduce new functions, as needed. Yep. I started with (1), and was *ASKED* to do the others. Adding double variables in pretty useless in my opinion, and potentially lead to issues in various places, so I'm not in a hurry to do that. > 1) and 2) seem worthwhile to me. 3) may depend on the use cases we'd > like to have.. sqrt has for example value if a variable can be set > double as a double type. Sure. This is just an example of a double function so that if someone wants to add another one they can just update where it make sense. Maybe log/exp would make more sense for pgbench. > In conclusion, for this CF the patch doing the documentation fixes is > "Ready for committer", the second patch introducing the function > infrastructure should focus more on its core structure and should be > marked as "Returned with feedback". Opinions are welcome. My opinion is that to do and unddo work because of random thoughts on the list is tiring. What I can do is: (1) fix the doc and bugs if any, obviously. (2a) remove double stuff, just keep integer functions. I would rather keep min/max, though. (2b) keep the patch with both int & double as is. What I will *NOT* do is to add double variables without a convincing use case. -- Fabien.
On Thu, Dec 17, 2015 at 6:02 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> Really, I like this patch, and I think that it is great to be able to >> use a set of functions and methods within a pgbench script because >> this clearly can bring more clarity for a developer, still it seems >> that this patch is trying to do too many things at the same time: >> 1) Add infrastructure to support function calls and refactor the >> existing functions to use it. This is the FUNCTION stuff in the >> expression scanner. >> 2) Add the concept of double return type, this could be an extension >> of \set with a data type, or a new command like \setdouble. This >> should include as well the casting routines I guess. This is the >> DOUBLE stuff in the expression scanner. >> 3) Introduce new functions, as needed. > > > Yep. I started with (1), and was *ASKED* to do the others. That's the beginning of the thread... N developers have (N+1) opinions as a wise guy watching this mailing list said once. >> 1) and 2) seem worthwhile to me. 3) may depend on the use cases we'd >> like to have.. sqrt has for example value if a variable can be set >> double as a double type. > > > Sure. This is just an example of a double function so that if someone wants > to add another one they can just update where it make sense. Maybe log/exp > would make more sense for pgbench. Perhaps.. >> In conclusion, for this CF the patch doing the documentation fixes is >> "Ready for committer", the second patch introducing the function >> infrastructure should focus more on its core structure and should be >> marked as "Returned with feedback". Opinions are welcome. > > > My opinion is that to do and undo work because of random thoughts on the > list is tiring. > > What I can do is: > > (1) fix the doc and bugs if any, obviously. Thanks. > (2a) remove double stuff, just keep integer functions. > I would rather keep min/max, though. (2a) sounds like a fine plan to get something committable. We could keep min/max/abs, and remove sqrt/pi. What's actually the use case for debug? I cannot wrap my mind on one? > (2b) keep the patch with both int & double as is. Functions returning double are not that useful... pgbench can live without them. > What I will *NOT* do is to add double variables without a convincing use > case. I am not actually meaning that you should do it. I scrutinized the thread and this patch and thought that there could be perhaps be some use cases for double return values after seeing the cast functions. It is not my attention to piss you off, I like this patch a lot, and you have already proved that if someone wanted to get those additional features, well the infrastructure being put in place would allow doing that :) If the second patch gets in a simplified shape I'll look at it again then let's move on with a committer's opinion. Regards, -- Michael
>> (2a) remove double stuff, just keep integer functions. >> I would rather keep min/max, though. > > (2a) sounds like a fine plan to get something committable. We could keep > min/max/abs, and remove sqrt/pi. What's actually the use case for debug? > I cannot wrap my mind on one? It was definitely useful to debug the double/int type stuff within expressions when writing a non trivial pgbench script. It is probably less interesting if there are only integers. >> (2b) keep the patch with both int & double as is. > > Functions returning double are not that useful... pgbench can live > without them. Sure. -- Fabien.
On Thu, Dec 17, 2015 at 10:33 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >>> (2a) remove double stuff, just keep integer functions. >>> I would rather keep min/max, though. >> >> >> (2a) sounds like a fine plan to get something committable. We could keep >> min/max/abs, and remove sqrt/pi. What's actually the use case for debug? I >> cannot wrap my mind on one? > > > It was definitely useful to debug the double/int type stuff within > expressions when writing a non trivial pgbench script. It is probably less > interesting if there are only integers. OK for me. Let's leave without it as well if you think it is not that useful. It could still be introduced later on. -- Michael
On Fri, Dec 18, 2015 at 9:08 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Dec 17, 2015 at 10:33 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> >>>> (2a) remove double stuff, just keep integer functions. >>>> I would rather keep min/max, though. >>> >>> >>> (2a) sounds like a fine plan to get something committable. We could keep >>> min/max/abs, and remove sqrt/pi. What's actually the use case for debug? I >>> cannot wrap my mind on one? >> >> >> It was definitely useful to debug the double/int type stuff within >> expressions when writing a non trivial pgbench script. It is probably less >> interesting if there are only integers. > > OK for me. Let's leave without it as well if you think it is not that > useful. It could still be introduced later on. s/leave/live. This morning is a bit hard... -- Michael
On Wed, Dec 16, 2015 at 12:54 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote: >>> I have looked for now at the first patch and finished with the >>> attached while looking at it. Perhaps a committer could look already >>> at that? >> >> It looks fine to me except that I think we should spell out "param" as >> "parameter" throughout, instead of abbreviating. > > Fine for me. I have updated the first patch as attached (still looking > at the second). Committed. I was on the fence about whether to back-patch this considering how close we are to release, but ended up deciding to go for it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Michael, > It was definitely useful to debug the double/int type stuff within > expressions when writing a non trivial pgbench script. It is probably less > interesting if there are only integers. After looking again at the code, I remembered why double are useful: there are needed for random exponential & gaussian because the last parameter is a double. I do not care about the sqrt, but double must be allowed to keep that, and the randoms are definitely useful for a pgbench script. Now the patch may just keep double constants, but it would look awkward, and the doc must explain why 1.3 and 1+2 are okay, but not 1.3 + 2.4. So I'm less keen at removing double expressions, because it removes a key feature. If it is a blocker I'll go for just the constant, but this looks to me like a stupid compromise. -- Fabien.
On Sat, Dec 19, 2015 at 6:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> It was definitely useful to debug the double/int type stuff within >> expressions when writing a non trivial pgbench script. It is probably less >> interesting if there are only integers. > > > After looking again at the code, I remembered why double are useful: there > are needed for random exponential & gaussian because the last parameter is a > double. > > I do not care about the sqrt, but double must be allowed to keep that, and > the randoms are definitely useful for a pgbench script. Now the patch may > just keep double constants, but it would look awkward, and the doc must > explain why 1.3 and 1+2 are okay, but not 1.3 + 2.4. > > So I'm less keen at removing double expressions, because it removes a key > feature. If it is a blocker I'll go for just the constant, but this looks to > me like a stupid compromise. Hm, say that you do that in a script: \set aid double(1.4) \set bid random_gaussian(1, 10, :aid) Then what is passed as third argument in random_gaussian is 1, and not 1.4, no? If all allocations within a variable are unconditionally integers, why is it useful to make the cast function double() user-visible? Now, by looking at the code, I agree that you would need to keep things like DOUBLE and coerceToDouble(), PGBENCH_RANDOM_GAUSSIAN and its other friend are directly using it. I am just doubting that it is actually necessary to make that visible at user-level if they have no direct use.. -- Michael
>> After looking again at the code, I remembered why double are useful: there >> are needed for random exponential & gaussian because the last parameter is a >> double. >> >> I do not care about the sqrt, but double must be allowed to keep that, and >> the randoms are definitely useful for a pgbench script. Now the patch may >> just keep double constants, but it would look awkward, and the doc must >> explain why 1.3 and 1+2 are okay, but not 1.3 + 2.4. >> >> So I'm less keen at removing double expressions, because it removes a key >> feature. If it is a blocker I'll go for just the constant, but this looks to >> me like a stupid compromise. > > Hm, say that you do that in a script: \set aid double(1.4) \set bid > random_gaussian(1, 10, :aid) Then what is passed as third argument in > random_gaussian is 1, and not 1.4, no? Indeed. Maybe pgbench should just generate an error when a variable is assigned a double, so that the user must explicitly add an int() cast. > If all allocations within a variable are unconditionally integers, why > is it useful to make the cast function double() user-visible? I'm not sure whether we are talking about the same thing: - there a "double" type managed within expressions, but not variables- there is a double() function, which takes an int and casts to double I understood that you were suggesting to remove all "double" expressions, but now it seems to be just about the double() function. > Now, by looking at the code, I agree that you would need > to keep things like DOUBLE and coerceToDouble(), > PGBENCH_RANDOM_GAUSSIAN and its other friend are directly using it. Yep. > I am just doubting that it is actually necessary to make that visible at > user-level if they have no direct use.. If there are both ints and doubles, then being able to cast make sense, so I just put both functions without deeper thinking. So I would suggest to generate an error when an double expression is assigned to a variable, so as to avoid any surprise. If both type are kept, I would like to keep the debug functions, which is really just a debug tool to have a look at what is going within expressions. -- Fabien.
On Sat, Dec 19, 2015 at 10:32 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >>> After looking again at the code, I remembered why double are useful: >>> there >>> are needed for random exponential & gaussian because the last parameter >>> is a >>> double. >>> >>> I do not care about the sqrt, but double must be allowed to keep that, >>> and >>> the randoms are definitely useful for a pgbench script. Now the patch may >>> just keep double constants, but it would look awkward, and the doc must >>> explain why 1.3 and 1+2 are okay, but not 1.3 + 2.4. >>> >>> So I'm less keen at removing double expressions, because it removes a key >>> feature. If it is a blocker I'll go for just the constant, but this looks >>> to >>> me like a stupid compromise. >> >> >> Hm, say that you do that in a script: \set aid double(1.4) \set bid >> random_gaussian(1, 10, :aid) Then what is passed as third argument in >> random_gaussian is 1, and not 1.4, no? > > > Indeed. > > Maybe pgbench should just generate an error when a variable is assigned a > double, so that the user must explicitly add an int() cast. I would honestly just remove this whole int() and double() business from what is available for the user, and mention on in the documentation clearly the return type and the types of the arguments of each function. And that's already what your patch is doing. >> If all allocations within a variable are unconditionally integers, why is >> it useful to make the cast function double() user-visible? > > I'm not sure whether we are talking about the same thing: > - there a "double" type managed within expressions, but not variables > - there is a double() function, which takes an int and casts to double > > I understood that you were suggesting to remove all "double" expressions, > but now it seems to be just about the double() function. There is indeed a misunderstanding here: I meant from the start the removal of only the "double" function. It would be nice to keep as user-visible only things that have some meaning. >> I am just doubting that it is actually necessary to make that visible at >> user-level if they have no direct use.. > > If there are both ints and doubles, then being able to cast make sense, so I > just put both functions without deeper thinking. > So I would suggest to generate an error when an double expression is > assigned to a variable, so as to avoid any surprise. > If both type are kept, I would like to keep the debug functions, which is > really just a debug tool to have a look at what is going within expressions. Well, if there were doubles as return results really allocated as doubles in variables having both would make sense. And honestly something like sqrt that returns an integer when allocated in a variable is really surprising.. And as you mentioned upthread there is no real meaning to have doubles variable types that can be allocated. (Moving this patch to next CF btw) -- Michael
Hello Michael, >> I'm not sure whether we are talking about the same thing: >> - there a "double" type managed within expressions, but not variables >> - there is a double() function, which takes an int and casts to double >> >> I understood that you were suggesting to remove all "double" expressions, >> but now it seems to be just about the double() function. > > There is indeed a misunderstanding here: I meant from the start the > removal of only the "double" function. Ok. I clearly misunderstood everything... > It would be nice to keep as user-visible only things that have some > meaning. Why not. The purpose of some of these functions what to show how the function infrastructured extended, as was asked on the thread. The really useful ones in the bunch are the arithmetic operators and randoms generators. > [...] Well, if there were doubles as return results really allocated as > doubles in variables having both would make sense. And honestly > something like sqrt that returns an integer when allocated in a variable > is really surprising.. And as you mentioned upthread there is no real > meaning to have doubles variable types that can be allocated. So you would just like to remove double double(int) and double sqrt(double) from the patch, and basically that is all? int int(double)?? debug??? (hmmm, useful debugging a non trivial expression). -- Fabien.
On Tue, Dec 22, 2015 at 12:16 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > So you would just like to remove double double(int) and double sqrt(double) > from the patch, and basically that is all? int int(double)?? > debug??? (hmmm, useful debugging a non trivial expression). OK, so I am finally back on this item, and after a close look at the code I am throwing away my concerns. - if (!evaluateExpr(st, expr, &result)) + if (!evaluateExpr(thread, st, expr, &result)) { st->ecnt++; return true; } - sprintf(res, INT64_FORMAT, result); + sprintf(res, INT64_FORMAT, INT(result)); Based on that all the final results of a \set command will have an integer format, still after going through this patch, allowing double as return type for nested function calls (first time "nested" is written on this thread) is actually really useful, and that's what makes sense for this feature. I am not sure why I haven't thought about that before as well... So, even if for example the final result of a variable is an integer, it is possible to do very fancy things like that: \set aid debug(random_exponential(1, 100, pi())) \set bid debug(random_exponential(101, 200, pi())) \set cid debug(random_gaussian(:aid, :bid, double(:aid * pi()))) SELECT :cid; That's actually where things like sqrt() and pi() gain a lot in power by working directly on the integers returned by the random functions. Something that bothered me though while testing: the debug() function is useful, but it becomes difficult to see its results efficiently when many outputs are stacking, so I think that it would be useful to be able to pass a string prefix to have the possibility to annotate a debug entry, say "debug('foo', 5.2)" outputs: debug(script=0,command=1): note: foo, int/double X I guess that it would be useful then to allow as well concatenation of strings using "+" with a new PgBenchExprType as ENODE_STRING, but perhaps I am asking too much. Thoughts are welcome regarding that, it does not seem mandatory as of now as this patch is already doing much. We could remove some of the functions in the first shot of this patch to simplify it a bit, but that does not look like a win as their footprint on the code is low. I haven't noticed at quick glance any kind of memory leaks but this deserves a closer look with valgrind for example, still the patch looks in good shape to me. And more comments for example in pgbench.h would be welcome to explain more the code. I am fine to take a final look at that before handling it to a committer though. Does that sound fine as a plan, Fabien? -- Michael
Hello Michaël, > Based on that all the final results of a \set command will have an > integer format, still after going through this patch, allowing double > as return type for nested function calls (first time "nested" is > written on this thread) is actually really useful, and that's what > makes sense for this feature. Yep. > [...] Something that bothered me though while testing: the debug() > function is useful, but it becomes difficult to see its results > efficiently when many outputs are stacking, so I think that it would be > useful to be able to pass a string prefix to have the possibility to > annotate a debug entry, say "debug('foo', 5.2)" outputs: > debug(script=0,command=1): note: foo, int/double X I guess that it would > be useful then to allow as well concatenation of strings using "+" with > a new PgBenchExprType as ENODE_STRING, but perhaps I am asking too much. I think that the answer is yes:-) Obviously all this is possible in the grand scale of things, but this is also significant work for a tiny benefit, if any. I rather see "debug" as a simple tool for debugging a script with "pgbench -t 1" (i.e. one or few transactions), so that the trace length is not an issue. Once the script is satisfactory, remove the debug and run it for real. Note that it does not make much sense to run with "debug" calls for many transactions because of the performance impact. > Thoughts are welcome regarding that, it does not seem mandatory Good. > as of now as this patch is already doing much. Yep. > We could remove some of the functions in the first shot of this patch > to simplify it a bit, but that does not look like a win as their > footprint on the code is low. That is one good thing about this function infrastructure, adding a new function has a small impact. Some functions are there more for the demo of how to do it than useful as such (eg sqrt()), but that is not bad thing. > I haven't noticed at quick glance any kind of memory leaks but this > deserves a closer look with valgrind for example, still the patch > looks in good shape to me. ISTM that there is no allocation in the evaluation part, all is done in stack, so there is no risk of a leak there. If there is a leak in the parser we really don't care: this is run once when reading a script, and then after a run pgbench exits anyway. > And more comments for example in pgbench.h would be welcome to explain > more the code. Ok. I'm generally in favor of comments. > I am fine to take a final look at that before handling it to a committer > though. Does that sound fine as a plan, Fabien? I understand that you propose to add these comments? I'm fine with that!:-) If I misuderstood, tell me:-) -- Fabien.
On Thu, Jan 7, 2016 at 7:00 PM, Fabien COELHO wrote: > Obviously all this is possible in the grand scale of things, but this is > also significant work for a tiny benefit, if any. I rather see "debug" as a > simple tool for debugging a script with "pgbench -t 1" (i.e. one or few > transactions), so that the trace length is not an issue. > Once the script is satisfactory, remove the debug and run it for real. > Note that it does not make much sense to run with "debug" calls for many > transactions because of the performance impact. Yep. That's exactly what I did during my tests. >> And more comments for example in pgbench.h would be welcome to explain >> more the code. > > Ok. I'm generally in favor of comments. OK, I added some where I thought it was adapted. >> I am fine to take a final look at that before handling it to a committer >> though. Does that sound fine as a plan, Fabien? > > I understand that you propose to add these comments? Yep. I did as attached. > I'm fine with that!:-) > If I misuderstood, tell me:-) I think you don't, but I found other issues: 1) When precising a negative parameter in the gaussian and exponential functions an assertion is triggered: Assertion failed: (parameter > 0.0), function getExponentialRand, file pgbench.c, line 494. Abort trap: 6 (core dumped) An explicit error is better. 2) When precising a value between 0 and 2, pgbench will loop infinitely. For example: \set cid debug(random_gaussian(-100, 100, 0)) In both cases we just lack sanity checks with PGBENCH_RANDOM, PGBENCH_RANDOM_EXPONENTIAL and PGBENCH_RANDOM_GAUSSIAN. I think that those checks would be better if moved into getExponentialRand & co with the assertions removed. I would personally have those functions return a boolean status and have the result in a pointer as a function argument. Another thing itching me is that ENODE_OPERATOR is actually useless now that there is a proper function layer. Removing it and replacing it with a set of functions would be more adapted and make the code simpler, at the cost of more functions and changing the parser so as an operator found is transformed into a function expression. I am attaching a patch where I tweaked a bit the docs and added some comments for clarity. Patch is switched to "waiting on author" for the time being. Regards, -- Michael
Attachment
Hello Michaël, > 1) When precising a negative parameter in the gaussian and exponential > functions an assertion is triggered: > Assertion failed: (parameter > 0.0), function getExponentialRand, file > pgbench.c, line 494. > Abort trap: 6 (core dumped) > An explicit error is better. Ok for assert -> error. > 2) When precising a value between 0 and 2, pgbench will loop > infinitely. For example: > \set cid debug(random_gaussian(-100, 100, 0)) > In both cases we just lack sanity checks with PGBENCH_RANDOM, > PGBENCH_RANDOM_EXPONENTIAL and PGBENCH_RANDOM_GAUSSIAN. I think that > those checks would be better if moved into getExponentialRand & co > with the assertions removed. I would personally have those functions > return a boolean status and have the result in a pointer as a function > argument. ISTM that if pgbench is to be stopped, the simplest option is just to abort with a nicer error message from the get*Rand function, there is no need to change the function signature and transfer the error management upwards. > Another thing itching me is that ENODE_OPERATOR is actually useless > now that there is a proper function layer. Removing it and replacing > it with a set of functions would be more adapted and make the code > simpler, at the cost of more functions and changing the parser so as > an operator found is transformed into a function expression. Hmmm. Why not, although the operator management is slightly more efficient (eg always 2 operands...). > I am attaching a patch where I tweaked a bit the docs and added some > comments for clarity. Patch is switched to "waiting on author" for the > time being. Ok. I'm hesitating about removing the operator management, especially if I'm told to put it back afterwards. -- Fabien.
On Tue, Jan 12, 2016 at 5:52 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> 2) When precising a value between 0 and 2, pgbench will loop >> infinitely. For example: >> \set cid debug(random_gaussian(-100, 100, 0)) >> In both cases we just lack sanity checks with PGBENCH_RANDOM, >> PGBENCH_RANDOM_EXPONENTIAL and PGBENCH_RANDOM_GAUSSIAN. I think that >> those checks would be better if moved into getExponentialRand & co >> with the assertions removed. I would personally have those functions >> return a boolean status and have the result in a pointer as a function >> argument. > > > ISTM that if pgbench is to be stopped, the simplest option is just to abort > with a nicer error message from the get*Rand function, there is no need to > change the function signature and transfer the error management upwards. That's fine to me, as long as the solution is elegant. >> I am attaching a patch where I tweaked a bit the docs and added some >> comments for clarity. Patch is switched to "waiting on author" for the >> time being. > > Ok. I'm hesitating about removing the operator management, especially if I'm > told to put it back afterwards. I can understand that, things like that happen all the time here and that's not a straight-forward patch that we have here. I am sure that additional opinions here would be good to have before taking one decision or another. With the current statu-quo, let's just do what you think is best. -- Michael
Hello Michaël, >> ISTM that if pgbench is to be stopped, the simplest option is just to abort >> with a nicer error message from the get*Rand function, there is no need to >> change the function signature and transfer the error management upwards. > > That's fine to me, as long as the solution is elegant. Hmmm, this is subjective:-) I've decided to stay with the current behavior (\setrandom), that is to abort the current transaction on errors but not to abort pgbench itself. The check is done before calling the functions, and I let an assert in the functions just to be sure. It is going to loop on errors anyway, but this is what it already does anyway. >> Ok. I'm hesitating about removing the operator management, especially if I'm >> told to put it back afterwards. > > I can understand that, things like that happen all the time here and > that's not a straight-forward patch that we have here. I am sure that > additional opinions here would be good to have before taking one > decision or another. With the current statu-quo, let's just do what > you think is best. I let the operators alone and just adds functions management next to it. I'll merge operators as functions only if it is a blocker. I have assumed that your v4 is really v17, and this is v18... -- Fabien.
On Wed, Jan 13, 2016 at 10:28 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello Michaël, > >>> ISTM that if pgbench is to be stopped, the simplest option is just to >>> abort >>> with a nicer error message from the get*Rand function, there is no need >>> to >>> change the function signature and transfer the error management upwards. >> >> >> That's fine to me, as long as the solution is elegant. > > > Hmmm, this is subjective:-) And dependent on personal opinions and views. > I've decided to stay with the current behavior (\setrandom), that is to > abort the current transaction on errors but not to abort pgbench itself. The > check is done before calling the functions, and I let an assert in the > functions just to be sure. It is going to loop on errors anyway, but this is > what it already does anyway. OK, yes I see now I missed that during my last review. This has the disadvantage to double the amount of code dedicated to parameter checks though :( But based on the feedback perhaps other folks would feel that it would be actually worth simply dropping the existing \setrandom command. I won't object to that personally, such pgbench features are something for hackers and devs mainly, so I guess that we could survive without a deprecation period here. >>> Ok. I'm hesitating about removing the operator management, especially if >>> I'm >>> told to put it back afterwards. >> >> >> I can understand that, things like that happen all the time here and >> that's not a straight-forward patch that we have here. I am sure that >> additional opinions here would be good to have before taking one >> decision or another. With the current statu-quo, let's just do what >> you think is best. > > > I let the operators alone and just adds functions management next to it. > I'll merge operators as functions only if it is a blocker. I think that's a blocker, but I am not the only one here and not a committer. - fprintf(stderr, "gaussian parameter must be at least %f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]); + fprintf(stderr, + "random gaussian parameter must be greater than %f " + "(got %f)\n", MIN_GAUSSIAN_PARAM, parameter); This looks like useless noise to me. Why changing those messages? - if (parameter <= 0.0) + if (parameter < 0.0) { This bit is false, and triggers an assertion failure when the exponential parameter is 0. fprintf(stderr, - "exponential parameter must be greater than zero (not \"%s\")\n", - argv[5]); + "random exponential parameter must be greater than 0.0 " + "(got %f)\n", parameter); st->ecnt++; - return true; + return false; This diff is noise as well, and should be removed. + /* + * Note: this section could be removed, as the same functionnality + * is available through \set xxx random_gaussian(...) + */ I think that you are right to do that. That's no fun to break existing scripts, even if people doing that with pgbench are surely experienced hackers. } - case ENODE_VARIABLE: Such diffs are noise as well. int() should be strengthened regarding bounds. For example: \set cid debug(int(int(9223372036854775807) + double(9223372036854775807)))debug(script=0,command=1): int -9223372036854775808 -- Michael
Hello Michaël, >> Hmmm, this is subjective:-) > > And dependent on personal opinions and views. > >> I've decided to stay with the current behavior (\setrandom), that is to >> abort the current transaction on errors but not to abort pgbench itself. The >> check is done before calling the functions, and I let an assert in the >> functions just to be sure. It is going to loop on errors anyway, but this is >> what it already does anyway. > > OK, yes I see now I missed that during my last review. This has the > disadvantage to double the amount of code dedicated to parameter > checks though :( Yep, I noticed that obviously, but I envision that "\setrandom" pretty unelegant code could go away soon, so this is really just temporary. > But based on the feedback perhaps other folks would feel that it would > be actually worth simply dropping the existing \setrandom command. Yep, exactly my thoughts. I did not do it because there are two ways: actually remove it and be done, or build an equivalent \set at parse time, so that would just remove the execution part, but keep some ugly stuff in parsing. I would be in favor of just dropping it. > I won't object to that personally, such pgbench features are something > for hackers and devs mainly, so I guess that we could survive without a > deprecation period here. Yep. I can remove it, but I would like a clear go/vote before doing that. >>> I can understand that, things like that happen all the time here and >>> that's not a straight-forward patch that we have here. I am sure that >>> additional opinions here would be good to have before taking one >>> decision or another. With the current statu-quo, let's just do what >>> you think is best. >> >> >> I let the operators alone and just adds functions management next to it. >> I'll merge operators as functions only if it is a blocker. > > I think that's a blocker, but I am not the only one here and not a committer. Ok, I can remove that easily anyway. > - fprintf(stderr, "gaussian parameter must be at least > %f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]); > + fprintf(stderr, > + "random gaussian parameter must be greater than %f " > + "(got %f)\n", MIN_GAUSSIAN_PARAM, parameter); > This looks like useless noise to me. Why changing those messages? Because the message was not saying that it was about random, and I think that it is the important. > - if (parameter <= 0.0) > + if (parameter < 0.0) > { > > This bit is false, and triggers an assertion failure when the > exponential parameter is 0. Oops:-( Sorry. > fprintf(stderr, > - "exponential parameter must be greater than > zero (not \"%s\")\n", > - argv[5]); > + "random exponential parameter must be greater than 0.0 " > + "(got %f)\n", parameter); > st->ecnt++; > - return true; > + return false; > This diff is noise as well, and should be removed. Ok, I can but "zero" and "not" back, but same remark as above, why not tell that it is about random? This information is missing. > + /* > + * Note: this section could be removed, as the same > functionnality > + * is available through \set xxx random_gaussian(...) > + */ > I think that you are right to do that. That's no fun to break existing > scripts, even if people doing that with pgbench are surely experienced > hackers. Ok, but I would like a clear go or vote before doing this. > - > case ENODE_VARIABLE: > Such diffs are noise as well. Yep. > int() should be strengthened regarding bounds. For example: > \set cid debug(int(int(9223372036854775807) + > double(9223372036854775807))) > debug(script=0,command=1): int -9223372036854775808 Hmmm. You mean just to check the double -> int conversion for overflow, as in: SELECT (9223372036854775807::INT8 + 9223372036854775807::DOUBLE PRECISION)::INT8; Ok. -- Fabien.
On Thu, Jan 14, 2016 at 5:54 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > I wrote: >> - fprintf(stderr, "gaussian parameter must be at least >> %f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]); >> + fprintf(stderr, >> + "random gaussian parameter must be greater >> than %f " >> + "(got %f)\n", MIN_GAUSSIAN_PARAM, parameter); >> This looks like useless noise to me. Why changing those messages? > > > Because the message was not saying that it was about random, and I think > that it is the important. >> fprintf(stderr, >> - "exponential parameter must be greater than >> zero (not \"%s\")\n", >> - argv[5]); >> + "random exponential parameter must be greater than >> 0.0 " >> + "(got %f)\n", parameter); >> st->ecnt++; >> - return true; >> + return false; >> This diff is noise as well, and should be removed. > > Ok, I can but "zero" and "not" back, but same remark as above, why not tell > that it is about random? This information is missing. Those things should be a separate patch then, committed separately as they provide more verbose messages. >> + /* >> + * Note: this section could be removed, as the same >> functionnality >> + * is available through \set xxx random_gaussian(...) >> + */ >> I think that you are right to do that. That's no fun to break existing >> scripts, even if people doing that with pgbench are surely experienced >> hackers. > > Ok, but I would like a clear go or vote before doing this. For now, I am seeing opinions on those matters from nobody else than me and you, and we got toward the same direction. If you think that there is a possibility that somebody has a different opinion on those matters, and it is likely so let's keep the patch as-is then and wait for more input: it is easier to remove code than add it back. I am not sure what a committer would say, and it surely would be a waste of time to just move back and worth for everybody. >> int() should be strengthened regarding bounds. For example: >> \set cid debug(int(int(9223372036854775807) + >> double(9223372036854775807))) >> debug(script=0,command=1): int -9223372036854775808 > > > Hmmm. You mean just to check the double -> int conversion for overflow, > as in: > > SELECT (9223372036854775807::INT8 + > 9223372036854775807::DOUBLE PRECISION)::INT8; > > Ok. Yes, that's what I mean. The job running into that should definitely fail with a proper out-of-bound error message. -- Michael
Hello Michaël, Here is a v19 : - avoid noisy changes - abort on double->int overflow - implement operators as functions There is still \setrandom, that I can remove easily with a green light. -- Fabien.
On Fri, Jan 15, 2016 at 11:53 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Here is a v19 : > - avoid noisy changes > - abort on double->int overflow > - implement operators as functions > > There is still \setrandom, that I can remove easily with a green light. (I am not sure why *$%"# gmail broke the thread in my last email) Thanks for the new patch and replacing the operator stuff by functions. + <entry>uniformly-distributed random integer in <literal>[lb,ub]</></> Nitpick: when defining an interval like that, you may want to add a space after the comma. For example seg.sgml does that. It would be good to be consistent even here. And actually you wrote [ub, lb] in two places, this should have been reversed. + /* beware that the list is reverse in make_func */ s/reverse/reversed/? } +#ifdef DEBUG Some noise. With this example: \set cid debug(sqrt(-1)) I get that: debug(script=0,command=1): double nan An error would be more logical, no? You want to emulate with complex numbers instead? The basic operator functions also do not check for integer overflows. Those three ones are just overflowing: \set cid debug(9223372036854775807 + 1) \set cid debug(-9223372036854775808 - 1) \set cid debug(9223372036854775807 * 9223372036854775807) debug(script=0,command=1): int -9223372036854775807 debug(script=0,command=2): int 9223372036854775807 debug(script=0,command=3): int 1 And this one generates a core dump: \set cid debug(-9223372036854775808 / -1) Floating point exception: 8 (core dumped) A more general comment: what about splitting all the execution functions into a separate file exprexec.c? evaluateExpr (renamed as execExpr) is the root function, but then we have a set of static sub-functions for each node, like execExprFunc, execExprVar, execExprConst, etc? This way we would save a bit of tab-indentation, this patch making the new code lines becoming larger than 80 characters because of all the switch/case stuff that gets more complicated. -- Michael
Hello Michaël, > + <entry>uniformly-distributed random integer in <literal>[lb,ub]</></> > Nitpick: when defining an interval like that, you may want to add a > space after the comma. Why not. > + /* beware that the list is reverse in make_func */ > s/reverse/reversed/? Indeed. > + > #ifdef DEBUG > Some noise. Ok. > With this example: > \set cid debug(sqrt(-1)) > I get that: > debug(script=0,command=1): double nan > An error would be more logical, no? If "sqrt(-1)" as a double is Nan for the computer, I'm fine with that. It makes the code simpler to just let the math library do its stuff and not bother. > You want to emulate with complex numbers instead? Nope. > The basic operator functions also do not check for integer overflows. This is a feature. I think that they should not check for overflow, as in C, this is just int64_t arithmetic "as is". Moreover, it would be a new feature to add such a check if desirable, so it would belong to another patch, it is not related to adding functions. The addition already overflows in the current code. Finally I can think of good reason to use overflows deliberately, so I think it would argue against such a change. > Those three ones are just overflowing: > \set cid debug(9223372036854775807 + 1) > \set cid debug(-9223372036854775808 - 1) > \set cid debug(9223372036854775807 * 9223372036854775807) > debug(script=0,command=1): int -9223372036854775807 > debug(script=0,command=2): int 9223372036854775807 > debug(script=0,command=3): int 1 All these results are fine from my point of view. > And this one generates a core dump: > \set cid debug(-9223372036854775808 / -1) > Floating point exception: 8 (core dumped) This one is funny, but it is a fact of int64_t life: you cannot divide INT64_MIN by -1 because the result cannot be represented as an int64_t. This is propably hardcoded in the processor. I do not think it is worth doing anything about it for pgbench. > A more general comment: what about splitting all the execution > functions into a separate file exprexec.c? evaluateExpr (renamed as > execExpr) is the root function, but then we have a set of static > sub-functions for each node, like execExprFunc, execExprVar, > execExprConst, etc? I do not see a strong case for renaming. The function part could be split because of the indentation, though. > This way we would save a bit of tab-indentation, this patch making the > new code lines becoming larger than 80 characters because of all the > switch/case stuff that gets more complicated. I agree that the code is pretty ugly, but this is partly due to postgres indentation rules for switch which are *NOT* reasonnable, IMO. I put the function evaluation in a function in the attached version. -- Fabien.
On Sun, Jan 17, 2016 at 3:10 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> With this example: >> \set cid debug(sqrt(-1)) >> I get that: >> debug(script=0,command=1): double nan >> An error would be more logical, no? > > > If "sqrt(-1)" as a double is Nan for the computer, I'm fine with that. It > makes the code simpler to just let the math library do its stuff and not > bother. Hm. OK.. >> The basic operator functions also do not check for integer overflows. > > > This is a feature. I think that they should not check for overflow, as in C, > this is just int64_t arithmetic "as is". (int64_t is not an available type on Windows btw.) > Moreover, it would be a new feature to add such a check if desirable, so it > would belong to another patch, it is not related to adding functions. > The addition already overflows in the current code. > > Finally I can think of good reason to use overflows deliberately, so I think > it would argue against such a change. Could you show up an example? I am curious about that. >> Those three ones are just overflowing: >> \set cid debug(9223372036854775807 + 1) >> \set cid debug(-9223372036854775808 - 1) >> \set cid debug(9223372036854775807 * 9223372036854775807) >> debug(script=0,command=1): int -9223372036854775807 >> debug(script=0,command=2): int 9223372036854775807 >> debug(script=0,command=3): int 1 > All these results are fine from my point of view. On HEAD we are getting similar strange results, so I am fine to withdraw but that's still very strange to me. The first case is generating -9223372036854775808, the second one compiles 9223372036854775807 and the third one generates 1. Or we make the error checks even more consistent in back-branches, perhaps that 's indeed not worth it for a client though. >> And this one generates a core dump: >> \set cid debug(-9223372036854775808 / -1) >> Floating point exception: 8 (core dumped) > > This one is funny, but it is a fact of int64_t life: you cannot divide > INT64_MIN by -1 because the result cannot be represented as an int64_t. > This is propably hardcoded in the processor. I do not think it is worth > doing anything about it for pgbench. This one, on the contrary, is generating an error on HEAD, and your patch is causing a crash: value "9223372036854775808" is out of range for type bigint That's a regression, no? I am uncomfortable with the fact of letting such holes in the code, even if that's a client application. >> A more general comment: what about splitting all the execution >> functions into a separate file exprexec.c? evaluateExpr (renamed as >> execExpr) is the root function, but then we have a set of static >> sub-functions for each node, like execExprFunc, execExprVar, >> execExprConst, etc? > > I do not see a strong case for renaming. The function part could be split > because of the indentation, though. The split makes sense to me regarding the fact that we have function parsing, execution and the main code paths in different files. That's not far from the backend that does similar split actually. >> This way we would save a bit of tab-indentation, this patch making the new >> code lines becoming larger than 80 characters because of all the switch/case >> stuff that gets more complicated. > > I agree that the code is pretty ugly, but this is partly due to postgres > indentation rules for switch which are *NOT* reasonnable, IMO. > > I put the function evaluation in a function in the attached version. Thanks, this makes the code a bit clearer. -- Michael
On Mon, Jan 18, 2016 at 10:07 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sun, Jan 17, 2016 at 3:10 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> I put the function evaluation in a function in the attached version. > > Thanks, this makes the code a bit clearer. OK, so I had an extra look at this patch and I am marking it as ready for committer. A couple of things to be aware of, and the result of this thread with the patch in its current state: - The patch is keeping \setrandom. Fabien and I are agreeing to purely remove it, though it is kept in the patch because it is easier to remove existing code rather than add it again per Fabien's concerns. - INT64_MIN / -1 throws a core dump, and errors on HEAD. I think this should be fixed, Fabien does not. - There are not many overflow checks for the exiting int64 operators and functions. HEAD doesn't do much, this patch makes it the situation a bit worse even if there are a couple of checks for int() for example. We do not do any checks for sqrt(-N) (N > 0) for example. - It may be more interesting to have the function execution code into a separate file for clarity. Not mandatory though. Except those comments, all the other issues have been addressed. I think this is a great patch, and greatly improves the extensibility of pgbench. -- Michael
>>> The basic operator functions also do not check for integer overflows. >> >> This is a feature. I think that they should not check for overflow, as in C, >> this is just int64_t arithmetic "as is". > > (int64_t is not an available type on Windows btw.) Possibly. I really meant "64 bits signed integers", whatever its name. "int64_t" is the standard C99 name. >> Finally I can think of good reason to use overflows deliberately, so I >> think it would argue against such a change. > Could you show up an example? I am curious about that. The one I can think of is the use of "SUM" to aggregate hashes for computing a hash on a table. If SUM would overflow and stop this would break the usage. Now there could be a case for having an overflow detection on SUM, but that would be another SUM, preferably with a distinct name. >>> \set cid debug(9223372036854775807 * 9223372036854775807) >>> debug(script=0,command=3): int 1 >> All these results are fine from my point of view. > > On HEAD we are getting similar strange results, Yep, this is not new. > so I am fine to withdraw but that's still very strange to me. Arithmetic operator modulo are pretty strange, I can agree with that:-) > The first case is generating -9223372036854775808, the second one > compiles 9223372036854775807 and the third one generates 1. This should be the "real" result modulo 2**64, if I'm not mistaken. > Or we make the error checks even more consistent in back-branches, > perhaps that 's indeed not worth it for a client though. Yep, that would be another patch. >>> And this one generates a core dump: >>> \set cid debug(-9223372036854775808 / -1) >>> Floating point exception: 8 (core dumped) >> >> This one is funny, but it is a fact of int64_t life: you cannot divide >> INT64_MIN by -1 because the result cannot be represented as an int64_t. >> This is propably hardcoded in the processor. I do not think it is worth >> doing anything about it for pgbench. > > This one, on the contrary, is generating an error on HEAD, and your > patch is causing a crash: value "9223372036854775808" is out of range > for type bigint That's a regression, no? Hmmm, yes, somehow, but just for this one value, if you try the next: pgbench 9.4.5: value "-9223372036854775809" is out of range for type bigint I guess that the implementation before 9.5 converted "-9223372036854775808" as an int, which is INT64_MIN, so it was fine. Now it is parsed as "operator uminus" applied to "9223372036854775808", which is not fine because this would be INT64_MAX+1, which is not possible. I would prefer just to neglect that as a very small (1/2**64) feature change rather than a meaningful regression, especially as the coding effort to fix this is significant and the value of handling it differently is nought. > I am uncomfortable with the fact of letting such holes in the code, even > if that's a client application. This is a 2**128 probability case which stops pgbench. It is obviously possible to add a check to catch it, and then generate an error message, but I would rather just ignore it and let pgbench stop on that. -- Fabien.
> OK, so I had an extra look at this patch and I am marking it as ready > for committer. Ok. > - INT64_MIN / -1 throws a core dump, and errors on HEAD. I think this > should be fixed, Fabien does not. Yep. Another point about this one is that it is not related to this patch about functions. -- Fabien.
>> OK, so I had an extra look at this patch and I am marking it as ready >> for committer. > > Ok. Attached is a rebase after recent changes in pgbench code & doc. -- Fabien.
On Sat, Jan 16, 2016 at 1:10 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > All these results are fine from my point of view. > >> And this one generates a core dump: >> \set cid debug(-9223372036854775808 / -1) >> Floating point exception: 8 (core dumped) That does not seem acceptable to me. I don't want pgbench to die a horrible death if a floating point exception occurs any more than I want the server to do the same. I want the error to be properly caught and reported. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jan 27, 2016 at 3:39 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Attached is a rebase after recent changes in pgbench code & doc. +/* use short names in the evaluator */ +#define INT(v) coerceToInt(&v) +#define DOUBLE(v) coerceToDouble(&v) +#define SET_INT(pv, ival) setIntValue(pv, ival) +#define SET_DOUBLE(pv, dval) setDoubleValue(pv, dval) I don't like this at all. It seems to me that this really obscures the code. The few extra characters are a small price to pay for not having to go look up the macro definition to understand what the code is doing. The third hunk in pgbench.c unnecessary deletes a blank line. /* * inner expresion in (cut, 1] (if parameter > 0), rand in [0, 1) + * Assert((1.0 - cut) != 0.0); */ - Assert((1.0 - cut) != 0.0); rand = -log(cut + (1.0 - cut) * uniform) / parameter; + Moving the Assert() into the comment seems like a bad plan. If the Assert is true, it shouldn't be commented out. If it's not, it shouldn't be there at all. Commit e41beea0ddb74ef975f08b917a354ec33cb60830, which you wrote, went to some trouble to display good context for error messages. What you have here seems like a huge step backwards: + fprintf(stderr, "double to int overflow for %f\n", dval); + exit(1); So, we're just going to give up on all of that error context reporting that we added back then? That would be sad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, >>> And this one generates a core dump: >>> \set cid debug(-9223372036854775808 / -1) >>> Floating point exception: 8 (core dumped) > > That does not seem acceptable to me. I don't want pgbench to die a > horrible death if a floating point exception occurs any more than I want > the server to do the same. I want the error to be properly caught and > reported. Please note that this issue/bug/feature/whatever already exists just for these two values (1/2**128 probability), it is not related to this patch about functions. sh> pgbench --version pgbench (PostgreSQL) 9.5.0 sh> cat div.sql \set i -9223372036854775807 \set i :i - 1 \set i :i / -1 sh> pgbench -f div.sql starting vacuum...end. Floating point exception (core dumped) I do not think that it is really worth fixing, but I will not prevent anyone to fix it. -- Fabien.
Hello Robert, >> Attached is a rebase after recent changes in pgbench code & doc. > > +/* use short names in the evaluator */ > +#define INT(v) coerceToInt(&v) > +#define DOUBLE(v) coerceToDouble(&v) > +#define SET_INT(pv, ival) setIntValue(pv, ival) > +#define SET_DOUBLE(pv, dval) setDoubleValue(pv, dval) > > I don't like this at all. It seems to me that this really obscures > the code. The few extra characters are a small price to pay for not > having to go look up the macro definition to understand what the code > is doing. Hmmm. Postgres indentation rules for "switch" are peculiar to say the least and make it hard to write code that stay under 80 columns. The coerceToInt function name looks pretty long (I would rather have toInt/toDbl/setInt/setDbl) but I was "told" to use that, so I'm trying to find a tradeoff with a macro. Obviously I can substitude and have rather long lines that I personally find much uglier. > The third hunk in pgbench.c unnecessary deletes a blank line. Yep, that is possible. > /* > * inner expresion in (cut, 1] (if parameter > 0), rand in [0, 1) > + * Assert((1.0 - cut) != 0.0); > */ > - Assert((1.0 - cut) != 0.0); > rand = -log(cut + (1.0 - cut) * uniform) / parameter; > + > > Moving the Assert() into the comment seems like a bad plan. If the > Assert is true, it shouldn't be commented out. If it's not, it > shouldn't be there at all. I put this assertion when I initially wrote this code, but I think that it is proven so I moved it in comment just as a reminder for someone who might touch anything that this must hold. > Commit e41beea0ddb74ef975f08b917a354ec33cb60830, which you wrote, went > to some trouble to display good context for error messages. What you > have here seems like a huge step backwards: > > + fprintf(stderr, "double to int overflow for > %f\n", dval); > + exit(1); > > So, we're just going to give up on all of that error context reporting > that we added back then? That would be sad. Well, I'm a lazy programmer, so I'm trying to measure the benefit. IMO there is no benefit to better manage this case, especially as the various solution I thought of where either ugly and/or had a significant impact on the code. Note that in the best case the error would be detected and reported and the client is stopped, and other clients go on... But then, if you started a bench and some clients die while running probably your results are meaningless, so my opinion is that you are better off with an exit than with some message that you may miss and performance results computed with much less clients than you asked for. Pgbench is a bench tool, not a production tool. -- Fabien.
On Thu, Jan 28, 2016 at 7:07 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > I do not think that it is really worth fixing, but I will not prevent anyone > to fix it. I still think it does. Well, if there is consensus to address this one and optionally the other integer overflows even on back branches, I'll write a patch and let's call that a deal. This is not a problem from my side. -- Michael
On Wed, Jan 27, 2016 at 5:43 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Pgbench is a bench tool, not a production tool. I don't really see how that's relevant. When I run a program and it dies after causing the operating system to send it a fatal signal, I say to myself "self, that program has a bug". Other people may have different reactions, but that's mine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, >> Pgbench is a bench tool, not a production tool. > > I don't really see how that's relevant. My point is that I do not see any added value for pgbench to keep on executing a performance bench if some clients die due to script errors: it is more useful to stop the whole bench and report the issue, so the user can fix their script, than to keep going on with the remaining clients, from a benchmarking perspective. So I'm arguing that exiting, with an error message, is better than handling user errors. > When I run a program and it dies after causing the operating system to > send it a fatal signal, I say to myself "self, that program has a bug". > Other people may have different reactions, but that's mine. I was talking about an exit call generated on a float to int conversion error when there is an error in the user script. The bug is in the user script and this is clearly reported by pgbench. However, your argument may be relevant for avoiding fatal signal such as generated by INT64_MAX / -1, because on this one the message error is terse so how to fix the issue is not clear to the user. -- Fabien.
>> I do not think that it is really worth fixing, but I will not prevent anyone >> to fix it. > > I still think it does. Well, if there is consensus to address this one > and optionally the other integer overflows even on back branches, I'll > write a patch and let's call that a deal. This is not a problem from > my side. My point is just about the cost-benefit of fixing a low probability issue that you can only encounter if you are looking for it, and not with any reasonable bench script. Now adding somewhere a test might just help closing the subject and do more useful things, so that would also be a win. /* these would raise an arithmetic error */ if (lval == INT64_MIN && rval == -1) { fprintf(stderr, "cannot divideor modulo INT64_MIN by -1\n"); return false; } This may be backpatched to old supported versions. -- Fabien.
On Thu, Jan 28, 2016 at 1:36 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > So I'm arguing that exiting, with an error message, is better than handling > user errors. I'm not objecting to exiting with an error message, but I think letting ourselves be killed by a signal is no good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>> So I'm arguing that exiting, with an error message, is better than handling >> user errors. > > I'm not objecting to exiting with an error message, but I think > letting ourselves be killed by a signal is no good. Ok, I understand this point for this purpose. -- Fabien.
<Oops, resent because I again put the wrong From address, sorry> v22 compared to previous: - remove the short macros (although IMO it is a code degradation) - try not to remove/add blankslines - let some assert "as is" - still exit on float to int overflow, see arguments in other mails - check for INT64_MIN/ -1 (although I think it is useless) -- Fabien.
Hello Michaël, v23 attached, which does not change the message but does the other fixes. > + if (coerceToInt(&lval) == INT64_MIN && coerceToInt(&rval) == -1) > + { > + fprintf(stderr, "cannot divide INT64_MIN by -1\n"); > + return false; > + } > Bike-shedding: "bigint out of range" to match what is done in the backend? ISTM that it is clearer for the user to say that the issue is with the division? Otherwise the message does not help much. Well, not that it would be printed often... > +/* > + * Recursive evaluation of an expression in a pgbench script > + * using the current state of variables. > + * Returns whether the evaluation was ok, > + * the value itself is returned through the retval pointer. > + */ > Could you reformat this comment? I can. > fprintf(stderr, > - "exponential parameter must be > greater than zero (not \"%s\")\n", > - argv[5]); > + "exponential parameter must be greater than > zero (not \"%s\")\n", > + argv[5]); > This is some unnecessary diff noise. Indeed. > + setIntValue(retval, getGaussianRand(thread, arg1, arg2, > dparam)); > Some portions of the code are using tabs instead of spaces between > function arguments. Indeed. > I would as well suggest fixing first the (INT64_MAX / -1) crash on HEAD > and back-branches with something like the patch attached, inspired from > int8.c. I think it is overkill, but do as you feel. Note that it must also handle modulo, but the code you suggest cannot be used for that. #include <stdint.h> int main(int argc, char* argv[]) { int64_t l = INT64_MIN; int64_t r = -1; int64_t d = l% r; return 0; } // => Floating point exception (core dumped) ISTM that the second condition can be simplified, as there is no possible issue if lval is positive? if (lval < 0 && *resval < 0) { ... } -- Fabien.
On Fri, Jan 29, 2016 at 3:40 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
-- I would as well suggest fixing first the (INT64_MAX / -1) crash on HEAD and back-branches with something like the patch attached, inspired from int8.c.
I think it is overkill, but do as you feel.
Perhaps we could have Robert decide on this one first? That's a bug after all that had better be backpatched.
Note that it must also handle modulo, but the code you suggest cannot be used for that.
#include <stdint.h>
int main(int argc, char* argv[])
{
int64_t l = INT64_MIN;
int64_t r = -1;
int64_t d = l % r;
return 0;
}
// => Floating point exception (core dumped)
Right, forgot this one, we just need to check if rval is -1 here, and return 0 as result. I am updating the fix as attached.
Michael
Attachment
Hi Michaël, >> I think it is overkill, but do as you feel. > > Perhaps we could have Robert decide on this one first? That's a bug after > all that had better be backpatched. Fine with me. > [modulo...] Right, forgot this one, we just need to check if rval is -1 > here, and return 0 as result. I am updating the fix as attached. This looks to me like it works. I still feel that the condition should be simplified, probably with: if (lval < 0 && *resval <= 0) ... (instead of < in my previous suggestion, if some processors return 0 on -INT64_MIN). Also, a comment is needed to explain why such a bizarre condition is used/needed for just the INT64_MIN case. -- Fabien.
On Fri, Jan 29, 2016 at 6:05 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > (instead of < in my previous suggestion, if some processors return 0 on > -INT64_MIN). Also, a comment is needed to explain why such a bizarre > condition is used/needed for just the INT64_MIN case. The last patch I sent has this bit: + /* + * Some machines throw a floating-point exception + * for INT64_MIN % -1, the correct answer being + * zero in any case. + */ How would you reformulate that à-la-Fabien? -- Michael
>> Also, a comment is needed to explain why such a bizarre >> condition is used/needed for just the INT64_MIN case. > > The last patch I sent has this bit: > + /* > + * Some machines throw a floating-point exception > + * for INT64_MIN % -1, the correct answer being > + * zero in any case. > + */ > How would you reformulate that à-la-Fabien? This one about modulo is fine. I was refering to this other one in the division case: + /* overflow check (needed for INT64_MIN) */ + if (lval != 0 && (*retval < 0 == lval < 0)) Why not use "if (lval == INT64_MIN)" instead of this complicated condition? If it is really needed for some reason, I think that a comment could help. -- Fabien.
On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > + /* overflow check (needed for INT64_MIN) */ > + if (lval != 0 && (*retval < 0 == lval < 0)) > > Why not use "if (lval == INT64_MIN)" instead of this complicated condition? > If it is really needed for some reason, I think that a comment could help. Checking for PG_INT64_MIN only would be fine as well, so let's do so. I thought honestly that we had better check if the result and the left argument are not of the same sign, but well. -- Michael
Attachment
v22 compared to previous: - remove the short macros (although IMO it is a code degradation) - try not to remove/add blankslines - let some assert "as is" - still exit on float to int overflow, see arguments in other mails - check for INT64_MIN/ -1 (although I think it is useless) -- Fabien.
On Fri, Jan 29, 2016 at 5:16 AM, Fabien COELHO <fabien.coelho@mines-paristech.fr> wrote:
> v22 compared to previous:
Thanks for the new patch!
> - remove the short macros (although IMO it is a code degradation)
FWIW, I like this suggestion from Robert.
> - check for INT64_MIN / -1 (although I think it is useless)
> - try not to remove/add blanks lines
> - let some assert "as is"
> - still exit on float to int overflow, see arguments in other mails
+make_op(const char *operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+ return make_func(find_func(operator),
+ /* beware that the list is reversed in make_func */
+ make_elist(rexpr, make_elist(lexpr, NULL)));
+}
I think that this should use as argument the function ID, aka PGBENCH_ADD or similar instead of find_func() with an operator character. This saves a couple of instructions.
+ * - nargs: number of arguments (-1 is a special value for min & max)
My fault perhaps, it may be better to mention here that -1 means that min and max need at least one argument, and that the number of arguments is not fixed.
+ case PGBENCH_MOD:
+ if (coerceToInt(&rval) == 0)
{
fprintf(stderr, "division by zero\n");
return false;
}
- *retval = lval % rval;
+ if (coerceToInt(&lval) == INT64_MIN && coerceToInt(&rval) == -1)
+ {
+ fprintf(stderr, "cannot divide INT64_MIN by -1\n");
+ return false;
+ }
For mod() there is no need to have an error, returning 0 is fine. You can actually do it unconditionally when rval == -1.
+ setDoubleValue(retval, d < 0.0? -d: d);
+typedef enum
+{
+ PGBT_NONE,
+ PGBT_INT,
+ PGBT_DOUBLE
+} PgBenchValueType;
+ if (retval->type == PGBT_INT)
+ fprintf(stderr, "int " INT64_FORMAT "\n", retval->u.ival);
+ else if (retval->type == PGBT_DOUBLE)
+ fprintf(stderr, "double %f\n", retval->u.dval);
+ else
+ fprintf(stderr, "none\n");
Another remaining item: should support for \setrandom be dropped? As the patch is presented this remains intact.
--
Michael
Hello Michaël, >> - remove the short macros (although IMO it is a code degradation) > > FWIW, I like this suggestion from Robert. I'm not especially found of macros, my main reserve is that because of the length of the function names this necessarily creates lines longer than 80 columns or awkward and repeated new lines within expressions, which are both ugly. > +make_op(const char *operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr) > +{ > + return make_func(find_func(operator), > + /* beware that the list is > reversed in make_func */ > + make_elist(rexpr, > make_elist(lexpr, NULL))); > +} > I think that this should use as argument the function ID, aka PGBENCH_ADD > or similar instead of find_func() with an operator character. This saves a > couple of instructions. Not that simply: The number is the index in the array of functions, not the enum value, they are currently off by one, and there may be the same function with different names for some reason some day, so I do not think it would be a good idea to enforce the order so that they are identical. Also this minor search is only executed when parsing the script. > + * - nargs: number of arguments (-1 is a special value for min & max) > My fault perhaps, it may be better to mention here that -1 means that min > and max need at least one argument, and that the number of arguments is not > fixed. Ok for a better comment. > For mod() there is no need to have an error, returning 0 is fine. You can > actually do it unconditionally when rval == -1. Ok. > + setDoubleValue(retval, d < 0.0? -d: d); > Nitpick: lack of spaces between the question mark. Ok. > NONE is used nowhere, but I think that you could use it for an assertion > check here: [...] > Just replace the "none" message by Assert(type != PGBT_NONE) for example. I've added a use of the macro. > Another remaining item: should support for \setrandom be dropped? As the > patch is presented this remains intact. As you know my opinion is "yes", but I have not receive a clear go about that from a committer and I'm not motivated by removing and then re adding code to the patch. If nothing clear is said, I'll do a patch just for that one functions are added and submit it to the next CF. -- Fabien.
On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> + /* overflow check (needed for INT64_MIN) */ >> + if (lval != 0 && (*retval < 0 == lval < 0)) >> >> Why not use "if (lval == INT64_MIN)" instead of this complicated condition? >> If it is really needed for some reason, I think that a comment could help. > > Checking for PG_INT64_MIN only would be fine as well, so let's do so. > I thought honestly that we had better check if the result and the left > argument are not of the same sign, but well. Committed and back-patched to 9.5. Doesn't apply further back. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Yet another rebase, so as to propagate the same special case checks int for / and %. -- Fabien.
On Mon, Feb 1, 2016 at 10:34 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >>> + /* overflow check (needed for INT64_MIN) */ >>> + if (lval != 0 && (*retval < 0 == lval < 0)) >>> >>> Why not use "if (lval == INT64_MIN)" instead of this complicated condition? >>> If it is really needed for some reason, I think that a comment could help. >> >> Checking for PG_INT64_MIN only would be fine as well, so let's do so. >> I thought honestly that we had better check if the result and the left >> argument are not of the same sign, but well. > > Committed and back-patched to 9.5. Doesn't apply further back. OK, here are patches for 9.1~9.4. The main differences are that in 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1 that's int32. In the latter case pgbench blows up the same way with that: \set i -2147483648 \set i :i / -1 select :i; In those patches INT32_MIN/INT64_MIN need to be explicitly set as well at the top of pgbench.c. I thing that's fine. -- Michael
Attachment
On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Feb 1, 2016 at 10:34 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >>>> + /* overflow check (needed for INT64_MIN) */ >>>> + if (lval != 0 && (*retval < 0 == lval < 0)) >>>> >>>> Why not use "if (lval == INT64_MIN)" instead of this complicated condition? >>>> If it is really needed for some reason, I think that a comment could help. >>> >>> Checking for PG_INT64_MIN only would be fine as well, so let's do so. >>> I thought honestly that we had better check if the result and the left >>> argument are not of the same sign, but well. >> >> Committed and back-patched to 9.5. Doesn't apply further back. > > OK, here are patches for 9.1~9.4. The main differences are that in > 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1 > that's int32. In the latter case pgbench blows up the same way with > that: > \set i -2147483648 > \set i :i / -1 > select :i; > In those patches INT32_MIN/INT64_MIN need to be explicitly set as well > at the top of pgbench.c. I thing that's fine. Oh, gosh, I should have said more clearly that I didn't really see a need to fix this all the way back. But I guess we could. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 2, 2016 at 1:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Feb 1, 2016 at 10:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>>> + /* overflow check (needed for INT64_MIN) */
>>>> + if (lval != 0 && (*retval < 0 == lval < 0))
>>>>
>>>> Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
>>>> If it is really needed for some reason, I think that a comment could help.
>>>
>>> Checking for PG_INT64_MIN only would be fine as well, so let's do so.
>>> I thought honestly that we had better check if the result and the left
>>> argument are not of the same sign, but well.
>>
>> Committed and back-patched to 9.5. Doesn't apply further back.
>
> OK, here are patches for 9.1~9.4. The main differences are that in
> 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
> that's int32. In the latter case pgbench blows up the same way with
> that:
> \set i -2147483648
> \set i :i / -1
> select :i;
> In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
> at the top of pgbench.c. I thing that's fine.
Oh, gosh, I should have said more clearly that I didn't really see a
need to fix this all the way back. But I guess we could.
--
Michael
On Tue, Feb 2, 2016 at 1:35 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > And now there are patches. Well, nobody has complained about that until now except me... So we could live without patchingback-branches, but it don't think it hurts much to fix those holes. Meh, s/it don't/I don't/ -- Michael
On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > OK, here are patches for 9.1~9.4. The main differences are that in > 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1 > that's int32. In the latter case pgbench blows up the same way with > that: > \set i -2147483648 > \set i :i / -1 > select :i; > In those patches INT32_MIN/INT64_MIN need to be explicitly set as well > at the top of pgbench.c. I thing that's fine. I thing so too. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Feb 3, 2016 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> OK, here are patches for 9.1~9.4. The main differences are that in >> 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1 >> that's int32. In the latter case pgbench blows up the same way with >> that: >> \set i -2147483648 >> \set i :i / -1 >> select :i; >> In those patches INT32_MIN/INT64_MIN need to be explicitly set as well >> at the top of pgbench.c. I thing that's fine. > > I thing so too. Committed. Thanks for thinging so. -- Michael
Fabien COELHO wrote: > > Hello Michaël, > > v23 attached, which does not change the message but does the other fixes. This doesn't apply anymore -- please rebase and submit to the next CF. I closed it here as returned with feedback. Thanks! -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, >> v23 attached, which does not change the message but does the other fixes. > > This doesn't apply anymore Indeed, but the latest version was really v25. > -- please rebase and submit to the next CF. I already provided it as v25 on Feb 1st. > I closed it here as returned with feedback. I turned it to "moved to next CF" as the patch is already on the queue. -- Fabien.
Fabien COELHO wrote: > > Hello, > > >>v23 attached, which does not change the message but does the other fixes. > > > >This doesn't apply anymore > > Indeed, but the latest version was really v25. > > >-- please rebase and submit to the next CF. > > I already provided it as v25 on Feb 1st. > > >I closed it here as returned with feedback. > > I turned it to "moved to next CF" as the patch is already on the queue. Ah, I didn't see v25 anywhere. What you did should be fine, thanks. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Feb 9, 2016 at 5:06 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Fabien COELHO wrote: >> >> Hello, >> >> >>v23 attached, which does not change the message but does the other fixes. >> > >> >This doesn't apply anymore >> >> Indeed, but the latest version was really v25. >> >> >-- please rebase and submit to the next CF. >> >> I already provided it as v25 on Feb 1st. >> >> >I closed it here as returned with feedback. >> >> I turned it to "moved to next CF" as the patch is already on the queue. > > Ah, I didn't see v25 anywhere. What you did should be fine, thanks. I just had another look at this patch. + <replaceable>parameter</>% of the time. Nitpick: double space here. + switch (func) { [...] + } + default: + return false; } In evalFunc(), the default case in switch for the operator functions should never be reached. Adding for example Assert(0) is something to consider. PGBT_NONE and PGBENCH_NONE are used nowhere. Why not removing them or replace the code paths where they would be used by assertions to trigger errors for future developments? Other than that the patch looks in pretty good shape to me. -- Michael
Hello Michaël, > + <replaceable>parameter</>% of the time. > Nitpick: double space here. Ok. > + default: > + return false; > } > In evalFunc(), the default case in switch for the operator functions > should never be reached. Adding for example Assert(0) is something to > consider. Ok for Assert + a comment. > PGBT_NONE and PGBENCH_NONE are used nowhere. Why not removing them Ok. v26 attached implements these changes. -- Fabien.
On Thu, Feb 11, 2016 at 12:37 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > v26 attached implements these changes. + /* the argument list has been built in reverse order, it is fixed here */ + expr->u.function.args = reverse_elist(args); Hm. I may be missing something, but why is that necessary? This is basically doing a double-reversion to put all the arguments in the correct order when parsing the function arguments. -- Michael
Hello Michaël, > + /* the argument list has been built in reverse order, it is fixed here */ > + expr->u.function.args = reverse_elist(args); > Hm. I may be missing something, but why is that necessary? This is > basically doing a double-reversion to put all the arguments in the > correct order when parsing the function arguments. This is because the expression list is parsed left to right and the list is built as a stack to avoid looking for the last argument to append the next expression, but then the list is in reverse order at the end of parsing, so it is reversed once to make it right. This way the complexity is kept as O(n). If this is too much I can switch to O(n**2) by appending each expression at the end of the list. -- Fabien.
On Fri, Feb 12, 2016 at 2:41 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello Michaël, > >> + /* the argument list has been built in reverse order, it is fixed >> here */ >> + expr->u.function.args = reverse_elist(args); >> Hm. I may be missing something, but why is that necessary? This is >> basically doing a double-reversion to put all the arguments in the >> correct order when parsing the function arguments. > > This is because the expression list is parsed left to right and the list is > built as a stack to avoid looking for the last argument to append the next > expression, but then the list is in reverse order at the end of parsing, so > it is reversed once to make it right. This way the complexity is kept as > O(n). > > If this is too much I can switch to O(n**2) by appending each expression at > the end of the list. (this one has been mentioned by Alvaro offlist) Using a pointer to the tail of the list would make the code simple, and save a couple of lines. Another thing that could be considered is also to move list.c and pg_list.h into src/common and reuse that. There are other frontend utilities that emulate the same kind of facilities, have a look for example at the other copycats in pg_dump and pg_basebackup. -- Michael
Hello Michaël, > Using a pointer to the tail of the list would make the code simple, > and save a couple of lines. I did that, see v27 attached. Note that it does not save any lines, because the reverse function is removed, but then you need another type to keep the head & tail, the link type is not enough, and then you have to manage that stuff in the code. Whether it is "simpler" is debatable. It probably costs more tests when executed. However, it really saves having to answer the question "why is the list reversed?", which is definitely a win from my point of view:-) > Another thing that could be considered is also to move list.c [...] I think that this option is too much bother for the small internal purpose at hand. -- Fabien.
On Fri, Feb 12, 2016 at 5:06 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > I think that this option is too much bother for the small internal purpose > at hand. Yeah, I'm really frustrated with this whole thread. There's been a huge amount of discussion on this patch, but I don't really feel like it's converging towards something I could actually commit. For example, I just realized that this patch allows values to be either a double or an integer and extends the operators to handle double values. But variables can still only be integers. Please name any programming language that has such a restriction. The only ones I can think of are command shells, and those at least flatten everything to string rather than integer so that you can store the value without loss of precision - just with loss of type-safety. I think designing this in this way is quite short-sighted. I don't think variables should be explicitly typed but they should be able to store a value of any type that expression evaluation can generate. Also, as I said back in November, there's really two completely separate enhancements in here. One of them is to support a new data type (doubles) and the other is to support functions. Those should really be separate patches. If neither of you are willing to split this patch, I'm not willing to commit it. Going over half of this patch at a time and getting it committed is going to be a lot of work for me, but I'm willing to do it. I'm not willing to go over the whole thing at once - that's going to take more time than I have, and produce what will in my opinion be an inferior commit history. If somebody else is willing to commit the whole thing as one patch, I'm not going to object, but I won't do it myself. I would not worry too much about the list thing at this point. I'm sure something simple is fine for that. I actually think the patch is now in decent shape as far as code-cleanliness is concerned, but I'm not sure we've really looked hard enough at the design. For example, I find implementing operators as functions in disguise not to be one of PostgreSQL's most awesome design decisions, and here we are copying that into pgbench for, well, no benefit that I can see, really. Maybe it's a good idea and maybe it's a bad idea, but how do we know? That stuff deserves more discussion. Code cleanup is good, so I do think it's good that a lot of effort has been put in there, but I don't think more code cleanup is what's going to get this patch over the finish line. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > For example, I just realized that this patch allows values to be > either a double or an integer and extends the operators to handle > double values. But variables can still only be integers. Indeed. > [...] at least flatten everything to string rather than integer so that > you can store the value without loss of precision - just with loss of > type-safety. I think designing this in this way is quite short-sighted. Note that I'm not responsible for this design, which is preexisting. Extending variables to be able to store doubles could also be done in another patch. > I don't think variables should be explicitly typed but they should be > able to store a value of any type that expression evaluation can > generate. Doubles are not really needed that much, it is just to provide something to random_* functions parameter, otherwise it is useless as far as pgbench is really concerned. > Also, as I said back in November, there's really two completely > separate enhancements in here. One of them is to support a new data > type (doubles) and the other is to support functions. Yep. The first part is precisely the patch I initially submitted 5 CF ago. Then I'm asked to put more things in it to show that it can indeed handle another type. Then I'm told "you should not have done that". What can I say? > Those should really be separate patches. They could. > [...] I find implementing operators as functions in disguise not to be > one of PostgreSQL's most awesome design decisions, and here we are > copying that into pgbench for, well, no benefit that I can see, really. Well, I did that initially, then I was asked to implements operators as functions. It probably saves some lines, so it is not too bad, even if the benefit is limited. > Maybe it's a good idea and maybe it's a bad idea, but how do we know? This is just pgbench, a tool for testing performance by running dummy transactions, not a production thing, so I think that it really does not matter. There is no user visible changes wrt operators. > [...] If neither of you are willing to split this patch, I'm not willing > to commit it. If I'm reading you correctly, you would consider committing it: - if the function & double stuff are separated ? - for the double part, if variables can be double ? -- Fabien.
On Sat, Feb 13, 2016 at 4:37 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> For example, I just realized that this patch allows values to be >> either a double or an integer and extends the operators to handle >> double values. But variables can still only be integers. > > Indeed. That's exactly the first impression I got about this patch when sending [1]. [1]: http://www.postgresql.org/message-id/CAB7nPqRHVFvnrAHaMarOniedeC90pf3-Wn+U5ccB4reD-HtJnw@mail.gmail.com By focusing only on integers the patch would largely gain in simplicity. Now being able to have double values is actually useful for nested function calls, nothing else. >> I don't think variables should be explicitly typed but they should be able >> to store a value of any type that expression evaluation can generate. > > Doubles are not really needed that much, it is just to provide something to > random_* functions parameter, otherwise it is useless as far as pgbench is > really concerned. +1. >> Also, as I said back in November, there's really two completely >> separate enhancements in here. One of them is to support a new data >> type (doubles) and the other is to support functions. > > Yep. The first part is precisely the patch I initially submitted 5 CF ago. > Then I'm asked to put more things in it to show that it can indeed handle > another type. Then I'm told "you should not have done that". What can I say? I think that you did your job here by answering the comments of all the reviewers that had a look at this patch. That's not an easy task. >> [...] I find implementing operators as functions in disguise not to be one >> of PostgreSQL's most awesome design decisions, and here we are copying that >> into pgbench for, well, no benefit that I can see, really. > > Well, I did that initially, then I was asked to implements operators as > functions. It probably saves some lines, so it is not too bad, even if the > benefit is limited. Put the blame on me for this one then. I suggested this idea because Postgres is doing the same, and because it simplifies slightly the union structure in charge of holding the parsed structures, making the code a bit more readable IMO. >> [...] If neither of you are willing to split this patch, I'm not willing >> to commit it. > > If I'm reading you correctly, you would consider committing it: > > - if the function & double stuff are separated ? > - for the double part, if variables can be double ? I just double-checked and could not see a clear use case mentioned in this thread for double return types, so I would suggest focusing on the integer portion with min(), max(), abs(), debug() and the existing functions refactored. That's what your first versions did. If someone is wishing to implement double types, this someone could do it, the infrastructure that this patch puts in place has already proved that it can be easily extended. -- Michael
Hello Michaël, >> - if the function & double stuff are separated ? >> - for the double part, if variables can be double ? > > I just double-checked and could not see a clear use case mentioned in > this thread for double return types, Alas there is one: non uniform random functions which use a double parameter. Once random functions are there, the \setrandom horror code could be removed, which would be a real benefit, IMO:-) So I see a good case to have some support for doubles. > so I would suggest focusing on the integer portion with min(), max(), > abs(), debug() and the existing functions refactored. That's what your > first versions did. If someone is wishing to implement double types, > this someone could do it, the infrastructure that this patch puts in > place has already proved that it can be easily extended. Adding double is not too big a deal. I just stopped at variables because I could not see any realistic use for them. My idea was to postpone that till it is actually needed, "never" being the most probable course. Now if this is a blocker for the committer, then I will probably make the effort whatever I think of the usefulness of the feature. -- Fabien.
> But variables can still only be integers. Version 28 attached has double variables, although typing is based on guessing because it is just a string. Any comment? > [...] two separate enhancements in here. One of them is to support a > new data type (doubles) and the other is to support functions. The two features are highly intermix, so it can only be dependent patches, first to add a function infrastructure and probably some support for doubles altough it would not be used, then to add doubles & their functions. A real pain is the documentation, because it means writing a documentation with only integer functions, then overwriting it with doubles. This is dumb work, really, for the sake of "a cleaner git history", the beauty of it no one will ever contemplate... -- Fabien.
On Sun, Feb 14, 2016 at 12:37 AM, Fabien COELHO wrote: > The two features are highly intermix, so it can only be dependent patches, > first to add a function infrastructure and probably some support for doubles > altough it would not be used, then to add doubles & their functions. > > A real pain is the documentation, because it means writing a documentation > with only integer functions, then overwriting it with doubles. This is dumb > work, really, for the sake of "a cleaner git history", the beauty of it no > one will ever contemplate... FWIW, I care a lot about splitting as much as possible patches where it is possible to have a clean history. So I would be fine to do a portion of the legwork and extract from this patch something smaller that adds only functions as a first step, with the minimum set of functions I mentioned upthread. Robert, Alvaro, Fabien, does that sound fine to you? -- Michael
On Sat, Feb 13, 2016 at 6:19 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sun, Feb 14, 2016 at 12:37 AM, Fabien COELHO wrote: >> The two features are highly intermix, so it can only be dependent patches, >> first to add a function infrastructure and probably some support for doubles >> altough it would not be used, then to add doubles & their functions. >> >> A real pain is the documentation, because it means writing a documentation >> with only integer functions, then overwriting it with doubles. This is dumb >> work, really, for the sake of "a cleaner git history", the beauty of it no >> one will ever contemplate... > > FWIW, I care a lot about splitting as much as possible patches where > it is possible to have a clean history. So I would be fine to do a > portion of the legwork and extract from this patch something smaller > that adds only functions as a first step, with the minimum set of > functions I mentioned upthread. Robert, Alvaro, Fabien, does that > sound fine to you? I'd be delighted. I would really like to get this feature in, but I'm not going to do it if it requires an unreasonable amount of work on my part - and what you propose would help a lot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 13, 2016 at 10:37 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > A real pain is the documentation, because it means writing a documentation > with only integer functions, then overwriting it with doubles. This is dumb > work, really, for the sake of "a cleaner git history", the beauty of it no > one will ever contemplate... You know, you make comments like this pretty much every time anybody suggests that you should change anything in any patch you write. It doesn't matter whether the change is suggested by Heikki, or by Michael, or by Andres, or by me. You pretty much always come back and say something that amounts to "changing the patch I already wrote is a waste of time". That gets a little disheartening after a while. This community's procedure is that patches have to be reviewed and reach consensus in order to get committed, and in my opinion that is generally not "dumb" but rather something that enhances the end result. I value your contributions to the community and I hope you will continue making them, but I don't like it when people call my ideas dumb. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Feb 14, 2016 at 10:05 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Feb 13, 2016 at 6:19 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Sun, Feb 14, 2016 at 12:37 AM, Fabien COELHO wrote: >>> The two features are highly intermix, so it can only be dependent patches, >>> first to add a function infrastructure and probably some support for doubles >>> altough it would not be used, then to add doubles & their functions. >>> >>> A real pain is the documentation, because it means writing a documentation >>> with only integer functions, then overwriting it with doubles. This is dumb >>> work, really, for the sake of "a cleaner git history", the beauty of it no >>> one will ever contemplate... >> >> FWIW, I care a lot about splitting as much as possible patches where >> it is possible to have a clean history. So I would be fine to do a >> portion of the legwork and extract from this patch something smaller >> that adds only functions as a first step, with the minimum set of >> functions I mentioned upthread. Robert, Alvaro, Fabien, does that >> sound fine to you? > > I'd be delighted. I would really like to get this feature in, but I'm > not going to do it if it requires an unreasonable amount of work on my > part - and what you propose would help a lot. OK, I'll see about producing a patch then for this basic infrastructure, with the rest built on top of it as a secondary patch. -- Michael
Hello Robert, > You know, you make comments like this pretty much every time anybody > suggests that you should change anything in any patch you write. Well, not everytime... I'm tired of doing and undoing things: I do a limited A because I'm cautious not to spend too much time that would go down the drain. Then I'm told "do A+B" to show that A is worthwhile. Then I'm told "you should not have done B with A, but submit A for the infrastructure and then B for the additional features", which was precisely my initial intent... So this is really the going forward and backwards because of the process that makes me write these remarks. > [...] I don't like it when people call my ideas dumb. Your idea is not dumb, I'm sorry if I have implied that, and I apologise. I like a clean history as much as everyone else. Having to unmix features and documentations is still "dumb work", even if the purpose is not dumb, I stand by this word for this part. -- Fabien.
> So I would be fine to do a portion of the legwork and extract from this > patch something smaller that adds only functions as a first step, with > the minimum set of functions I mentioned upthread. Robert, Alvaro, > Fabien, does that sound fine to you? Thanks, but this is my c*, I have a few hours of travelling this evening, I'll do it then. I'll be happy if you do the review of the resulting split. -- Fabien.
On Sun, Feb 14, 2016 at 4:42 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> So I would be fine to do a portion of the legwork and extract from this >> patch something smaller that adds only functions as a first step, with the >> minimum set of functions I mentioned upthread. Robert, Alvaro, Fabien, does >> that sound fine to you? > > > Thanks, but this is my c*, I have a few hours of travelling this evening, > I'll do it then. > > I'll be happy if you do the review of the resulting split. OK, I am fine with this scenario as well. I have luckily done nothing yet. -- Michael
Hello Michaël, >> I'll be happy if you do the review of the resulting split. > > OK, I am fine with this scenario as well. I have luckily done nothing yet. Here is a 3 part v29: a) add support for integer functions in pgbench, including turning operators as functions, as well as some minimal infrastructurefor additional types (this allows to minimize the diffs with the next patch which adds double). b) add support for doubles, including setting double variables. Note that variable are not explicitely typed. Add some double-related functions, most interesting of them for pgbench are the randoms. c) remove \setrandom support (as thanks to part b \set x random(...) does the same). Prints an error pointing to the replacementif \setrandom is used in a pgbench script. -- Fabien.
On Sun, Feb 14, 2016 at 11:28 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Here is a 3 part v29: > > a) add support for integer functions in pgbench, including turning > operators as functions, as well as some minimal infrastructure for > additional types (this allows to minimize the diffs with the next > patch which adds double). > > b) add support for doubles, including setting double variables. > Note that variable are not explicitely typed. Add some > double-related functions, most interesting of them for pgbench > are the randoms. > > c) remove \setrandom support (as thanks to part b \set x random(...) does > the same). Prints an error pointing to the replacement if \setrandom is > used in a pgbench script. Thanks, I'll review these as soon as I can get to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Feb 15, 2016 at 5:21 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Feb 14, 2016 at 11:28 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> Here is a 3 part v29: >> >> a) add support for integer functions in pgbench, including turning >> operators as functions, as well as some minimal infrastructure for >> additional types (this allows to minimize the diffs with the next >> patch which adds double). >> >> b) add support for doubles, including setting double variables. >> Note that variable are not explicitely typed. Add some >> double-related functions, most interesting of them for pgbench >> are the randoms. >> >> c) remove \setrandom support (as thanks to part b \set x random(...) does >> the same). Prints an error pointing to the replacement if \setrandom is >> used in a pgbench script. > > Thanks, I'll review these as soon as I can get to it. I got around to look at (a) in this set. + if ((PGBENCH_FUNCTIONS[fnumber].nargs >= 0 && + PGBENCH_FUNCTIONS[fnumber].nargs != elist_length(args)) || + /* check at least one arg for min & max */ + (PGBENCH_FUNCTIONS[fnumber].nargs == -1 && + elist_length(args) == 0)) + expr_yyerror_more("unexpected number of arguments", + PGBENCH_FUNCTIONS[fnumber].fname); We could split that into two parts, each one with a more precise error message: - For functions that expect at least one argument: "at least one argument was expected, none found". - For functions that expect N arguments: "N arguments expected, but M found" + "\\set aid random(1, " CppAsString2(naccounts) " * :scale)\n" "SELECT abalance FROM pgbench_accounts WHEREaid = :aid;\n" }}; -/* Function prototypes */ Noise. + * Recursive evaluation of int or double expressions + * + * Note that currently only integer variables are available, with values + * stored as text. This comment is incorrect, we only care about integers in this patch. Taking patch 1 as a completely independent thing, there is no need to introduce PgBenchValueType yet. Similar remark for setIntValue and coerceToInt. They are useful afterwards when introducing double types to be able to handle double input parameters for the gaussian and other functions. - /* - * INT64_MIN / -1 is problematic, since the result - * can't be represented on a two's-complement machine. - * Some machines produce INT64_MIN, some produce zero, - * some throw an exception. We can dodge the problem - * by recognizing that division by -1 is the same as - * negation. - */ - if (rval == -1) + if (coerceToInt(&rval) == -1) { - *retval = -lval; - - /* overflow check (needed for INT64_MIN) */ - if (lval == PG_INT64_MIN) - { - fprintf(stderr, "bigint out of range\n"); - return false; - } + setIntValue(retval, 0); + return true; } (INT64_MIN / -1) should error. (INT64_MIN % -1) should result in 0. This is missing the division handling. -- Michael
Hello Michaël, > + * Recursive evaluation of int or double expressions > + * > + * Note that currently only integer variables are available, with values > + * stored as text. > This comment is incorrect, we only care about integers in this patch. Indeed! > Taking patch 1 as a completely independent thing, there is no need to > introduce PgBenchValueType yet. Similar remark for setIntValue and > coerceToInt. They are useful afterwards when introducing double types to > be able to handle double input parameters for the gaussian and other > functions. Yes. This is exactly the pain I'm trying to avoid, creating a different implementation for the first patch, which is just overriden when the second part is applied... So I'm trying to compromise, having a several part patch *but* having the infrastructure ready for the second patch which adds the double type. Note that the first patch without the second is a loss of time for everyone, as the nearly only useful functions are the randoms, which require a double argument, so it does not make sense to apply the first one if the second one is not to be applied, I think. > [...] > (INT64_MIN / -1) should error. > (INT64_MIN % -1) should result in 0. > This is missing the division handling. Oops, indeed I totally messed up when merging the handling of / and %:-( I have found another issue in the (a) patch: the internal scripts were using the future random function which do not yet exist, as they are in patch (b). Here is a three part v30, which still includes the infrastructure for future types in the a patch, see my argumentation above. -- Fabien.
On Mon, Feb 15, 2016 at 4:58 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Indeed! > >> Taking patch 1 as a completely independent thing, there is no need to >> introduce PgBenchValueType yet. Similar remark for setIntValue and >> coerceToInt. They are useful afterwards when introducing double types to be >> able to handle double input parameters for the gaussian and other functions. > > Yes. This is exactly the pain I'm trying to avoid, creating a different > implementation for the first patch, which is just overriden when the second > part is applied... Splitting a patch means breaking it into independently committable sub-patches, not just throwing each line of the diff into a different pile as best you can. I'm with Michael: that part doesn't belong in this patch. If we want to have an infrastructure refactoring patch that just replaces every relevant use of int64 with PgBenchValue, a union supporting only integer values, then we can do that first and have a later patch introduce double as a separate change. But we can't have things that are logically part of patch #2 just tossed in with patch #1. I was in the middle of ripping that out of the patch when I realized that evalFunc() is pretty badly designed. What you've done is made it the job of each particular function to evaluate its arguments. I don't think that's a good plan. I think that when we discover that we've got a function, we should evaluate all of the arguments that were passed to it using common code that is shared across all types of functions and operators. Then, the evaluated arguments should be passed to the function-specific code, which can do as it likes with them. This way, you have less code that is specific to particular operations and more common code, which is generally a good thing. Every expression evaluation engine that I've ever heard of works this way - see, for example, the PostgreSQL backend. I experimented with trying to do this and ran into a problem: where exactly would you store the evaluated arguments when you don't know how many of them there will be? And even if you did know how many of them there will be, wouldn't that mean that evalFunc or evaluateExpr would have to palloc a buffer of the correct size for each invocation? That's far more heavyweight than the current implementation, and minimizing CPU usage inside pgbench is a concern. It would be interesting to do some pgbench runs with this patch, or the final patch, and see what effect it has on the TPS numbers, if any, and I think we should. But the first concern is to minimize any negative impact, so let's talk about how to do that. I think we should limit the number of arguments to a function to, say, 16, so that an array of int64s or PgBenchValues long enough to hold an entire argument list can be stack-allocated. The backend's limit is higher, but the only reason we need a value higher than 2 here is because you've chosen to introduce variable-argument functions; but I think 16 is enough for any practical purpose. Then, I think we should also change the parse tree representation so that transforms the linked-list into an array stored within the PgBenchExpr, so that you can access the expression for argument i as expr->u.arg[i]. Then we can write this is a loop that evaluates each expression in an array of expressions and stores the result in an array of values. That seems like it would be both cleaner and faster than what you've got here right now. It's also more similar to what you did with the function name itself: the most trivial thing the parser could do is store a pointer to the function or operator name, but that would be slow, so instead it looks up the function and stores a constant. I also went over your documentation changes. I think you inserted the new table smack dab in the middle of a section in a place that it doesn't really belong. The version attached makes it into its own <refsect2>, puts it in a new section a bit further down so that it doesn't break up the flow, and has a few other tweaks that I think are improvements. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Tue, Feb 16, 2016 at 9:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I experimented with trying to do this and ran into a problem: where > exactly would you store the evaluated arguments when you don't know > how many of them there will be? And even if you did know how many of > them there will be, wouldn't that mean that evalFunc or evaluateExpr > would have to palloc a buffer of the correct size for each invocation? > That's far more heavyweight than the current implementation, and > minimizing CPU usage inside pgbench is a concern. It would be > interesting to do some pgbench runs with this patch, or the final > patch, and see what effect it has on the TPS numbers, if any, and I > think we should. But the first concern is to minimize any negative > impact, so let's talk about how to do that. Good point. One simple idea here would be to use a custom pgbench script that has no SQL commands and just calculates the values of some parameters to measure the impact without depending on the backend, with a fixed number of transactions. -- Michael
On Tue, Feb 16, 2016 at 1:55 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Feb 16, 2016 at 9:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I experimented with trying to do this and ran into a problem: where >> exactly would you store the evaluated arguments when you don't know >> how many of them there will be? And even if you did know how many of >> them there will be, wouldn't that mean that evalFunc or evaluateExpr >> would have to palloc a buffer of the correct size for each invocation? >> That's far more heavyweight than the current implementation, and >> minimizing CPU usage inside pgbench is a concern. It would be >> interesting to do some pgbench runs with this patch, or the final >> patch, and see what effect it has on the TPS numbers, if any, and I >> think we should. But the first concern is to minimize any negative >> impact, so let's talk about how to do that. > > Good point. One simple idea here would be to use a custom pgbench > script that has no SQL commands and just calculates the values of some > parameters to measure the impact without depending on the backend, > with a fixed number of transactions. Sure, we could do that. But whether it materially changes pgbench -S results, say, is a lot more important. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > [...] But we can't have things that are logically part of patch #2 just > tossed in with patch #1. So you want integer functions without type in patch #1. > I was in the middle of ripping that out of the patch when I realized > that evalFunc() is pretty badly designed. Probably, we are just at v30:-) > What you've done is made it the job of each particular function to > evaluate its arguments. Yep. I did that for the multiple issues you raise below, and some you did not yet foresee: handling a variable number of arguments (0, 1, 2, 3, n), avoiding dynamic structures or arbitrary limitations, checking for a valid number of arguments, and also the fact that the evaluation call was not too horrible (2 lines per evaluation, factored out by groups of functions [operators, min/max, randoms, ...], it is not fully repeated). There are 5 sub-expression evaluation in the function, totalling 10 LOCs. TEN. > I don't think that's a good plan. The one you suggest does not strike me as intrinsically better: it is a trade between some code ugliness (5 repeated evals = 10 lines, a little more with the double functions, probably 20 lines) to other uglinesses (number of args limit or dynamic allocation, array length to manage and test somehow, list to array conversion code, things that will mean far more than the few lines of repeated code under discussion). So I think that it is just a choice between two plans, really, the better of which is debatable. > I experimented with trying to do this and ran into a problem: Yep. There are other problems, all of which solvable obviously, but which means that a lot more that 10 lines will be added to avoid the 5*2 lines repetition. My opinion is that the submitted version is simple and fine for the purpose, and the plan you suggest replaces 5*2 repetitions by a lot of code and complexity, which is not desirable and should be avoided. However, for obvious reasons the committer opinion prevails:-) After considering the various points I raised above, could you confirm that you do still require the implementation of this array approach before I spend time doing such a thing? > I also went over your documentation changes. Thanks, this looks better. -- Fabien.
Hello Robert, >> Good point. One simple idea here would be to use a custom pgbench >> script that has no SQL commands and just calculates the values of some >> parameters to measure the impact without depending on the backend, >> with a fixed number of transactions. > > Sure, we could do that. But whether it materially changes pgbench -S > results, say, is a lot more important. Indeed. Several runs on my laptop: ~ 400000-540000 tps with master using: \set naccounts 100000 * :scale \setrandom aid 1 :naccounts ~ 430000-530000 tps with full function patch using: \set naccounts 100000 * :scale \setrandom aid 1 :naccounts ~ 730000-890000 tps with full function patch using: \set aid random(1, 100000 * :scale) The performance is pretty similar on the same script. The real pain is variable management, avoiding some is a win. However, as you suggest, the tps impact even with -M prepared -S is nought, because the internal scripting time in pgbench is much smaller than the time to do actual connecting and querying. -- Fabien.
On Tue, Feb 16, 2016 at 3:53 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > My opinion is that the submitted version is simple and fine for the purpose, > and the plan you suggest replaces 5*2 repetitions by a lot of code and > complexity, which is not desirable and should be avoided. > > However, for obvious reasons the committer opinion prevails:-) You're welcome to solicit other opinions. I'm not unwilling to give way if there's a chorus of support for the way you've done it. But to me it sounds like you're saying that it doesn't really matter whether the system is scalable and maintainable because we only have 5 functions right now, which is a design philosophy with which I just don't agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 16, 2016 at 5:18 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >>> Good point. One simple idea here would be to use a custom pgbench >>> script that has no SQL commands and just calculates the values of some >>> parameters to measure the impact without depending on the backend, >>> with a fixed number of transactions. >> >> Sure, we could do that. But whether it materially changes pgbench -S >> results, say, is a lot more important. > > > Indeed. Several runs on my laptop: > > ~ 400000-540000 tps with master using: > \set naccounts 100000 * :scale > \setrandom aid 1 :naccounts > > ~ 430000-530000 tps with full function patch using: > \set naccounts 100000 * :scale > \setrandom aid 1 :naccounts > > ~ 730000-890000 tps with full function patch using: > \set aid random(1, 100000 * :scale) > > The performance is pretty similar on the same script. The real pain is > variable management, avoiding some is a win. Wow, that's pretty nice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, >> However, for obvious reasons the committer opinion prevails:-) > > You're welcome to solicit other opinions. I'm not unwilling to give > way if there's a chorus of support for the way you've done it. Hmmm. I do not expect much chorus on such a minor subject:-) > But to me it sounds like you're saying that it doesn't really matter > whether the system is scalable and maintainable because we only have 5 > functions right now, which is a design philosophy with which I just > don't agree. The design does not aim at scalability but at simplicity, otherwise I would have done things quite differently: with many functions the "switch" based selection does not scale anyway. Anyway, attached are two patches, one for each approach. The array (second patch) is not too bad if one agrees with a maximum number of arguments, and also as I did not change the list structure coming from the parser, so it does not need to manage the number of arguments in the function structure. The code for min/max is also simpler when dealing with an array instead of a linked list. I do not like much array references in the code, so I tried to avoid them by using lval/rval scalars for operator operands. Please choose the one you prefer, and I'll adapt the remaining stuff to your choice. -- Fabien.
On Tue, Feb 16, 2016 at 11:12 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello Robert, > >>> However, for obvious reasons the committer opinion prevails:-) >> >> >> You're welcome to solicit other opinions. I'm not unwilling to give >> way if there's a chorus of support for the way you've done it. > > > Hmmm. I do not expect much chorus on such a minor subject:-) > >> But to me it sounds like you're saying that it doesn't really matter >> whether the system is scalable and maintainable because we only have 5 >> functions right now, which is a design philosophy with which I just don't >> agree. > > > The design does not aim at scalability but at simplicity, otherwise I would > have done things quite differently: with many functions the "switch" based > selection does not scale anyway. > > Anyway, attached are two patches, one for each approach. > > The array (second patch) is not too bad if one agrees with a maximum number > of arguments, and also as I did not change the list structure coming from > the parser, so it does not need to manage the number of arguments in the > function structure. The code for min/max is also simpler when dealing with > an array instead of a linked list. I do not like much array references in > the code, so I tried to avoid them by using lval/rval scalars for operator > operands. > > Please choose the one you prefer, and I'll adapt the remaining stuff to your > choice. For those two patches and HEAD, it is possible to do a direct performance comparison for simple operator functions, as those are being switched to behave as functions. So doing the following for 50M "transactions" on my laptop: \set aid 1 + 1 pgbench -f addition.sql -t 50000000 I have the following: HEAD: 3.5~3.7M TPS list method: 3.6~3.7M TPS array method: 3.4~3.5M TPS So all approaches have a comparable performance. Btw, patch 2 is returning a warning for me: pgbench.c:1060:16: warning: comparison of constant -9223372036854775808 with expression of type 'int' is always false [-Wtautological-constant-out-of-range-compare] if (lval == PG_INT64_MIN) It is trying to compare a 32b integer with an int64 value, evalFunc needed an int64. -- Michael
Hello Michaël, > \set aid 1 + 1 > pgbench -f addition.sql -t 50000000 > > I have the following: > HEAD: 3.5~3.7M TPS > list method: 3.6~3.7M TPS > array method: 3.4~3.5M TPS > So all approaches have a comparable performance. Yep, the execution trace is pretty similar in all cases, maybe with a little more work for the array method, although I'm surprise that the difference is discernable. > Btw, patch 2 is returning a warning for me: > It is trying to compare a 32b integer with an int64 value, evalFunc > needed an int64. Indeed. My gcc 4.8.4 with --Wall does not show the warning, too bad. Attached is the fixed patch for the array method. -- Fabien
On Wed, Feb 17, 2016 at 5:22 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> \set aid 1 + 1 >> pgbench -f addition.sql -t 50000000 >> >> I have the following: >> HEAD: 3.5~3.7M TPS >> list method: 3.6~3.7M TPS >> array method: 3.4~3.5M TPS >> So all approaches have a comparable performance. > > Yep, the execution trace is pretty similar in all cases, maybe with a little > more work for the array method, although I'm surprise that the difference is > discernable. Nah, that's mostly noise I think. My conclusion is that all approaches are rather equivalent for the operators. -- Michael
On Wed, Feb 17, 2016 at 3:22 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Indeed. My gcc 4.8.4 with --Wall does not show the warning, too bad. > > Attached is the fixed patch for the array method. Committed with a few tweaks, including running pgindent over some of it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 2, 2016 at 3:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Feb 17, 2016 at 3:22 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> Indeed. My gcc 4.8.4 with --Wall does not show the warning, too bad. >> >> Attached is the fixed patch for the array method. > > Committed with a few tweaks, including running pgindent over some of it. Thanks. So the first set of functions is in, and the operators are executed as functions as well. Fabien, are you planning to send rebased versions of the rest? By that I mean the switch of the existing subcommands into equivalent functions and the handling of double values as parameters for those functions. -- Michael
> Committed with a few tweaks, including running pgindent over some of it. Thanks! -- Fabien.
>>> Attached is the fixed patch for the array method. >> >> Committed with a few tweaks, including running pgindent over some of it. > > Thanks. So the first set of functions is in, and the operators are > executed as functions as well. Fabien, are you planning to send > rebased versions of the rest? By that I mean the switch of the > existing subcommands into equivalent functions and the handling of > double values as parameters for those functions. Here they are: - 32-b: add double functions, including double variables - 32-c: remove \setrandom support (advice to use\set + random instead) -- Fabien.
> - 32-b: add double functions, including double variables > - 32-c: remove \setrandom support (advice to use \set + random instead) Here is a rebased version after Tom's updates, 33-b & 33-c. I also extended the floating point syntax to signed accept signed exponents, and changed the regexpr style to match Toms changes. -- Fabien.
On Mon, Mar 7, 2016 at 4:58 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> - 32-b: add double functions, including double variables >> - 32-c: remove \setrandom support (advice to use \set + random instead) > > Here is a rebased version after Tom's updates, 33-b & 33-c. I also extended > the floating point syntax to signed accept signed exponents, and changed the > regexpr style to match Toms changes. Having a look at 33-b, this looks pretty good now, but: // comments are not allowed. I'd just remove the two you have. It make no sense to exit(1) and then return 0, so don't do that. I might write this code as: if (pval->type == PGBT_INT) return pval->u.ival; Assert(pval->type == PGBT_DOUBLE); /* do double stuff */ This would get rid of the internal-error case here altogether in favor of testing it via an assertion. I think that coerceToInt() should not exit(1) when an overflow occurs; instead, I think the function should be declared as bool coerceToInt(PgBenchValue *pval, int64 *result), and the error return should be propagated back to the caller, which can then return false as we do for other error cases. I think that some of the places you've used coerceToInt() are not appropriate. Like, for example: + if (coerceToInt(rval) == 0) + { + fprintf(stderr, "division by zero\n"); + return false; + } Now, if rval is out of range of an integer, that is going to overflow while trying to see whether it should divide by zero. Please work a little harder here and in similar cases. Maybe add a helper function checkIntegerEquality(PgBenchValue *, int64). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > Having a look at 33-b, this looks pretty good now, but: > > // comments are not allowed. I'd just remove the two you have. Oops, C89 did not make it everywhere yet! > It make no sense to exit(1) and then return 0, so don't do that. [...] > This would get rid of the internal-error case here altogether in favor > of testing it via an assertion. Ok. Fine with me. > I think that coerceToInt() should not exit(1) when an overflow occurs; > instead, I think the function should be declared as bool > coerceToInt(PgBenchValue *pval, int64 *result), and the error return > should be propagated back to the caller, which can then return false > as we do for other error cases. This is the pain (ugly repeated code) that I wish to avoid, because then we cannot write a simple addition for doing an addition, but have to do several ugly tests instead. Ok, elegance is probably not a sufficient argument against doing that. Moreover, I do not see any condition in which doing what you suggest makes much sense from the user perspective: if there is an internal error in the bench code it seems much more logical to ask for the user for a sensible bench script, because I would not know how to interpret tps on a bench with internal failures in the client code anyway. For me, exiting immediatly on internal script errors is the right action. If this is a blocker I'll do them, but I'm convince it is not what should be done. > I think that some of the places you've used coerceToInt() are not > appropriate. Like, for example: > > + if > (coerceToInt(rval) == 0) > + > fprintf(stderr, "division by zero\n"); > + return false; > + } > > Now, if rval is out of range of an integer, that is going to overflow > while trying to see whether it should divide by zero. Please work a > little harder here and in similar cases. Ok. > Maybe add a helper function > checkIntegerEquality(PgBenchValue *, int64). Maybe. -- Fabien.
On Tue, Mar 8, 2016 at 11:48 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > If this is a blocker I'll do them, but I'm convince it is not what should be > done. Well, I think it's a blocker. Exiting within deeply-nested code instead of propagating the error upward does not strike me as a good practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert. Here is a v34 b & c. > // comments are not allowed. I'd just remove the two you have. Back to the eighties! > It make no sense to exit(1) and then return 0, so don't do that. I > might write this code as: > This would get rid of the internal-error case here altogether in favor > of testing it via an assertion. I've put assertions instead of exit in some places. > I think that coerceToInt() should not exit(1) when an overflow occurs; I think that it should, because the only sane option for the user is to fix the script and relaunch the bench: counting errors has no added value for the user. The attached version does some error handling instead, too bad. > Now, if rval is out of range of an integer, that is going to overflow > while trying to see whether it should divide by zero. I could not find a place where there where such potential issue. If the value is zero, it cannot overflow when cast to int. If it is not zero but it overflows, then it is an overflow, so it should overflow. Maybe I misunderstood your point. -- Fabien.
On Tue, Mar 8, 2016 at 3:52 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > [ new patch and assorted color commentary ] This is not acceptable: + /* guess double type (n for "inf", "-inf" and "nan") */ + if (strchr(var, '.') != NULL || strchr(var, 'n') != NULL) + { + double dv; + sscanf(var, "%lf", &dv); + setDoubleValue(retval, dv); + } + else + setIntValue(retval, strtoint64(var)); That totally breaks the error handling which someone carefully established here. + PgBenchValue *varg = & vargs[0]; Extra space. + if (!coerceToDouble(st, & vargs[0], &dval)) + return false; Another extra space. - int nargs = 0; - int64 iargs[MAX_FARGS]; - PgBenchExprLink *l = args; + int nargs = 0; + PgBenchValue vargs[MAX_FARGS]; + PgBenchExprLink *l = args; Completely unnecessary reindentation of the first and third lines. + setIntValue(retval, i < 0? -i: i); Not project style. Please fix the whitespace damage git diff --check complains about, and check for other places where you haven't followed project style. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, Here is a v35 b & c. > This is not acceptable: > > + /* guess double type (n for "inf", > "-inf" and "nan") */ > + if (strchr(var, '.') != NULL || > strchr(var, 'n') != NULL) > + { > + double dv; > + sscanf(var, "%lf", &dv); > + setDoubleValue(retval, dv); > + } > + else > + setIntValue(retval, strtoint64(var)); > > That totally breaks the error handling which someone carefully established here. I'm not sure what is "not acceptable" as it "totally breaks the error handling" in the above code. I assumed that you want to check that sscanf can read what sprintf generated when handling "\set". I'd guess that libc would be broken if it was the case. I've made pgbench check that it is not. If it was something else, you have to spell it out for me. > Extra space. [...] Another extra space. Indeed. > - int nargs = 0; > - int64 iargs[MAX_FARGS]; > - PgBenchExprLink *l = args; > + int nargs = 0; > + PgBenchValue vargs[MAX_FARGS]; > + PgBenchExprLink *l = args; > > Completely unnecessary reindentation of the first and third lines. It just looked better to me. > + setIntValue(retval, i < 0? -i: i); > > Not project style. Indeed. > Please fix the whitespace damage git diff --check complains about, "git diff -check" does not seem to complain on the full v35-b. > and check for other places where you haven't followed project style. I've found some more instances of "& ref". -- Fabien.
On Wed, Mar 9, 2016 at 3:09 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > I'm not sure what is "not acceptable" as it "totally breaks the error > handling" in the above code. > > I assumed that you want to check that sscanf can read what sprintf generated > when handling "\set". I'd guess that libc would be broken if it was the > case. I've made pgbench check that it is not. If the variable contains something that is not an integer, the existing code will end up here: fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str); With your patch, you get different behavior depending on exactly how the input is malformed. I have a strong suspicion you're going to tell me that this is another one of those cases where it's OK to make the error handling worse than it is today, but I'm tired of arguing that point. I'm not going to commit it this way, and frankly, neither is anyone else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > [...] With your patch, you get different behavior depending on exactly > how the input is malformed. I understand that you require only one possible error message on malformed input, instead of failing when converting to double if the input looked like a double (there was a '.' clue int it, but that is not proof), or something else if it was assumed to be an int. So I'm going to assume that you do not like the type guessing. > I'm not going to commit it this way, and frankly, neither is anyone > else. Hmmm. Probably my unconcious self is trying to reach 42. Here is a v36 which inspect very carefully the string to decide whether it is an int or a double. You may, or may not, find it to your taste, I can't say. -- Fabien.
> Here is a v36 which inspect very carefully the string to decide whether it is > an int or a double. You may, or may not, find it to your taste, I can't say. Here is a v37 which is mostly a rebase after recent changes. Also I noticed that I was double counting errors in the previous version, so I fixed it. -- Fabien.
v38 is a simple rebase, trying to keep up-to-date with Tom's work. -- Fabien.
> v38 is a simple rebase, trying to keep up-to-date with Tom's work. v39 is yet another rebase: 42 is in sight! -- Fabien.
On Mon, Mar 21, 2016 at 6:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> v38 is a simple rebase, trying to keep up-to-date with Tom's work. > > > v39 is yet another rebase: 42 is in sight! What's up with v42? Is that your personal record? -- Michael
Bonjour Michaël, >> v39 is yet another rebase: 42 is in sight! > > What's up with v42? Is that your personal record? It is just a (geek) joke, see: https://en.wikipedia.org/wiki/42_%28number%29#Hitchhiker.27s_Guide_to_the_Galaxy It is the answer to the "Ultimate Question of Life, The Universe and Everything", computed by the supercomputer "Deep thought" in 7.5 million years. -- Fabien.
v40 is yet another rebase. -- Fabien.
On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > v40 is yet another rebase. Thanks. Committed after removing an unnecessary parameter from the coerceToXXX() functions. I guess the question here is whether we want the part-c patch, which removes the historical \setrandom syntax. That's technically no longer needed since we now can use random() as a function directly inside \set commands, but we might want it anyway for backward compatibility. Anybody have an opinion on that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 29, 2016 at 9:54 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> v40 is yet another rebase. > > Thanks. Committed after removing an unnecessary parameter from the > coerceToXXX() functions. > > I guess the question here is whether we want the part-c patch, which > removes the historical \setrandom syntax. That's technically no > longer needed since we now can use random() as a function directly > inside \set commands, but we might want it anyway for backward > compatibility. > > Anybody have an opinion on that? +1 for nuking it. That's not worth the trouble maintaining it. -- Michael
* Michael Paquier (michael.paquier@gmail.com) wrote: > On Tue, Mar 29, 2016 at 9:54 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> v40 is yet another rebase. > > > > Thanks. Committed after removing an unnecessary parameter from the > > coerceToXXX() functions. > > > > I guess the question here is whether we want the part-c patch, which > > removes the historical \setrandom syntax. That's technically no > > longer needed since we now can use random() as a function directly > > inside \set commands, but we might want it anyway for backward > > compatibility. > > > > Anybody have an opinion on that? > > +1 for nuking it. That's not worth the trouble maintaining it. If we don't nuke it, it'll never die. See also: pg_shadow Thanks! Stephen
>> I guess the question here is whether we want the part-c patch, which >> removes the historical \setrandom syntax. That's technically no >> longer needed since we now can use random() as a function directly >> inside \set commands, but we might want it anyway for backward >> compatibility. This patch is indeed a proposal. >> Anybody have an opinion on that? > > +1 for nuking it. That's not worth the trouble maintaining it. I share Michaël opinion. Some arguments for removing it: - \setrandom is syntactically inhomogeneous in the overall syntax, and is now redundant - switching to the \set syntax is pretty easy, see attached script - custom scripts are short, they are used by few but advance users, for which updating would not be an issue - the parsing & execution codes are lengthy, repetitive... -- Fabien.
On Mon, Mar 28, 2016 at 9:36 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Michael Paquier (michael.paquier@gmail.com) wrote: >> On Tue, Mar 29, 2016 at 9:54 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> > On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> >> v40 is yet another rebase. >> > >> > Thanks. Committed after removing an unnecessary parameter from the >> > coerceToXXX() functions. >> > >> > I guess the question here is whether we want the part-c patch, which >> > removes the historical \setrandom syntax. That's technically no >> > longer needed since we now can use random() as a function directly >> > inside \set commands, but we might want it anyway for backward >> > compatibility. >> > >> > Anybody have an opinion on that? >> >> +1 for nuking it. That's not worth the trouble maintaining it. > > If we don't nuke it, it'll never die. > > See also: pg_shadow Hearing no objections, BOOM. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, >> If we don't nuke it, it'll never die. > > Hearing no objections, BOOM. FIZZ! :-) Thanks for the commits, and apology for the portability bugs. -- Fabien.
On Wed, Mar 30, 2016 at 1:23 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello Robert, > >>> If we don't nuke it, it'll never die. >> >> >> Hearing no objections, BOOM. > > > FIZZ! :-) > > Thanks for the commits, and apology for the portability bugs. Thanks for the additions, Fabien. This has been a long trip. -- Michael