Thread: extend pgbench expressions with functions

extend pgbench expressions with functions

From
Fabien COELHO
Date:
This patch extends pgbench expression with functions. Currently only one 
"abs" function is added. The point is rather to bootstrap the 
infrastructure for other functions (such as hash, random variants...) to 
be added later.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> This patch extends pgbench expression with functions. Currently only one 
> "abs" function is added. The point is rather to bootstrap the infrastructure 
> for other functions (such as hash, random variants...) to be added later.

Here is a small v2 update: - with better error messages on non existing functions - a minimal documentation

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> Here is a small v2 update:

v3, just a rebase.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Sun, Mar 29, 2015 at 2:20 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> Here is a small v2 update:
>
> v3, just a rebase.

Thanks for working on this.  I see it's already registered in the
2015-06 CF, and will have a look at when we get there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> v3, just a rebase.
>
> Thanks for working on this.  I see it's already registered in the
> 2015-06 CF, and will have a look at when we get there.

v4, rebase (after pgbench moved to src) and use of the recently added 
syntax_error function.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> v4, rebase (after pgbench moved to src) and use of the recently added 
> syntax_error function.

v5 which adds a forgotten header declaration for a new function, so as to 
avoid a warning.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> v4, rebase (after pgbench moved to src) and use of the recently added 
>> syntax_error function.
>
> v5 which adds a forgotten header declaration for a new function, so as to 
> avoid a warning.

While playing around for adding other functions I noticed that with the 
parsing rules I set unknown function were detected quite late in the 
process, so that the error messages was imprecise.

v6 fixes this by checking function names ealier.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
v7: rebase after pgindent run for 9.6.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Heikki Linnakangas
Date:
On 03/06/2015 11:41 AM, Fabien COELHO wrote:
> This patch extends pgbench expression with functions. Currently only one
> "abs" function is added. The point is rather to bootstrap the
> infrastructure for other functions (such as hash, random variants...) to
> be added later.

I think it would actually be good to add at least some of those other 
functions in the initial patch. The infrastructure that this patch adds 
only supports arguments with a single argument, so it won't get us very 
far. Also, will we need non-integer (e.g. string, numeric, whatever) 
arguments for the functions? How about other datatypes for variables in 
general? Perhaps not, or if we do that can be a separate patch, but it's 
something to keep in mind. The pgbench script language is evolving into 
a full-blown Turing-complete programming language...

As an initial list of functions, I'd suggest:

abs(x)
min(x, y, ...)
max(x, y, ...)
random_uniform(min, max)
random_gaussian(min, max, threshold)
random_exponential(min, max, threshold)

Would that be enough to specify e.g. the

As soon as we add more functions, the way they are documented needs to 
be reworked too; we'll need to add a table in the manual to list them.

- Heikki




Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Heikki,

>> This patch extends pgbench expression with functions. Currently only one
>> "abs" function is added. The point is rather to bootstrap the
>> infrastructure for other functions (such as hash, random variants...) to
>> be added later.
>
> I think it would actually be good to add at least some of those other 
> functions in the initial patch.

Hmmm, sure. I wanted some feedback on the "how" before doing that, hence 
the infrastructure patch submitted with just one fonction. Obviously I can 
expand, but before that any opinion on the "how"?

For instance I decided against having individual functions recognized by 
the lexer, and to really have an infrastructure for storing, checking and 
adding them without lex/yacc.

> The infrastructure that this patch adds only supports arguments with a 
> single argument, so it won't get us very far.

> Also, will we need non-integer (e.g. string, numeric, whatever) 
> arguments for the functions?

Maybe float *constants* for random exponential & gaussian.

> How about other datatypes for variables in general?

The point is not to develop another full language in pgbench. People can 
do things with PL/pgSQL & server side if it must be really advanced, the 
point is really to facilitate pgbench "simple" scripts.

> Perhaps not, or if we do that can be a separate patch, but it's 
> something to keep in mind. The pgbench script language is evolving into 
> a full-blown Turing-complete programming language...

The point is *NOT* to do that. For Turing, basically you would need while 
or recursion & condition. Currently there is no such thing and not plan 
for such thing, and I do not think it is desirable.

> As an initial list of functions, I'd suggest:
>
> abs(x)

This is the one included in the patch.

> min(x, y, ...)
> max(x, y, ...)

Hmm, varargs...

> random_uniform(min, max)

Ok. probably just "random"?

> random_gaussian(min, max, threshold)
> random_exponential(min, max, threshold)

Hmm, threshold is a float.

> As soon as we add more functions, the way they are documented needs to 
> be reworked too; we'll need to add a table in the manual to list them.

Yep.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Heikki,

> As an initial list of functions, I'd suggest:
>
> abs(x)
> min(x, y, ...)
> max(x, y, ...)
> random_uniform(min, max)
> random_gaussian(min, max, threshold)
> random_exponential(min, max, threshold)
>
> Would that be enough to specify e.g. the
>
> As soon as we add more functions, the way they are documented needs to be 
> reworked too; we'll need to add a table in the manual to list them.

Here is a v8 with "abs", "min", "max", "random", "gaussian" et
"exponential".

There is no expression typing, but expressions are evaluated and cast to 
the expected type depending on what is needed (variables expect an int 
value, function arguments are usually int but for threshold...).

There is an evalDouble function to evaluate double arguments for 
gaussian & exponential thresholds.

There is a special "int" function which evaluates its argument as a double 
and cast the result to an int, for testing purpose.

Variables are only int values, and functions only return ints.

There is no real doc, WIP...

Also included are a set of sql stuff I used for minimal tests.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Heikki,

>> As soon as we add more functions, the way they are documented needs to be 
>> reworked too; we'll need to add a table in the manual to list them.
>
> Here is a v8 with "abs", "min", "max", "random", "gaussian" et
> "exponential".
>
> [...]
>
> There is no real doc, WIP...

Here is a v9 with a doc. The implicit typing of expressions is improved.

I also added two debug functions which allow to show intermediate integer 
(idebug) or double (ddebug).
  \set i idebug(random(1, 10))

will print the random value and assign it to i.


I updated the defaut scripts, which seems to speed up meta command 
evaluations. The initial version does less than 2 million evals per 
second:
  sh> cat before.sql  \set naccounts 100000 * :scale  \setrandom aid 1 :naccounts
  sh> ./pgbench -T 3 -P 1 -f before.sql  [...]  tps = 1899004.793098 (excluding connections establishing)


The updated version does more than 3 million evals per second:
  sh> cat after.sql  \set aid random(1, 100000 * :scale)
  sh> ./pgbench -T 3 -P 1 -f after.sql  [...]  tps = 3143168.813690 (excluding connections establishing)


Suggestion:

A possibility would be to remove altogether the \setrandom stuff as the 
functionality is available through \set, maybe with an error message to 
advise about using \set with one of the random functions. That would 
remove about 200 ugly locs...

Another option would be to translate the setrandom stuff into a \set 
expression, that would maybe save 100 locs for the eval, but keep and 
expand a little the "manual" parser part.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Sun, Jul 26, 2015 at 6:37 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Heikki,
>
>>> As soon as we add more functions, the way they are documented needs to be
>>> reworked too; we'll need to add a table in the manual to list them.
>>
>>
>> Here is a v8 with "abs", "min", "max", "random", "gaussian" et
>> "exponential".
>>
>> [...]
>>
>> There is no real doc, WIP...
>
>
> Here is a v9 with a doc. The implicit typing of expressions is improved.
>
> I also added two debug functions which allow to show intermediate integer
> (idebug) or double (ddebug).
>
>   \set i idebug(random(1, 10))
>
> will print the random value and assign it to i.
>
>
> I updated the defaut scripts, which seems to speed up meta command
> evaluations. The initial version does less than 2 million evals per second:
>
>   sh> cat before.sql
>   \set naccounts 100000 * :scale
>   \setrandom aid 1 :naccounts
>
>   sh> ./pgbench -T 3 -P 1 -f before.sql
>   [...]
>   tps = 1899004.793098 (excluding connections establishing)
>
>
> The updated version does more than 3 million evals per second:
>
>   sh> cat after.sql
>   \set aid random(1, 100000 * :scale)
>
>   sh> ./pgbench -T 3 -P 1 -f after.sql
>   [...]
>   tps = 3143168.813690 (excluding connections establishing)
>
>
> Suggestion:
>
> A possibility would be to remove altogether the \setrandom stuff as the
> functionality is available through \set, maybe with an error message to
> advise about using \set with one of the random functions. That would remove
> about 200 ugly locs...
>
> Another option would be to translate the setrandom stuff into a \set
> expression, that would maybe save 100 locs for the eval, but keep and expand
> a little the "manual" parser part.

I have moved this patch to the next CF.
-- 
Michael



Re: extend pgbench expressions with functions

From
BeomYong Lee
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, failed
Spec compliant:           tested, failed
Documentation:            tested, failed

Hello.

I installed centos 6.7 version on virtualbax, have cloned postgres from GitHub.
And then i patched to postgres as the patch file with the following command:
$ patch -p1 < pgbench-expr-abs-9.patch

To make sure that the patch apply to postgres, i wrote two script files.
The script file is:
$ cat before.sql
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
$ cat after.sql
\set aid random(1, 100000 * :scale)

Using the script file in pgbench, i run : 
$ ./pgbench -T 3 -P 1 -f before.sql
$ ./pgbench -T 3 -P 1 -f after.sql

This was run well. I test for the functions(e.g abs, ddebug, double, exporand, idebug, int, gaussrand, min, max,
random,sqrt) mentioned in the patch, it was also executed without error.
 

Finally i reviewed code in patch, but did not find to error.



Re: extend pgbench expressions with functions

From
BeomYong Lee
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

Hello.

I installed centos 6.7 version on virtualbax, have cloned postgres from GitHub.
And then i patched to postgres as the patch file with the following command:
$ patch -p1 < pgbench-expr-abs-9.patch

To make sure that the patch apply to postgres, i wrote two script files.
The script file is:
$ cat before.sql
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
$ cat after.sql
\set aid random(1, 100000 * :scale)

Using the script file in pgbench, i run :
$ ./pgbench -T 3 -P 1 -f before.sql
$ ./pgbench -T 3 -P 1 -f after.sql

This was run well. I test for the functions(e.g abs, ddebug, double, exporand, idebug, int, gaussrand, min, max,
random,sqrt) mentioned in the patch, it was also executed without error.
 

Finally i reviewed code in patch, but did not find to error.



Re: extend pgbench expressions with functions

From
Kyotaro HORIGUCHI
Date:
Hi, I had a look on this, this gives amazing performance. (I
myself haven't tried that yet:-p) But anyway the new syntax looks
far smarter than preparing arbitraly metacommands.

michael> I have moved this patch to the next CF.

> > Hello Heikki,
> >
> >>> As soon as we add more functions, the way they are documented needs to be
> >>> reworked too; we'll need to add a table in the manual to list them.
> >>
> >>
> >> Here is a v8 with "abs", "min", "max", "random", "gaussian" et
> >> "exponential".
> >>
> >> [...]
> >>
> >> There is no real doc, WIP...

As a quick glance on the patch, I have two simple comment.
Would you consider these comments?

1.evalInt and evalDouble are mutually calling on single exprnode. It looks simple and seems to work but could easily
brokentofall into infinite call and finally (in a moment) exceedsstack depth.
 
I think it is better not to do so. For example, reunioningevalDouble and evalInt into one evalulateExpr() which can
returnbothdouble and int result would do so. The interface would besomething like the follwings or others. Some
individualfunctionsmight be better to be split out.
 
 static ExprRetType evaluateExpr(TState,CState,PgBenchExpr,                                 int *iret, double *dret);
 or
 typedef struct EvalRetVal {   bool is_double,   union val {     int    *ival;     double *dval;   } } EvalRetVal;
staticbool evaluateExpr(..., EvalRetVal *ret);
 


2. You combined abs/sqrt/ddebug and then make a branch for them. It'd be better to split them even if eval*() looks to
beduplicate.
 


regards,


> > Here is a v9 with a doc. The implicit typing of expressions is improved.
> >
> > I also added two debug functions which allow to show intermediate integer
> > (idebug) or double (ddebug).
> >
> >   \set i idebug(random(1, 10))
> >
> > will print the random value and assign it to i.
> >
> >
> > I updated the defaut scripts, which seems to speed up meta command
> > evaluations. The initial version does less than 2 million evals per second:
> >
> >   sh> cat before.sql
> >   \set naccounts 100000 * :scale
> >   \setrandom aid 1 :naccounts
> >
> >   sh> ./pgbench -T 3 -P 1 -f before.sql
> >   [...]
> >   tps = 1899004.793098 (excluding connections establishing)
> >
> >
> > The updated version does more than 3 million evals per second:
> >
> >   sh> cat after.sql
> >   \set aid random(1, 100000 * :scale)
> >
> >   sh> ./pgbench -T 3 -P 1 -f after.sql
> >   [...]
> >   tps = 3143168.813690 (excluding connections establishing)
> >
> >
> > Suggestion:
> >
> > A possibility would be to remove altogether the \setrandom stuff as the
> > functionality is available through \set, maybe with an error message to
> > advise about using \set with one of the random functions. That would remove
> > about 200 ugly locs...
> >
> > Another option would be to translate the setrandom stuff into a \set
> > expression, that would maybe save 100 locs for the eval, but keep and expand
> > a little the "manual" parser part.
> 
> I have moved this patch to the next CF.
> -- 
> Michael

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Kyotaro-san,

> 1.evalInt and evalDouble are mutually calling on single expr
> node.

Indeed.

> It looks simple and seems to work but could easily broken
> to fall into infinite call and finally (in a moment) exceeds
> stack depth.

The recursion is on the tree-structure of the expression, which is 
evaluated depth-first. As the tree must be finite, and there is no such 
thing as recursive functions in the expression tree, this should never 
happen. If it happened, it would be a bug in the eval functions, probably 
a forgotten "case", and could be fixed there easily.

> I think it is better not to do so. For example, reunioning
> evalDouble and evalInt into one evalulateExpr() which can return
> both double and int result would do so. The interface would be
> something like the follwings or others. Some individual
> functions might be better to be split out.
>
>  static ExprRetType evaluateExpr(TState,CState,PgBenchExpr,
>                                  int *iret, double *dret);

That would not work as simply as that: this signature does not say whether 
a double or an int is expected, and without this information you would not 
know what to do on some operators (the typing is explicit and descendant 
in the current implementation).

Even if you add this information, or you use the returned type information 
to do the typing dynamically, that would mean testing the result types and 
implementing the necessary casts depending on this information for every 
function within the generic eval, which would result in repetitive and 
ugly code for every function and operator: for instance for '+', you would 
have to implement 4 cases explicitely, i+i, f+i, i+f, f+f as "int add", 
"cast left and double add", "cast right and double add", and finally 
"double add". Then you have other operators to consider... Although some 
sharing can be done, this would end in lengthy & ugly code.

A possible alternative could be to explicitely and statically type all 
operations, adding the necessary casts explicitely, distinguishing 
operators, but the would mean more non-obvious code to "compile" the 
parsed expressions, I'm not sure that pgbench deserves this.

The two eval functions and the descendant typing allow to hide/ignore 
these issues because each eval function handles just one type, type 
boundaries are explicitely handled by calling the other function, so there 
is no reason to test and cast everything all the time.

So I thing that it would not be an improvement to try to go the way you 
suggest.

> 2. You combined abs/sqrt/ddebug and then make a branch for
>  them. It'd be better to split them even if eval*() looks to be
>  duplicate.

Hm, why not.

Here is a v10 where I did that when it did not add too much code 
repetition (eg I kept the shared branches for MIN & MAX and for the 3 
RANDOM functions so as to avoid large copy pastes).

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> [...]
> Here is a v10 where I did that when it did not add too much code repetition 
> (eg I kept the shared branches for MIN & MAX and for the 3 RANDOM functions 
> so as to avoid large copy pastes).

Ooops, forgotten attachement.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Kyotaro HORIGUCHI
Date:
Hello, 

At Tue, 15 Sep 2015 19:26:51 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in
<alpine.DEB.2.10.1509151844080.6503@sto>
> > 1.evalInt and evalDouble are mutually calling on single expr
> > node.
> 
> Indeed.
> 
> > It looks simple and seems to work but could easily broken
> > to fall into infinite call and finally (in a moment) exceeds
> > stack depth.
> 
> The recursion is on the tree-structure of the expression, which is
> evaluated depth-first. As the tree must be finite, and there is no
> such thing as recursive functions in the expression tree, this should
> never happen. If it happened, it would be a bug in the eval functions,
> probably a forgotten "case", and could be fixed there easily.

My description should have been obscure. Indeed the call tree is
finite for *sane* expression node. But it makes infinit call for
a value of expr->etype unknown by both evalDouble and
evalInt. This is what I intended to express by the words 'easily
broken'. AFAICS most of switches for nodes in the core would
stops and emit the message like following,

| default:
|   elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));

So I think the following code would do well.

| evalDoubld()
| {
|    if (IS_INT_EXPR_NODE(expr->etype))
|    {
|       if (!evalInt(...))
|          return false;
|       *retval = (double) ival;
|       return true;
|    }     
| 
|    switch (expr->etype)
|    {
|      case ENODE_DOUBLE_CONSTANT:
|      ...
|      default:
|         elog(ERROR, "unrecognized expr type: %d", (int) expr->etype);

I suppose here's a big difference between SEGV after run through
the stack bottom and server's return with ERROR for the same
bug. But IS_INT_EXPR_NODE() would be redundant with the
switch-case..

Do you think that I'm taking the difference too serious? However
this is needelss discussion if the following change is acceptable
for you.

> > I think it is better not to do so. For example, reunioning
> > evalDouble and evalInt into one evalulateExpr() which can return
> > both double and int result would do so. The interface would be
> > something like the follwings or others. Some individual
> > functions might be better to be split out.
> >
> >  static ExprRetType evaluateExpr(TState,CState,PgBenchExpr,
> >                                  int *iret, double *dret);
> 
> That would not work as simply as that: this signature does not say
> whether a double or an int is expected, and without this information
> you would not know what to do on some operators (the typing is
> explicit and descendant in the current implementation).
> Even if you add this information, or you use the returned type
> information to do the typing dynamically, that would mean testing the
> result types and implementing the necessary casts depending on this
> information for every function within the generic eval, which would
> result in repetitive and ugly code for every function and operator:
> for instance for '+', you would have to implement 4 cases explicitely,
> i+i, f+i, i+f, f+f as "int add", "cast left and double add", "cast
> right and double add", and finally "double add". Then you have other
> operators to consider... Although some sharing can be done, this would
> end in lengthy & ugly code.

Yes, I also think it is ugly as I anticipated:(.

By the way, the complexity comes from separating integer and
double. If there is no serios reason to separate them, handling
all values as double makes things far simpler.

Could you let me know the reason why it strictly separates
integer and double?

|   evaluateExpr(..., double *retval)
|   {
|      switch (expr->etype)
|      {
|        case ENODE_CONSTANT:
|          *retval = expr->u.constant.val;
|          return true;
|        case ENODE_OPERATOR
|          /* Nothing to worry about */
|        case ENODE_FUNCTION:
|        {
|            switch (func)
|            { 
|               case PGBENCH_ABS/SQRT/DEBUG:
|                 /* Nothing to worry about, too */
|               /* case PGBENCH_DOUBLE is useless ever after */
|               case PGBENCH_INT:
|                 double arg;
|                 if (!evalulateExpr(..., &arg);
|                    return false;
|         *retval = floor(arg);

I don't see no problem in possible errors of floating point
calculations for this purpose. Is there any?         
Anyway set meta command finally converts the result as integer
regardless of the real value so the following conversion does the
same as your current code.

in doCustom()
> else if (pg_strcasecmp(argv[0], "set") == 0)
> {
>     char        res[64];
>     PgBenchExpr *expr = commands[st->state]->expr;
>     int64        result;
> 
-     if (!evalInt(thread, st, expr, &result))
+     if (!evaluateExpr(thread, st, expr, &dresult))
+           result = (int64)dresult;

This should make exprparse.y simpler.

> A possible alternative could be to explicitely and statically type all
> operations, adding the necessary casts explicitely, distinguishing
> operators, but the would mean more non-obvious code to "compile" the
> parsed expressions, I'm not sure that pgbench deserves this.
> The two eval functions and the descendant typing allow to hide/ignore
> these issues because each eval function handles just one type, type
> boundaries are explicitely handled by calling the other function, so
> there is no reason to test and cast everything all the time.
> 
> So I thing that it would not be an improvement to try to go the way
> you suggest.



> > 2. You combined abs/sqrt/ddebug and then make a branch for
> >  them. It'd be better to split them even if eval*() looks to be
> >  duplicate.
> 
> Hm, why not.

Thanks.

> Here is a v10 where I did that when it did not add too much code
> repetition (eg I kept the shared branches for MIN & MAX and for the 3
> RANDOM functions so as to avoid large copy pastes).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Kyotaro-san,

> My description should have been obscure. Indeed the call tree is
> finite for *sane* expression node. But it makes infinit call for
> a value of expr->etype unknown by both evalDouble and
> evalInt.

Such issue would be detected if the function is actually tested, hopefully 
this should be the case... :-)

However I agree that relying implicitely on the "default" case is not very 
good practice, so I updated the code in the attached v11 to fail 
explicitely on such errors.

I also attached a small test script, which exercises most (all?) 
functions:
  ./pgbench -f functions.sql -t 1

> [...]
> By the way, the complexity comes from separating integer and
> double. If there is no serios reason to separate them, handling
> all values as double makes things far simpler.

Yep, but no.

> Could you let me know the reason why it strictly separates integer and 
> double? I don't see no problem in possible errors of floating point 
> calculations for this purpose. Is there any?

Indeed it would make things simpler, but it would break large integers as 
the int64 -> double -> int64 casts would result in approximations. The 
integer type is the important one here because it is used for primary 
keys, and you do not want a key to be approximated in any way, so the 
int64 type must be fully and exactly supported.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Thu, Sep 17, 2015 at 10:58 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> By the way, the complexity comes from separating integer and
> double. If there is no serios reason to separate them, handling
> all values as double makes things far simpler.

-1.  double is an inexact type, whereas integer is an exact type.

The typical way to handle this sort of thing is to define a struct
whose first member is a type field and whose second field is a union
of all the types you need to care about.  Then that gets passed around
everywhere.  This patch should be designed in such a way that if we
eventually end up with functions that have 10 different return types
instead of 2 different return types, we don't need to add 8 more
parameters to any functions.  Instead, those still return
PgBench_Value (or whatever we call it) which is the aforementioned
struct, but there are more options for what that can contain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> -1.  double is an inexact type, whereas integer is an exact type.

Sure. I already argue on that very line.

> The typical way to handle this sort of thing is to define a struct
> whose first member is a type field and whose second field is a union
> of all the types you need to care about.

Yep.

> Then that gets passed around everywhere.  This patch should be designed 
> in such a way that if we eventually end up with functions that have 10 
> different return types instead of 2 different return types, we don't 
> need to add 8 more parameters to any functions.  Instead, those still 
> return PgBench_Value (or whatever we call it) which is the 
> aforementioned struct, but there are more options for what that can 
> contain.

I just put the double type as a proof of concept, but for pgbench only 
integers really matters.

What you suggest would work, but it would also result in ugly and lengthy 
code, as I argued in another mail, because you have to decide for 
overloaded operators and functions which actual typed operator must be 
called, and then perform the necessary type conversions depending on the 
actual type of the operands. The implicit descendent typing used in the 
patch hides this, and is more than enough for pgbench, IMO.

If this is a blocker, I would rather remove the support for doubles than 
write verbose and inelegant code.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
"Shulgin, Oleksandr"
Date:
On Fri, Sep 18, 2015 at 10:21 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Hello Kyotaro-san,

My description should have been obscure. Indeed the call tree is
finite for *sane* expression node. But it makes infinit call for
a value of expr->etype unknown by both evalDouble and
evalInt.

Such issue would be detected if the function is actually tested, hopefully this should be the case... :-)

However I agree that relying implicitely on the "default" case is not very good practice, so I updated the code in the attached v11 to fail explicitely on such errors.

I also attached a small test script, which exercises most (all?) functions:

  ./pgbench -f functions.sql -t 1

A short review from me:

1. Patch applies cleanly on current HEAD.
2. It compiles without errors or warnings.
3. The attached test case can be executed w/o symptoms of any problem and it produces meaningful results.

Should we not allow for functions taking 0 arguments?  Since we're already into some math here, how about pi()? ;-)

I understand requiring at least 1 arg simplifies the code a bit, but right now it reports syntax error for "random()", while it correctly reports unexpected number of arguments for "random(1,2,3)".  We would need another check for min() and max() which expect >=1 arguments, but it's easy to add.

I would also argue that we should rename "random" to "rand" here to avoid confusion with the familiar SQL function "random()" that doesn't take arguments.

--
Alex

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Here is a v12 which implements the suggestions below.

> Should we not allow for functions taking 0 arguments?  Since we're already
> into some math here, how about pi()? ;-)

Hmmm, why not.

> I understand requiring at least 1 arg simplifies the code a bit, but right
> now it reports syntax error for "random()", while it correctly reports
> unexpected number of arguments for "random(1,2,3)".  We would need another
> check for min() and max() which expect >=1 arguments, but it's easy to add.

Indeed, I had to add a special check.

> I would also argue that we should rename "random" to "rand" here to avoid
> confusion with the familiar SQL function "random()" that doesn't take
> arguments.

Why not, as it is also consistent with exporand() & gaussrand().

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Fri, Oct 30, 2015 at 1:01 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Here is a v12 which implements the suggestions below.

Sorry it's taken me so long to get around to looking at this.

Some thoughts on an initial read-through:

1. I think there should really be two patches here, the first adding
functions, and the second adding doubles.  Those seem like separate
changes.  And offhand, the double stuff looks a lot less useful that
the function call syntax.

2. ddebug and idebug seem like a lousy idea to me.  If we really need
to be able to print stuff from pgbench, which I kind of doubt, then
maybe we should have a string data type and a print() function, or
maybe a string data type and a toplevel \echo command.

3. I'm perplexed by why you've replaced evaluateExpr() with evalInt()
and evalDouble().  That doesn't seem similar to what I've seen in
other expression-evaluation engines.  Perhaps I could find out by
reading the comments, but actually not, because this entire patch
seems to add only one comment:

+       /* reset column count for this scan */
+       yycol = 0;

While I'm not a fan of excessive commenting, I think a little more
explanation here would be good.

4. The table of functions in pgbench.sgml seems to leave something to
be desired.  We added a pretty detailed write-up on the Gaussian and
exponential options to \setrandom, but exporand() has only this
description:

+      <row>
+       <entry><literal><function>exporand(<replaceable>i</>,
<replaceable>j</>, <replaceable>t</>)</></></>
+       <entry>integer</>
+       <entry>exponentially distributed random integer in the bounds,
see below</>
+       <entry><literal>exporand(1, 10, 3.0)</></>
+       <entry>int between <literal>1</> and <literal>10</></>
+      </row>

That's not very helpful.  Without looking at the example, there's no
way to guess what i and j mean, and even with looking at the example,
there's no way to guess what t means.  If, as I'm guessing, exporand()
and guassrand() behave like \setrandom with the exponential and/or
Gaussian options, then the documentation for one of those things
should contain all of the detailed information and the documentation
for the other should refer to it.  More than likely, exporand() and
gaussrand() should get the detailed explanation, and \setrandom should
be document as a deprecated alternative to \set ...
{gauss,expo,}rand(...)

5. I liked Heikki's proposed function names random_gaussian(min, max,
threshold) and random_exponential(min, max, threshold) better than the
ones you've picked here.  I think random() would be OK instead of his
suggestion of random_uniform(), though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Kyotaro HORIGUCHI
Date:
Hello, sorry for the silence.

At Fri, 18 Sep 2015 20:35:48 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in
<alpine.DEB.2.10.1509182019100.27223@sto>
> > -1.  double is an inexact type, whereas integer is an exact type.
> 
> Sure. I already argue on that very line.

Agreed.

> > The typical way to handle this sort of thing is to define a struct
> > whose first member is a type field and whose second field is a union
> > of all the types you need to care about.
> 
> Yep.
> 
> > Then that gets passed around everywhere.  This patch should be
> > designed in such a way that if we eventually end up with functions
> > that have 10 different return types instead of 2 different return
> > types, we don't need to add 8 more parameters to any functions.
> > Instead, those still return PgBench_Value (or whatever we call it)
> > which is the aforementioned struct, but there are more options for
> > what that can contain.
> 
> I just put the double type as a proof of concept, but for pgbench only
> integers really matters.
> 
> What you suggest would work, but it would also result in ugly and
> lengthy code, as I argued in another mail, because you have to decide
> for overloaded operators and functions which actual typed operator
> must be called, and then perform the necessary type conversions
> depending on the actual type of the operands. The implicit descendent
> typing used in the patch hides this, and is more than enough for
> pgbench, IMO.

I also agree this.

> If this is a blocker, I would rather remove the support for doubles
> than write verbose and inelegant code.

I understood the situation and agreed for current shape of the
code. I no longer object the calling-alternatively code. But I'd
like see the abbreviated discussion in the comment on the
function.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

> 1. I think there should really be two patches here, the first adding
> functions, and the second adding doubles.  Those seem like separate
> changes.  And offhand, the double stuff looks a lot less useful that
> the function call syntax.

I first submitted the infrastructure part, but I was asked to show how 
more functions could be included, especially random variants. As random 
gaussian/exponential functions require a double argument, there must be 
some support for double values.

Now, it could certainly be split in two patches, but this is rather 
artificial, IMO.

> 2. ddebug and idebug seem like a lousy idea to me.

It was really useful to me for debugging and testing.

> If we really need to be able to print stuff from pgbench, which I kind 
> of doubt, then maybe we should have a string data type and a print() 
> function, or maybe a string data type and a toplevel \echo command.

The *debug functions allow to intercept the value computed within an 
expression. If you rely on variables and some echo (which does not exist) 
this means that there should be double variables as well, which is not 
currently the case, and which I do not see as useful for the kind of 
script written for pgbench. Adding the string type is more work, and
I do not see a good use case for those.

So the *debug functions are really just a lightweight solution for 
debugging type related issues in expressions. I can drop them if this is a 
blocker, but the are really useful for testing quickly a script.

> 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt()
> and evalDouble().

As explained above in the thread (I think), the reason is that having one 
overloaded expression evaluation which handles types conversion would 
produce pretty heavy code, and the two functions with the descending 
typing allows to have a much smaller code with the same effect.

The issue is that with two types all functions must handle argument type 
conversion explicitely.

For instance for "random_gaussian(int, int, double)", it may be called 
with any combination of 3 int/double arguments, each one must be tested 
and possibly converted to the target type before calling the actual 
function. For overloaded operators or functions (arithmetics, abs...) 
there is also the decision about which operator is called and then what 
conversions are necessary.

With the descending typing and two functions cross recursion all these 
explicit tests and conversion disappear because the function evaluation 
calls evalInt or evalDouble depending on the expected types.

Basically, the code is significantly shorter and elegant with this option.

> That doesn't seem similar to what I've seen in other 
> expression-evaluation engines.

Probably. This is because I choose a descending typing to simplify the 
implementation. Changing this would bring no real practical benefit from 
the usage point of view, but would add significant more verbose and ugly 
code to test and handle type conversions everywhere, so I'm not keen to do 
that.

> Perhaps I could find out by reading the comments, but actually not, 
> because this entire patch seems to add only one comment:
>
> +       /* reset column count for this scan */

There are a few others, really:-)

> While I'm not a fan of excessive commenting, I think a little more
> explanation here would be good.

I can certainly add more comments to the code, especially around the eval 
cross recursion functions.

> 4. The table of functions in pgbench.sgml seems to leave something to
> be desired.  We added a pretty detailed write-up on the Gaussian and
> exponential options to \setrandom, but exporand() has only this
> description:

Yep. The idea was *not* to replicate the (painful) explanations about 
random functions, but that it should be shared between the function and 
the \set variants.

> +      <row>
> +       <entry><literal><function>exporand(<replaceable>i</>,
> <replaceable>j</>, <replaceable>t</>)</></></>
> +       <entry>integer</>
> +       <entry>exponentially distributed random integer in the bounds,
> see below</>
> +       <entry><literal>exporand(1, 10, 3.0)</></>
> +       <entry>int between <literal>1</> and <literal>10</></>
> +      </row>
>
> That's not very helpful.

The table explanation must be kept short for the table format...

> Without looking at the example, there's no way to guess what i and j 
> mean, and even with looking at the example, there's no way to guess what 
> t means.  If, as I'm guessing, exporand() and guassrand() behave like 
> \setrandom with the exponential and/or Gaussian options, then the 
> documentation for one of those things should contain all of the detailed 
> information and the documentation for the other should refer to it.

Indeed, that was the idea, but it seems that I forgot the pointer:-)

> More than likely, exporand() and gaussrand() should get the detailed 
> explanation, and \setrandom should be document as a deprecated 
> alternative to \set ... {gauss,expo,}rand(...)

Ok, the description can be moved to the function part and the \set 
version reference the other.

> 5. I liked Heikki's proposed function names random_gaussian(min, max,
> threshold) and random_exponential(min, max, threshold) better than the
> ones you've picked here.  I think random() would be OK instead of his
> suggestion of random_uniform(), though.

Ok.

I'll submit an updated version of the patch, which addresses points 4 & 5 
and documents 3, and wait for feedback on the explanations I gave before 
doing anything for about 1 & 2, as I think that the implied changes are 
not desirable. I'm not keen at all on changing the cross recursion 
implementation (3), this would just be pretty ugly code without actual 
user benefit.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Kyotaro-san,

> [...] I understood the situation and agreed for current shape of the 
> code. I no longer object the calling-alternatively code. But I'd like 
> see the abbreviated discussion in the comment on the function.

Indeed, this design choice deserves much better explanations.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Alvaro Herrera
Date:
Fabien COELHO wrote:

> For instance for "random_gaussian(int, int, double)", it may be called with
> any combination of 3 int/double arguments, each one must be tested and
> possibly converted to the target type before calling the actual function.
> For overloaded operators or functions (arithmetics, abs...) there is also
> the decision about which operator is called and then what conversions are
> necessary.

I think we should make this as simple as possible -- in particular I
don't see a lot of value in overloading here.  I'd rather have it throw
an error and force you to write the double with a ".0" when the given
value is an integer so that it is parsed as a double, rather than
accepting ambiguous input.  Conversely, if you pass a double to the
integer options, raise an error.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Alvaro,

>> For instance for "random_gaussian(int, int, double)", it may be called with
>> any combination of 3 int/double arguments, each one must be tested and
>> possibly converted to the target type before calling the actual function.
>> For overloaded operators or functions (arithmetics, abs...) there is also
>> the decision about which operator is called and then what conversions are
>> necessary.
>
> I think we should make this as simple as possible -- in particular I
> don't see a lot of value in overloading here.  I'd rather have it throw
> an error and force you to write the double with a ".0" when the given
> value is an integer so that it is parsed as a double, rather than
> accepting ambiguous input.  Conversely, if you pass a double to the
> integer options, raise an error.

I agree that the overloading and other stuff is not a decisive and very 
useful feature of the patch, but I think that the descendant-typing 
two-functions recursive evaluation is a good compromise, as it works 
without much ado nor ambiguity, and from my point of view the code stays 
quite simple with this approach.

So although the benefit (of having overloaded operators and handling two 
types expressions) is indeed in practice quite limited, the cost in the 
code is low.

Being more restrictive or strict as you suggest would not remove much 
code, and would remove some convenience. Also, removing this feature would 
induce inconsistencies: integer arguments would allow general expressions, 
but only double constants would be ok for double arguments...

So I would rather keep it as it is.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

Here is a v13 and a small test script.
 - change names to random(), random_exponential() and random_gaussian()   I find them too long, but if the committer
wantthat I cannot help   it:-)
 
 - more comments, especially about the expression evaluation &   type system.
 - improved documentation, in particular to include suggestions by Tomas   Vondra about clarifying explanations about
thegaussian &   exponential random generators, and clear references from \setrandom   to the \set expressions.
 
 - still just one patch, because removing double would mean removing the 2   exponential & gaussian random functions
whichrequire a double   argument.
 
   Note that I started with one small patch for adding the infrastructure,   but then Heikki requested more functions
includingdouble type stuff to   illustrate the point, then Robert asks to break it back, going forward   and backward
istiring...
 
 - still "lousy" *debug functions, because I found them useful for   debugging and testing, really. It is easy to
removethem, but I would   advise against doing that as it would make debugging an expression   much less
straightforward.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello again,

The v14 also remove references to the "threshold" word about gaussian and 
exponential random generation in the source code (comments and variable 
names), as it has no clear meaning, to replace it with param/parameter 
depending on the context, as discussed in another thread started by Tomas 
Vondra.

Sorry for the noise.

> Here is a v13 and a small test script.
>
> - change names to random(), random_exponential() and random_gaussian()
>   I find them too long, but if the committer want that I cannot help
>   it:-)
>
> - more comments, especially about the expression evaluation &
>   type system.
>
> - improved documentation, in particular to include suggestions by Tomas
>   Vondra about clarifying explanations about the gaussian &
>   exponential random generators, and clear references from \setrandom
>   to the \set expressions.
>
> - still just one patch, because removing double would mean removing the 2
>   exponential & gaussian random functions which require a double
>   argument.
>
>   Note that I started with one small patch for adding the infrastructure,
>   but then Heikki requested more functions including double type stuff to
>   illustrate the point, then Robert asks to break it back, going forward
>   and backward is tiring...
>
> - still "lousy" *debug functions, because I found them useful for
>   debugging and testing, really. It is easy to remove them, but I would
>   advise against doing that as it would make debugging an expression
>   much less straightforward.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Wed, Nov 4, 2015 at 4:06 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> 1. I think there should really be two patches here, the first adding
>> functions, and the second adding doubles.  Those seem like separate
>> changes.  And offhand, the double stuff looks a lot less useful that
>> the function call syntax.
>
> I first submitted the infrastructure part, but I was asked to show how more
> functions could be included, especially random variants. As random
> gaussian/exponential functions require a double argument, there must be some
> support for double values.

Ugh, OK.

>> 2. ddebug and idebug seem like a lousy idea to me.
>
> It was really useful to me for debugging and testing.

That doesn't mean it belongs in the final patch.

>> 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt()
>> and evalDouble().
>
> Basically, the code is significantly shorter and elegant with this option.

I find that pretty hard to swallow.  If the backend took this
approach, we've have a separate evaluate function for every datatype,
which would make it completely impossible to support the creation of
new datatypes.  And I find this code to be quite difficult to follow.

What I think we should have is a type like PgBenchValue that
represents a value which may be an integer or a double or whatever
else we want to support, but not an expression - specifically a value.
Then when you invoke a function or an operator it gets passed one or
more PgBenchValue objects and can do whatever it wants with those,
storing the result into an output PgBenchValue.  So add might look
like this:

void
add(PgBenchValue *result, PgBenchValue *x, PgBenchValue *y)
{   if (x->type == PGBT_INTEGER && y->type == PGBT_INTEGER)   {       result->type = PGBT_INTEGER;       result->u.ival
=x->u.ival + y->u.ival;       return;   }   if (x->type == PGBT_DOUBLE && y->type == PGBT_DOUBLE)   {
result->type= PGBT_DOUBLE;       result->u.ival = x->u.dval + y->u.dval;       return;  }   /* cross-type cases, if
desired,go here */   /* if no case applies, somehow report an error */
 
}

The way you have it, the logic for every function and operator exists
in two copies: one in the integer function, and the other in the
double function.  As soon as we add another data type - strings,
dates, whatever - we'd need to have cases for all of these things in
those functions as well, and every time we add a function, we have to
update all the case statements for every datatype's evaluation
function.  That's just not a scalable model.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

>>> 2. ddebug and idebug seem like a lousy idea to me.
>>
>> It was really useful to me for debugging and testing.
>
> That doesn't mean it belongs in the final patch.

I think it is useful when debugging a script, not just for debugging the 
evaluation code itself.

>>> 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt()
>>> and evalDouble().
>>
>> Basically, the code is significantly shorter and elegant with this option.
>
> I find that pretty hard to swallow.

Hmmm, maybe with some more explanations?

> If the backend took this approach,

Sure, I would never suggest to do anything like that in the backend!

> we've have a separate evaluate function for every datatype, which would 
> make it completely impossible to support the creation of new datatypes.

In the backend implementation is generic about types (one can add a type 
dynamically in the system), which is another abstraction level, it does 
not compare in any way.

> And I find this code to be quite difficult to follow.

It is really the function *you* wrote, and there is just one version for
int and one for double.

> What I think we should have is a type like PgBenchValue that represents 
> a value which may be an integer or a double or whatever else we want to 
> support, but not an expression - specifically a value. Then when you 
> invoke a function or an operator it gets passed one or more PgBenchValue 
> objects and can do whatever it wants with those, storing the result into 
> an output PgBenchValue. So add might look like this:

Sure, I do know what it looks like, and I want to avoid it, because this 
is just a lot of ugly code which is useless for pgbench purpose.

> void
> add(PgBenchValue *result, PgBenchValue *x, PgBenchValue *y)
> {
>    if (x->type == PGBT_INTEGER && y->type == PGBT_INTEGER)
>    {
>        result->type = PGBT_INTEGER;
>        result->u.ival = x->u.ival + y->u.ival;
>        return;
>    }
>    if (x->type == PGBT_DOUBLE && y->type == PGBT_DOUBLE)
>    {
>        result->type = PGBT_DOUBLE;
>        result->u.ival = x->u.dval + y->u.dval;
>        return;
>   }
>    /* cross-type cases, if desired, go here */

Why reject 1 + 2.0 ? So the cross-type cases are really required for user 
sanity, which adds:
    if (x->type == PGBT_DOUBLE && y->type == PGBT_INTEGER)    {        result->type = PGBT_DOUBLE;
result->u.ival= x->u.dval + y->u.ival;        return;    }    if (x->type == PGBT_INTEGER && y->type == PGBT_DOUBLE)
{       result->type = PGBT_DOUBLE;        result->u.ival = x->u.ival + y->u.dval;       return;   } 
 
> }

For the '+' overload operator with conversions there are 4 cases (2 
arguments ** 2 types) to handle. For all 5 binary operators (+ - * / %). 
that makes 20 cases to handle. Then for every function, you have to deal 
with type conversion as well, each function times number of arguments ** 
number of types. That is 2 cases for abs, 4 cases for random, 8 cases for 
each random_exp*, 8 for random_gaus*, and so on. Some thinking would be 
required for n-ary functions (min & max).

Basically, it can be done, no technical issue, it is just a matter of 
writing a *lot* of repetitive code, hundreds of lines of them. As I think 
it does not bring any value for pgbench purpose, I used the other approach 
which reduces the code size by avoiding the combinatorial "cross-type" 
conversions.

> The way you have it, the logic for every function and operator exists
> in two copies: one in the integer function, and the other in the
> double function.

Yep, but only 2 cases to handle, instead of 4 cases in the above example, 
that is the point.

> As soon as we add another data type - strings, dates, whatever - we'd 
> need to have cases for all of these things in those functions as well,
> and every time we add a function, we have to update all the case 
> statements for every datatype's evaluation function.  That's just not a 
> scalable model.

On the contrary, it is reasonably scalable:

With the eval-per-type for every added type one should implement one eval 
function which handles all functions and operators, each eval function 
roughly the same size as the current ones.

With the above approach, the overloaded add function which handles 2 
operands with 3 types ('post' + 'gres' -> 'postgres') would have to deal 
with 2**3 = 8 types combinations instead of 4, so basically it would be 
doubling the code size. If you add dates on top of that, 2**4 = 16 cases 
just for one operator. No difficulty there, just a lot of lines...

Basically the code size complexity of the above approach is:
  #functions * (#args ** #types)

while for the approach in the submitted patch it is:
  #types * #functions

Now I obviously agree that the approach is not generic and should not be 
used in other contexts, it is just that it is simple/short and it fits the 
bill for pgbench.

Note that I do not really envision adding more types for pgbench scripts. 
The double type is just a side effect of the non uniform randoms. What I 
plan is to add a few functions, especially a pseudo-random permutation 
and/or a parametric hash, that would allow running more realistic tests 
with non uniform distributions.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Thu, Nov 5, 2015 at 3:33 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> For the '+' overload operator with conversions there are 4 cases (2
> arguments ** 2 types) to handle. For all 5 binary operators (+ - * / %).
> that makes 20 cases to handle. Then for every function, you have to deal
> with type conversion as well, each function times number of arguments **
> number of types. That is 2 cases for abs, 4 cases for random, 8 cases for
> each random_exp*, 8 for random_gaus*, and so on. Some thinking would be
> required for n-ary functions (min & max).
>
> Basically, it can be done, no technical issue, it is just a matter of
> writing a *lot* of repetitive code, hundreds of lines of them. As I think it
> does not bring any value for pgbench purpose, I used the other approach
> which reduces the code size by avoiding the combinatorial "cross-type"
> conversions.

Those can be avoided in other ways.  For example:
  if (x->type == PGBT_INTEGER && y->type == PGBT_INTEGER)  {      result->type = PGBT_INTEGER;      result->u.ival =
x->u.ival+ y->u.ival;  }  else  {      result->type = PGBT_DOUBLE;      result->u.ival = coerceToDouble(x) +
coerceToDouble(y); }
 

coerceToDouble can re-used for every arithmetic operator and can throw
an error if the input type is not coercible.  Yet, we still return an
exact integer answer when possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> Those can be avoided in other ways.  For example:

Ok, ok, I surrender:-)

Here is a v15 which hides conversions and assignment details in macros and 
factors out type testing of overloaded operators so that the code 
expansion is minimal (basically the operator evaluation is duplicated for 
int & double, but the rest is written once). The evaluation cost is 
probably slightly higher than the previous version because of the many 
hidden type tests.

Note that variables are only int stored as text. Another patch may try to 
propagate the value structure for variables, but then it changes the query 
expansion code, it is more or less orthogonal to add functions. Moreover 
double variables would not be really useful anyway.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Fri, Nov 6, 2015 at 5:00 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> Those can be avoided in other ways.  For example:
>
> Ok, ok, I surrender:-)
>
> Here is a v15 which hides conversions and assignment details in macros and
> factors out type testing of overloaded operators so that the code expansion
> is minimal (basically the operator evaluation is duplicated for int &
> double, but the rest is written once). The evaluation cost is probably
> slightly higher than the previous version because of the many hidden type
> tests.
>
> Note that variables are only int stored as text. Another patch may try to
> propagate the value structure for variables, but then it changes the query
> expansion code, it is more or less orthogonal to add functions. Moreover
> double variables would not be really useful anyway.

OK, comments on this version:

1. It's really not appropriate to fold the documentation changes
raised on the other thread into this patch.  I'm not going to commit
something where the commit message is a laundry list of unrelated
changes.  Please separate out the documentation changes as a separate
patch.  Let's do that first, and then make this patch apply on top of
those changes.  The related changes in getGaussianRand etc. should
also be part of that patch, not this one.

2. Please reduce the churn in the pgbench output example.  Most of the
lines that you've changed don't actually need to be changed.

3. I think we should not have ENODE_INTEGER_CONSTANT and
ENODE_DOUBLE_CONSTANT.  We should just have ENODE_CONSTANT, and it
should store the same datatype we use to represent values in the
evaluator.

4. The way you've defined the value type is not great.  int64_t isn't
used anywhere in our source base.  Don't start here.  I think the
is_none, is_int, is_double naming is significantly inferior to what I
suggested, and unlike what we do through the rest of our code.
Similarly, the coercion functions are not very descriptive named, and
I don't see why we'd want those to be macros rather than static
functions.

5. The declaration of PgBenchExprList has a cuddled opening brace,
which is not PostgreSQL style.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

> 1. It's really not appropriate to fold the documentation changes
> raised on the other thread into this patch.  I'm not going to commit
> something where the commit message is a laundry list of unrelated
> changes.  Please separate out the documentation changes as a separate
> patch.  Let's do that first, and then make this patch apply on top of
> those changes.  The related changes in getGaussianRand etc. should
> also be part of that patch, not this one.

Hmmm. Attached is a two-part v16.

> 2. Please reduce the churn in the pgbench output example.  Most of the
> lines that you've changed don't actually need to be changed.

I did a real run to get consistant figures, especially as now more 
informations are shown. I did some computations to try to generate 
something consistent without changing too many lines, but just taking a 
real output would make more sense.

> 3. I think we should not have ENODE_INTEGER_CONSTANT and
> ENODE_DOUBLE_CONSTANT.  We should just have ENODE_CONSTANT, and it
> should store the same datatype we use to represent values in the
> evaluator.

Why not. This induces a two level tag structure, which I do not find that 
great, but it simplifies the evaluator code to have one less case.

> 4. The way you've defined the value type is not great.  int64_t isn't
> used anywhere in our source base.  Don't start here.

Indeed. I really meant "int64" which is used elsewhere in pgbench.

> I think the is_none, is_int, is_double naming is significantly inferior 
> to what I suggested, and unlike what we do through the rest of our code.

Hmmm. I think that it is rather a matter of taste, and PGBT_DOUBLE does 
not strike me as intrinsically superior, although possibly more in style.

I changed value_t and value_type_t to PgBenchValue and ~Type to blend in, 
even if I find it on the heavy side.

> Similarly, the coercion functions are not very descriptive named,

I do not understand. "INT(value)" and "DOUBLE(value) both look pretty 
explicit to me...

The point a choosing short names is to avoid newlines to stay (or try to 
stay) under 80 columns, especially as pg indentations rules tend to move 
things quickly on the right in case constructs, which are used in the 
evaluation function.

I use "explicitely named" functions, but used short macros in the 
evaluator.

> and I don't see why we'd want those to be macros rather than static 
> functions.

Can be functions, sure. Switched.

> 5. The declaration of PgBenchExprList has a cuddled opening brace,
> which is not PostgreSQL style.

Indeed. There were other instances.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Fri, Nov 6, 2015 at 3:44 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> 1. It's really not appropriate to fold the documentation changes
>> raised on the other thread into this patch.  I'm not going to commit
>> something where the commit message is a laundry list of unrelated
>> changes.  Please separate out the documentation changes as a separate
>> patch.  Let's do that first, and then make this patch apply on top of
>> those changes.  The related changes in getGaussianRand etc. should
>> also be part of that patch, not this one.
>
> Hmmm. Attached is a two-part v16.

Thanks.  Part 1 looks, on the whole, fine to me, although I think the
changes to use less whitespace and removing decimal places in the
documentation are going in the wrong direction.  That is:

-      About 67% of values are drawn from the middle <literal>1.0 / threshold</>
+      About 67% of values are drawn from the middle <literal>1/param</>,

I would say 1.0 / param, just as we used to say 1.0 / threshold.  Any
reason why not?  That's easier to read IMHO and makes it more clear
that it's integer division.

I'm copying Tomas Vondra on this email since he was the one who kicked
off the other thread where this was previously being discussed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>>> 1. It's really not appropriate to fold the documentation changes
>>> raised on the other thread into this patch.  I'm not going to commit
>>> something where the commit message is a laundry list of unrelated
>>> changes.  Please separate out the documentation changes as a separate
>>> patch.  Let's do that first, and then make this patch apply on top of
>>> those changes.  The related changes in getGaussianRand etc. should
>>> also be part of that patch, not this one.
>>
>> Hmmm. Attached is a two-part v16.
>
> Thanks.  Part 1 looks, on the whole, fine to me, although I think the
> changes to use less whitespace and removing decimal places in the
> documentation are going in the wrong direction.  That is:
>
> -      About 67% of values are drawn from the middle <literal>1.0 / threshold</>
> +      About 67% of values are drawn from the middle <literal>1/param</>,
>
> I would say 1.0 / param, just as we used to say 1.0 / threshold.  Any
> reason why not?

For the 1.0 -> 1, this because in the example afterwards I set param to 
2.0 and I wanted it clear where the one half was coming from, and ISTM 
that the 2.0 stands out more with "1 / 2.0" than with "1.0 / 2.0".

For the spaces, this is because with just "1/" the space seemed less 
necessary for clarity, but it seemed necessary with "1.0 /"

Now it is easy to backtrack.

> That's easier to read IMHO and makes it more clear that it's integer 
> division.

It is not. One benefit of "1.0" makes it clear that this is a double 
division.

> I'm copying Tomas Vondra on this email since he was the one who kicked
> off the other thread where this was previously being discussed.

Fine.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> Thanks.  Part 1 looks, on the whole, fine to me, although I think the
>> changes to use less whitespace and removing decimal places in the
>> documentation are going in the wrong direction.  That is:
>> 
>> -      About 67% of values are drawn from the middle <literal>1.0 / 
>> threshold</>
>> +      About 67% of values are drawn from the middle <literal>1/param</>,
>> 
>> I would say 1.0 / param, just as we used to say 1.0 / threshold.  Any
>> reason why not?
>
> For the 1.0 -> 1, this because in the example afterwards I set param to 2.0 
> and I wanted it clear where the one half was coming from, and ISTM that the 
> 2.0 stands out more with "1 / 2.0" than with "1.0 / 2.0".
>
> For the spaces, this is because with just "1/" the space seemed less 
> necessary for clarity, but it seemed necessary with "1.0 /"
>
> Now it is easy to backtrack.

After looking at the generated html version, I find that the "1/param" and 
"2/param" formula are very simple and pretty easy to read, and they would 
not be really enhanced with additional spacing.

ISTM that adaptative spacing (no spacing for level 1 operations, some for 
higher level) is a good approach for readability, ie:
   f(i) - f(i+1)             ^ no spacing here        ^ some spacing here

So I would suggest to keep the submitted version, unless this is a 
blocker.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Sat, Nov 7, 2015 at 2:45 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> After looking at the generated html version, I find that the "1/param" and
> "2/param" formula are very simple and pretty easy to read, and they would
> not be really enhanced with additional spacing.
>
> ISTM that adaptative spacing (no spacing for level 1 operations, some for
> higher level) is a good approach for readability, ie:
>
>    f(i) - f(i+1)
>              ^ no spacing here
>         ^ some spacing here
>
> So I would suggest to keep the submitted version, unless this is a blocker.

Well, I think with the ".0" version it looks more like floating-point
math, and I like the extra white-space.  But I'm happy to hear other
opinions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Mon, Nov 9, 2015 at 6:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Nov 7, 2015 at 2:45 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> After looking at the generated html version, I find that the "1/param" and
>> "2/param" formula are very simple and pretty easy to read, and they would
>> not be really enhanced with additional spacing.
>>
>> ISTM that adaptative spacing (no spacing for level 1 operations, some for
>> higher level) is a good approach for readability, ie:
>>
>>    f(i) - f(i+1)
>>              ^ no spacing here
>>         ^ some spacing here
>>
>> So I would suggest to keep the submitted version, unless this is a blocker.
>
> Well, I think with the ".0" version it looks more like floating-point
> math, and I like the extra white-space.  But I'm happy to hear other
> opinions.

-      defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</>
+      defined as <literal>(max+min)/2</>, with
This thing reminds me a bit of the little of TeX I know, when writing
things like "\sqrt{1-e^2}" spaces would be printed in the converted
html, and as that's floating arythmetic, we should have as well a .0.
So I would agree on both points with Robert.

I have looked for now at the first patch and finished with the
attached while looking at it. Perhaps a committer could look already
at that?
I am still looking at the 2nd patch in more details...
--
Michael

Attachment

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Nov 9, 2015 at 6:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Nov 7, 2015 at 2:45 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>> After looking at the generated html version, I find that the "1/param" and
>>> "2/param" formula are very simple and pretty easy to read, and they would
>>> not be really enhanced with additional spacing.
>>>
>>> ISTM that adaptative spacing (no spacing for level 1 operations, some for
>>> higher level) is a good approach for readability, ie:
>>>
>>>    f(i) - f(i+1)
>>>              ^ no spacing here
>>>         ^ some spacing here
>>>
>>> So I would suggest to keep the submitted version, unless this is a blocker.
>>
>> Well, I think with the ".0" version it looks more like floating-point
>> math, and I like the extra white-space.  But I'm happy to hear other
>> opinions.
>
> -      defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</>
> +      defined as <literal>(max+min)/2</>, with
> This thing reminds me a bit of the little of TeX I know, when writing
> things like "\sqrt{1-e^2}" spaces would be printed in the converted
> html, and as that's floating arythmetic, we should have as well a .0.
> So I would agree on both points with Robert.
>
> I have looked for now at the first patch and finished with the
> attached while looking at it. Perhaps a committer could look already
> at that?

It looks fine to me except that I think we should spell out "param" as
"parameter" throughout, instead of abbreviating.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
>> I have looked for now at the first patch and finished with the
>> attached while looking at it. Perhaps a committer could look already
>> at that?
>
> It looks fine to me except that I think we should spell out "param" as
> "parameter" throughout, instead of abbreviating.

Fine for me. I have updated the first patch as attached (still looking
at the second).
--
Michael

Attachment

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It looks fine to me except that I think we should spell out "param" as
>> "parameter" throughout, instead of abbreviating.
>
> Fine for me. I have updated the first patch as attached (still looking
> at the second).

This doc update threshold -> parameter & sgml and C code looks ok to me.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Wed, Dec 16, 2015 at 3:07 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>>>
>>> It looks fine to me except that I think we should spell out "param" as
>>> "parameter" throughout, instead of abbreviating.
>>
>>
>> Fine for me. I have updated the first patch as attached (still looking
>> at the second).
>
>
> This doc update threshold -> parameter & sgml and C code looks ok to me.

So I am having a look at the second patch...

+      double constants such as <literal>3.14156</>,
You meant perhaps 3.14159 :)

+       <entry><literal><function>max(<replaceable>i</>,
<replaceable>...</>)</></></>
+       <entry>integer</>
Such function declarations are better with optional arguments listed
within brackets.

+      <row>
+       <entry><literal><function>double(<replaceable>i</>)</></></>
+       <entry>double</>
+       <entry>cast to double</>
+       <entry><literal>double(5432)</></>
+       <entry><literal>5432.0</></>
+      </row>
+      <row>
+       <entry><literal><function>int(<replaceable>x</>)</></></>
+       <entry>integer</>
+       <entry>cast to int</>
+       <entry><literal>int(5.4 + 3.8)</></>
+       <entry><literal>9</></>
+      </row>
If there are cast functions, why doesn't this patch introduces the
concept of the data type for a variable defined in a script? Both
concepts are linked, and for now as I can see the allocation of a \set
variable is actually always an integer.

In consequence, sqrt behavior is a bit surprising, for example this script:
\set aid sqrt(2.0)
SELECT :aid;
returns that:
SELECT 1;
Shouldn't a user expect 1.414 instead? Fabien, am I missing a trick?

It seems to me that this patch would gain in clarity by focusing a bit
more on the infrastructure first and remove a couple of things that
are not that mandatory for now... So the following things are not
necessary as of now:
- cast functions from/to int/double, because a result variable of a
\set does not include the idea that a result variable can be something
else than an integer. At least no options is given to the user to be
able to make a direct use of a double value.
- functions that return a double number: sqrt, pi
- min and max have value because they allow a user to specify the
expression once as a variable instead of writing it perhaps multiple
times in SQL, still is it enough to justify having them as a set of
functions available by default? I am not really sure either.

Really, I like this patch, and I think that it is great to be able to
use a set of functions and methods within a pgbench script because
this clearly can bring more clarity for a developer, still it seems
that this patch is trying to do too many things at the same time:
1) Add infrastructure to support function calls and refactor the
existing functions to use it. This is the FUNCTION stuff in the
expression scanner.
2) Add the concept of double return type, this could be an extension
of \set with a data type, or a new command like \setdouble. This
should include as well the casting routines I guess. This is the
DOUBLE stuff in the expression scanner.
3) Introduce new functions, as needed.

1) and 2) seem worthwhile to me. 3) may depend on the use cases we'd
like to have.. sqrt has for example value if a variable can be set
double as a double type.

In conclusion, for this CF the patch doing the documentation fixes is
"Ready for committer", the second patch introducing the function
infrastructure should focus more on its core structure and should be
marked as "Returned with feedback". Opinions are welcome.
Regards,
-- 
Michael



Re: extend pgbench expressions with functions

From
Alvaro Herrera
Date:
Michael Paquier wrote:
> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
> >> I have looked for now at the first patch and finished with the
> >> attached while looking at it. Perhaps a committer could look already
> >> at that?
> >
> > It looks fine to me except that I think we should spell out "param" as
> > "parameter" throughout, instead of abbreviating.
> 
> Fine for me. I have updated the first patch as attached (still looking
> at the second).

hm, so this is to backpatch, not merely for master, yes?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Thu, Dec 17, 2015 at 1:42 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> In conclusion, for this CF the patch doing the documentation fixes is
> "Ready for committer", the second patch introducing the function
> infrastructure should focus more on its core structure and should be
> marked as "Returned with feedback". Opinions are welcome.

I forgot to attach the second patch that I rebased using the 1st one
modified with the documentation changes.
--
Michael

Attachment

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Thu, Dec 17, 2015 at 1:54 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Michael Paquier wrote:
>> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
>> >> I have looked for now at the first patch and finished with the
>> >> attached while looking at it. Perhaps a committer could look already
>> >> at that?
>> >
>> > It looks fine to me except that I think we should spell out "param" as
>> > "parameter" throughout, instead of abbreviating.
>>
>> Fine for me. I have updated the first patch as attached (still looking
>> at the second).
>
> hm, so this is to backpatch, not merely for master, yes?

I haven't thought about that as it is a cosmetic patch.. But yes
that's harmless to backpatch to 9.5, and it would actually be good to
get a consistent code base with master I guess.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michael,

Thanks for your remarks.

> +      double constants such as <literal>3.14156</>,
> You meant perhaps 3.14159 :)

Indeed!

> +       <entry><literal><function>max(<replaceable>i</>,
> <replaceable>...</>)</></></>
> +       <entry>integer</>
> Such function declarations are better with optional arguments listed
> within brackets.

Why not. I did it that way because this is the standard C syntax for 
varargs.

> +      <row>
> +       <entry><literal><function>double(<replaceable>i</>)</></></>
> +       <entry>double</>
> +       <entry>cast to double</>
> +       <entry><literal>double(5432)</></>
> +       <entry><literal>5432.0</></>
> +      </row>
> +      <row>
> +       <entry><literal><function>int(<replaceable>x</>)</></></>
> +       <entry>integer</>
> +       <entry>cast to int</>
> +       <entry><literal>int(5.4 + 3.8)</></>
> +       <entry><literal>9</></>
> +      </row>
> If there are cast functions, why doesn't this patch introduces the
> concept of the data type for a variable defined in a script?

Because this would be a pain and this is really useless as far as pgbench 
scripts are concerned, it really just needs integers.

> Both concepts are linked, and for now as I can see the allocation of a 
> \set variable is actually always an integer.

Yes.

> In consequence, sqrt behavior is a bit surprising, for example this script:
> \set aid sqrt(2.0)
> SELECT :aid;
> returns that:
> SELECT 1;
> Shouldn't a user expect 1.414 instead? Fabien, am I missing a trick?

There is no trick. There are only integer variables in pgbench, so 1.4 is 
rounded to 1 by the assignment.

> It seems to me that this patch would gain in clarity by focusing a bit
> more on the infrastructure first and remove a couple of things that
> are not that mandatory for now...

This is more or less what I did in the beginning, and then someone said 
"please show a complete infra with various examples to show that the infra 
is really extensible". Now you say the reverse. This is *tiring* and is 
not a good use of the little time I can give.

> So the following things are not necessary as of now:

> - cast functions from/to int/double, because a result variable of a
> \set does not include the idea that a result variable can be something
> else than an integer. At least no options is given to the user to be
> able to make a direct use of a double value.
> - functions that return a double number: sqrt, pi
> - min and max have value because they allow a user to specify the
> expression once as a variable instead of writing it perhaps multiple
> times in SQL, still is it enough to justify having them as a set of
> functions available by default? I am not really sure either.

AFAICR this is because I was *ASKED* to show an infra which would deal 
with various cases: varargs, double/int functions/operators, overloading, 
so I added some example in each category, hence the current list of 
functions in the patch.

> Really, I like this patch, and I think that it is great to be able to
> use a set of functions and methods within a pgbench script because
> this clearly can bring more clarity for a developer, still it seems
> that this patch is trying to do too many things at the same time:
> 1) Add infrastructure to support function calls and refactor the
> existing functions to use it. This is the FUNCTION stuff in the
> expression scanner.
> 2) Add the concept of double return type, this could be an extension
> of \set with a data type, or a new command like \setdouble. This
> should include as well the casting routines I guess. This is the
> DOUBLE stuff in the expression scanner.
> 3) Introduce new functions, as needed.

Yep. I started with (1), and was *ASKED* to do the others.

Adding double variables in pretty useless in my opinion, and potentially 
lead to issues in various places, so I'm not in a hurry to do that.

> 1) and 2) seem worthwhile to me. 3) may depend on the use cases we'd
> like to have.. sqrt has for example value if a variable can be set
> double as a double type.

Sure. This is just an example of a double function so that if someone 
wants to add another one they can just update where it make sense. Maybe 
log/exp would make more sense for pgbench.

> In conclusion, for this CF the patch doing the documentation fixes is
> "Ready for committer", the second patch introducing the function
> infrastructure should focus more on its core structure and should be
> marked as "Returned with feedback". Opinions are welcome.

My opinion is that to do and unddo work because of random thoughts on the 
list is tiring.

What I can do is:

(1) fix the doc and bugs if any, obviously.

(2a) remove double stuff, just keep integer functions.     I would rather keep min/max, though.

(2b) keep the patch with both int & double as is.

What I will *NOT* do is to add double variables without a convincing use 
case.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Thu, Dec 17, 2015 at 6:02 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> Really, I like this patch, and I think that it is great to be able to
>> use a set of functions and methods within a pgbench script because
>> this clearly can bring more clarity for a developer, still it seems
>> that this patch is trying to do too many things at the same time:
>> 1) Add infrastructure to support function calls and refactor the
>> existing functions to use it. This is the FUNCTION stuff in the
>> expression scanner.
>> 2) Add the concept of double return type, this could be an extension
>> of \set with a data type, or a new command like \setdouble. This
>> should include as well the casting routines I guess. This is the
>> DOUBLE stuff in the expression scanner.
>> 3) Introduce new functions, as needed.
>
>
> Yep. I started with (1), and was *ASKED* to do the others.

That's the beginning of the thread... N developers have (N+1) opinions
as a wise guy watching this mailing list said once.

>> 1) and 2) seem worthwhile to me. 3) may depend on the use cases we'd
>> like to have.. sqrt has for example value if a variable can be set
>> double as a double type.
>
>
> Sure. This is just an example of a double function so that if someone wants
> to add another one they can just update where it make sense. Maybe log/exp
> would make more sense for pgbench.

Perhaps..

>> In conclusion, for this CF the patch doing the documentation fixes is
>> "Ready for committer", the second patch introducing the function
>> infrastructure should focus more on its core structure and should be
>> marked as "Returned with feedback". Opinions are welcome.
>
>
> My opinion is that to do and undo work because of random thoughts on the
> list is tiring.
>
> What I can do is:
>
> (1) fix the doc and bugs if any, obviously.

Thanks.

> (2a) remove double stuff, just keep integer functions.
>      I would rather keep min/max, though.

(2a) sounds like a fine plan to get something committable. We could
keep min/max/abs, and remove sqrt/pi. What's actually the use case for
debug? I cannot wrap my mind on one?

> (2b) keep the patch with both int & double as is.

Functions returning double are not that useful... pgbench can live without them.

> What I will *NOT* do is to add double variables without a convincing use
> case.

I am not actually meaning that you should do it. I scrutinized the
thread and this patch and thought that there could be perhaps be some
use cases for double return values after seeing the cast functions. It
is not my attention to piss you off, I like this patch a lot, and you
have already proved that if someone wanted to get those additional
features, well the infrastructure being put in place would allow doing
that :)
If the second patch gets in a simplified shape I'll look at it again
then let's move on with a committer's opinion.

Regards,
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> (2a) remove double stuff, just keep integer functions.
>>      I would rather keep min/max, though.
>
> (2a) sounds like a fine plan to get something committable. We could keep 
> min/max/abs, and remove sqrt/pi. What's actually the use case for debug? 
> I cannot wrap my mind on one?

It was definitely useful to debug the double/int type stuff within 
expressions when writing a non trivial pgbench script. It is probably less 
interesting if there are only integers.

>> (2b) keep the patch with both int & double as is.
>
> Functions returning double are not that useful... pgbench can live 
> without them.

Sure.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Thu, Dec 17, 2015 at 10:33 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>>> (2a) remove double stuff, just keep integer functions.
>>>      I would rather keep min/max, though.
>>
>>
>> (2a) sounds like a fine plan to get something committable. We could keep
>> min/max/abs, and remove sqrt/pi. What's actually the use case for debug? I
>> cannot wrap my mind on one?
>
>
> It was definitely useful to debug the double/int type stuff within
> expressions when writing a non trivial pgbench script. It is probably less
> interesting if there are only integers.

OK for me. Let's leave without it as well if you think it is not that
useful. It could still be introduced later on.
-- 
Michael



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Fri, Dec 18, 2015 at 9:08 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Dec 17, 2015 at 10:33 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>
>>>> (2a) remove double stuff, just keep integer functions.
>>>>      I would rather keep min/max, though.
>>>
>>>
>>> (2a) sounds like a fine plan to get something committable. We could keep
>>> min/max/abs, and remove sqrt/pi. What's actually the use case for debug? I
>>> cannot wrap my mind on one?
>>
>>
>> It was definitely useful to debug the double/int type stuff within
>> expressions when writing a non trivial pgbench script. It is probably less
>> interesting if there are only integers.
>
> OK for me. Let's leave without it as well if you think it is not that
> useful. It could still be introduced later on.

s/leave/live. This morning is a bit hard...
-- 
Michael



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Wed, Dec 16, 2015 at 12:54 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Dec 16, 2015 at 6:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Dec 14, 2015 at 7:25 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
>>> I have looked for now at the first patch and finished with the
>>> attached while looking at it. Perhaps a committer could look already
>>> at that?
>>
>> It looks fine to me except that I think we should spell out "param" as
>> "parameter" throughout, instead of abbreviating.
>
> Fine for me. I have updated the first patch as attached (still looking
> at the second).

Committed.  I was on the fence about whether to back-patch this
considering how close we are to release, but ended up deciding to go
for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michael,

> It was definitely useful to debug the double/int type stuff within 
> expressions when writing a non trivial pgbench script. It is probably less 
> interesting if there are only integers.

After looking again at the code, I remembered why double are useful: there 
are needed for random exponential & gaussian because the last parameter is 
a double.

I do not care about the sqrt, but double must be allowed to keep that, and 
the randoms are definitely useful for a pgbench script. Now the patch may 
just keep double constants, but it would look awkward, and the doc must 
explain why 1.3 and 1+2 are okay, but not 1.3 + 2.4.

So I'm less keen at removing double expressions, because it removes a key 
feature. If it is a blocker I'll go for just the constant, but this looks 
to me like a stupid compromise.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Sat, Dec 19, 2015 at 6:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> It was definitely useful to debug the double/int type stuff within
>> expressions when writing a non trivial pgbench script. It is probably less
>> interesting if there are only integers.
>
>
> After looking again at the code, I remembered why double are useful: there
> are needed for random exponential & gaussian because the last parameter is a
> double.
>
> I do not care about the sqrt, but double must be allowed to keep that, and
> the randoms are definitely useful for a pgbench script. Now the patch may
> just keep double constants, but it would look awkward, and the doc must
> explain why 1.3 and 1+2 are okay, but not 1.3 + 2.4.
>
> So I'm less keen at removing double expressions, because it removes a key
> feature. If it is a blocker I'll go for just the constant, but this looks to
> me like a stupid compromise.

Hm, say that you do that in a script:
\set aid double(1.4)
\set bid random_gaussian(1, 10, :aid)
Then what is passed as third argument in random_gaussian is 1, and not
1.4, no? If all allocations within a variable are unconditionally
integers, why is it useful to make the cast function double()
user-visible? Now, by looking at the code, I agree that you would need
to keep things like DOUBLE and coerceToDouble(),
PGBENCH_RANDOM_GAUSSIAN and its other friend are directly using it. I
am just doubting that it is actually necessary to make that visible at
user-level if they have no direct use..
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> After looking again at the code, I remembered why double are useful: there
>> are needed for random exponential & gaussian because the last parameter is a
>> double.
>>
>> I do not care about the sqrt, but double must be allowed to keep that, and
>> the randoms are definitely useful for a pgbench script. Now the patch may
>> just keep double constants, but it would look awkward, and the doc must
>> explain why 1.3 and 1+2 are okay, but not 1.3 + 2.4.
>>
>> So I'm less keen at removing double expressions, because it removes a key
>> feature. If it is a blocker I'll go for just the constant, but this looks to
>> me like a stupid compromise.
>
> Hm, say that you do that in a script: \set aid double(1.4) \set bid 
> random_gaussian(1, 10, :aid) Then what is passed as third argument in 
> random_gaussian is 1, and not 1.4, no?

Indeed.

Maybe pgbench should just generate an error when a variable is assigned a 
double, so that the user must explicitly add an int() cast.

> If all allocations within a variable are unconditionally integers, why 
> is it useful to make the cast function double() user-visible?

I'm not sure whether we are talking about the same thing: - there a "double" type managed within expressions, but not
variables- there is a double() function, which takes an int and casts to double
 

I understood that you were suggesting to remove all "double" expressions,
but now it seems to be just about the double() function.

> Now, by looking at the code, I agree that you would need
> to keep things like DOUBLE and coerceToDouble(),
> PGBENCH_RANDOM_GAUSSIAN and its other friend are directly using it.

Yep.

> I am just doubting that it is actually necessary to make that visible at 
> user-level if they have no direct use..

If there are both ints and doubles, then being able to cast make sense, so 
I just put both functions without deeper thinking.

So I would suggest to generate an error when an double expression is 
assigned to a variable, so as to avoid any surprise.

If both type are kept, I would like to keep the debug functions, which is 
really just a debug tool to have a look at what is going within 
expressions.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Sat, Dec 19, 2015 at 10:32 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>>> After looking again at the code, I remembered why double are useful:
>>> there
>>> are needed for random exponential & gaussian because the last parameter
>>> is a
>>> double.
>>>
>>> I do not care about the sqrt, but double must be allowed to keep that,
>>> and
>>> the randoms are definitely useful for a pgbench script. Now the patch may
>>> just keep double constants, but it would look awkward, and the doc must
>>> explain why 1.3 and 1+2 are okay, but not 1.3 + 2.4.
>>>
>>> So I'm less keen at removing double expressions, because it removes a key
>>> feature. If it is a blocker I'll go for just the constant, but this looks
>>> to
>>> me like a stupid compromise.
>>
>>
>> Hm, say that you do that in a script: \set aid double(1.4) \set bid
>> random_gaussian(1, 10, :aid) Then what is passed as third argument in
>> random_gaussian is 1, and not 1.4, no?
>
>
> Indeed.
>
> Maybe pgbench should just generate an error when a variable is assigned a
> double, so that the user must explicitly add an int() cast.

I would honestly just remove this whole int() and double() business
from what is available for the user, and mention on in the
documentation clearly the return type and the types of the arguments
of each function. And that's already what your patch is doing.

>> If all allocations within a variable are unconditionally integers, why is
>> it useful to make the cast function double() user-visible?
>
> I'm not sure whether we are talking about the same thing:
>  - there a "double" type managed within expressions, but not variables
>  - there is a double() function, which takes an int and casts to double
>
> I understood that you were suggesting to remove all "double" expressions,
> but now it seems to be just about the double() function.

There is indeed a misunderstanding here: I meant from the start the
removal of only the "double" function. It would be nice to keep as
user-visible only things that have some meaning.

>> I am just doubting that it is actually necessary to make that visible at
>> user-level if they have no direct use..
>
> If there are both ints and doubles, then being able to cast make sense, so I
> just put both functions without deeper thinking.
> So I would suggest to generate an error when an double expression is
> assigned to a variable, so as to avoid any surprise.
> If both type are kept, I would like to keep the debug functions, which is
> really just a debug tool to have a look at what is going within expressions.

Well, if there were doubles as return results really allocated as
doubles in variables having both would make sense. And honestly
something like sqrt that returns an integer when allocated in a
variable is really surprising.. And as you mentioned upthread there is
no real meaning to have doubles variable types that can be allocated.

(Moving this patch to next CF btw)
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michael,

>> I'm not sure whether we are talking about the same thing:
>>  - there a "double" type managed within expressions, but not variables
>>  - there is a double() function, which takes an int and casts to double
>>
>> I understood that you were suggesting to remove all "double" expressions,
>> but now it seems to be just about the double() function.
>
> There is indeed a misunderstanding here: I meant from the start the
> removal of only the "double" function.

Ok. I clearly misunderstood everything...

> It would be nice to keep as user-visible only things that have some 
> meaning.

Why not.

The purpose of some of these functions what to show how the function 
infrastructured extended, as was asked on the thread. The really useful 
ones in the bunch are the arithmetic operators and randoms generators.

> [...] Well, if there were doubles as return results really allocated as 
> doubles in variables having both would make sense. And honestly 
> something like sqrt that returns an integer when allocated in a variable 
> is really surprising.. And as you mentioned upthread there is no real 
> meaning to have doubles variable types that can be allocated.

So you would just like to remove double double(int) and double 
sqrt(double) from the patch, and basically that is all? int int(double)??
debug??? (hmmm, useful debugging a non trivial expression).

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Tue, Dec 22, 2015 at 12:16 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> So you would just like to remove double double(int) and double sqrt(double)
> from the patch, and basically that is all? int int(double)??
> debug??? (hmmm, useful debugging a non trivial expression).

OK, so I am finally back on this item, and after a close look at the
code I am throwing away my concerns.

-                       if (!evaluateExpr(st, expr, &result))
+                       if (!evaluateExpr(thread, st, expr, &result))                       {
   st->ecnt++;                               return true;                       }
 
-                       sprintf(res, INT64_FORMAT, result);
+                       sprintf(res, INT64_FORMAT, INT(result));
Based on that all the final results of a \set command will have an
integer format, still after going through this patch, allowing double
as return type for nested function calls (first time "nested" is
written on this thread) is actually really useful, and that's what
makes sense for this feature. I am not sure why I haven't thought
about that before as well... So, even if for example the final result
of a variable is an integer, it is possible to do very fancy things
like that:
\set aid debug(random_exponential(1, 100, pi()))
\set bid debug(random_exponential(101, 200, pi()))
\set cid debug(random_gaussian(:aid, :bid, double(:aid * pi())))
SELECT :cid;

That's actually where things like sqrt() and pi() gain a lot in power
by working directly on the integers returned by the random functions.

Something that bothered me though while testing: the debug() function
is useful, but it becomes difficult to see its results efficiently
when many outputs are stacking, so I think that it would be useful to
be able to pass a string prefix to have the possibility to annotate a
debug entry, say "debug('foo', 5.2)" outputs:
debug(script=0,command=1): note: foo, int/double X
I guess that it would be useful then to allow as well concatenation of
strings using "+" with a new PgBenchExprType as ENODE_STRING, but
perhaps I am asking too much. Thoughts are welcome regarding that, it
does not seem mandatory as of now as this patch is already doing much.
We could remove some of the functions in the first shot of this patch
to simplify it a bit, but that does not look like a win as their
footprint on the code is low.

I haven't noticed at quick glance any kind of memory leaks but this
deserves a closer look with valgrind for example, still the patch
looks in good shape to me. And more comments for example in pgbench.h
would be welcome to explain more the code. I am fine to take a final
look at that before handling it to a committer though. Does that sound
fine as a plan, Fabien?
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

> Based on that all the final results of a \set command will have an
> integer format, still after going through this patch, allowing double
> as return type for nested function calls (first time "nested" is
> written on this thread) is actually really useful, and that's what
> makes sense for this feature.

Yep.

> [...] Something that bothered me though while testing: the debug() 
> function is useful, but it becomes difficult to see its results 
> efficiently when many outputs are stacking, so I think that it would be 
> useful to be able to pass a string prefix to have the possibility to 
> annotate a debug entry, say "debug('foo', 5.2)" outputs: 
> debug(script=0,command=1): note: foo, int/double X I guess that it would 
> be useful then to allow as well concatenation of strings using "+" with 
> a new PgBenchExprType as ENODE_STRING, but perhaps I am asking too much.

I think that the answer is yes:-)

Obviously all this is possible in the grand scale of things, but this is 
also significant work for a tiny benefit, if any. I rather see "debug" as 
a simple tool for debugging a script with "pgbench -t 1" (i.e. one or few 
transactions), so that the trace length is not an issue.

Once the script is satisfactory, remove the debug and run it for real.

Note that it does not make much sense to run with "debug" calls for many 
transactions because of the performance impact.

> Thoughts are welcome regarding that, it does not seem mandatory

Good.

> as of now as this patch is already doing much.

Yep.

> We could remove some of the functions in the first shot of this patch
> to simplify it a bit, but that does not look like a win as their
> footprint on the code is low.

That is one good thing about this function infrastructure, adding a new 
function has a small impact. Some functions are there more for the demo of 
how to do it than useful as such (eg sqrt()), but that is not bad thing.

> I haven't noticed at quick glance any kind of memory leaks but this
> deserves a closer look with valgrind for example, still the patch
> looks in good shape to me.

ISTM that there is no allocation in the evaluation part, all is done in 
stack, so there is no risk of a leak there. If there is a leak in the 
parser we really don't care: this is run once when reading a script, and 
then after a run pgbench exits anyway.

> And more comments for example in pgbench.h would be welcome to explain 
> more the code.

Ok. I'm generally in favor of comments.

> I am fine to take a final look at that before handling it to a committer 
> though. Does that sound fine as a plan, Fabien?

I understand that you propose to add these comments?

I'm fine with that!:-)

If I misuderstood, tell me:-)

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Thu, Jan 7, 2016 at 7:00 PM, Fabien COELHO wrote:
> Obviously all this is possible in the grand scale of things, but this is
> also significant work for a tiny benefit, if any. I rather see "debug" as a
> simple tool for debugging a script with "pgbench -t 1" (i.e. one or few
> transactions), so that the trace length is not an issue.
> Once the script is satisfactory, remove the debug and run it for real.
> Note that it does not make much sense to run with "debug" calls for many
> transactions because of the performance impact.

Yep. That's exactly what I did during my tests.

>> And more comments for example in pgbench.h would be welcome to explain
>> more the code.
>
> Ok. I'm generally in favor of comments.

OK, I added some where I thought it was adapted.

>> I am fine to take a final look at that before handling it to a committer
>> though. Does that sound fine as a plan, Fabien?
>
> I understand that you propose to add these comments?

Yep. I did as attached.

> I'm fine with that!:-)
> If I misuderstood, tell me:-)

I think you don't, but I found other issues:
1) When precising a negative parameter in the gaussian and exponential
functions an assertion is triggered:
Assertion failed: (parameter > 0.0), function getExponentialRand, file
pgbench.c, line 494.
Abort trap: 6 (core dumped)
An explicit error is better.
2) When precising a value between 0 and 2, pgbench will loop
infinitely. For example:
\set cid debug(random_gaussian(-100, 100, 0))
In both cases we just lack sanity checks with PGBENCH_RANDOM,
PGBENCH_RANDOM_EXPONENTIAL and PGBENCH_RANDOM_GAUSSIAN. I think that
those checks would be better if moved into getExponentialRand & co
with the assertions removed. I would personally have those functions
return a boolean status and have the result in a pointer as a function
argument.

Another thing itching me is that ENODE_OPERATOR is actually useless
now that there is a proper function layer. Removing it and replacing
it with a set of functions would be more adapted and make the code
simpler, at the cost of more functions and changing the parser so as
an operator found is transformed into a function expression.

I am attaching a patch where I tweaked a bit the docs and added some
comments for clarity. Patch is switched to "waiting on author" for the
time being.
Regards,
--
Michael

Attachment

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

> 1) When precising a negative parameter in the gaussian and exponential
> functions an assertion is triggered:
> Assertion failed: (parameter > 0.0), function getExponentialRand, file
> pgbench.c, line 494.
> Abort trap: 6 (core dumped)
> An explicit error is better.

Ok for assert -> error.

> 2) When precising a value between 0 and 2, pgbench will loop
> infinitely. For example:
> \set cid debug(random_gaussian(-100, 100, 0))
> In both cases we just lack sanity checks with PGBENCH_RANDOM,
> PGBENCH_RANDOM_EXPONENTIAL and PGBENCH_RANDOM_GAUSSIAN. I think that
> those checks would be better if moved into getExponentialRand & co
> with the assertions removed. I would personally have those functions
> return a boolean status and have the result in a pointer as a function
> argument.

ISTM that if pgbench is to be stopped, the simplest option is just to 
abort with a nicer error message from the get*Rand function, there is no 
need to change the function signature and transfer the error management 
upwards.

> Another thing itching me is that ENODE_OPERATOR is actually useless
> now that there is a proper function layer. Removing it and replacing
> it with a set of functions would be more adapted and make the code
> simpler, at the cost of more functions and changing the parser so as
> an operator found is transformed into a function expression.

Hmmm. Why not, although the operator management is slightly more efficient 
(eg always 2 operands...).

> I am attaching a patch where I tweaked a bit the docs and added some
> comments for clarity. Patch is switched to "waiting on author" for the
> time being.

Ok. I'm hesitating about removing the operator management, especially if 
I'm told to put it back afterwards.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Tue, Jan 12, 2016 at 5:52 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> 2) When precising a value between 0 and 2, pgbench will loop
>> infinitely. For example:
>> \set cid debug(random_gaussian(-100, 100, 0))
>> In both cases we just lack sanity checks with PGBENCH_RANDOM,
>> PGBENCH_RANDOM_EXPONENTIAL and PGBENCH_RANDOM_GAUSSIAN. I think that
>> those checks would be better if moved into getExponentialRand & co
>> with the assertions removed. I would personally have those functions
>> return a boolean status and have the result in a pointer as a function
>> argument.
>
>
> ISTM that if pgbench is to be stopped, the simplest option is just to abort
> with a nicer error message from the get*Rand function, there is no need to
> change the function signature and transfer the error management upwards.

That's fine to me, as long as the solution is elegant.

>> I am attaching a patch where I tweaked a bit the docs and added some
>> comments for clarity. Patch is switched to "waiting on author" for the
>> time being.
>
> Ok. I'm hesitating about removing the operator management, especially if I'm
> told to put it back afterwards.

I can understand that, things like that happen all the time here and
that's not a straight-forward patch that we have here. I am sure that
additional opinions here would be good to have before taking one
decision or another. With the current statu-quo, let's just do what
you think is best.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

>> ISTM that if pgbench is to be stopped, the simplest option is just to abort
>> with a nicer error message from the get*Rand function, there is no need to
>> change the function signature and transfer the error management upwards.
>
> That's fine to me, as long as the solution is elegant.

Hmmm, this is subjective:-)

I've decided to stay with the current behavior (\setrandom), that is to 
abort the current transaction on errors but not to abort pgbench itself. 
The check is done before calling the functions, and I let an assert in the 
functions just to be sure. It is going to loop on errors anyway, but this 
is what it already does anyway.

>> Ok. I'm hesitating about removing the operator management, especially if I'm
>> told to put it back afterwards.
>
> I can understand that, things like that happen all the time here and
> that's not a straight-forward patch that we have here. I am sure that
> additional opinions here would be good to have before taking one
> decision or another. With the current statu-quo, let's just do what
> you think is best.

I let the operators alone and just adds functions management next to it. 
I'll merge operators as functions only if it is a blocker.

I have assumed that your v4 is really v17, and this is v18...

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Wed, Jan 13, 2016 at 10:28 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Michaël,
>
>>> ISTM that if pgbench is to be stopped, the simplest option is just to
>>> abort
>>> with a nicer error message from the get*Rand function, there is no need
>>> to
>>> change the function signature and transfer the error management upwards.
>>
>>
>> That's fine to me, as long as the solution is elegant.
>
>
> Hmmm, this is subjective:-)

And dependent on personal opinions and views.

> I've decided to stay with the current behavior (\setrandom), that is to
> abort the current transaction on errors but not to abort pgbench itself. The
> check is done before calling the functions, and I let an assert in the
> functions just to be sure. It is going to loop on errors anyway, but this is
> what it already does anyway.

OK, yes I see now I missed that during my last review. This has the
disadvantage to double the amount of code dedicated to parameter
checks though :(
But based on the feedback perhaps other folks would feel that it would
be actually worth simply dropping the existing \setrandom command. I
won't object to that personally, such pgbench features are something
for hackers and devs mainly, so I guess that we could survive without
a deprecation period here.

>>> Ok. I'm hesitating about removing the operator management, especially if
>>> I'm
>>> told to put it back afterwards.
>>
>>
>> I can understand that, things like that happen all the time here and
>> that's not a straight-forward patch that we have here. I am sure that
>> additional opinions here would be good to have before taking one
>> decision or another. With the current statu-quo, let's just do what
>> you think is best.
>
>
> I let the operators alone and just adds functions management next to it.
> I'll merge operators as functions only if it is a blocker.

I think that's a blocker, but I am not the only one here and not a committer.

-                 fprintf(stderr, "gaussian parameter must be at least
%f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]);
+                fprintf(stderr,
+                           "random gaussian parameter must be greater than %f "
+                            "(got %f)\n", MIN_GAUSSIAN_PARAM, parameter);
This looks like useless noise to me. Why changing those messages?

-     if (parameter <= 0.0)
+    if (parameter < 0.0)     {

This bit is false, and triggers an assertion failure when the
exponential parameter is 0.
             fprintf(stderr,
-                       "exponential parameter must be greater than
zero (not \"%s\")\n",
-                        argv[5]);
+                      "random exponential parameter must be greater than 0.0 "
+                       "(got %f)\n", parameter);             st->ecnt++;
-            return true;
+           return false;
This diff is noise as well, and should be removed.

+               /*
+                * Note: this section could be removed, as the same
functionnality
+                * is available through \set xxx random_gaussian(...)
+                */
I think that you are right to do that. That's no fun to break existing
scripts, even if people doing that with pgbench are surely experienced
hackers.
                       }
-               case ENODE_VARIABLE:
Such diffs are noise as well.

int() should be strengthened regarding bounds. For example:
\set cid debug(int(int(9223372036854775807) +
double(9223372036854775807)))debug(script=0,command=1): int -9223372036854775808
--
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

>> Hmmm, this is subjective:-)
>
> And dependent on personal opinions and views.
>
>> I've decided to stay with the current behavior (\setrandom), that is to
>> abort the current transaction on errors but not to abort pgbench itself. The
>> check is done before calling the functions, and I let an assert in the
>> functions just to be sure. It is going to loop on errors anyway, but this is
>> what it already does anyway.
>
> OK, yes I see now I missed that during my last review. This has the
> disadvantage to double the amount of code dedicated to parameter
> checks though :(

Yep, I noticed that obviously, but I envision that "\setrandom" pretty 
unelegant code could go away soon, so this is really just temporary.

> But based on the feedback perhaps other folks would feel that it would
> be actually worth simply dropping the existing \setrandom command.

Yep, exactly my thoughts. I did not do it because there are two ways: 
actually remove it and be done, or build an equivalent \set at parse time, 
so that would just remove the execution part, but keep some ugly stuff in 
parsing.

I would be in favor of just dropping it.

> I won't object to that personally, such pgbench features are something 
> for hackers and devs mainly, so I guess that we could survive without a 
> deprecation period here.

Yep. I can remove it, but I would like a clear go/vote before doing that.

>>> I can understand that, things like that happen all the time here and
>>> that's not a straight-forward patch that we have here. I am sure that
>>> additional opinions here would be good to have before taking one
>>> decision or another. With the current statu-quo, let's just do what
>>> you think is best.
>>
>>
>> I let the operators alone and just adds functions management next to it.
>> I'll merge operators as functions only if it is a blocker.
>
> I think that's a blocker, but I am not the only one here and not a committer.

Ok, I can remove that easily anyway.

> -                 fprintf(stderr, "gaussian parameter must be at least
> %f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]);
> +                fprintf(stderr,
> +                           "random gaussian parameter must be greater than %f "
> +                            "(got %f)\n", MIN_GAUSSIAN_PARAM, parameter);
> This looks like useless noise to me. Why changing those messages?

Because the message was not saying that it was about random, and I think
that it is the important.

> -     if (parameter <= 0.0)
> +    if (parameter < 0.0)
>      {
>
> This bit is false, and triggers an assertion failure when the
> exponential parameter is 0.

Oops:-( Sorry.

>              fprintf(stderr,
> -                       "exponential parameter must be greater than
> zero (not \"%s\")\n",
> -                        argv[5]);
> +                      "random exponential parameter must be greater than 0.0 "
> +                       "(got %f)\n", parameter);
>              st->ecnt++;
> -            return true;
> +           return false;
> This diff is noise as well, and should be removed.

Ok, I can but "zero" and "not" back, but same remark as above, why not 
tell that it is about random? This information is missing.

> +               /*
> +                * Note: this section could be removed, as the same
> functionnality
> +                * is available through \set xxx random_gaussian(...)
> +                */
> I think that you are right to do that. That's no fun to break existing
> scripts, even if people doing that with pgbench are surely experienced
> hackers.

Ok, but I would like a clear go or vote before doing this.

> -
>                case ENODE_VARIABLE:
> Such diffs are noise as well.

Yep.

> int() should be strengthened regarding bounds. For example:
> \set cid debug(int(int(9223372036854775807) +
> double(9223372036854775807)))
> debug(script=0,command=1): int -9223372036854775808

Hmmm. You mean just to check the double -> int conversion for overflow,
as in:
  SELECT (9223372036854775807::INT8 +          9223372036854775807::DOUBLE PRECISION)::INT8;

Ok.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Thu, Jan 14, 2016 at 5:54 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> I wrote:
>> -                 fprintf(stderr, "gaussian parameter must be at least
>> %f (not \"%s\")\n", MIN_GAUSSIAN_PARAM, argv[5]);
>> +                fprintf(stderr,
>> +                           "random gaussian parameter must be greater
>> than %f "
>> +                            "(got %f)\n", MIN_GAUSSIAN_PARAM, parameter);
>> This looks like useless noise to me. Why changing those messages?
>
>
> Because the message was not saying that it was about random, and I think
> that it is the important.
>>              fprintf(stderr,
>> -                       "exponential parameter must be greater than
>> zero (not \"%s\")\n",
>> -                        argv[5]);
>> +                      "random exponential parameter must be greater than
>> 0.0 "
>> +                       "(got %f)\n", parameter);
>>              st->ecnt++;
>> -            return true;
>> +           return false;
>> This diff is noise as well, and should be removed.
>
> Ok, I can but "zero" and "not" back, but same remark as above, why not tell
> that it is about random? This information is missing.

Those things should be a separate patch then, committed separately as
they provide more verbose messages.

>> +               /*
>> +                * Note: this section could be removed, as the same
>> functionnality
>> +                * is available through \set xxx random_gaussian(...)
>> +                */
>> I think that you are right to do that. That's no fun to break existing
>> scripts, even if people doing that with pgbench are surely experienced
>> hackers.
>
> Ok, but I would like a clear go or vote before doing this.

For now, I am seeing opinions on those matters from nobody else than
me and you, and we got toward the same direction. If you think that
there is a possibility that somebody has a different opinion on those
matters, and it is likely so let's keep the patch as-is then and wait
for more input: it is easier to remove code than add it back. I am not
sure what a committer would say, and it surely would be a waste of
time to just move back and worth for everybody.

>> int() should be strengthened regarding bounds. For example:
>> \set cid debug(int(int(9223372036854775807) +
>> double(9223372036854775807)))
>> debug(script=0,command=1): int -9223372036854775808
>
>
> Hmmm. You mean just to check the double -> int conversion for overflow,
> as in:
>
>   SELECT (9223372036854775807::INT8 +
>           9223372036854775807::DOUBLE PRECISION)::INT8;
>
> Ok.

Yes, that's what I mean. The job running into that should definitely
fail with a proper out-of-bound error message.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

Here is a v19 : - avoid noisy changes - abort on double->int overflow - implement operators as functions

There is still \setrandom, that I can remove easily with a green light.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Fri, Jan 15, 2016 at 11:53 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Here is a v19 :
>  - avoid noisy changes
>  - abort on double->int overflow
>  - implement operators as functions
>
> There is still \setrandom, that I can remove easily with a green light.

(I am not sure why *$%"# gmail broke the thread in my last email)

Thanks for the new patch and replacing the operator stuff by functions.

+       <entry>uniformly-distributed random integer in <literal>[lb,ub]</></>
Nitpick: when defining an interval like that, you may want to add a
space after the comma. For example seg.sgml does that. It would be
good to be consistent even here. And actually you wrote [ub, lb] in
two places, this should have been reversed.

+      /* beware that the list is reverse in make_func */
s/reverse/reversed/?
                                       }
+#ifdef DEBUG
Some noise.

With this example:
\set cid debug(sqrt(-1))
I get that:
debug(script=0,command=1): double nan
An error would be more logical, no? You want to emulate with complex
numbers instead?

The basic operator functions also do not check for integer overflows.
Those three ones are just overflowing:
\set cid debug(9223372036854775807 + 1)
\set cid debug(-9223372036854775808 - 1)
\set cid debug(9223372036854775807 * 9223372036854775807)
debug(script=0,command=1): int -9223372036854775807
debug(script=0,command=2): int 9223372036854775807
debug(script=0,command=3): int 1
And this one generates a core dump:
\set cid debug(-9223372036854775808 / -1)
Floating point exception: 8 (core dumped)

A more general comment: what about splitting all the execution
functions into a separate file exprexec.c? evaluateExpr (renamed as
execExpr) is the root function, but then we have a set of static
sub-functions for each node, like execExprFunc, execExprVar,
execExprConst, etc? This way we would save a bit of tab-indentation,
this patch making the new code lines becoming larger than 80
characters because of all the switch/case stuff that gets more
complicated.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

> +       <entry>uniformly-distributed random integer in <literal>[lb,ub]</></>

> Nitpick: when defining an interval like that, you may want to add a
> space after the comma.

Why not.

> +      /* beware that the list is reverse in make_func */
> s/reverse/reversed/?

Indeed.

> +
> #ifdef DEBUG
> Some noise.

Ok.

> With this example:
> \set cid debug(sqrt(-1))
> I get that:
> debug(script=0,command=1): double nan
> An error would be more logical, no?

If "sqrt(-1)" as a double is Nan for the computer, I'm fine with that. It 
makes the code simpler to just let the math library do its stuff and not 
bother.

> You want to emulate with complex numbers instead?

Nope.

> The basic operator functions also do not check for integer overflows.

This is a feature. I think that they should not check for overflow, as in 
C, this is just int64_t arithmetic "as is".

Moreover, it would be a new feature to add such a check if desirable, so 
it would belong to another patch, it is not related to adding functions.
The addition already overflows in the current code.

Finally I can think of good reason to use overflows deliberately, so I 
think it would argue against such a change.

> Those three ones are just overflowing:
> \set cid debug(9223372036854775807 + 1)
> \set cid debug(-9223372036854775808 - 1)
> \set cid debug(9223372036854775807 * 9223372036854775807)
> debug(script=0,command=1): int -9223372036854775807
> debug(script=0,command=2): int 9223372036854775807
> debug(script=0,command=3): int 1

All these results are fine from my point of view.

> And this one generates a core dump:
> \set cid debug(-9223372036854775808 / -1)
> Floating point exception: 8 (core dumped)

This one is funny, but it is a fact of int64_t life: you cannot divide 
INT64_MIN by -1 because the result cannot be represented as an int64_t.
This is propably hardcoded in the processor. I do not think it is worth 
doing anything about it for pgbench.

> A more general comment: what about splitting all the execution
> functions into a separate file exprexec.c? evaluateExpr (renamed as
> execExpr) is the root function, but then we have a set of static
> sub-functions for each node, like execExprFunc, execExprVar,
> execExprConst, etc?

I do not see a strong case for renaming. The function part could be split 
because of the indentation, though.

> This way we would save a bit of tab-indentation, this patch making the 
> new code lines becoming larger than 80 characters because of all the 
> switch/case stuff that gets more complicated.

I agree that the code is pretty ugly, but this is partly due to postgres 
indentation rules for switch which are *NOT* reasonnable, IMO.

I put the function evaluation in a function in the attached version.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Sun, Jan 17, 2016 at 3:10 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> With this example:
>> \set cid debug(sqrt(-1))
>> I get that:
>> debug(script=0,command=1): double nan
>> An error would be more logical, no?
>
>
> If "sqrt(-1)" as a double is Nan for the computer, I'm fine with that. It
> makes the code simpler to just let the math library do its stuff and not
> bother.

Hm. OK..

>> The basic operator functions also do not check for integer overflows.
>
>
> This is a feature. I think that they should not check for overflow, as in C,
> this is just int64_t arithmetic "as is".

(int64_t is not an available type on Windows btw.)

> Moreover, it would be a new feature to add such a check if desirable, so it
> would belong to another patch, it is not related to adding functions.
> The addition already overflows in the current code.
>
> Finally I can think of good reason to use overflows deliberately, so I think
> it would argue against such a change.

Could you show up an example? I am curious about that.

>> Those three ones are just overflowing:
>> \set cid debug(9223372036854775807 + 1)
>> \set cid debug(-9223372036854775808 - 1)
>> \set cid debug(9223372036854775807 * 9223372036854775807)
>> debug(script=0,command=1): int -9223372036854775807
>> debug(script=0,command=2): int 9223372036854775807
>> debug(script=0,command=3): int 1
> All these results are fine from my point of view.

On HEAD we are getting similar strange results, so I am fine to
withdraw but that's still very strange to me. The first case is
generating -9223372036854775808, the second one compiles
9223372036854775807 and the third one generates 1. Or we make the
error checks even more consistent in back-branches, perhaps that 's
indeed not worth it for a client though.

>> And this one generates a core dump:
>> \set cid debug(-9223372036854775808 / -1)
>> Floating point exception: 8 (core dumped)
>
> This one is funny, but it is a fact of int64_t life: you cannot divide
> INT64_MIN by -1 because the result cannot be represented as an int64_t.
> This is propably hardcoded in the processor. I do not think it is worth
> doing anything about it for pgbench.

This one, on the contrary, is generating an error on HEAD, and your
patch is causing a crash:
value "9223372036854775808" is out of range for type bigint
That's a regression, no? I am uncomfortable with the fact of letting
such holes in the code, even if that's a client application.

>> A more general comment: what about splitting all the execution
>> functions into a separate file exprexec.c? evaluateExpr (renamed as
>> execExpr) is the root function, but then we have a set of static
>> sub-functions for each node, like execExprFunc, execExprVar,
>> execExprConst, etc?
>
> I do not see a strong case for renaming. The function part could be split
> because of the indentation, though.

The split makes sense to me regarding the fact that we have function
parsing, execution and the main code paths in different files. That's
not far from the backend that does similar split actually.

>> This way we would save a bit of tab-indentation, this patch making the new
>> code lines becoming larger than 80 characters because of all the switch/case
>> stuff that gets more complicated.
>
> I agree that the code is pretty ugly, but this is partly due to postgres
> indentation rules for switch which are *NOT* reasonnable, IMO.
>
> I put the function evaluation in a function in the attached version.

Thanks, this makes the code a bit clearer.
-- 
Michael



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Mon, Jan 18, 2016 at 10:07 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sun, Jan 17, 2016 at 3:10 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> I put the function evaluation in a function in the attached version.
>
> Thanks, this makes the code a bit clearer.

OK, so I had an extra look at this patch and I am marking it as ready
for committer. A couple of things to be aware of, and the result of
this thread with the patch in its current state:
- The patch is keeping \setrandom. Fabien and I are agreeing to purely
remove it, though it is kept in the patch because it is easier to
remove existing code rather than add it again per Fabien's concerns.
- INT64_MIN / -1 throws a core dump, and errors on HEAD. I think this
should be fixed, Fabien does not.
- There are not many overflow checks for the exiting int64 operators
and functions. HEAD doesn't do much, this patch makes it the situation
a bit worse even if there are a couple of checks for int() for
example. We do not do any checks for sqrt(-N) (N > 0) for example.
- It may be more interesting to have the function execution code into
a separate file for clarity. Not mandatory though.
Except those comments, all the other issues have been addressed. I
think this is a great patch, and greatly improves the extensibility of
pgbench.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>>> The basic operator functions also do not check for integer overflows.
>>
>> This is a feature. I think that they should not check for overflow, as in C,
>> this is just int64_t arithmetic "as is".
>
> (int64_t is not an available type on Windows btw.)

Possibly. I really meant "64 bits signed integers", whatever its name. 
"int64_t" is the standard C99 name.

>> Finally I can think of good reason to use overflows deliberately, so I 
>> think it would argue against such a change.

> Could you show up an example? I am curious about that.

The one I can think of is the use of "SUM" to aggregate hashes for 
computing a hash on a table. If SUM would overflow and stop this would 
break the usage. Now there could be a case for having an overflow 
detection on SUM, but that would be another SUM, preferably with a 
distinct name.

>>> \set cid debug(9223372036854775807 * 9223372036854775807)
>>> debug(script=0,command=3): int 1
>> All these results are fine from my point of view.
>
> On HEAD we are getting similar strange results,

Yep, this is not new.

> so I am fine to withdraw but that's still very strange to me.

Arithmetic operator modulo are pretty strange, I can agree with that:-)

> The first case is generating -9223372036854775808, the second one 
> compiles 9223372036854775807 and the third one generates 1.

This should be the "real" result modulo 2**64, if I'm not mistaken.

> Or we make the error checks even more consistent in back-branches, 
> perhaps that 's indeed not worth it for a client though.

Yep, that would be another patch.

>>> And this one generates a core dump:
>>> \set cid debug(-9223372036854775808 / -1)
>>> Floating point exception: 8 (core dumped)
>>
>> This one is funny, but it is a fact of int64_t life: you cannot divide
>> INT64_MIN by -1 because the result cannot be represented as an int64_t.
>> This is propably hardcoded in the processor. I do not think it is worth
>> doing anything about it for pgbench.
>
> This one, on the contrary, is generating an error on HEAD, and your 
> patch is causing a crash: value "9223372036854775808" is out of range 
> for type bigint That's a regression, no?

Hmmm, yes, somehow, but just for this one value, if you try the next:

pgbench 9.4.5: value "-9223372036854775809" is out of range for type bigint

I guess that the implementation before 9.5 converted 
"-9223372036854775808" as an int, which is INT64_MIN, so it was fine. Now 
it is parsed as "operator uminus" applied to "9223372036854775808", which 
is not fine because this would be INT64_MAX+1, which is not possible.

I would prefer just to neglect that as a very small (1/2**64) feature 
change rather than a meaningful regression, especially as the coding 
effort to fix this is significant and the value of handling it differently 
is nought.

> I am uncomfortable with the fact of letting such holes in the code, even 
> if that's a client application.

This is a 2**128 probability case which stops pgbench. It is obviously 
possible to add a check to catch it, and then generate an error message, 
but I would rather just ignore it and let pgbench stop on that.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> OK, so I had an extra look at this patch and I am marking it as ready
> for committer.

Ok.

> - INT64_MIN / -1 throws a core dump, and errors on HEAD. I think this
> should be fixed, Fabien does not.

Yep. Another point about this one is that it is not related to this patch 
about functions.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> OK, so I had an extra look at this patch and I am marking it as ready
>> for committer.
>
> Ok.

Attached is a rebase after recent changes in pgbench code & doc.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Sat, Jan 16, 2016 at 1:10 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> All these results are fine from my point of view.
>
>> And this one generates a core dump:
>> \set cid debug(-9223372036854775808 / -1)
>> Floating point exception: 8 (core dumped)

That does not seem acceptable to me.  I don't want pgbench to die a
horrible death if a floating point exception occurs any more than I
want the server to do the same.  I want the error to be properly
caught and reported.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Wed, Jan 27, 2016 at 3:39 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Attached is a rebase after recent changes in pgbench code & doc.

+/* use short names in the evaluator */
+#define INT(v) coerceToInt(&v)
+#define DOUBLE(v) coerceToDouble(&v)
+#define SET_INT(pv, ival) setIntValue(pv, ival)
+#define SET_DOUBLE(pv, dval) setDoubleValue(pv, dval)

I don't like this at all.  It seems to me that this really obscures
the code.  The few extra characters are a small price to pay for not
having to go look up the macro definition to understand what the code
is doing.

The third hunk in pgbench.c unnecessary deletes a blank line.
       /*        * inner expresion in (cut, 1] (if parameter > 0), rand in [0, 1)
+        * Assert((1.0 - cut) != 0.0);        */
-       Assert((1.0 - cut) != 0.0);       rand = -log(cut + (1.0 - cut) * uniform) / parameter;
+

Moving the Assert() into the comment seems like a bad plan.  If the
Assert is true, it shouldn't be commented out.  If it's not, it
shouldn't be there at all.

Commit e41beea0ddb74ef975f08b917a354ec33cb60830, which you wrote, went
to some trouble to display good context for error messages.  What you
have here seems like a huge step backwards:

+                       fprintf(stderr, "double to int overflow for
%f\n", dval);
+                       exit(1);

So, we're just going to give up on all of that error context reporting
that we added back then?  That would be sad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

>>> And this one generates a core dump:
>>> \set cid debug(-9223372036854775808 / -1)
>>> Floating point exception: 8 (core dumped)
>
> That does not seem acceptable to me. I don't want pgbench to die a 
> horrible death if a floating point exception occurs any more than I want 
> the server to do the same.  I want the error to be properly caught and 
> reported.

Please note that this issue/bug/feature/whatever already exists just for 
these two values (1/2**128 probability), it is not related to this patch 
about functions.
  sh> pgbench --version  pgbench (PostgreSQL) 9.5.0
  sh> cat div.sql  \set i -9223372036854775807  \set i :i - 1  \set i :i / -1
  sh> pgbench -f div.sql  starting vacuum...end.  Floating point exception (core dumped)

I do not think that it is really worth fixing, but I will not prevent 
anyone to fix it.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

>> Attached is a rebase after recent changes in pgbench code & doc.
>
> +/* use short names in the evaluator */
> +#define INT(v) coerceToInt(&v)
> +#define DOUBLE(v) coerceToDouble(&v)
> +#define SET_INT(pv, ival) setIntValue(pv, ival)
> +#define SET_DOUBLE(pv, dval) setDoubleValue(pv, dval)
>
> I don't like this at all.  It seems to me that this really obscures
> the code.  The few extra characters are a small price to pay for not
> having to go look up the macro definition to understand what the code
> is doing.

Hmmm. Postgres indentation rules for "switch" are peculiar to say the 
least and make it hard to write code that stay under 80 columns. The 
coerceToInt function name looks pretty long (I would rather have 
toInt/toDbl/setInt/setDbl) but I was "told" to use that, so I'm trying to 
find a tradeoff with a macro. Obviously I can substitude and have rather 
long lines that I personally find much uglier.

> The third hunk in pgbench.c unnecessary deletes a blank line.

Yep, that is possible.

>        /*
>         * inner expresion in (cut, 1] (if parameter > 0), rand in [0, 1)
> +        * Assert((1.0 - cut) != 0.0);
>         */
> -       Assert((1.0 - cut) != 0.0);
>        rand = -log(cut + (1.0 - cut) * uniform) / parameter;
> +
>
> Moving the Assert() into the comment seems like a bad plan.  If the
> Assert is true, it shouldn't be commented out.  If it's not, it
> shouldn't be there at all.

I put this assertion when I initially wrote this code, but I think that it 
is proven so I moved it in comment just as a reminder for someone who 
might touch anything that this must hold.

> Commit e41beea0ddb74ef975f08b917a354ec33cb60830, which you wrote, went
> to some trouble to display good context for error messages.  What you
> have here seems like a huge step backwards:
>
> +                       fprintf(stderr, "double to int overflow for
> %f\n", dval);
> +                       exit(1);
>
> So, we're just going to give up on all of that error context reporting
> that we added back then?  That would be sad.

Well, I'm a lazy programmer, so I'm trying to measure the benefit. IMO 
there is no benefit to better manage this case, especially as the various 
solution I thought of where either ugly and/or had a significant impact on 
the code.

Note that in the best case the error would be detected and reported and 
the client is stopped, and other clients go on... But then, if you started 
a bench and some clients die while running probably your results are 
meaningless, so my opinion is that you are better off with an exit than
with some message that you may miss and performance results computed with 
much less clients than you asked for.

Pgbench is a bench tool, not a production tool.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Thu, Jan 28, 2016 at 7:07 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> I do not think that it is really worth fixing, but I will not prevent anyone
> to fix it.

I still think it does. Well, if there is consensus to address this one
and optionally the other integer overflows even on back branches, I'll
write a patch and let's call that a deal. This is not a problem from
my side.
-- 
Michael



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Wed, Jan 27, 2016 at 5:43 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Pgbench is a bench tool, not a production tool.

I don't really see how that's relevant.  When I run a program and it
dies after causing the operating system to send it a fatal signal, I
say to myself "self, that program has a bug".  Other people may have
different reactions, but that's mine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

>> Pgbench is a bench tool, not a production tool.
>
> I don't really see how that's relevant.

My point is that I do not see any added value for pgbench to keep on 
executing a performance bench if some clients die due to script errors: it 
is more useful to stop the whole bench and report the issue, so the user 
can fix their script, than to keep going on with the remaining clients, 
from a benchmarking perspective.

So I'm arguing that exiting, with an error message, is better than 
handling user errors.

> When I run a program and it dies after causing the operating system to 
> send it a fatal signal, I say to myself "self, that program has a bug". 
> Other people may have different reactions, but that's mine.

I was talking about an exit call generated on a float to int conversion 
error when there is an error in the user script. The bug is in the user 
script and this is clearly reported by pgbench.

However, your argument may be relevant for avoiding fatal signal such as 
generated by INT64_MAX / -1, because on this one the message error is 
terse so how to fix the issue is not clear to the user.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> I do not think that it is really worth fixing, but I will not prevent anyone
>> to fix it.
>
> I still think it does. Well, if there is consensus to address this one
> and optionally the other integer overflows even on back branches, I'll
> write a patch and let's call that a deal. This is not a problem from
> my side.

My point is just about the cost-benefit of fixing a low probability issue 
that you can only encounter if you are looking for it, and not with any 
reasonable bench script.

Now adding somewhere a test might just help closing the subject and 
do more useful things, so that would also be a win.
  /* these would raise an arithmetic error */  if (lval == INT64_MIN && rval == -1)  {     fprintf(stderr, "cannot
divideor modulo INT64_MIN by -1\n");     return false;  }
 

This may be backpatched to old supported versions.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Thu, Jan 28, 2016 at 1:36 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> So I'm arguing that exiting, with an error message, is better than handling
> user errors.

I'm not objecting to exiting with an error message, but I think
letting ourselves be killed by a signal is no good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> So I'm arguing that exiting, with an error message, is better than handling
>> user errors.
>
> I'm not objecting to exiting with an error message, but I think
> letting ourselves be killed by a signal is no good.

Ok, I understand this point for this purpose.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
<Oops, resent because I again put the wrong From address, sorry>

v22 compared to previous: - remove the short macros (although IMO it is a code degradation) - try not to remove/add
blankslines - let some assert "as is" - still exit on float to int overflow, see arguments in other mails - check for
INT64_MIN/ -1 (although I think it is useless)
 

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

v23 attached, which does not change the message but does the other fixes.

> +        if (coerceToInt(&lval) == INT64_MIN && coerceToInt(&rval) == -1)
> +        {
> +               fprintf(stderr, "cannot divide INT64_MIN by -1\n");
> +               return false;
> +        }
> Bike-shedding: "bigint out of range" to match what is done in the backend?

ISTM that it is clearer for the user to say that the issue is with the 
division? Otherwise the message does not help much. Well, not that it 
would be printed often...

> +/*
> + * Recursive evaluation of an expression in a pgbench script
> + * using the current state of variables.
> + * Returns whether the evaluation was ok,
> + * the value itself is returned through the retval pointer.
> + */
> Could you reformat this comment?

I can.

>                     fprintf(stderr,
> -                                "exponential parameter must be
> greater than zero (not \"%s\")\n",
> -                                argv[5]);
> +                         "exponential parameter must be greater than
> zero (not \"%s\")\n",
> +                          argv[5]);
> This is some unnecessary diff noise.

Indeed.

> +        setIntValue(retval,     getGaussianRand(thread, arg1, arg2,
>  dparam));
> Some portions of the code are using tabs instead of spaces between
> function arguments.

Indeed.

> I would as well suggest fixing first the (INT64_MAX / -1) crash on HEAD 
> and back-branches with something like the patch attached, inspired from 
> int8.c.

I think it is overkill, but do as you feel.

Note that it must also handle modulo, but the code you suggest cannot be 
used for that.
  #include <stdint.h>  int main(int argc, char* argv[])  {    int64_t l = INT64_MIN;    int64_t r = -1;    int64_t d =
l% r;    return 0;  }  // => Floating point exception (core dumped)
 

ISTM that the second condition can be simplified, as there is no possible 
issue if lval is positive?
   if (lval < 0 && *resval < 0) { ... }

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:


On Fri, Jan 29, 2016 at 3:40 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I would as well suggest fixing first the (INT64_MAX / -1) crash on HEAD and back-branches with something like the patch attached, inspired from int8.c.

I think it is overkill, but do as you feel.

Perhaps we could have Robert decide on this one first? That's a bug after all that had better be backpatched.
 
Note that it must also handle modulo, but the code you suggest cannot be used for that.

  #include <stdint.h>
  int main(int argc, char* argv[])
  {
    int64_t l = INT64_MIN;
    int64_t r = -1;
    int64_t d = l % r;
    return 0;
  }
  // => Floating point exception (core dumped)

Right, forgot this one, we just need to check if rval is -1 here, and return 0 as result. I am updating the fix as attached.
--
Michael
Attachment

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:

Hi Michaël,

>> I think it is overkill, but do as you feel.
>
> Perhaps we could have Robert decide on this one first? That's a bug after
> all that had better be backpatched.

Fine with me.

> [modulo...] Right, forgot this one, we just need to check if rval is -1 
> here, and return 0 as result. I am updating the fix as attached.

This looks to me like it works.

I still feel that the condition should be simplified, probably with:
  if (lval < 0 && *resval <= 0) ...

(instead of < in my previous suggestion, if some processors return 0 on 
-INT64_MIN). Also, a comment is needed to explain why such a bizarre 
condition is used/needed for just the INT64_MIN case.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Fri, Jan 29, 2016 at 6:05 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> (instead of < in my previous suggestion, if some processors return 0 on
> -INT64_MIN). Also, a comment is needed to explain why such a bizarre
> condition is used/needed for just the INT64_MIN case.

The last patch I sent has this bit:
+              /*
+               * Some machines throw a floating-point exception
+               * for INT64_MIN % -1, the correct answer being
+               * zero in any case.
+               */
How would you reformulate that à-la-Fabien?
--
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> Also, a comment is needed to explain why such a bizarre
>> condition is used/needed for just the INT64_MIN case.
>
> The last patch I sent has this bit:
> +              /*
> +               * Some machines throw a floating-point exception
> +               * for INT64_MIN % -1, the correct answer being
> +               * zero in any case.
> +               */
> How would you reformulate that à-la-Fabien?

This one about modulo is fine.

I was refering to this other one in the division case:

+            /* overflow check (needed for INT64_MIN) */
+            if (lval != 0 && (*retval < 0 == lval < 0))

Why not use "if (lval == INT64_MIN)" instead of this complicated 
condition? If it is really needed for some reason, I think that a comment 
could help.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> +            /* overflow check (needed for INT64_MIN) */
> +            if (lval != 0 && (*retval < 0 == lval < 0))
>
> Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
> If it is really needed for some reason, I think that a comment could help.

Checking for PG_INT64_MIN only would be fine as well, so let's do so.
I thought honestly that we had better check if the result and the left
argument are not of the same sign, but well.
--
Michael

Attachment

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
v22 compared to previous: - remove the short macros (although IMO it is a code degradation) - try not to remove/add
blankslines - let some assert "as is" - still exit on float to int overflow, see arguments in other mails - check for
INT64_MIN/ -1 (although I think it is useless)
 

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:


On Fri, Jan 29, 2016 at 5:16 AM, Fabien COELHO <fabien.coelho@mines-paristech.fr> wrote:
> v22 compared to previous:

Thanks for the new patch!

>  - remove the short macros (although IMO it is a code degradation)

FWIW, I like this suggestion from Robert.

>  - check for INT64_MIN / -1 (although I think it is useless)
>  - try not to remove/add blanks lines
>  - let some assert "as is"
>  - still exit on float to int overflow, see arguments in other mails

+make_op(const char *operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+       return make_func(find_func(operator),
+                                        /* beware that the list is reversed in make_func */
+                                        make_elist(rexpr, make_elist(lexpr, NULL)));
+}
I think that this should use as argument the function ID, aka PGBENCH_ADD or similar instead of find_func() with an operator character. This saves a couple of instructions.

+ * - nargs: number of arguments (-1 is a special value for min & max)
My fault perhaps, it may be better to mention here that -1 means that min and max need at least one argument, and that the number of arguments is not fixed.

+                                       case PGBENCH_MOD:
+                                               if (coerceToInt(&rval) == 0)
                                                {
                                                        fprintf(stderr, "division by zero\n");
                                                        return false;
                                                }
-                                               *retval = lval % rval;
+                                               if (coerceToInt(&lval) == INT64_MIN && coerceToInt(&rval) == -1)
+                                               {
+                                                       fprintf(stderr, "cannot divide INT64_MIN by -1\n");
+                                                       return false;
+                                               }
For mod() there is no need to have an error, returning 0 is fine. You can actually do it unconditionally when rval == -1.

+               setDoubleValue(retval, d < 0.0? -d: d);
Nitpick: lack of spaces between the question mark.

+typedef enum
+{
+       PGBT_NONE,
+       PGBT_INT,
+       PGBT_DOUBLE
+} PgBenchValueType;
NONE is used nowhere, but I think that you could use it for an assertion check here:
+                       if (retval->type == PGBT_INT)
+                               fprintf(stderr, "int " INT64_FORMAT "\n", retval->u.ival);
+                       else if (retval->type == PGBT_DOUBLE)
+                               fprintf(stderr, "double %f\n", retval->u.dval);
+                       else
+                               fprintf(stderr, "none\n");
Just replace the "none" message by Assert(type != PGBT_NONE) for example.

Another remaining item: should support for \setrandom be dropped? As the patch is presented this remains intact.
--
Michael

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

>>  - remove the short macros (although IMO it is a code degradation)
>
> FWIW, I like this suggestion from Robert.

I'm not especially found of macros, my main reserve is that because of the 
length of the function names this necessarily creates lines longer than 80 
columns or awkward and repeated new lines within expressions, which are 
both ugly.

> +make_op(const char *operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
> +{
> +       return make_func(find_func(operator),
> +                                        /* beware that the list is
> reversed in make_func */
> +                                        make_elist(rexpr,
> make_elist(lexpr, NULL)));
> +}
> I think that this should use as argument the function ID, aka PGBENCH_ADD
> or similar instead of find_func() with an operator character. This saves a
> couple of instructions.

Not that simply: The number is the index in the array of functions, not 
the enum value, they are currently off by one, and there may be the same 
function with different names for some reason some day, so I do not think 
it would be a good idea to enforce the order so that they are identical.
Also this minor search is only executed when parsing the script.

> + * - nargs: number of arguments (-1 is a special value for min & max)
> My fault perhaps, it may be better to mention here that -1 means that min
> and max need at least one argument, and that the number of arguments is not
> fixed.

Ok for a better comment.

> For mod() there is no need to have an error, returning 0 is fine. You can
> actually do it unconditionally when rval == -1.

Ok.

> +               setDoubleValue(retval, d < 0.0? -d: d);
> Nitpick: lack of spaces between the question mark.

Ok.

> NONE is used nowhere, but I think that you could use it for an assertion
> check here: [...]
> Just replace the "none" message by Assert(type != PGBT_NONE) for example.

I've added a use of the macro.

> Another remaining item: should support for \setrandom be dropped? As the
> patch is presented this remains intact.

As you know my opinion is "yes", but I have not receive a clear go about 
that from a committer and I'm not motivated by removing and then re adding 
code to the patch.

If nothing clear is said, I'll do a patch just for that one functions are 
added and submit it to the next CF.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> +            /* overflow check (needed for INT64_MIN) */
>> +            if (lval != 0 && (*retval < 0 == lval < 0))
>>
>> Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
>> If it is really needed for some reason, I think that a comment could help.
>
> Checking for PG_INT64_MIN only would be fine as well, so let's do so.
> I thought honestly that we had better check if the result and the left
> argument are not of the same sign, but well.

Committed and back-patched to 9.5.  Doesn't apply further back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Yet another rebase, so as to propagate the same special case checks int 
for / and %.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Mon, Feb 1, 2016 at 10:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>> +            /* overflow check (needed for INT64_MIN) */
>>> +            if (lval != 0 && (*retval < 0 == lval < 0))
>>>
>>> Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
>>> If it is really needed for some reason, I think that a comment could help.
>>
>> Checking for PG_INT64_MIN only would be fine as well, so let's do so.
>> I thought honestly that we had better check if the result and the left
>> argument are not of the same sign, but well.
>
> Committed and back-patched to 9.5.  Doesn't apply further back.

OK, here are patches for 9.1~9.4. The main differences are that in
9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
that's int32. In the latter case pgbench blows up the same way with
that:
\set i -2147483648
\set i :i / -1
select :i;
In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
at the top of pgbench.c. I thing that's fine.
--
Michael

Attachment

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Feb 1, 2016 at 10:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>>> +            /* overflow check (needed for INT64_MIN) */
>>>> +            if (lval != 0 && (*retval < 0 == lval < 0))
>>>>
>>>> Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
>>>> If it is really needed for some reason, I think that a comment could help.
>>>
>>> Checking for PG_INT64_MIN only would be fine as well, so let's do so.
>>> I thought honestly that we had better check if the result and the left
>>> argument are not of the same sign, but well.
>>
>> Committed and back-patched to 9.5.  Doesn't apply further back.
>
> OK, here are patches for 9.1~9.4. The main differences are that in
> 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
> that's int32. In the latter case pgbench blows up the same way with
> that:
> \set i -2147483648
> \set i :i / -1
> select :i;
> In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
> at the top of pgbench.c. I thing that's fine.

Oh, gosh, I should have said more clearly that I didn't really see a
need to fix this all the way back.  But I guess we could.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:


On Tue, Feb 2, 2016 at 1:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Feb 1, 2016 at 10:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>>> +            /* overflow check (needed for INT64_MIN) */
>>>> +            if (lval != 0 && (*retval < 0 == lval < 0))
>>>>
>>>> Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
>>>> If it is really needed for some reason, I think that a comment could help.
>>>
>>> Checking for PG_INT64_MIN only would be fine as well, so let's do so.
>>> I thought honestly that we had better check if the result and the left
>>> argument are not of the same sign, but well.
>>
>> Committed and back-patched to 9.5.  Doesn't apply further back.
>
> OK, here are patches for 9.1~9.4. The main differences are that in
> 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
> that's int32. In the latter case pgbench blows up the same way with
> that:
> \set i -2147483648
> \set i :i / -1
> select :i;
> In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
> at the top of pgbench.c. I thing that's fine.

Oh, gosh, I should have said more clearly that I didn't really see a
need to fix this all the way back.  But I guess we could.

And now there are patches. Well, nobody has complained about that until now except me... So we could live without patching back-branches, but it don't think it hurts much to fix those holes.
--
Michael

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Tue, Feb 2, 2016 at 1:35 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> And now there are patches. Well, nobody has complained about that until now except me... So we could live without
patchingback-branches, but it don't think it hurts much to fix those holes.
 

Meh, s/it don't/I don't/
-- 
Michael



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> OK, here are patches for 9.1~9.4. The main differences are that in
> 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
> that's int32. In the latter case pgbench blows up the same way with
> that:
> \set i -2147483648
> \set i :i / -1
> select :i;
> In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
> at the top of pgbench.c. I thing that's fine.

I thing so too.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Wed, Feb 3, 2016 at 11:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> OK, here are patches for 9.1~9.4. The main differences are that in
>> 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
>> that's int32. In the latter case pgbench blows up the same way with
>> that:
>> \set i -2147483648
>> \set i :i / -1
>> select :i;
>> In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
>> at the top of pgbench.c. I thing that's fine.
>
> I thing so too.  Committed.

Thanks for thinging so.
-- 
Michael



Re: extend pgbench expressions with functions

From
Alvaro Herrera
Date:
Fabien COELHO wrote:
> 
> Hello Michaël,
> 
> v23 attached, which does not change the message but does the other fixes.

This doesn't apply anymore -- please rebase and submit to the next CF.
I closed it here as returned with feedback.

Thanks!

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello,

>> v23 attached, which does not change the message but does the other fixes.
>
> This doesn't apply anymore

Indeed, but the latest version was really v25.

> -- please rebase and submit to the next CF.

I already provided it as v25 on Feb 1st.

> I closed it here as returned with feedback.

I turned it to "moved to next CF" as the patch is already on the queue.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Alvaro Herrera
Date:
Fabien COELHO wrote:
> 
> Hello,
> 
> >>v23 attached, which does not change the message but does the other fixes.
> >
> >This doesn't apply anymore
> 
> Indeed, but the latest version was really v25.
> 
> >-- please rebase and submit to the next CF.
> 
> I already provided it as v25 on Feb 1st.
> 
> >I closed it here as returned with feedback.
> 
> I turned it to "moved to next CF" as the patch is already on the queue.

Ah, I didn't see v25 anywhere.  What you did should be fine, thanks.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Tue, Feb 9, 2016 at 5:06 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Fabien COELHO wrote:
>>
>> Hello,
>>
>> >>v23 attached, which does not change the message but does the other fixes.
>> >
>> >This doesn't apply anymore
>>
>> Indeed, but the latest version was really v25.
>>
>> >-- please rebase and submit to the next CF.
>>
>> I already provided it as v25 on Feb 1st.
>>
>> >I closed it here as returned with feedback.
>>
>> I turned it to "moved to next CF" as the patch is already on the queue.
>
> Ah, I didn't see v25 anywhere.  What you did should be fine, thanks.

I just had another look at this patch.

+      <replaceable>parameter</>%  of the time.
Nitpick: double space here.


+               switch (func)               {
[...]
+                   }
+                   default:
+                       return false;               }
In evalFunc(), the default case in switch for the operator functions
should never be reached. Adding for example Assert(0) is something to
consider.

PGBT_NONE and PGBENCH_NONE are used nowhere. Why not removing them or
replace the code paths where they would be used by assertions to
trigger errors for future developments?

Other than that the patch looks in pretty good shape to me.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

> +      <replaceable>parameter</>%  of the time.
> Nitpick: double space here.

Ok.

> +                   default:
> +                       return false;
>                }
> In evalFunc(), the default case in switch for the operator functions
> should never be reached. Adding for example Assert(0) is something to
> consider.

Ok for Assert + a comment.

> PGBT_NONE and PGBENCH_NONE are used nowhere. Why not removing them

Ok.

v26 attached implements these changes.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Thu, Feb 11, 2016 at 12:37 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> v26 attached implements these changes.

+    /* the argument list has been built in reverse order, it is fixed here */
+    expr->u.function.args = reverse_elist(args);
Hm. I may be missing something, but why is that necessary? This is
basically doing a double-reversion to put all the arguments in the
correct order when parsing the function arguments.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

> +    /* the argument list has been built in reverse order, it is fixed here */
> +    expr->u.function.args = reverse_elist(args);
> Hm. I may be missing something, but why is that necessary? This is
> basically doing a double-reversion to put all the arguments in the
> correct order when parsing the function arguments.

This is because the expression list is parsed left to right and the list 
is built as a stack to avoid looking for the last argument to append the 
next expression, but then the list is in reverse order at the end of 
parsing, so it is reversed once to make it right. This way the complexity 
is kept as O(n).

If this is too much I can switch to O(n**2) by appending each expression 
at the end of the list.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Fri, Feb 12, 2016 at 2:41 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Michaël,
>
>> +    /* the argument list has been built in reverse order, it is fixed
>> here */
>> +    expr->u.function.args = reverse_elist(args);
>> Hm. I may be missing something, but why is that necessary? This is
>> basically doing a double-reversion to put all the arguments in the
>> correct order when parsing the function arguments.
>
> This is because the expression list is parsed left to right and the list is
> built as a stack to avoid looking for the last argument to append the next
> expression, but then the list is in reverse order at the end of parsing, so
> it is reversed once to make it right. This way the complexity is kept as
> O(n).
>
> If this is too much I can switch to O(n**2) by appending each expression at
> the end of the list.

(this one has been mentioned by Alvaro offlist)
Using a pointer to the tail of the list would make the code simple,
and save a couple of lines.

Another thing that could be considered is also to move list.c and
pg_list.h into src/common and reuse that. There are other frontend
utilities that emulate the same kind of facilities, have a look for
example at the other copycats in pg_dump and pg_basebackup.
--
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

> Using a pointer to the tail of the list would make the code simple,
> and save a couple of lines.

I did that, see v27 attached.

Note that it does not save any lines, because the reverse function is 
removed, but then you need another type to keep the head & tail, the link 
type is not enough, and then you have to manage that stuff in the code. 
Whether it is "simpler" is debatable. It probably costs more tests when 
executed.

However, it really saves having to answer the question "why is the list 
reversed?", which is definitely a win from my point of view:-)

> Another thing that could be considered is also to move list.c [...]

I think that this option is too much bother for the small internal purpose 
at hand.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Fri, Feb 12, 2016 at 5:06 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> I think that this option is too much bother for the small internal purpose
> at hand.

Yeah, I'm really frustrated with this whole thread.  There's been a
huge amount of discussion on this patch, but I don't really feel like
it's converging towards something I could actually commit.

For example, I just realized that this patch allows values to be
either a double or an integer and extends the operators to handle
double values.  But variables can still only be integers.  Please name
any programming language that has such a restriction.  The only ones I
can think of are command shells, and those at least flatten everything
to string rather than integer so that you can store the value without
loss of precision - just with loss of type-safety.  I think designing
this in this way is quite short-sighted.  I don't think variables
should be explicitly typed but they should be able to store a value of
any type that expression evaluation can generate.

Also, as I said back in November, there's really two completely
separate enhancements in here.  One of them is to support a new data
type (doubles) and the other is to support functions.  Those should
really be separate patches.  If neither of you are willing to split
this patch, I'm not willing to commit it.  Going over half of this
patch at a time and getting it committed is going to be a lot of work
for me, but I'm willing to do it.  I'm not willing to go over the
whole thing at once - that's going to take more time than I have, and
produce what will in my opinion be an inferior commit history.  If
somebody else is willing to commit the whole thing as one patch, I'm
not going to object, but I won't do it myself.

I would not worry too much about the list thing at this point.  I'm
sure something simple is fine for that.  I actually think the patch is
now in decent shape as far as code-cleanliness is concerned, but I'm
not sure we've really looked hard enough at the design.  For example,
I find implementing operators as functions in disguise not to be one
of PostgreSQL's most awesome design decisions, and here we are copying
that into pgbench for, well, no benefit that I can see, really.  Maybe
it's a good idea and maybe it's a bad idea, but how do we know?  That
stuff deserves more discussion.  Code cleanup is good, so I do think
it's good that a lot of effort has been put in there, but I don't
think more code cleanup is what's going to get this patch over the
finish line.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

> For example, I just realized that this patch allows values to be
> either a double or an integer and extends the operators to handle
> double values.  But variables can still only be integers.

Indeed.

> [...] at least flatten everything to string rather than integer so that 
> you can store the value without loss of precision - just with loss of 
> type-safety.  I think designing this in this way is quite short-sighted.

Note that I'm not responsible for this design, which is preexisting. 
Extending variables to be able to store doubles could also be done in 
another patch.

> I don't think variables should be explicitly typed but they should be 
> able to store a value of any type that expression evaluation can 
> generate.

Doubles are not really needed that much, it is just to provide something 
to random_* functions parameter, otherwise it is useless as far as pgbench 
is really concerned.

> Also, as I said back in November, there's really two completely
> separate enhancements in here.  One of them is to support a new data
> type (doubles) and the other is to support functions.

Yep. The first part is precisely the patch I initially submitted 5 CF ago.

Then I'm asked to put more things in it to show that it can indeed handle 
another type. Then I'm told "you should not have done that". What can I 
say?

> Those should really be separate patches.

They could.

> [...] I find implementing operators as functions in disguise not to be 
> one of PostgreSQL's most awesome design decisions, and here we are 
> copying that into pgbench for, well, no benefit that I can see, really.

Well, I did that initially, then I was asked to implements operators as 
functions. It probably saves some lines, so it is not too bad, even if the 
benefit is limited.

> Maybe it's a good idea and maybe it's a bad idea, but how do we know?

This is just pgbench, a tool for testing performance by running dummy 
transactions, not a production thing, so I think that it really does not 
matter. There is no user visible changes wrt operators.

> [...] If neither of you are willing to split this patch, I'm not willing 
> to commit it.

If I'm reading you correctly, you would consider committing it:
 - if the function & double stuff are separated ?
 - for the double part, if variables can be double ?

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Sat, Feb 13, 2016 at 4:37 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>> For example, I just realized that this patch allows values to be
>> either a double or an integer and extends the operators to handle
>> double values.  But variables can still only be integers.
>
> Indeed.

That's exactly the first impression I got about this patch when sending [1].
[1]: http://www.postgresql.org/message-id/CAB7nPqRHVFvnrAHaMarOniedeC90pf3-Wn+U5ccB4reD-HtJnw@mail.gmail.com
By focusing only on integers the patch would largely gain in
simplicity. Now being able to have double values is actually useful
for nested function calls, nothing else.

>> I don't think variables should be explicitly typed but they should be able
>> to store a value of any type that expression evaluation can generate.
>
> Doubles are not really needed that much, it is just to provide something to
> random_* functions parameter, otherwise it is useless as far as pgbench is
> really concerned.

+1.

>> Also, as I said back in November, there's really two completely
>> separate enhancements in here.  One of them is to support a new data
>> type (doubles) and the other is to support functions.
>
> Yep. The first part is precisely the patch I initially submitted 5 CF ago.
> Then I'm asked to put more things in it to show that it can indeed handle
> another type. Then I'm told "you should not have done that". What can I say?

I think that you did your job here by answering the comments of all
the reviewers that had a look at this patch. That's not an easy task.

>> [...] I find implementing operators as functions in disguise not to be one
>> of PostgreSQL's most awesome design decisions, and here we are copying that
>> into pgbench for, well, no benefit that I can see, really.
>
> Well, I did that initially, then I was asked to implements operators as
> functions. It probably saves some lines, so it is not too bad, even if the
> benefit is limited.

Put the blame on me for this one then. I suggested this idea because
Postgres is doing the same, and because it simplifies slightly the
union structure in charge of holding the parsed structures, making the
code a bit more readable IMO.

>> [...] If neither of you are willing to split this patch, I'm not willing
>> to commit it.
>
> If I'm reading you correctly, you would consider committing it:
>
>  - if the function & double stuff are separated ?
>  - for the double part, if variables can be double ?

I just double-checked and could not see a clear use case mentioned in
this thread for double return types, so I would suggest focusing on
the integer portion with min(), max(), abs(), debug() and the existing
functions refactored. That's what your first versions did. If someone
is wishing to implement double types, this someone could do it, the
infrastructure that this patch puts in place has already proved that
it can be easily extended.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

>>  - if the function & double stuff are separated ?
>>  - for the double part, if variables can be double ?
>
> I just double-checked and could not see a clear use case mentioned in
> this thread for double return types,

Alas there is one: non uniform random functions which use a double 
parameter.

Once random functions are there, the \setrandom horror code could be 
removed, which would be a real benefit, IMO:-)

So I see a good case to have some support for doubles.

> so I would suggest focusing on the integer portion with min(), max(), 
> abs(), debug() and the existing functions refactored. That's what your 
> first versions did. If someone is wishing to implement double types, 
> this someone could do it, the infrastructure that this patch puts in 
> place has already proved that it can be easily extended.

Adding double is not too big a deal. I just stopped at variables because I 
could not see any realistic use for them. My idea was to postpone that 
till it is actually needed, "never" being the most probable course.

Now if this is a blocker for the committer, then I will probably make 
the effort whatever I think of the usefulness of the feature.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> But variables can still only be integers.

Version 28 attached has double variables, although typing is based on 
guessing because it is just a string.

Any comment?

> [...] two separate enhancements in here.  One of them is to support a 
> new data type (doubles) and the other is to support functions.

The two features are highly intermix, so it can only be dependent patches, 
first to add a function infrastructure and probably some support for 
doubles altough it would not be used, then to add doubles & their 
functions.

A real pain is the documentation, because it means writing a documentation 
with only integer functions, then overwriting it with doubles. This is 
dumb work, really, for the sake of "a cleaner git history", the beauty of 
it no one will ever contemplate...

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Sun, Feb 14, 2016 at 12:37 AM, Fabien COELHO wrote:
> The two features are highly intermix, so it can only be dependent patches,
> first to add a function infrastructure and probably some support for doubles
> altough it would not be used, then to add doubles & their functions.
>
> A real pain is the documentation, because it means writing a documentation
> with only integer functions, then overwriting it with doubles. This is dumb
> work, really, for the sake of "a cleaner git history", the beauty of it no
> one will ever contemplate...

FWIW, I care a lot about splitting as much as possible patches where
it is possible to have a clean history. So I would be fine to do a
portion of the legwork and extract from this patch something smaller
that adds only functions as a first step, with the minimum set of
functions I mentioned upthread. Robert, Alvaro, Fabien, does that
sound fine to you?
-- 
Michael



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Sat, Feb 13, 2016 at 6:19 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Sun, Feb 14, 2016 at 12:37 AM, Fabien COELHO wrote:
>> The two features are highly intermix, so it can only be dependent patches,
>> first to add a function infrastructure and probably some support for doubles
>> altough it would not be used, then to add doubles & their functions.
>>
>> A real pain is the documentation, because it means writing a documentation
>> with only integer functions, then overwriting it with doubles. This is dumb
>> work, really, for the sake of "a cleaner git history", the beauty of it no
>> one will ever contemplate...
>
> FWIW, I care a lot about splitting as much as possible patches where
> it is possible to have a clean history. So I would be fine to do a
> portion of the legwork and extract from this patch something smaller
> that adds only functions as a first step, with the minimum set of
> functions I mentioned upthread. Robert, Alvaro, Fabien, does that
> sound fine to you?

I'd be delighted.  I would really like to get this feature in, but I'm
not going to do it if it requires an unreasonable amount of work on my
part - and what you propose would help a lot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Sat, Feb 13, 2016 at 10:37 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> A real pain is the documentation, because it means writing a documentation
> with only integer functions, then overwriting it with doubles. This is dumb
> work, really, for the sake of "a cleaner git history", the beauty of it no
> one will ever contemplate...

You know, you make comments like this pretty much every time anybody
suggests that you should change anything in any patch you write.  It
doesn't matter whether the change is suggested by Heikki, or by
Michael, or by Andres, or by me.  You pretty much always come back and
say something that amounts to "changing the patch I already wrote is a
waste of time".  That gets a little disheartening after a while.  This
community's procedure is that patches have to be reviewed and reach
consensus in order to get committed, and in my opinion that is
generally not "dumb" but rather something that enhances the end
result.  I value your contributions to the community and I hope you
will continue making them, but I don't like it when people call my
ideas dumb.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Sun, Feb 14, 2016 at 10:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Feb 13, 2016 at 6:19 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Sun, Feb 14, 2016 at 12:37 AM, Fabien COELHO wrote:
>>> The two features are highly intermix, so it can only be dependent patches,
>>> first to add a function infrastructure and probably some support for doubles
>>> altough it would not be used, then to add doubles & their functions.
>>>
>>> A real pain is the documentation, because it means writing a documentation
>>> with only integer functions, then overwriting it with doubles. This is dumb
>>> work, really, for the sake of "a cleaner git history", the beauty of it no
>>> one will ever contemplate...
>>
>> FWIW, I care a lot about splitting as much as possible patches where
>> it is possible to have a clean history. So I would be fine to do a
>> portion of the legwork and extract from this patch something smaller
>> that adds only functions as a first step, with the minimum set of
>> functions I mentioned upthread. Robert, Alvaro, Fabien, does that
>> sound fine to you?
>
> I'd be delighted.  I would really like to get this feature in, but I'm
> not going to do it if it requires an unreasonable amount of work on my
> part - and what you propose would help a lot.

OK, I'll see about producing a patch then for this basic
infrastructure, with the rest built on top of it as a secondary patch.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

> You know, you make comments like this pretty much every time anybody
> suggests that you should change anything in any patch you write.

Well, not everytime...

I'm tired of doing and undoing things: I do a limited A because I'm 
cautious not to spend too much time that would go down the drain. Then I'm 
told "do A+B" to show that A is worthwhile. Then I'm told "you should not 
have done B with A, but submit A for the infrastructure and then B for the 
additional features", which was precisely my initial intent...

So this is really the going forward and backwards because of the process 
that makes me write these remarks.

> [...] I don't like it when people call my ideas dumb.

Your idea is not dumb, I'm sorry if I have implied that, and I apologise. 
I like a clean history as much as everyone else.

Having to unmix features and documentations is still "dumb work", even if 
the purpose is not dumb, I stand by this word for this part.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> So I would be fine to do a portion of the legwork and extract from this 
> patch something smaller that adds only functions as a first step, with 
> the minimum set of functions I mentioned upthread. Robert, Alvaro, 
> Fabien, does that sound fine to you?

Thanks, but this is my c*, I have a few hours of travelling this evening, 
I'll do it then.

I'll be happy if you do the review of the resulting split.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Sun, Feb 14, 2016 at 4:42 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>> So I would be fine to do a portion of the legwork and extract from this
>> patch something smaller that adds only functions as a first step, with the
>> minimum set of functions I mentioned upthread. Robert, Alvaro, Fabien, does
>> that sound fine to you?
>
>
> Thanks, but this is my c*, I have a few hours of travelling this evening,
> I'll do it then.
>
> I'll be happy if you do the review of the resulting split.

OK, I am fine with this scenario as well. I have luckily done nothing yet.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

>> I'll be happy if you do the review of the resulting split.
>
> OK, I am fine with this scenario as well. I have luckily done nothing yet.

Here is a 3 part v29:

a) add support for integer functions in pgbench, including turning   operators as functions, as well as some minimal
infrastructurefor   additional types (this allows to minimize the diffs with the next   patch which adds double).
 

b) add support for doubles, including setting double variables.   Note that variable are not explicitely typed. Add
some  double-related functions, most interesting of them for pgbench   are the randoms.
 

c) remove \setrandom support (as thanks to part b \set x random(...) does   the same). Prints an error pointing to the
replacementif \setrandom is   used in a pgbench script.
 

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Sun, Feb 14, 2016 at 11:28 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Here is a 3 part v29:
>
> a) add support for integer functions in pgbench, including turning
>    operators as functions, as well as some minimal infrastructure for
>    additional types (this allows to minimize the diffs with the next
>    patch which adds double).
>
> b) add support for doubles, including setting double variables.
>    Note that variable are not explicitely typed. Add some
>    double-related functions, most interesting of them for pgbench
>    are the randoms.
>
> c) remove \setrandom support (as thanks to part b \set x random(...) does
>    the same). Prints an error pointing to the replacement if \setrandom is
>    used in a pgbench script.

Thanks, I'll review these as soon as I can get to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Mon, Feb 15, 2016 at 5:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 14, 2016 at 11:28 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> Here is a 3 part v29:
>>
>> a) add support for integer functions in pgbench, including turning
>>    operators as functions, as well as some minimal infrastructure for
>>    additional types (this allows to minimize the diffs with the next
>>    patch which adds double).
>>
>> b) add support for doubles, including setting double variables.
>>    Note that variable are not explicitely typed. Add some
>>    double-related functions, most interesting of them for pgbench
>>    are the randoms.
>>
>> c) remove \setrandom support (as thanks to part b \set x random(...) does
>>    the same). Prints an error pointing to the replacement if \setrandom is
>>    used in a pgbench script.
>
> Thanks, I'll review these as soon as I can get to it.

I got around to look at (a) in this set.

+   if ((PGBENCH_FUNCTIONS[fnumber].nargs >= 0 &&
+        PGBENCH_FUNCTIONS[fnumber].nargs != elist_length(args)) ||
+       /* check at least one arg for min & max */
+       (PGBENCH_FUNCTIONS[fnumber].nargs == -1 &&
+        elist_length(args) == 0))
+       expr_yyerror_more("unexpected number of arguments",
+                         PGBENCH_FUNCTIONS[fnumber].fname);
We could split that into two parts, each one with a more precise error message:
- For functions that expect at least one argument: "at least one
argument was expected, none found".
- For functions that expect N arguments: "N arguments expected, but M found"

+       "\\set aid random(1, " CppAsString2(naccounts) " * :scale)\n"       "SELECT abalance FROM pgbench_accounts
WHEREaid = :aid;\n"   }};
 

-/* Function prototypes */
Noise.

+ * Recursive evaluation of int or double expressions
+ *
+ * Note that currently only integer variables are available, with values
+ * stored as text.
This comment is incorrect, we only care about integers in this patch.

Taking patch 1 as a completely independent thing, there is no need to
introduce PgBenchValueType yet. Similar remark for setIntValue and
coerceToInt. They are useful afterwards when introducing double types
to be able to handle double input parameters for the gaussian and
other functions.

-                       /*
-                        * INT64_MIN / -1 is problematic, since the result
-                        * can't be represented on a two's-complement machine.
-                        * Some machines produce INT64_MIN, some produce zero,
-                        * some throw an exception. We can dodge the problem
-                        * by recognizing that division by -1 is the same as
-                        * negation.
-                        */
-                       if (rval == -1)
+                       if (coerceToInt(&rval) == -1)                       {
-                           *retval = -lval;
-
-                           /* overflow check (needed for INT64_MIN) */
-                           if (lval == PG_INT64_MIN)
-                           {
-                               fprintf(stderr, "bigint out of range\n");
-                               return false;
-                           }
+                           setIntValue(retval, 0);
+                           return true;                       }
(INT64_MIN / -1) should error. (INT64_MIN % -1) should result in 0.
This is missing the division handling.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

> + * Recursive evaluation of int or double expressions
> + *
> + * Note that currently only integer variables are available, with values
> + * stored as text.
> This comment is incorrect, we only care about integers in this patch.

Indeed!

> Taking patch 1 as a completely independent thing, there is no need to 
> introduce PgBenchValueType yet. Similar remark for setIntValue and 
> coerceToInt. They are useful afterwards when introducing double types to 
> be able to handle double input parameters for the gaussian and other 
> functions.

Yes. This is exactly the pain I'm trying to avoid, creating a different 
implementation for the first patch, which is just overriden when the 
second part is applied...

So I'm trying to compromise, having a several part patch *but* having the 
infrastructure ready for the second patch which adds the double type.

Note that the first patch without the second is a loss of time for 
everyone, as the nearly only useful functions are the randoms, which 
require a double argument, so it does not make sense to apply the first 
one if the second one is not to be applied, I think.

> [...]
> (INT64_MIN / -1) should error.
> (INT64_MIN % -1) should result in 0.
> This is missing the division handling.

Oops, indeed I totally messed up when merging the handling of / and %:-(

I have found another issue in the (a) patch: the internal scripts were 
using the future random function which do not yet exist, as they are in 
patch (b).

Here is a three part v30, which still includes the infrastructure for 
future types in the a patch, see my argumentation above.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Mon, Feb 15, 2016 at 4:58 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Indeed!
>
>> Taking patch 1 as a completely independent thing, there is no need to
>> introduce PgBenchValueType yet. Similar remark for setIntValue and
>> coerceToInt. They are useful afterwards when introducing double types to be
>> able to handle double input parameters for the gaussian and other functions.
>
> Yes. This is exactly the pain I'm trying to avoid, creating a different
> implementation for the first patch, which is just overriden when the second
> part is applied...

Splitting a patch means breaking it into independently committable
sub-patches, not just throwing each line of the diff into a different
pile as best you can.  I'm with Michael: that part doesn't belong in
this patch.  If we want to have an infrastructure refactoring patch
that just replaces every relevant use of int64 with PgBenchValue, a
union supporting only integer values, then we can do that first and
have a later patch introduce double as a separate change.  But we
can't have things that are logically part of patch #2 just tossed in
with patch #1.

I was in the middle of ripping that out of the patch when I realized
that evalFunc() is pretty badly designed.  What you've done is made it
the job of each particular function to evaluate its arguments.  I
don't think that's a good plan.  I think that when we discover that
we've got a function, we should evaluate all of the arguments that
were passed to it using common code that is shared across all types of
functions and operators.  Then, the evaluated arguments should be
passed to the function-specific code, which can do as it likes with
them.  This way, you have less code that is specific to particular
operations and more common code, which is generally a good thing.
Every expression evaluation engine that I've ever heard of works this
way - see, for example, the PostgreSQL backend.

I experimented with trying to do this and ran into a problem: where
exactly would you store the evaluated arguments when you don't know
how many of them there will be?  And even if you did know how many of
them there will be, wouldn't that mean that evalFunc or evaluateExpr
would have to palloc a buffer of the correct size for each invocation?
 That's far more heavyweight than the current implementation, and
minimizing CPU usage inside pgbench is a concern.  It would be
interesting to do some pgbench runs with this patch, or the final
patch, and see what effect it has on the TPS numbers, if any, and I
think we should.  But the first concern is to minimize any negative
impact, so let's talk about how to do that.

I think we should limit the number of arguments to a function to, say,
16, so that an array of int64s or PgBenchValues long enough to hold an
entire argument list can be stack-allocated.  The backend's limit is
higher, but the only reason we need a value higher than 2 here is
because you've chosen to introduce variable-argument functions; but I
think 16 is enough for any practical purpose.  Then, I think we should
also change the parse tree representation so that transforms the
linked-list into an array stored within the PgBenchExpr, so that you
can access the expression for argument i as expr->u.arg[i].  Then we
can write this is a loop that evaluates each expression in an array of
expressions and stores the result in an array of values.  That seems
like it would be both cleaner and faster than what you've got here
right now.  It's also more similar to what you did with the function
name itself: the most trivial thing the parser could do is store a
pointer to the function or operator name, but that would be slow, so
instead it looks up the function and stores a constant.

I also went over your documentation changes.  I think you inserted the
new table smack dab in the middle of a section in a place that it
doesn't really belong.  The version attached makes it into its own
<refsect2>, puts it in a new section a bit further down so that it
doesn't break up the flow, and has a few other tweaks that I think are
improvements.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Tue, Feb 16, 2016 at 9:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I experimented with trying to do this and ran into a problem: where
> exactly would you store the evaluated arguments when you don't know
> how many of them there will be?  And even if you did know how many of
> them there will be, wouldn't that mean that evalFunc or evaluateExpr
> would have to palloc a buffer of the correct size for each invocation?
>  That's far more heavyweight than the current implementation, and
> minimizing CPU usage inside pgbench is a concern.  It would be
> interesting to do some pgbench runs with this patch, or the final
> patch, and see what effect it has on the TPS numbers, if any, and I
> think we should.  But the first concern is to minimize any negative
> impact, so let's talk about how to do that.

Good point. One simple idea here would be to use a custom pgbench
script that has no SQL commands and just calculates the values of some
parameters to measure the impact without depending on the backend,
with a fixed number of transactions.
-- 
Michael



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Tue, Feb 16, 2016 at 1:55 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Feb 16, 2016 at 9:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I experimented with trying to do this and ran into a problem: where
>> exactly would you store the evaluated arguments when you don't know
>> how many of them there will be?  And even if you did know how many of
>> them there will be, wouldn't that mean that evalFunc or evaluateExpr
>> would have to palloc a buffer of the correct size for each invocation?
>>  That's far more heavyweight than the current implementation, and
>> minimizing CPU usage inside pgbench is a concern.  It would be
>> interesting to do some pgbench runs with this patch, or the final
>> patch, and see what effect it has on the TPS numbers, if any, and I
>> think we should.  But the first concern is to minimize any negative
>> impact, so let's talk about how to do that.
>
> Good point. One simple idea here would be to use a custom pgbench
> script that has no SQL commands and just calculates the values of some
> parameters to measure the impact without depending on the backend,
> with a fixed number of transactions.

Sure, we could do that.  But whether it materially changes pgbench -S
results, say, is a lot more important.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

> [...] But we can't have things that are logically part of patch #2 just 
> tossed in with patch #1.

So you want integer functions without type in patch #1.

> I was in the middle of ripping that out of the patch when I realized
> that evalFunc() is pretty badly designed.

Probably, we are just at v30:-)

> What you've done is made it the job of each particular function to 
> evaluate its arguments.

Yep.

I did that for the multiple issues you raise below, and some you did not 
yet foresee: handling a variable number of arguments (0, 1, 2, 3, n), 
avoiding dynamic structures or arbitrary limitations, checking for a valid 
number of arguments, and also the fact that the evaluation call was not 
too horrible (2 lines per evaluation, factored out by groups of functions 
[operators, min/max, randoms, ...], it is not fully repeated).

There are 5 sub-expression evaluation in the function, totalling 10 LOCs.

TEN.

> I don't think that's a good plan.

The one you suggest does not strike me as intrinsically better: it is a 
trade between some code ugliness (5 repeated evals = 10 lines, a little 
more with the double functions, probably 20 lines) to other uglinesses 
(number of args limit or dynamic allocation, array length to manage and 
test somehow, list to array conversion code, things that will mean far 
more than the few lines of repeated code under discussion).

So I think that it is just a choice between two plans, really, the better 
of which is debatable.

> I experimented with trying to do this and ran into a problem:

Yep. There are other problems, all of which solvable obviously, but which 
means that a lot more that 10 lines will be added to avoid the 5*2 lines 
repetition.

My opinion is that the submitted version is simple and fine for the 
purpose, and the plan you suggest replaces 5*2 repetitions by a lot of 
code and complexity, which is not desirable and should be avoided.

However, for obvious reasons the committer opinion prevails:-)

After considering the various points I raised above, could you confirm 
that you do still require the implementation of this array approach before 
I spend time doing such a thing?

> I also went over your documentation changes.

Thanks, this looks better.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

>> Good point. One simple idea here would be to use a custom pgbench
>> script that has no SQL commands and just calculates the values of some
>> parameters to measure the impact without depending on the backend,
>> with a fixed number of transactions.
>
> Sure, we could do that.  But whether it materially changes pgbench -S
> results, say, is a lot more important.

Indeed. Several runs on my laptop:
  ~ 400000-540000 tps with master using:    \set naccounts 100000 * :scale    \setrandom aid 1 :naccounts
  ~ 430000-530000 tps with full function patch using:    \set naccounts 100000 * :scale    \setrandom aid 1 :naccounts
  ~ 730000-890000 tps with full function patch using:    \set aid random(1, 100000 * :scale)

The performance is pretty similar on the same script. The real pain is 
variable management, avoiding some is a win.

However, as you suggest, the tps impact even with -M prepared -S is 
nought, because the internal scripting time in pgbench is much smaller 
than the time to do actual connecting and querying.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Tue, Feb 16, 2016 at 3:53 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> My opinion is that the submitted version is simple and fine for the purpose,
> and the plan you suggest replaces 5*2 repetitions by a lot of code and
> complexity, which is not desirable and should be avoided.
>
> However, for obvious reasons the committer opinion prevails:-)

You're welcome to solicit other opinions.  I'm not unwilling to give
way if there's a chorus of support for the way you've done it.  But to
me it sounds like you're saying that it doesn't really matter whether
the system is scalable and maintainable because we only have 5
functions right now, which is a design philosophy with which I just
don't agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Tue, Feb 16, 2016 at 5:18 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>> Good point. One simple idea here would be to use a custom pgbench
>>> script that has no SQL commands and just calculates the values of some
>>> parameters to measure the impact without depending on the backend,
>>> with a fixed number of transactions.
>>
>> Sure, we could do that.  But whether it materially changes pgbench -S
>> results, say, is a lot more important.
>
>
> Indeed. Several runs on my laptop:
>
>   ~ 400000-540000 tps with master using:
>     \set naccounts 100000 * :scale
>     \setrandom aid 1 :naccounts
>
>   ~ 430000-530000 tps with full function patch using:
>     \set naccounts 100000 * :scale
>     \setrandom aid 1 :naccounts
>
>   ~ 730000-890000 tps with full function patch using:
>     \set aid random(1, 100000 * :scale)
>
> The performance is pretty similar on the same script. The real pain is
> variable management, avoiding some is a win.

Wow, that's pretty nice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

>> However, for obvious reasons the committer opinion prevails:-)
>
> You're welcome to solicit other opinions.  I'm not unwilling to give
> way if there's a chorus of support for the way you've done it.

Hmmm. I do not expect much chorus on such a minor subject:-)

> But to me it sounds like you're saying that it doesn't really matter 
> whether the system is scalable and maintainable because we only have 5 
> functions right now, which is a design philosophy with which I just 
> don't agree.

The design does not aim at scalability but at simplicity, otherwise I 
would have done things quite differently: with many functions the "switch" 
based selection does not scale anyway.

Anyway, attached are two patches, one for each approach.

The array (second patch) is not too bad if one agrees with a maximum 
number of arguments, and also as I did not change the list structure 
coming from the parser, so it does not need to manage the number of 
arguments in the function structure. The code for min/max is also simpler 
when dealing with an array instead of a linked list. I do not like much 
array references in the code, so I tried to avoid them by using lval/rval 
scalars for operator operands.

Please choose the one you prefer, and I'll adapt the remaining stuff to 
your choice.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Tue, Feb 16, 2016 at 11:12 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Robert,
>
>>> However, for obvious reasons the committer opinion prevails:-)
>>
>>
>> You're welcome to solicit other opinions.  I'm not unwilling to give
>> way if there's a chorus of support for the way you've done it.
>
>
> Hmmm. I do not expect much chorus on such a minor subject:-)
>
>> But to me it sounds like you're saying that it doesn't really matter
>> whether the system is scalable and maintainable because we only have 5
>> functions right now, which is a design philosophy with which I just don't
>> agree.
>
>
> The design does not aim at scalability but at simplicity, otherwise I would
> have done things quite differently: with many functions the "switch" based
> selection does not scale anyway.
>
> Anyway, attached are two patches, one for each approach.
>
> The array (second patch) is not too bad if one agrees with a maximum number
> of arguments, and also as I did not change the list structure coming from
> the parser, so it does not need to manage the number of arguments in the
> function structure. The code for min/max is also simpler when dealing with
> an array instead of a linked list. I do not like much array references in
> the code, so I tried to avoid them by using lval/rval scalars for operator
> operands.
>
> Please choose the one you prefer, and I'll adapt the remaining stuff to your
> choice.

For those two patches and HEAD, it is possible to do a direct
performance comparison for simple operator functions, as those are
being switched to behave as functions. So doing the following for 50M
"transactions" on my laptop:
\set aid 1 + 1
pgbench -f addition.sql -t 50000000

I have the following:
HEAD: 3.5~3.7M TPS
list method: 3.6~3.7M TPS
array method: 3.4~3.5M TPS
So all approaches have a comparable performance.

Btw, patch 2 is returning a warning for me:
pgbench.c:1060:16: warning: comparison of constant
-9223372036854775808 with expression of type 'int' is always false
[-Wtautological-constant-out-of-range-compare]                                               if (lval == PG_INT64_MIN)
It is trying to compare a 32b integer with an int64 value, evalFunc
needed an int64.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Michaël,

> \set aid 1 + 1
> pgbench -f addition.sql -t 50000000
>
> I have the following:
> HEAD: 3.5~3.7M TPS
> list method: 3.6~3.7M TPS
> array method: 3.4~3.5M TPS
> So all approaches have a comparable performance.

Yep, the execution trace is pretty similar in all cases, maybe with a 
little more work for the array method, although I'm surprise that the 
difference is discernable.

> Btw, patch 2 is returning a warning for me:
> It is trying to compare a 32b integer with an int64 value, evalFunc
> needed an int64.

Indeed. My gcc 4.8.4 with --Wall does not show the warning, too bad.

Attached is the fixed patch for the array method.

-- 
Fabien

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Wed, Feb 17, 2016 at 5:22 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> \set aid 1 + 1
>> pgbench -f addition.sql -t 50000000
>>
>> I have the following:
>> HEAD: 3.5~3.7M TPS
>> list method: 3.6~3.7M TPS
>> array method: 3.4~3.5M TPS
>> So all approaches have a comparable performance.
>
> Yep, the execution trace is pretty similar in all cases, maybe with a little
> more work for the array method, although I'm surprise that the difference is
> discernable.

Nah, that's mostly noise I think. My conclusion is that all approaches
are rather equivalent for the operators.
-- 
Michael



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Wed, Feb 17, 2016 at 3:22 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Indeed. My gcc 4.8.4 with --Wall does not show the warning, too bad.
>
> Attached is the fixed patch for the array method.

Committed with a few tweaks, including running pgindent over some of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Wed, Mar 2, 2016 at 3:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Feb 17, 2016 at 3:22 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> Indeed. My gcc 4.8.4 with --Wall does not show the warning, too bad.
>>
>> Attached is the fixed patch for the array method.
>
> Committed with a few tweaks, including running pgindent over some of it.

Thanks. So the first set of functions is in, and the operators are
executed as functions as well. Fabien, are you planning to send
rebased versions of the rest? By that I mean the switch of the
existing subcommands into equivalent functions and the handling of
double values as parameters for those functions.
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> Committed with a few tweaks, including running pgindent over some of it.

Thanks!

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>>> Attached is the fixed patch for the array method.
>>
>> Committed with a few tweaks, including running pgindent over some of it.
>
> Thanks. So the first set of functions is in, and the operators are
> executed as functions as well. Fabien, are you planning to send
> rebased versions of the rest? By that I mean the switch of the
> existing subcommands into equivalent functions and the handling of
> double values as parameters for those functions.

Here they are: - 32-b: add double functions, including double variables - 32-c: remove \setrandom support (advice to
use\set + random instead)
 

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> - 32-b: add double functions, including double variables
> - 32-c: remove \setrandom support (advice to use \set + random instead)

Here is a rebased version after Tom's updates, 33-b & 33-c. I also 
extended the floating point syntax to signed accept signed exponents, and 
changed the regexpr style to match Toms changes.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Mon, Mar 7, 2016 at 4:58 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> - 32-b: add double functions, including double variables
>> - 32-c: remove \setrandom support (advice to use \set + random instead)
>
> Here is a rebased version after Tom's updates, 33-b & 33-c. I also extended
> the floating point syntax to signed accept signed exponents, and changed the
> regexpr style to match Toms changes.

Having a look at 33-b, this looks pretty good now, but:

// comments are not allowed.  I'd just remove the two you have.

It make no sense to exit(1) and then return 0, so don't do that.  I
might write this code as:

if (pval->type == PGBT_INT)  return pval->u.ival;

Assert(pval->type == PGBT_DOUBLE);
/* do double stuff */

This would get rid of the internal-error case here altogether in favor
of testing it via an assertion.

I think that coerceToInt() should not exit(1) when an overflow occurs;
instead, I think the function should be declared as bool
coerceToInt(PgBenchValue *pval, int64 *result), and the error return
should be propagated back to the caller, which can then return false
as we do for other error cases.

I think that some of the places you've used coerceToInt() are not
appropriate.  Like, for example:

+                                                       if
(coerceToInt(rval) == 0)
+                                                       {
+
fprintf(stderr, "division by zero\n");
+                                                               return false;
+                                                       }

Now, if rval is out of range of an integer, that is going to overflow
while trying to see whether it should divide by zero.  Please work a
little harder here and in similar cases.  Maybe add a helper function
checkIntegerEquality(PgBenchValue *, int64).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

> Having a look at 33-b, this looks pretty good now, but:
>
> // comments are not allowed.  I'd just remove the two you have.

Oops, C89 did not make it everywhere yet!

> It make no sense to exit(1) and then return 0, so don't do that. [...]
> This would get rid of the internal-error case here altogether in favor
> of testing it via an assertion.

Ok. Fine with me.

> I think that coerceToInt() should not exit(1) when an overflow occurs;
> instead, I think the function should be declared as bool
> coerceToInt(PgBenchValue *pval, int64 *result), and the error return
> should be propagated back to the caller, which can then return false
> as we do for other error cases.

This is the pain (ugly repeated code) that I wish to avoid, because then 
we cannot write a simple addition for doing an addition, but have to do 
several ugly tests instead. Ok, elegance is probably not a sufficient 
argument against doing that.

Moreover, I do not see any condition in which doing what you suggest makes 
much sense from the user perspective: if there is an internal error in the 
bench code it seems much more logical to ask for the user for a sensible 
bench script, because I would not know how to interpret tps on a bench 
with internal failures in the client code anyway.

For me, exiting immediatly on internal script errors is the right action.

If this is a blocker I'll do them, but I'm convince it is not what should 
be done.

> I think that some of the places you've used coerceToInt() are not
> appropriate.  Like, for example:
>
> +                                                       if
> (coerceToInt(rval) == 0)


> +
> fprintf(stderr, "division by zero\n");
> +                                                               return false;
> +                                                       }
>
> Now, if rval is out of range of an integer, that is going to overflow
> while trying to see whether it should divide by zero.  Please work a
> little harder here and in similar cases.

Ok.

>  Maybe add a helper function
> checkIntegerEquality(PgBenchValue *, int64).

Maybe.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Tue, Mar 8, 2016 at 11:48 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> If this is a blocker I'll do them, but I'm convince it is not what should be
> done.

Well, I think it's a blocker.  Exiting within deeply-nested code
instead of propagating the error upward does not strike me as a good
practice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert.

Here is a v34 b & c.

> // comments are not allowed.  I'd just remove the two you have.

Back to the eighties!

> It make no sense to exit(1) and then return 0, so don't do that.  I
> might write this code as:
> This would get rid of the internal-error case here altogether in favor
> of testing it via an assertion.

I've put assertions instead of exit in some places.

> I think that coerceToInt() should not exit(1) when an overflow occurs;

I think that it should, because the only sane option for the user is to 
fix the script and relaunch the bench: counting errors has no added value 
for the user.

The attached version does some error handling instead, too bad.

> Now, if rval is out of range of an integer, that is going to overflow
> while trying to see whether it should divide by zero.

I could not find a place where there where such potential issue. If the 
value is zero, it cannot overflow when cast to int. If it is not zero but 
it overflows, then it is an overflow, so it should overflow. Maybe I 
misunderstood your point.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Tue, Mar 8, 2016 at 3:52 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> [ new patch and assorted color commentary ]

This is not acceptable:

+                               /* guess double type (n for "inf",
"-inf" and "nan") */
+                               if (strchr(var, '.') != NULL ||
strchr(var, 'n') != NULL)
+                               {
+                                       double dv;
+                                       sscanf(var, "%lf", &dv);
+                                       setDoubleValue(retval, dv);
+                               }
+                               else
+                                       setIntValue(retval, strtoint64(var));

That totally breaks the error handling which someone carefully established here.

+                               PgBenchValue *varg = & vargs[0];

Extra space.

+                               if (!coerceToDouble(st, & vargs[0], &dval))
+                                       return false;

Another extra space.

-       int                     nargs = 0;
-       int64           iargs[MAX_FARGS];
-       PgBenchExprLink *l = args;
+       int                                     nargs = 0;
+       PgBenchValue            vargs[MAX_FARGS];
+       PgBenchExprLink    *l = args;

Completely unnecessary reindentation of the first and third lines.

+                                       setIntValue(retval, i < 0? -i: i);

Not project style.

Please fix the whitespace damage git diff --check complains about, and
check for other places where you haven't followed project style.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

Here is a v35 b & c.

> This is not acceptable:
>
> +                               /* guess double type (n for "inf",
> "-inf" and "nan") */
> +                               if (strchr(var, '.') != NULL ||
> strchr(var, 'n') != NULL)
> +                               {
> +                                       double dv;
> +                                       sscanf(var, "%lf", &dv);
> +                                       setDoubleValue(retval, dv);
> +                               }
> +                               else
> +                                       setIntValue(retval, strtoint64(var));
>
> That totally breaks the error handling which someone carefully established here.

I'm not sure what is "not acceptable" as it "totally breaks the error 
handling" in the above code.

I assumed that you want to check that sscanf can read what sprintf 
generated when handling "\set". I'd guess that libc would be broken if it 
was the case. I've made pgbench check that it is not.

If it was something else, you have to spell it out for me.

> Extra space. [...] Another extra space.

Indeed.

> -       int                     nargs = 0;
> -       int64           iargs[MAX_FARGS];
> -       PgBenchExprLink *l = args;
> +       int                                     nargs = 0;
> +       PgBenchValue            vargs[MAX_FARGS];
> +       PgBenchExprLink    *l = args;
>
> Completely unnecessary reindentation of the first and third lines.

It just looked better to me.

> +                                       setIntValue(retval, i < 0? -i: i);
>
> Not project style.

Indeed.

> Please fix the whitespace damage git diff --check complains about,

"git diff -check" does not seem to complain on the full v35-b.

> and check for other places where you haven't followed project style.

I've found some more instances of "& ref".

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Wed, Mar 9, 2016 at 3:09 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> I'm not sure what is "not acceptable" as it "totally breaks the error
> handling" in the above code.
>
> I assumed that you want to check that sscanf can read what sprintf generated
> when handling "\set". I'd guess that libc would be broken if it was the
> case. I've made pgbench check that it is not.

If the variable contains something that is not an integer, the
existing code will end up here:

fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);

With your patch, you get different behavior depending on exactly how
the input is malformed.  I have a strong suspicion you're going to
tell me that this is another one of those cases where it's OK to make
the error handling worse than it is today, but I'm tired of arguing
that point.  I'm not going to commit it this way, and frankly, neither
is anyone else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

> [...] With your patch, you get different behavior depending on exactly 
> how the input is malformed.

I understand that you require only one possible error message on malformed 
input, instead of failing when converting to double if the input looked 
like a double (there was a '.' clue int it, but that is not proof), or 
something else if it was assumed to be an int.

So I'm going to assume that you do not like the type guessing.

> I'm not going to commit it this way, and frankly, neither is anyone 
> else.

Hmmm. Probably my unconcious self is trying to reach 42.

Here is a v36 which inspect very carefully the string to decide whether 
it is an int or a double. You may, or may not, find it to your taste, I 
can't say.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> Here is a v36 which inspect very carefully the string to decide whether it is 
> an int or a double. You may, or may not, find it to your taste, I can't say.

Here is a v37 which is mostly a rebase after recent changes. Also I 
noticed that I was double counting errors in the previous version, so I 
fixed it.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
v38 is a simple rebase, trying to keep up-to-date with Tom's work.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
> v38 is a simple rebase, trying to keep up-to-date with Tom's work.

v39 is yet another rebase: 42 is in sight!

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Mon, Mar 21, 2016 at 6:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>> v38 is a simple rebase, trying to keep up-to-date with Tom's work.
>
>
> v39 is yet another rebase: 42 is in sight!

What's up with v42? Is that your personal record?
-- 
Michael



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Bonjour Michaël,

>> v39 is yet another rebase: 42 is in sight!
>
> What's up with v42? Is that your personal record?

It is just a (geek) joke, see:

https://en.wikipedia.org/wiki/42_%28number%29#Hitchhiker.27s_Guide_to_the_Galaxy

It is the answer to the "Ultimate Question of Life, The Universe and 
Everything", computed by the supercomputer "Deep thought" in 7.5 million 
years.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
v40 is yet another rebase.

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> v40 is yet another rebase.

Thanks.  Committed after removing an unnecessary parameter from the
coerceToXXX() functions.

I guess the question here is whether we want the part-c patch, which
removes the historical \setrandom syntax.  That's technically no
longer needed since we now can use random() as a function directly
inside \set commands, but we might want it anyway for backward
compatibility.

Anybody have an opinion on that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Tue, Mar 29, 2016 at 9:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> v40 is yet another rebase.
>
> Thanks.  Committed after removing an unnecessary parameter from the
> coerceToXXX() functions.
>
> I guess the question here is whether we want the part-c patch, which
> removes the historical \setrandom syntax.  That's technically no
> longer needed since we now can use random() as a function directly
> inside \set commands, but we might want it anyway for backward
> compatibility.
>
> Anybody have an opinion on that?

+1 for nuking it. That's not worth the trouble maintaining it.
-- 
Michael



Re: extend pgbench expressions with functions

From
Stephen Frost
Date:
* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Tue, Mar 29, 2016 at 9:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> >> v40 is yet another rebase.
> >
> > Thanks.  Committed after removing an unnecessary parameter from the
> > coerceToXXX() functions.
> >
> > I guess the question here is whether we want the part-c patch, which
> > removes the historical \setrandom syntax.  That's technically no
> > longer needed since we now can use random() as a function directly
> > inside \set commands, but we might want it anyway for backward
> > compatibility.
> >
> > Anybody have an opinion on that?
>
> +1 for nuking it. That's not worth the trouble maintaining it.

If we don't nuke it, it'll never die.

See also: pg_shadow

Thanks!

Stephen

Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
>> I guess the question here is whether we want the part-c patch, which
>> removes the historical \setrandom syntax.  That's technically no
>> longer needed since we now can use random() as a function directly
>> inside \set commands, but we might want it anyway for backward
>> compatibility.

This patch is indeed a proposal.

>> Anybody have an opinion on that?
>
> +1 for nuking it. That's not worth the trouble maintaining it.

I share Michaël opinion.

Some arguments for removing it:
 - \setrandom is syntactically inhomogeneous in the overall syntax,   and is now redundant
 - switching to the \set syntax is pretty easy, see attached script
 - custom scripts are short, they are used by few but   advance users, for which updating would not be an issue
 - the parsing & execution codes are lengthy, repetitive...

-- 
Fabien.

Re: extend pgbench expressions with functions

From
Robert Haas
Date:
On Mon, Mar 28, 2016 at 9:36 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Michael Paquier (michael.paquier@gmail.com) wrote:
>> On Tue, Mar 29, 2016 at 9:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Sun, Mar 27, 2016 at 5:01 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> >> v40 is yet another rebase.
>> >
>> > Thanks.  Committed after removing an unnecessary parameter from the
>> > coerceToXXX() functions.
>> >
>> > I guess the question here is whether we want the part-c patch, which
>> > removes the historical \setrandom syntax.  That's technically no
>> > longer needed since we now can use random() as a function directly
>> > inside \set commands, but we might want it anyway for backward
>> > compatibility.
>> >
>> > Anybody have an opinion on that?
>>
>> +1 for nuking it. That's not worth the trouble maintaining it.
>
> If we don't nuke it, it'll never die.
>
> See also: pg_shadow

Hearing no objections, BOOM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: extend pgbench expressions with functions

From
Fabien COELHO
Date:
Hello Robert,

>> If we don't nuke it, it'll never die.
>
> Hearing no objections, BOOM.

FIZZ! :-)

Thanks for the commits, and apology for the portability bugs.

-- 
Fabien.



Re: extend pgbench expressions with functions

From
Michael Paquier
Date:
On Wed, Mar 30, 2016 at 1:23 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
> Hello Robert,
>
>>> If we don't nuke it, it'll never die.
>>
>>
>> Hearing no objections, BOOM.
>
>
> FIZZ! :-)
>
> Thanks for the commits, and apology for the portability bugs.

Thanks for the additions, Fabien. This has been a long trip.
-- 
Michael