Re: extend pgbench expressions with functions - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: extend pgbench expressions with functions
Date
Msg-id 20150918.115826.177771694.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: extend pgbench expressions with functions  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: extend pgbench expressions with functions
Re: extend pgbench expressions with functions
List pgsql-hackers
Hello, 

At Tue, 15 Sep 2015 19:26:51 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in
<alpine.DEB.2.10.1509151844080.6503@sto>
> > 1.evalInt and evalDouble are mutually calling on single expr
> > node.
> 
> Indeed.
> 
> > It looks simple and seems to work but could easily broken
> > to fall into infinite call and finally (in a moment) exceeds
> > stack depth.
> 
> The recursion is on the tree-structure of the expression, which is
> evaluated depth-first. As the tree must be finite, and there is no
> such thing as recursive functions in the expression tree, this should
> never happen. If it happened, it would be a bug in the eval functions,
> probably a forgotten "case", and could be fixed there easily.

My description should have been obscure. Indeed the call tree is
finite for *sane* expression node. But it makes infinit call for
a value of expr->etype unknown by both evalDouble and
evalInt. This is what I intended to express by the words 'easily
broken'. AFAICS most of switches for nodes in the core would
stops and emit the message like following,

| default:
|   elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));

So I think the following code would do well.

| evalDoubld()
| {
|    if (IS_INT_EXPR_NODE(expr->etype))
|    {
|       if (!evalInt(...))
|          return false;
|       *retval = (double) ival;
|       return true;
|    }     
| 
|    switch (expr->etype)
|    {
|      case ENODE_DOUBLE_CONSTANT:
|      ...
|      default:
|         elog(ERROR, "unrecognized expr type: %d", (int) expr->etype);

I suppose here's a big difference between SEGV after run through
the stack bottom and server's return with ERROR for the same
bug. But IS_INT_EXPR_NODE() would be redundant with the
switch-case..

Do you think that I'm taking the difference too serious? However
this is needelss discussion if the following change is acceptable
for you.

> > I think it is better not to do so. For example, reunioning
> > evalDouble and evalInt into one evalulateExpr() which can return
> > both double and int result would do so. The interface would be
> > something like the follwings or others. Some individual
> > functions might be better to be split out.
> >
> >  static ExprRetType evaluateExpr(TState,CState,PgBenchExpr,
> >                                  int *iret, double *dret);
> 
> That would not work as simply as that: this signature does not say
> whether a double or an int is expected, and without this information
> you would not know what to do on some operators (the typing is
> explicit and descendant in the current implementation).
> Even if you add this information, or you use the returned type
> information to do the typing dynamically, that would mean testing the
> result types and implementing the necessary casts depending on this
> information for every function within the generic eval, which would
> result in repetitive and ugly code for every function and operator:
> for instance for '+', you would have to implement 4 cases explicitely,
> i+i, f+i, i+f, f+f as "int add", "cast left and double add", "cast
> right and double add", and finally "double add". Then you have other
> operators to consider... Although some sharing can be done, this would
> end in lengthy & ugly code.

Yes, I also think it is ugly as I anticipated:(.

By the way, the complexity comes from separating integer and
double. If there is no serios reason to separate them, handling
all values as double makes things far simpler.

Could you let me know the reason why it strictly separates
integer and double?

|   evaluateExpr(..., double *retval)
|   {
|      switch (expr->etype)
|      {
|        case ENODE_CONSTANT:
|          *retval = expr->u.constant.val;
|          return true;
|        case ENODE_OPERATOR
|          /* Nothing to worry about */
|        case ENODE_FUNCTION:
|        {
|            switch (func)
|            { 
|               case PGBENCH_ABS/SQRT/DEBUG:
|                 /* Nothing to worry about, too */
|               /* case PGBENCH_DOUBLE is useless ever after */
|               case PGBENCH_INT:
|                 double arg;
|                 if (!evalulateExpr(..., &arg);
|                    return false;
|         *retval = floor(arg);

I don't see no problem in possible errors of floating point
calculations for this purpose. Is there any?         
Anyway set meta command finally converts the result as integer
regardless of the real value so the following conversion does the
same as your current code.

in doCustom()
> else if (pg_strcasecmp(argv[0], "set") == 0)
> {
>     char        res[64];
>     PgBenchExpr *expr = commands[st->state]->expr;
>     int64        result;
> 
-     if (!evalInt(thread, st, expr, &result))
+     if (!evaluateExpr(thread, st, expr, &dresult))
+           result = (int64)dresult;

This should make exprparse.y simpler.

> A possible alternative could be to explicitely and statically type all
> operations, adding the necessary casts explicitely, distinguishing
> operators, but the would mean more non-obvious code to "compile" the
> parsed expressions, I'm not sure that pgbench deserves this.
> The two eval functions and the descendant typing allow to hide/ignore
> these issues because each eval function handles just one type, type
> boundaries are explicitely handled by calling the other function, so
> there is no reason to test and cast everything all the time.
> 
> So I thing that it would not be an improvement to try to go the way
> you suggest.



> > 2. You combined abs/sqrt/ddebug and then make a branch for
> >  them. It'd be better to split them even if eval*() looks to be
> >  duplicate.
> 
> Hm, why not.

Thanks.

> Here is a v10 where I did that when it did not add too much code
> repetition (eg I kept the shared branches for MIN & MAX and for the 3
> RANDOM functions so as to avoid large copy pastes).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: creating extension including dependencies
Next
From: Euler Taveira
Date:
Subject: vacuumdb sentence