Re: extend pgbench expressions with functions - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: extend pgbench expressions with functions
Date
Msg-id alpine.DEB.2.10.1511040930040.16208@sto
Whole thread Raw
In response to Re: extend pgbench expressions with functions  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: extend pgbench expressions with functions
Re: extend pgbench expressions with functions
Re: extend pgbench expressions with functions
List pgsql-hackers
Hello Robert,

> 1. I think there should really be two patches here, the first adding
> functions, and the second adding doubles.  Those seem like separate
> changes.  And offhand, the double stuff looks a lot less useful that
> the function call syntax.

I first submitted the infrastructure part, but I was asked to show how 
more functions could be included, especially random variants. As random 
gaussian/exponential functions require a double argument, there must be 
some support for double values.

Now, it could certainly be split in two patches, but this is rather 
artificial, IMO.

> 2. ddebug and idebug seem like a lousy idea to me.

It was really useful to me for debugging and testing.

> If we really need to be able to print stuff from pgbench, which I kind 
> of doubt, then maybe we should have a string data type and a print() 
> function, or maybe a string data type and a toplevel \echo command.

The *debug functions allow to intercept the value computed within an 
expression. If you rely on variables and some echo (which does not exist) 
this means that there should be double variables as well, which is not 
currently the case, and which I do not see as useful for the kind of 
script written for pgbench. Adding the string type is more work, and
I do not see a good use case for those.

So the *debug functions are really just a lightweight solution for 
debugging type related issues in expressions. I can drop them if this is a 
blocker, but the are really useful for testing quickly a script.

> 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt()
> and evalDouble().

As explained above in the thread (I think), the reason is that having one 
overloaded expression evaluation which handles types conversion would 
produce pretty heavy code, and the two functions with the descending 
typing allows to have a much smaller code with the same effect.

The issue is that with two types all functions must handle argument type 
conversion explicitely.

For instance for "random_gaussian(int, int, double)", it may be called 
with any combination of 3 int/double arguments, each one must be tested 
and possibly converted to the target type before calling the actual 
function. For overloaded operators or functions (arithmetics, abs...) 
there is also the decision about which operator is called and then what 
conversions are necessary.

With the descending typing and two functions cross recursion all these 
explicit tests and conversion disappear because the function evaluation 
calls evalInt or evalDouble depending on the expected types.

Basically, the code is significantly shorter and elegant with this option.

> That doesn't seem similar to what I've seen in other 
> expression-evaluation engines.

Probably. This is because I choose a descending typing to simplify the 
implementation. Changing this would bring no real practical benefit from 
the usage point of view, but would add significant more verbose and ugly 
code to test and handle type conversions everywhere, so I'm not keen to do 
that.

> Perhaps I could find out by reading the comments, but actually not, 
> because this entire patch seems to add only one comment:
>
> +       /* reset column count for this scan */

There are a few others, really:-)

> While I'm not a fan of excessive commenting, I think a little more
> explanation here would be good.

I can certainly add more comments to the code, especially around the eval 
cross recursion functions.

> 4. The table of functions in pgbench.sgml seems to leave something to
> be desired.  We added a pretty detailed write-up on the Gaussian and
> exponential options to \setrandom, but exporand() has only this
> description:

Yep. The idea was *not* to replicate the (painful) explanations about 
random functions, but that it should be shared between the function and 
the \set variants.

> +      <row>
> +       <entry><literal><function>exporand(<replaceable>i</>,
> <replaceable>j</>, <replaceable>t</>)</></></>
> +       <entry>integer</>
> +       <entry>exponentially distributed random integer in the bounds,
> see below</>
> +       <entry><literal>exporand(1, 10, 3.0)</></>
> +       <entry>int between <literal>1</> and <literal>10</></>
> +      </row>
>
> That's not very helpful.

The table explanation must be kept short for the table format...

> Without looking at the example, there's no way to guess what i and j 
> mean, and even with looking at the example, there's no way to guess what 
> t means.  If, as I'm guessing, exporand() and guassrand() behave like 
> \setrandom with the exponential and/or Gaussian options, then the 
> documentation for one of those things should contain all of the detailed 
> information and the documentation for the other should refer to it.

Indeed, that was the idea, but it seems that I forgot the pointer:-)

> More than likely, exporand() and gaussrand() should get the detailed 
> explanation, and \setrandom should be document as a deprecated 
> alternative to \set ... {gauss,expo,}rand(...)

Ok, the description can be moved to the function part and the \set 
version reference the other.

> 5. I liked Heikki's proposed function names random_gaussian(min, max,
> threshold) and random_exponential(min, max, threshold) better than the
> ones you've picked here.  I think random() would be OK instead of his
> suggestion of random_uniform(), though.

Ok.

I'll submit an updated version of the patch, which addresses points 4 & 5 
and documents 3, and wait for feedback on the explanations I gave before 
doing anything for about 1 & 2, as I think that the implied changes are 
not desirable. I'm not keen at all on changing the cross recursion 
implementation (3), this would just be pretty ugly code without actual 
user benefit.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Kouhei Kaigai
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual
Next
From: Fabien COELHO
Date:
Subject: Re: extend pgbench expressions with functions