Re: extend pgbench expressions with functions - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: extend pgbench expressions with functions |
Date | |
Msg-id | alpine.DEB.2.10.1511052018410.8244@sto Whole thread Raw |
In response to | Re: extend pgbench expressions with functions (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: extend pgbench expressions with functions
|
List | pgsql-hackers |
Hello Robert, >>> 2. ddebug and idebug seem like a lousy idea to me. >> >> It was really useful to me for debugging and testing. > > That doesn't mean it belongs in the final patch. I think it is useful when debugging a script, not just for debugging the evaluation code itself. >>> 3. I'm perplexed by why you've replaced evaluateExpr() with evalInt() >>> and evalDouble(). >> >> Basically, the code is significantly shorter and elegant with this option. > > I find that pretty hard to swallow. Hmmm, maybe with some more explanations? > If the backend took this approach, Sure, I would never suggest to do anything like that in the backend! > we've have a separate evaluate function for every datatype, which would > make it completely impossible to support the creation of new datatypes. In the backend implementation is generic about types (one can add a type dynamically in the system), which is another abstraction level, it does not compare in any way. > And I find this code to be quite difficult to follow. It is really the function *you* wrote, and there is just one version for int and one for double. > What I think we should have is a type like PgBenchValue that represents > a value which may be an integer or a double or whatever else we want to > support, but not an expression - specifically a value. Then when you > invoke a function or an operator it gets passed one or more PgBenchValue > objects and can do whatever it wants with those, storing the result into > an output PgBenchValue. So add might look like this: Sure, I do know what it looks like, and I want to avoid it, because this is just a lot of ugly code which is useless for pgbench purpose. > void > add(PgBenchValue *result, PgBenchValue *x, PgBenchValue *y) > { > if (x->type == PGBT_INTEGER && y->type == PGBT_INTEGER) > { > result->type = PGBT_INTEGER; > result->u.ival = x->u.ival + y->u.ival; > return; > } > if (x->type == PGBT_DOUBLE && y->type == PGBT_DOUBLE) > { > result->type = PGBT_DOUBLE; > result->u.ival = x->u.dval + y->u.dval; > return; > } > /* cross-type cases, if desired, go here */ Why reject 1 + 2.0 ? So the cross-type cases are really required for user sanity, which adds: if (x->type == PGBT_DOUBLE && y->type == PGBT_INTEGER) { result->type = PGBT_DOUBLE; result->u.ival= x->u.dval + y->u.ival; return; } if (x->type == PGBT_INTEGER && y->type == PGBT_DOUBLE) { result->type = PGBT_DOUBLE; result->u.ival = x->u.ival + y->u.dval; return; } > } For the '+' overload operator with conversions there are 4 cases (2 arguments ** 2 types) to handle. For all 5 binary operators (+ - * / %). that makes 20 cases to handle. Then for every function, you have to deal with type conversion as well, each function times number of arguments ** number of types. That is 2 cases for abs, 4 cases for random, 8 cases for each random_exp*, 8 for random_gaus*, and so on. Some thinking would be required for n-ary functions (min & max). Basically, it can be done, no technical issue, it is just a matter of writing a *lot* of repetitive code, hundreds of lines of them. As I think it does not bring any value for pgbench purpose, I used the other approach which reduces the code size by avoiding the combinatorial "cross-type" conversions. > The way you have it, the logic for every function and operator exists > in two copies: one in the integer function, and the other in the > double function. Yep, but only 2 cases to handle, instead of 4 cases in the above example, that is the point. > As soon as we add another data type - strings, dates, whatever - we'd > need to have cases for all of these things in those functions as well, > and every time we add a function, we have to update all the case > statements for every datatype's evaluation function. That's just not a > scalable model. On the contrary, it is reasonably scalable: With the eval-per-type for every added type one should implement one eval function which handles all functions and operators, each eval function roughly the same size as the current ones. With the above approach, the overloaded add function which handles 2 operands with 3 types ('post' + 'gres' -> 'postgres') would have to deal with 2**3 = 8 types combinations instead of 4, so basically it would be doubling the code size. If you add dates on top of that, 2**4 = 16 cases just for one operator. No difficulty there, just a lot of lines... Basically the code size complexity of the above approach is: #functions * (#args ** #types) while for the approach in the submitted patch it is: #types * #functions Now I obviously agree that the approach is not generic and should not be used in other contexts, it is just that it is simple/short and it fits the bill for pgbench. Note that I do not really envision adding more types for pgbench scripts. The double type is just a side effect of the non uniform randoms. What I plan is to add a few functions, especially a pseudo-random permutation and/or a parametric hash, that would allow running more realistic tests with non uniform distributions. -- Fabien.
pgsql-hackers by date: