Thread: add modulo (%) operator to pgbench
This patch is pretty trivial. Add modulo operator to pgbench. This is useful to compute a permutation for tests with non uniform accesses (exponential or gaussian), so as to avoid trivial correlations between neighbour keys. -- Fabien.
> This patch is pretty trivial. Another slightly less trivial but more useful version. The issue is that there are 3 definitions of modulo, two of which are fine (Knuth floored division and Euclidian), and the last one much less useful. Alas, C (%) & SQL (MOD) choose the bad definition:-( I really need any of the other two. The attached patch adds all versions, with "%" and "mod" consistent with their C and SQL unfortunate counterparts, and "fmod" and "emod" the sane ones. > Add modulo operator to pgbench. > > This is useful to compute a permutation for tests with non uniform > accesses (exponential or gaussian), so as to avoid trivial correlations > between neighbour keys. -- Fabien.
On Mon, Aug 4, 2014 at 5:20 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> This patch is pretty trivial. > Another slightly less trivial but more useful version. > > The issue is that there are 3 definitions of modulo, two of which are fine > (Knuth floored division and Euclidian), and the last one much less useful. > Alas, C (%) & SQL (MOD) choose the bad definition:-( I really need any of > the other two. The attached patch adds all versions, with "%" and "mod" > consistent with their C and SQL unfortunate counterparts, and "fmod" and > "emod" the sane ones. Three different modulo operators seems like a lot for a language that doesn't even have a real expression syntax, but I'll yield to whatever the consensus is on this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Mon, Aug 4, 2014 at 5:20 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> This patch is pretty trivial. > > Another slightly less trivial but more useful version. > > > > The issue is that there are 3 definitions of modulo, two of which are fine > > (Knuth floored division and Euclidian), and the last one much less useful. > > Alas, C (%) & SQL (MOD) choose the bad definition:-( I really need any of > > the other two. The attached patch adds all versions, with "%" and "mod" > > consistent with their C and SQL unfortunate counterparts, and "fmod" and > > "emod" the sane ones. > > Three different modulo operators seems like a lot for a language that > doesn't even have a real expression syntax, but I'll yield to whatever > the consensus is on this one. I wonder if it would be necessary to offer the division operator semantics corresponding to whatever additional modulo operator we choose to offer. That is, if we add emod, do we need "ediv" as well? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hello Robert, >> The issue is that there are 3 definitions of modulo, two of which are fine >> (Knuth floored division and Euclidian), and the last one much less useful. >> Alas, C (%) & SQL (MOD) choose the bad definition:-( I really need any of >> the other two. The attached patch adds all versions, with "%" and "mod" >> consistent with their C and SQL unfortunate counterparts, and "fmod" and >> "emod" the sane ones. > > Three different modulo operators seems like a lot for a language that > doesn't even have a real expression syntax, but I'll yield to whatever > the consensus is on this one. I agree that it is overkill. In fact there is a link: if there was a real expression syntax, the remainder sign could be fixed afterwards, so the standard C/SQL version would do. If it is not available, the modulo operator must be right. If there is only one modulo added, I would rather have the Knuth version. However I was afraid that someone would object if "%" does not return the same result than the C/PostgreSQL versions (even if I think that nearly nobody has a clue about what % returns when arguments are negative), hence the 3 modulo version to counter this potential critic. But I would prefer just one version with the Knuth (or Euclidian) definitions. -- Fabien.
Hello Alvaro, > I wonder if it would be necessary to offer the division operator > semantics corresponding to whatever additional modulo operator we choose > to offer. That is, if we add emod, do we need "ediv" as well? I would make sense, however I do not need it, and I'm not sure of a use case where it would be needed, so I do not think that it is "necessary". If it happens to be, it could be added then quite easily. -- Fabien.
> Three different modulo operators seems like a lot for a language that > doesn't even have a real expression syntax, but I'll yield to whatever > the consensus is on this one. Here is a third simpler patch which only implements the Knuth's modulo, where the remainder has the same sign as the divisor. I would prefer this version 3 (one simple modulo based on Knuth definition) or if there is a problem version 2 (all 3 modulos). Version 1 which provides a modulo compatible with C & SQL is really useless to me. -- Fabien.
On Tue, Aug 5, 2014 at 5:53 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> On Mon, Aug 4, 2014 at 5:20 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> >> This patch is pretty trivial. >> > Another slightly less trivial but more useful version. >> > >> > The issue is that there are 3 definitions of modulo, two of which are fine >> > (Knuth floored division and Euclidian), and the last one much less useful. >> > Alas, C (%) & SQL (MOD) choose the bad definition:-( I really need any of >> > the other two. The attached patch adds all versions, with "%" and "mod" >> > consistent with their C and SQL unfortunate counterparts, and "fmod" and >> > "emod" the sane ones. >> >> Three different modulo operators seems like a lot for a language that >> doesn't even have a real expression syntax, but I'll yield to whatever >> the consensus is on this one. > > I wonder if it would be necessary to offer the division operator > semantics corresponding to whatever additional modulo operator we choose > to offer. That is, if we add emod, do we need "ediv" as well? Maybe we ought to break down and support a real expression syntax. Sounds like that would be better all around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Maybe we ought to break down and support a real expression syntax. > Sounds like that would be better all around. Adding operators is more or less orthogonal with providing a new expression syntax. I'm not sure that there is currently a crying need for it (a syntax). It would be a significant project. It would raise the question "where to stop"... And I just really need a few more functions/operators which can be fitted in the current implementation quite easily. -- Fabien.
2014-08-06 23:38 GMT+09:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Here is a third simpler patch which only implements the Knuth's modulo, where the remainder has the same sign as the divisor.Three different modulo operators seems like a lot for a language that
doesn't even have a real expression syntax, but I'll yield to whatever
the consensus is on this one.
I would prefer this version 3 (one simple modulo based on Knuth definition) or if there is a problem version 2 (all 3 modulos). Version 1 which provides a modulo compatible with C & SQL is really useless to me.
I like version 3, it is simple and practical. And it's enough in pgbench.
If someone wants to use other implementation of modulo algorithm, he just changes his source code.
Best regards,
--
Mitsumasa KONDO
Hi,
Here is the review result.
#1. Patch compatibility
Little bit hunk, but it can patch to latest master.
#2. Functionality
No problem.
#3. Documentation
I think modulo operator explanation should put at last at the doc line.
Because the others are more frequently used.
#4. Algorithm
You proposed three modulo algorithm, that are
1. general modulo, 2. floor modulo and 3. euclidian modulo.
They calculate different value when modulo2 or reminder is negative number.
Calculate examples are here,
1. general modulo (patch1)
5 % 3 = 2
5 % -3 = 1
-5 % 3 = -1
2. floor modulo (patch2, 3)
5 % 3 = 2
5 % -3 = -2
-5 % 3 = 2
3. euclidian modulo (patch2)
5 % 3 = 2
5 % -3 = 4
-5 % 3 = 2
That's all.
I think if we want to create equal possibility and inutuitive random generator, we select floor modulo, as you see the upper example. It can create contrapositive random number. 1 and 2 method cannot.
I think euclidian modulo doesn't need by a lot of people. If we add it, many people will confuse, because they doesn't know the mathematic algorithms.
So I like patch3 which is simple and practical.
If you agree or reply my comment, I will mark ready for commiter.
Best Regards,
--
Mitsumsasa KONDO
Hello Mutsumara-san, > #3. Documentation > I think modulo operator explanation should put at last at the doc line. > Because the others are more frequently used. > So I like patch3 which is simple and practical. Ok. > If you agree or reply my comment, I will mark ready for commiter. Please find attached v4, which is v3 plus an improved documentation which is clearer about the sign of the remainder. -- Fabien.
Hi Fabien-san,
Thank you for your fast work!
2014-09-08 23:08 GMT+09:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Hello Mutsumara-san,#3. Documentation
I think modulo operator explanation should put at last at the doc line.
Because the others are more frequently used.So I like patch3 which is simple and practical.
Ok.If you agree or reply my comment, I will mark ready for commiter.
Please find attached v4, which is v3 plus an improved documentation
which is clearer about the sign of the remainder.
The attached is seemed no problem. But I'd like to say about order of explanation in five formulas.
Fix version is here. Please confirm, and I mark it for ready for commiter.
Best regards,
--
Mitsumasa KONDO
Attachment
> The attached is seemed no problem. But I'd like to say about order of > explanation in five formulas. > > Fix version is here. Please confirm, and I mark it for ready for > commiter. I'm ok with this version. -- Fabien.
On Wed, Aug 6, 2014 at 3:22 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> Maybe we ought to break down and support a real expression syntax. >> Sounds like that would be better all around. > > Adding operators is more or less orthogonal with providing a new expression > syntax. I'm not sure that there is currently a crying need for it (a > syntax). It would be a significant project. It would raise the question > "where to stop"... And I just really need a few more functions/operators > which can be fitted in the current implementation quite easily. Writing a simple expression parser for pgbench using flex and bison would not be an inordinately difficult problem. And it would let you add abs() and ?: and thereby avoid the need to hard-code multiple versions of the modulo operator in the patch. The fact that you're agonizing about which modulo to add is a sign that the language is too impoverished to let you do anything non-trivial. That's why C only has one modulo operator: as the patch demonstrates, the differences between the version can be patched over with a very short C expression. If we had real expressions in pgbench, you could do the same thing there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > Writing a simple expression parser for pgbench using flex and bison > would not be an inordinately difficult problem. Sure. Note that there is not only the parsing but also the evaluating to think of, which mean a data structure to represent the expressions which would be more complex than the current one. Although it is not difficult as such, it would mean breaking pgbench heavily, which would mean more time and energy than I have available. > And it would let you add abs() and ?: and thereby avoid the need to > hard-code multiple versions of the modulo operator in the patch. It would also mean to *implement* abs and ?: in the "execution" code to compute the parsed expression. > The fact that you're agonizing about which modulo to add is a sign that > the language is too impoverished to let you do anything non-trivial. I'm not agonizing about which modulo to use:-) I know I do not want the C/SQL version which is never really useful if you have signed data. I overlooked this detail when submitting the first patch, and produced a stupid patch with all 3 possible modulos, before going to the sane solution which is to use the floored division Knuth version. If I had thought a bit, I would have sent v3 as v1 directly. > That's why C only has one modulo operator: as the patch demonstrates, > the differences between the version can be patched over with a very > short C expression. If we had real expressions in pgbench, you could do > the same thing there. Sure. I agree that pgbench is limited and that real expressions would have helped, but it is also quite useful and easy to extend "as is". Maybe the "add an expression parser to pgbench" could be added in the postgresql todo wiki? -- Fabien.
On Tue, Sep 9, 2014 at 11:07 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> The fact that you're agonizing about which modulo to add is a sign that >> the language is too impoverished to let you do anything non-trivial. > > I'm not agonizing about which modulo to use:-) I know I do not want the > C/SQL version which is never really useful if you have signed data. I > overlooked this detail when submitting the first patch, and produced a > stupid patch with all 3 possible modulos, before going to the sane solution > which is to use the floored division Knuth version. If I had thought a bit, > I would have sent v3 as v1 directly. Sure, and I would have looked at that patch and complained that you were implementing a modulo operator with different semantics than C. And then we'd be right back where we are now. The problem here isn't that I don't know what you want, or that what you want is unreasonable. The problem is that we can't cater to every slightly-different thing that somebody wants to do with pgbench. If we try, we'll end up with a neverending jumble of features most of which never get used by 1 or 2 people. Features for any part of PostgreSQL need to be of reasonably general interest, not something that is super-specialized to one particular set of needs. If I commit what you're asking me to commit here, I think that's exactly what I'll be doing, and I don't think that's a good idea. In all seriousness, sometimes the right solution to these problems is just to patch your own copy. I've done that with pgbench at least once and with PostgreSQL in general more times than I can conveniently count. I've learned a lot of useful things that way, but I can't expect all of the debugging instrumentation and trial features that I've created to get incorporated into the product. It's not reasonable, and it's not worth the time it would take me to make the code general and maintainable, so the code just gets dropped on the floor. In a perfect world, that wouldn't happen, but in a perfect world they'd pay me the same salary they pay Linus Torvalds. In this particular instance, we got onto this topic in the first place because of the Gaussian and exponential patches, and the desire to have the "hot" portion of the keys distributed in some random way through the data set rather than having everything be clustered at the front. As you yourself said, the best case for this patch is to allow a poor-man's approximation of that. Adding a weird non-standard operator for a poor approximation of the feature we're really looking for just doesn't feel like the right call. I recognize that you have a different view, and if enough people agree with you, that argument may win the day. But my opinion is that we are too far down in the weeds. We should be trying to create features that will have general appeal to pgbench users, not features that solve narrow problems for individual developers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > Sure, and I would have looked at that patch and complained that you > were implementing a modulo operator with different semantics than C. > And then we'd be right back where we are now. > [...] Probably. To be clear about my intent, which is a summary of what you already know: I would like to be able to generate a reasonable non uniform throttled workload with pgbench. I do think such a feature is worth having for anybody who would like to do something realistic with pgbench, and that it is in the "general interest" of postgresql to have such features. In particular, given the current state of abysmal performance under some trivial load with pg, I really think that it is really worth having a better tool, and I think that my effort with rate and progress helped to put these hidden problems into the light, and I do hope that they are going to be solved quickly. Now if I submitted a big patch with all the necessary features in it, someone would ask to break it into pieces. So they are submitted one by one (progress, throttling, limit, modulo, ...). Note I did not start with the non uniform stuff, but Mitsumasa-san sent a gaussian distribution patch and I jumped onto the wagon to complement it with an exponential distribution patch. I knew when doing it that is was not enough, but as I said "one piece at a time", given the effort required to pass simple patch. What is still needed for the overall purpose is the ability to scatter the distribution. This is really: (1) a positive remainder modulo, which is the trivial patch under discussion (2) a fast but good quality for the purpose hash function also a rather small patch, not submitted yet. (3) maybe the '|' operator to do a TP*-like non-uniform load, which is really periodic so I do not like it. a trivialpatch, not submitted yet. If you do not want one of these pieces (1 & 2), basically the interest of gaussian/exponential addition is much reduced, and this is just a half baked effort aborted because you did not want what was required to make it useful. Well, I can only disagree, but you are a committer and I'm not! -- Fabien.
On Wed, Sep 10, 2014 at 4:48 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Note I did not start with the non uniform stuff, but Mitsumasa-san sent a > gaussian distribution patch and I jumped onto the wagon to complement it > with an exponential distribution patch. I knew when doing it that is was not > enough, but as I said "one piece at a time", given the effort required to > pass simple patch. > > What is still needed for the overall purpose is the ability to scatter the > distribution. This is really: > > (1) a positive remainder modulo, which is the trivial patch under > discussion > > (2) a fast but good quality for the purpose hash function > also a rather small patch, not submitted yet. > > (3) maybe the '|' operator to do a TP*-like non-uniform load, > which is really periodic so I do not like it. > a trivial patch, not submitted yet. > > If you do not want one of these pieces (1 & 2), basically the interest of > gaussian/exponential addition is much reduced, and this is just a half baked > effort aborted because you did not want what was required to make it useful. > Well, I can only disagree, but you are a committer and I'm not! I am not objecting to the functionality; I'm objecting to bolting on ad-hoc operators one at a time. I think an expression syntax would let us do this in a much more scalable way. If I had time, I'd go do that, but I don't. We could add abs(x) and hash(x) and it would all be grand. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > I am not objecting to the functionality; I'm objecting to bolting on > ad-hoc operators one at a time. I think an expression syntax would > let us do this in a much more scalable way. If I had time, I'd go do > that, but I don't. We could add abs(x) and hash(x) and it would all > be grand. Ok. I do agree that an expression syntax would be great! However, that would not diminish nor change much the amount and kind of code necessary to add an operator or a function: the parser would have to be updated, and the expression structure, and the executor. Currently the pgbench "parser" and expression are very limited, but they are also very generic so that nothing is needed to add a new operator there, only the execution code needs to be updated, and the update would be basically the same (if this is this function or operator, actually do it...) if the execution part of a real expression syntax would have to be updated. So although I agree that a real expression syntax would be great "nice to have", I do not think that it is valid objection to block this patch. Moreover, upgrading pgbench to handle an actual expression syntax is not so trivial, because for instance some parts of the text needs to be parsed (set syntax) while others would need not to be pased (SQL commands), so some juggling would be needed in the lexer, or maybe only call the parser on some lines and not others... Everything is possible, but this one would require some careful thinking. -- Fabien.
2014-09-11 15:47 GMT+09:00 Fabien COELHO <coelho@cri.ensmp.fr>:
Because if we use the negative value in modulo calculation, it just set negative value as both positive values,
Hello Robert,I am not objecting to the functionality; I'm objecting to bolting on
ad-hoc operators one at a time. I think an expression syntax would
let us do this in a much more scalable way. If I had time, I'd go do
that, but I don't. We could add abs(x) and hash(x) and it would all
be grand.
Ok. I do agree that an expression syntax would be great!
Yes, it's not bad.
However, will we need some method of modulo calculation?
I don't think so much. I think most intuitive modulo calculation method is floor modulo,
it is easy to expect the result than others. This strong point is good for benchmark script users.
But I don't have any strong opinion about this patch, not blocking:)
Best Regards
--
Mistumasa KONDO
On Thu, Sep 11, 2014 at 2:47 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Ok. I do agree that an expression syntax would be great! > > However, that would not diminish nor change much the amount and kind of code > necessary to add an operator or a function That's not really true. You can't really add abs(x) or hash(x) right now because the current code only supports this syntax: \set varname operand1 [ operator operand2 ] There's no way to add support for a unary operator with that syntax. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>> However, that would not diminish nor change much the amount and kind of >> code necessary to add an operator or a function > > That's not really true. You can't really add abs(x) or hash(x) right > now because the current code only supports this syntax: > > \set varname operand1 [ operator operand2 ] > > There's no way to add support for a unary operator with that syntax. Hmmm. If you accept a postfix syntax, there is:-) But this is not convincing. Adding a unary function with a clean syntax indeed requires doing something with the parser. -- Fabien.
Fabien, * Fabien COELHO (coelho@cri.ensmp.fr) wrote: > >That's not really true. You can't really add abs(x) or hash(x) right > >now because the current code only supports this syntax: > > > >\set varname operand1 [ operator operand2 ] > > > >There's no way to add support for a unary operator with that syntax. > > Hmmm. If you accept a postfix syntax, there is:-) > > But this is not convincing. Adding a unary function with a clean > syntax indeed requires doing something with the parser. Based on the discussion so far, it sounds like you're coming around to agree with Robert (as I'm leaning towards also) that we'd be better off building a real syntax here instead. If so, do you anticipate having a patch to do so in the next few days, or...? Thanks! Stephen
Hello Stephen, >> But this is not convincing. Adding a unary function with a clean >> syntax indeed requires doing something with the parser. > > Based on the discussion so far, it sounds like you're coming around to > agree with Robert (as I'm leaning towards also) that we'd be better off > building a real syntax here instead. Not exactly. My actual opinion is that it is really an orthogonal issue. ISTM that a similar code would be required somehow for the executor part of an hypothetical real syntax if it was to support modulo. > If so, do you anticipate having a patch to do so in the next few days, > or...? Developping a full expression syntax is a significant project that I'm not planing to undertake in the short or medium term, especially as I'm not convinced that it is worth it: It would involve many changes, and I'm afraid that the likelyhood of the patch being rejected on some ground is high. So my opinion is that this small modulo operator patch is both useful and harmless, so it should be committed. If someday there is a nice real syntax implemented, that will be great as well. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: > So my opinion is that this small modulo operator patch is both useful and > harmless, so it should be committed. You've really failed to make that case --- in particular, AFAICS there is not even consensus on the exact semantics that the operator should have. So I'm inclined to reject rather than put in something that may cause surprises down the road. The usefulness doesn't seem great enough to take that risk. The way forward, if we think there is enough value in it (I'm not sure there is), would be to build enough expression infrastructure so that we could inexpensively add both operators and functions. Then we could add a modulo operator with whatever semantics seem most popular, and some function(s) for the other semantics, and there would not be so much riding on choosing the "right" semantics. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Fabien COELHO <coelho@cri.ensmp.fr> writes: > > So my opinion is that this small modulo operator patch is both useful and > > harmless, so it should be committed. > > You've really failed to make that case --- in particular, AFAICS there is > not even consensus on the exact semantics that the operator should have. > So I'm inclined to reject rather than put in something that may cause > surprises down the road. The usefulness doesn't seem great enough to > take that risk. Agreed. > The way forward, if we think there is enough value in it (I'm not > sure there is), would be to build enough expression infrastructure > so that we could inexpensively add both operators and functions. > Then we could add a modulo operator with whatever semantics seem > most popular, and some function(s) for the other semantics, and > there would not be so much riding on choosing the "right" semantics. Indeed and there's plenty of time to make it happen for 9.5. Personally, I'd really like to see as I feel it'd help with the performance farm goal which has been discussed many times over. Fabien, I'd ask that you not be discouraged by this and continue to work with pgbench and work on such an improvement, if you're able to take it on with your other committments. I do see value in it and I feel it will help reproducability, a key and important aspect of performance analysis, much more so than just hacking a local copy of pgbench would. Thanks! Stephen
>> So my opinion is that this small modulo operator patch is both useful and >> harmless, so it should be committed. > > You've really failed to make that case --- in particular, AFAICS there is > not even consensus on the exact semantics that the operator should have. There is. Basically whatever with a positive remainder when the divisor is positive is fine. The only one to avoid is the dividend signed version, which happen to be the one of C and SQL, a very unfortunate choice in both case as soon as you have negative numbers. > So I'm inclined to reject rather than put in something that may cause > surprises down the road. The usefulness doesn't seem great enough to > take that risk. If you reject it, you can also remove the gaussian and exponential random distribution which is near useless without a mean to add a minimal pseudo-random stage, for which "(x * something) % size" is a reasonable approximation, hence the modulo submission. > The way forward, if we think there is enough value in it (I'm not > sure there is), would be to build enough expression infrastructure > so that we could inexpensively add both operators and functions. The modulo patch is basically 10 lines of code, I would not call that "expensive"... As I explained earlier, it would NOT be any different with an "expression infrastructure", as you would have to add a lex for the operator, then a yacc rule for the construction, the operator would need to be represented in a data structure, and the executor should be able to handle the case including errors... there is no way that this would be less that 10 lines of code. It would basically include the very same lines for the executor part. > Then we could add a modulo operator with whatever semantics seem > most popular, and some function(s) for the other semantics, and > there would not be so much riding on choosing the "right" semantics. The semantics is clear. I just choose the wrong one in the first patch:-) -- Fabien.
On 09/23/2014 09:15 PM, Fabien COELHO wrote: >> So I'm inclined to reject rather than put in something that may cause >> surprises down the road. The usefulness doesn't seem great enough to >> take that risk. Marked as "Returned with feedback". > If you reject it, you can also remove the gaussian and exponential random > distribution which is near useless without a mean to add a minimal > pseudo-random stage, for which "(x * something) % size" is a reasonable > approximation, hence the modulo submission. I'm confused. The gaussian and exponential patch was already committed. Are you saying that it's not actually useful, and should be reverted? That doesn't make sense to me, gaussian and exponential distributed values seems quite useful to me in test scripts. I don't understand what that pseudo-random stage you're talking about is. Can you elaborate? - Heikki
Hello Heikki, >> If you reject it, you can also remove the gaussian and exponential random >> distribution which is near useless without a mean to add a minimal >> pseudo-random stage, for which "(x * something) % size" is a reasonable >> approximation, hence the modulo submission. > > I'm confused. The gaussian and exponential patch was already committed. Yes. They are significant patches that really involved significant work, and which are only really useful with a complementary stupid 10 lines patch which is being rejected without understanding why it is needed. > Are you saying that it's not actually useful, and should be reverted? > That doesn't make sense to me, gaussian and exponential distributed > values seems quite useful to me in test scripts. Yes and no. Currently these distributions are achieved by mapping a continuous function onto integers, so that neighboring integers get neighboring number of draws, say with size=7: #draws 10 6 3 1 0 0 0 // some exponential distribution int drawn 0 1 2 3 4 5 6 Although having an exponential distribution of accesses on tuples is quite reasonable, the likelyhood there would be so much correlation between neighboring values is not realistic at all. You need some additional shuffling to get there. > I don't understand what that pseudo-random stage you're talking about is. Can > you elaborate? The pseudo random stage is just a way to scatter the values. A basic approach to achieve this is "i' = (i * large-prime) % size", if you have a modulo. For instance with prime=5 you may get something like: #draws 10 6 3 1 0 0 0 int drawn 0 1 2 3 4 5 6 (i) scattered 0 5 3 1 6 4 2 (i' = 5 i % 7) So the distribution becomes: #draws 10 1 0 3 0 6 0 scattered 0 1 2 3 4 5 6 Which is more interesting from a testing perspective because it removes the neighboring value correlation. A better approach is to use a hash function. "i' = hash(i) % size", although it skews the distribution some more, but the quality of the shuffling is much better than with the mult-modulo version above. Note that you need a modulo as well... I must say that I'm appaled by a decision process which leads to such results, with significant patches passed, and the tiny complement to make it really useful (I mean not on the paper or on the feature list, but in real life) is rejected... -- Fabien.
On 09/24/2014 10:45 AM, Fabien COELHO wrote: > Currently these distributions are achieved by mapping a continuous > function onto integers, so that neighboring integers get neighboring > number of draws, say with size=7: > > #draws 10 6 3 1 0 0 0 // some exponential distribution > int drawn 0 1 2 3 4 5 6 > > Although having an exponential distribution of accesses on tuples is quite > reasonable, the likelyhood there would be so much correlation between > neighboring values is not realistic at all. You need some additional > shuffling to get there. > >> I don't understand what that pseudo-random stage you're talking about is. Can >> you elaborate? > > The pseudo random stage is just a way to scatter the values. A basic > approach to achieve this is "i' = (i * large-prime) % size", if you have a > modulo. For instance with prime=5 you may get something like: > > #draws 10 6 3 1 0 0 0 > int drawn 0 1 2 3 4 5 6 (i) > scattered 0 5 3 1 6 4 2 (i' = 5 i % 7) > > So the distribution becomes: > > #draws 10 1 0 3 0 6 0 > scattered 0 1 2 3 4 5 6 > > Which is more interesting from a testing perspective because it removes > the neighboring value correlation. Depends on what you're testing. Yeah, shuffling like that makes sense for a primary key. Or not: very often, recently inserted rows are also queried more often, so that there is indeed a strong correlation between the integer key and the access frequency. Or imagine that you have a table that stores the height of people in centimeters. To populate that, you would want to use a gaussian distributed variable, without shuffling. For shuffling, perhaps we should provide a pgbench function or operator that does that directly, instead of having to implement it using * and %. Something like hash(x, min, max), where x is the input variable (gaussian distributed, or whatever you want), and min and max are the range to map it to. > I must say that I'm appaled by a decision process which leads to such > results, with significant patches passed, and the tiny complement to make > it really useful (I mean not on the paper or on the feature list, but in > real life) is rejected... The idea of a modulo operator was not rejected, we'd just like to have the infrastructure in place first. - Heikki
Fabien COELHO <coelho@cri.ensmp.fr> wrote: >>> So my opinion is that this small modulo operator patch is both useful and >>> harmless, so it should be committed. >> >> You've really failed to make that case --- in particular, AFAICS there is >> not even consensus on the exact semantics that the operator should have. > > There is. Basically whatever with a positive remainder when the divisor is > positive is fine. The only one to avoid is the dividend signed version, > which happen to be the one of C and SQL, a very unfortunate choice in both > case as soon as you have negative numbers. No, it depends totally on the application. For financial and physical inventory purposes where I have had occasion to use it, the properties which were important were: Assuming that all values are integers, for: x = a / b; y = a % b; If b is zero either statement must generate an error. If a and b have the same sign, x must be positive; else x must benegative. It must hold that abs(x) is equal to abs(a) / abs(b). It must hold that ((x * b) + y) is equal to a. This is exactly what the languages I was using did, and I was glad. I find it convenient that C and SQL behave this way. You are proposing that these not all hold, which in a lot of situations could cause big problems. You seem familiar enough with your own use case that I believe you when you say you don't want these semantics, but that just points out the need to support both. >> Then we could add a modulo operator with whatever semantics seem >> most popular, and some function(s) for the other semantics, and >> there would not be so much riding on choosing the "right" > semantics. > > The semantics is clear. I disagree. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Assuming that all values are integers, for: > x = a / b; > y = a % b; > If b is zero either statement must generate an error. > If a and b have the same sign, x must be positive; else x must be negative. > It must hold that abs(x) is equal to abs(a) / abs(b). > It must hold that ((x * b) + y) is equal to a. Not sure about the third of those statements, but the last one is definitely a requirement. I think the only defensible choice, really, is that % should be defined so that a = ((a / b) * b) + (a % b). It is perfectly reasonable to provide other division/modulus semantics as functions, preferably in matching pairs that also satisfy this axiom. But the two operators need to agree, or you'll have surprised users. regards, tom lane
> No, it depends totally on the application. For financial and > physical inventory purposes where I have had occasion to use it, > the properties which were important were: > [...] Hmmm. Probably I'm biased towards my compiler with an integer linear flavor field, where C-like "%" is always a pain, however you look at it. I'm not sure of physical inventories with negative numbers though. In accounting, I thought that a negative number was a positive number with "debit" written above. In finance, no problem to get big deficits:-) Now about the use case, I'm not sure that you would like to write your financial and physical inventory stuff within a pgbench test script, whereas in such a script I do expect when doing a modulo with the size of a table to have a positive result so that I can expect to find a tuple there, hence the desired "positive remainder" property for negative dividends. -- Fabien.
> The idea of a modulo operator was not rejected, we'd just like to have the > infrastructure in place first. Sigh. How to transform a trivial 10 lines patch into a probably 500+ lines project involving flex & bison & some non trivial data structures, and which may get rejected on any ground... Maybe I'll set that as a student project. -- Fabien.
On 09/24/2014 09:34 PM, Fabien COELHO wrote: > >> The idea of a modulo operator was not rejected, we'd just like to have the >> infrastructure in place first. > > Sigh. > > How to transform a trivial 10 lines patch into a probably 500+ lines > project involving flex & bison & some non trivial data structures, and > which may get rejected on any ground... That's how we get good features. It's very common for someone to need X, and to post a patch that does X. Other people pop up that need Y and Z which are similar, and you end up implementing those too. It's more work for you in the short term, but it benefits everyone in the long run. - Heikki
On Wed, Sep 24, 2014 at 2:34 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Sigh. > > How to transform a trivial 10 lines patch into a probably 500+ lines project > involving flex & bison & some non trivial data structures, and which may get > rejected on any ground... > > Maybe I'll set that as a student project. I think you're making a mountain out of a molehill. I implemented this today in about three hours. The patch is attached. It needs more testing, documentation, and possibly some makefile adjustments, but it seems to basically work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 9/24/14, 10:10 PM, Robert Haas wrote: > I think you're making a mountain out of a molehill. I implemented this > today in about three hours. I think you're greatly understating your own efficiency at shift/reducing parser mountains down to molehills. Fabien even guessed the LOC size of the resulting patch with less than a 9% error. That's some top notch software metrics and development work there boys; kudos all around. Let's get this operator support whipped into shape, then we can add the 2 to 3 versions of the modulo operator needed to make the major use cases work. (There was never going to be just one hacked in with a quick patch that satisfied the multiple ways you can do this) Then onto the last missing pieces of Fabien's abnormally distributed test workload vision. He seems kind of stressed about the process lately; not sure what to say about it. Yes, the PostgreSQL community is hostile to short, targeted feature improvements unless they come already fit into a large, standards compliant strategy for improving the database. That doesn't work well when approached by scratch an itch stye development. Nothing we can really do about it -- Greg Smith greg.smith@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/
Hello Robert, > I think you're making a mountain out of a molehill. Probably. I tend to try the minimum effort first. > I implemented this today in about three hours. The patch is attached. Great! Your patch is 544 lines, my size evaluation was quite good:-) Note that I probably spent 10 minutes on the 10 lines patch, but I probably spent hours writing mails about it. > It needs more testing, documentation, and possibly some makefile > adjustments, but it seems to basically work. I'll try to do that, but not in the short term. Note that the modulo is not the one I need, but probably I can make do with an abs() function and/or a conditional. -- Fabien.
>> I think you're making a mountain out of a molehill. I implemented this >> today in about three hours. > > I think you're greatly understating your own efficiency at shift/reducing > parser mountains down to molehills. Fabien even guessed the LOC size of the > resulting patch with less than a 9% error. As I do research and also teach compilers, it was not that hard for me to provide a reasonable size estimation. A also noticed Robert's 3 lines per minutes outstanding productivity. Once you include argumenting, testing, debuging and documenting, that may go down a bit, but it is very good anyway. You can also notice that in the patch the handling of '%' involves 11 lines (1 lexer, 1 parser, 9 executor), basically the very same size as the patch I submitted, for a very similar code. > [...]. Yes, the PostgreSQL community is hostile to > short, targeted feature improvements unless they come already fit into a > large, standards compliant strategy for improving the database. That doesn't > work well when approached by scratch an itch stye development. Indeed I tend to avoid spending hours on something when I can spend minutes, and if I spend hours I'm really cross if a patch is rejected, whereas I'm less crossed if I just lost minutes. I had bad experiences with patch submissions in the past when "things are changed and someone does not want it", and it seems that this patch falls within this risk, hence my reluctance to spend the time. Now, as the patch is submitted by a core dev, I think the likelyhood that some version will be committed is high, so all is well, and I gladly review and test it when I have time, alas not in the short term. -- Fabien.
On Thu, Sep 25, 2014 at 1:27 AM, Gregory Smith <gregsmithpgsql@gmail.com> wrote: > On 9/24/14, 10:10 PM, Robert Haas wrote: >> I think you're making a mountain out of a molehill. I implemented this >> today in about three hours. > > I think you're greatly understating your own efficiency at shift/reducing > parser mountains down to molehills. Fabien even guessed the LOC size of the > resulting patch with less than a 9% error. That's some top notch software > metrics and development work there boys; kudos all around. Well, I blame Heikki. I used to think that this kind of thing was really hard, and a few years ago I might have had Fabien's reaction, but then I saw Heikki bust out a shift/reduce parser for the isolation tester in no time, so I decided it must not be that hard. So it proved. I copied all that hard parts from other parts of the PostgreSQL code base - my references were the isolation tester lexer and parser, the contrib/seg parser, and the main parser. I couldn't do it that fast from scratch, not even close, but adapting something that's already known to work is much easier. > Let's get this operator support whipped into shape, then we can add the 2 to > 3 versions of the modulo operator needed to make the major use cases work. > (There was never going to be just one hacked in with a quick patch that > satisfied the multiple ways you can do this) I don't think adding more versions of the modulo operator is the right way forward: I think we should add ? : for conditionals and some kind of function thing like abs(x). Or maybe come up with a more sophisticated rehashing algorithm and expose that directly as hash(x). That's my whole reason for not wanting to adopt Fabien's approach in the first place: I was cool with exposing C's modulo operator, but any other modulo semantics seem like they should be built up from general-purpose primitives. Anyway, I think the first thing is that somebody needs to spend some time testing, polishing, and documenting this patch, before we start adding to it. I'm hoping someone else will volunteer - other tasks beckon. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 9/25/14, 8:38 AM, Robert Haas wrote: > That's my whole reason for not wanting to adopt Fabien's approach in > the first place: I was cool with exposing C's modulo operator, but any > other modulo semantics seem like they should be built up from > general-purpose primitives. Maybe; I don't quite understand his requirements well enough yet to know if that's possible, or if it's easier to give him a full special operator of his own. But since what you did makes that easier, too, forward progress regardless. > Anyway, I think the first thing is that somebody needs to spend some > time testing, polishing, and documenting this patch, before we start > adding to it. I'm hoping someone else will volunteer - other tasks > beckon. I bouncing it to here for you, and I expect to help with those parts presumably in addition to Fabien's help: https://commitfest.postgresql.org/action/patch_view?id=1581
On 09/25/2014 05:10 AM, Robert Haas wrote: > On Wed, Sep 24, 2014 at 2:34 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> Sigh. >> >> How to transform a trivial 10 lines patch into a probably 500+ lines project >> involving flex & bison & some non trivial data structures, and which may get >> rejected on any ground... >> >> Maybe I'll set that as a student project. > > I think you're making a mountain out of a molehill. I implemented > this today in about three hours. The patch is attached. It needs > more testing, documentation, and possibly some makefile adjustments, > but it seems to basically work. Looks good to me. The new modulo operator needs documentation, and it could use a pgindent run, but other than that I think this is ready for commit. It would be nice to go even further, and replace process_file, process_builtin, and process_commands altogether with a Bison parser. But this is definitely a step in the right direction. - Heikki
On 11/24/2014 07:26 AM, Heikki Linnakangas wrote: > On 09/25/2014 05:10 AM, Robert Haas wrote: >> On Wed, Sep 24, 2014 at 2:34 PM, Fabien COELHO <coelho@cri.ensmp.fr> >> wrote: >>> Sigh. >>> >>> How to transform a trivial 10 lines patch into a probably 500+ lines >>> project >>> involving flex & bison & some non trivial data structures, and which >>> may get >>> rejected on any ground... >>> >>> Maybe I'll set that as a student project. >> >> I think you're making a mountain out of a molehill. I implemented >> this today in about three hours. The patch is attached. It needs >> more testing, documentation, and possibly some makefile adjustments, >> but it seems to basically work. > > Looks good to me. The new modulo operator needs documentation, and it > could use a pgindent run, but other than that I think this is ready > for commit. > > It would be nice to go even further, and replace process_file, > process_builtin, and process_commands altogether with a Bison parser. > But this is definitely a step in the right direction. > As it doesn't have documentation, I'm inclined to say we should mark this as Waiting on Author or Returned with Feedback. cheers andrew
> As it doesn't have documentation, I'm inclined to say we should mark this as > Waiting on Author or Returned with Feedback. I'm planing to have a detailed look at Robert's patch before the end of the year. I could update pgbench documentation while doing that. -- Fabien.
On 12/13/2014 01:19 PM, Fabien COELHO wrote: > >> As it doesn't have documentation, I'm inclined to say we should mark >> this as Waiting on Author or Returned with Feedback. > > I'm planing to have a detailed look at Robert's patch before the end > of the year. I could update pgbench documentation while doing that. > Ok, I think for now that means Returned with Feedback. cheers andrew
Here is a review & updated version of the patch. Patch applies and compiles without problem on current head, and worked for the various tests I made with custom scripts. This patch is a good thing, and I recommand warmly its inclusion. It extends greatly pgbench custom capabilities by allowing arbitrary integer expressions, and will allow to add other functions (I'll send abs() & a hash(), and possibly others). I'm not sure about what Makefile changes could be necessary for various environments, it looks ok to me as it is. I have included the following additional changes in v2: (1) small update to pgbench documentation. English proof reading is welcome. (2) improve error reporting to display the file and line from where an error is raised, and also the column on syntaxerrors (yyline is always 1...). The prior status was that there were no way to get this information, which wasquite annoying. It could probably be improved further. (3) spacing fixed in a few places. If Robert is ok with these changes, I think it could be applied. -- Fabien.
> I'm not sure about what Makefile changes could be necessary for > various environments, it looks ok to me as it is. Found one. EXTRA_CLEAN is needed for generated C files. Added to this v3. -- Fabien.
On 1 January 2015 at 04:02, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
I'm not sure about what Makefile changes could be necessary for
various environments, it looks ok to me as it is.
Found one. EXTRA_CLEAN is needed for generated C files. Added to this v3.
I was about to go and look at this, but I had a problem when attempting to compile with MSVC.
The attached patch seems to fix it.
Regards
David Rowley
Attachment
> I was about to go and look at this, but I had a problem when attempting to > compile with MSVC. Thanks! Here is a v4 which includes your fix. -- Fabien.
On 1 January 2015 at 21:23, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
I was about to go and look at this, but I had a problem when attempting to
compile with MSVC.
Thanks! Here is a v4 which includes your fix.
Thank you.
I've had a quick look at the patch as I'm quite interested in seeing some improvements in this area.
At the moment I feel the patch is a bit half done. I really think that since the gaussian and exponential stuff was added in commit ed802e7d, that this should now be changed so that we have functions like random(), erandom() and grandom() and the way to use this becomes:
\set name random(1,10)
\set name erandom(1,10)
And we completely pull out the new \\setrandom additions added in that commit. We'd likely keep \\setrandom name 1 10 for backwards compatibility.
Does anyone else feel strongly about fixing this now, while we have the chance?
Regards
David Rowley
Hello David, > At the moment I feel the patch is a bit half done. I really think that > since the gaussian and exponential stuff was added in commit ed802e7d, that > this should now be changed so that we have functions like random(), > erandom() and grandom() and the way to use this becomes: > \set name random(1,10) > \set name erandom(1,10) > > And we completely pull out the new \\setrandom additions added in that > commit. We'd likely keep \\setrandom name 1 10 for backwards compatibility. > Does anyone else feel strongly about fixing this now, while we have the > chance? I thought about adding functions, possibly random, very probably abs & some hash, but I felt it would be more for a second round. The other point is that although uniform random is fine, the gaussian and exponential ones require an additional floating point argument which means handling some typing. The current patch is "just" about handling operators as before, although in a much nicer and extensible way, thus I would suggest to let Robert's patch more or less as it is, and people will be able to propose new extensions later on. -- Fabien.
David Rowley wrote: > At the moment I feel the patch is a bit half done. I really think that > since the gaussian and exponential stuff was added in commit ed802e7d, that > this should now be changed so that we have functions like random(), > erandom() and grandom() and the way to use this becomes: > > \set name random(1,10) > \set name erandom(1,10) > > And we completely pull out the new \\setrandom additions added in that > commit. Sounds good to me. The current syntax is rather messy IMV, and presumably a function-based syntax might be better. > We'd likely keep \\setrandom name 1 10 for backwards compatibility. Yep. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1 January 2015 at 21:23, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
I was about to go and look at this, but I had a problem when attempting to
compile with MSVC.
Thanks! Here is a v4 which includes your fix.
I'm also just looking at you ERROR() macro, it seems that core code is quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is not defined. I'd say this needs to be fixed too. Have a look at the trick used in elog.h for when HAVE__VA_ARGS is not defined.
Regards
David Rowley
> I'm also just looking at you ERROR() macro, it seems that core code is > quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is > not defined. I'd say this needs to be fixed too. Have a look at the trick > used in elog.h for when HAVE__VA_ARGS is not defined. Indeed. The simplest solution seems to expand this macro so that these is no macro:-) I'll do that. -- Fabien.
> I'm also just looking at you ERROR() macro, it seems that core code is > quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is > not defined. I'd say this needs to be fixed too. Have a look at the trick > used in elog.h for when HAVE__VA_ARGS is not defined. Here is a v5 with the vararg macro expanded. -- Fabien.
David Rowley wrote: > I'm also just looking at you ERROR() macro, it seems that core code is > quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is > not defined. I'd say this needs to be fixed too. Have a look at the trick > used in elog.h for when HAVE__VA_ARGS is not defined. Hm, I just looked at the previous version which used ERROR rather than LOCATE and LOCATION, and I liked that approach better. Can we get that back? I understand that for the purposes of this patch it's easier to not change existing fprintf()/exit() calls, though. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Fabien COELHO wrote: > > >I'm also just looking at you ERROR() macro, it seems that core code is > >quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is > >not defined. I'd say this needs to be fixed too. Have a look at the trick > >used in elog.h for when HAVE__VA_ARGS is not defined. > > Here is a v5 with the vararg macro expanded. Thanks. On top of evaluateExpr() we need a comment (generally I think pgbench could do with more comments; not saying your patch should add them, just expressing an opinion.) Also, intuitively I would say that the return values of that function should be reversed: return true if things are good. Can we cause a stack overflow in this function with a complex enough expression? I wonder about LOCATE and LOCATION. Can we do away with the latter, and keep only LOCATE perhaps with a better name such as PRINT_ERROR_AT or similar? I would just expand an ad-hoc fprintf in the single place where the other macro is used. Are we okay with only integer operands? Is this something we would expand in the future? Is the gaussian/exp random stuff going to work with integer operands, if we want to change it to use function syntax, as expressed elsewhere? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Alvaro, > On top of evaluateExpr() we need a comment (generally I think pgbench > could do with more comments; not saying your patch should add them, just > expressing an opinion.) Having spent some time in pgbench, I agree that more comments are a good thing. > Also, intuitively I would say that the return values of that function > should be reversed: return true if things are good. Ok. > Can we cause a stack overflow in this function with a complex > enough expression? Not for any practical purpose, IMO. > I wonder about LOCATE and LOCATION. Can we do away with the latter, and > keep only LOCATE perhaps with a better name such as PRINT_ERROR_AT or > similar? I would just expand an ad-hoc fprintf in the single place > where the other macro is used. I think that all "location" information should always be the same, so having it defined only once helps maintenance. If someone fixes the macro and there is one expanded version it is likely that it would not be changed. Maybe we could do with only one macro, though. > Are we okay with only integer operands? Is this something we would > expand in the future? Probably > Is the gaussian/exp random stuff going to work with integer operands, No, it will need a floating point parameter, but I was thinking of only adding constants floats as function arguments in a first approach, and not allow an expression syntax on these, something like: \set n exprand(1, :size+3, 2.0) + 1 But not \set n exprand(1, :size+3, :size/3.14159) + 1 That is debatable. Otherwise we have to take care of typing issues, which would complicate the code significantly with two dynamic types (int & float) to handle, propagate and so in the expression evaluation. It is possible though, but it seems to me that it is a lot of bother for a small added value. Anyway, I suggest to keep that for another round and keep the Robert's isofunctional patch as it is before extending. > if we want to change it to use function syntax, as expressed elsewhere? I think I'll add a function syntax, and add a new node type to handle these, but the current syntax should/might be preserved for upward compatibility. -- Fabien.
>> I'm also just looking at you ERROR() macro, it seems that core code is >> quite careful not to use __VA_ARGS__ on compilers where HAVE__VA_ARGS is >> not defined. I'd say this needs to be fixed too. Have a look at the trick >> used in elog.h for when HAVE__VA_ARGS is not defined. > > Hm, I just looked at the previous version which used ERROR rather than > LOCATE and LOCATION, and I liked that approach better. Can we get that > back? It seems that postgresql must be able to compile with a preprocessor which does not handle varargs macros, so I thought it simpler to just expand the macro. If we must have it without a vararg macro, it means creating an adhoc vararg function to handle the vfprintf and exit. Oviously it can be done, if vfprintf is available. The prior style was to repeat fprintf/exit everywhere, I wanted to improve a little, but not to bother too much about portability constraints with pre C99 compilers. > I understand that for the purposes of this patch it's easier to > not change existing fprintf()/exit() calls, though. The issue is really the portability constraints. C99 is only 16 years old, so it is a minor language:-) -- Fabien.
Hello Alvaro, Here is a v6 with most of your suggestions applied. > On top of evaluateExpr() we need a comment (generally I think pgbench > could do with more comments; not saying your patch should add them, just > expressing an opinion.) Also, intuitively I would say that the return > values of that function should be reversed: return true if things are > good. Comment & inverted return value done. > I wonder about LOCATE and LOCATION. Can we do away with the latter, and > keep only LOCATE perhaps with a better name such as PRINT_ERROR_AT or > similar? I would just expand an ad-hoc fprintf in the single place > where the other macro is used. I've used just one PRINT_ERROR_AT() macro consistently. > Are we okay with only integer operands? Is this something we would > expand in the future? Is the gaussian/exp random stuff going to work > with integer operands, if we want to change it to use function syntax, > as expressed elsewhere? Nothing for now, I feel it is for a later round. > [other mail] bring ERROR() macro back I also prefer the code with it, but the cost-benefit of a pre-C99 compatible implementation seems quite low, and it does imply less (style) changes with the previous situation as it is. -- Fabien.
On Mon, Jan 5, 2015 at 10:37 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Anyway, I suggest to keep that for another round and keep the Robert's > isofunctional patch as it is before extending. +1. Let's please get the basic thing committed, and then people can write more patches to extend and improve it. There is no reason to squash-merge every enhancement someone might want here into what I wrote. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Mon, Jan 5, 2015 at 10:37 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> Anyway, I suggest to keep that for another round and keep the Robert's >> isofunctional patch as it is before extending. > > +1. Let's please get the basic thing committed, and then people can > write more patches to extend and improve it. There is no reason to > squash-merge every enhancement someone might want here into what I > wrote. From my point of view the latest v6 of the patch is pretty "ready for committers", but I'm not sure whom should decide that... Should I just update the state in the commitfest interface? -- Fabien.
* Fabien COELHO (coelho@cri.ensmp.fr) wrote: > > >On Mon, Jan 5, 2015 at 10:37 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >>Anyway, I suggest to keep that for another round and keep the Robert's > >>isofunctional patch as it is before extending. > > > >+1. Let's please get the basic thing committed, and then people can > >write more patches to extend and improve it. There is no reason to > >squash-merge every enhancement someone might want here into what I > >wrote. > > >From my point of view the latest v6 of the patch is pretty "ready > >for > committers", but I'm not sure whom should decide that... > > Should I just update the state in the commitfest interface? I took a look through the patch and the discussion and it certainly seems ready to me. I agree with Robert- let's go ahead and get this in and then folks can build on top of it. I'm guessing it was added as "Needs Review" since that's the default for a new entry, but it's clearly had review from multiple people, committers and non-committers alike, so I went ahead and marked it as 'ready for committer' to make that clear to folks looking at the CF app. Robert, as this is mostly your code (and you're marked as author on the CF app), do you want to do the actual commit, or are you impartial, or would you prefer someone else handle it, or..? I'm interested in this also and would be happy to help in any way I can. Thanks! Stephen
On Sat, Feb 28, 2015 at 12:06 PM, Stephen Frost <sfrost@snowman.net> wrote: > I took a look through the patch and the discussion and it certainly > seems ready to me. I agree with Robert- let's go ahead and get this in > and then folks can build on top of it. I'm guessing it was added as > "Needs Review" since that's the default for a new entry, but it's > clearly had review from multiple people, committers and non-committers > alike, so I went ahead and marked it as 'ready for committer' to make > that clear to folks looking at the CF app. > > Robert, as this is mostly your code (and you're marked as author on the > CF app), do you want to do the actual commit, or are you impartial, or > would you prefer someone else handle it, or..? I'm interested in this > also and would be happy to help in any way I can. Yeah, I'll take care of it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Mar 1, 2015 at 1:42 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, Feb 28, 2015 at 12:06 PM, Stephen Frost <sfrost@snowman.net> wrote: >> I took a look through the patch and the discussion and it certainly >> seems ready to me. I agree with Robert- let's go ahead and get this in >> and then folks can build on top of it. I'm guessing it was added as >> "Needs Review" since that's the default for a new entry, but it's >> clearly had review from multiple people, committers and non-committers >> alike, so I went ahead and marked it as 'ready for committer' to make >> that clear to folks looking at the CF app. >> >> Robert, as this is mostly your code (and you're marked as author on the >> CF app), do you want to do the actual commit, or are you impartial, or >> would you prefer someone else handle it, or..? I'm interested in this >> also and would be happy to help in any way I can. > > Yeah, I'll take care of it. Done. I removed some of the new error-reporting stuff because (1) I wasn't sure it was right and (2) it touched more places than just the ones directly relevant to the patch. I think those things can be resubmitted as a separate patch, but I'd like to have a more robust discussion about what we want the error reporting to look like rather than just sliding it into this patch. I also removed the bit about using "uniform" as an argument to setrandom, which may or may not be something we want but doesn't seem related to the rest of this. The rest, I committed with minor wordsmithing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Robert, > Done. Great! > I removed some of the new error-reporting stuff because (1) I wasn't > sure it was right Hmmm. Although I'm not sure it was right, I'm sure that not enough error reporting is too rough: sh> ./pgbench -f bad.sql syntax error at column 25 set: parse error > and (2) it touched more places than just the ones directly relevant to > the patch. It just tried to report errors consistently from all places. > I think those things can be resubmitted as a separate patch, Yep, I'll do that. > but I'd like to have a more robust discussion about what we want the > error reporting to look like rather than just sliding it into this > patch. As an input, I suggest that the error reporting feature should provide a clue about where the issue is, including a line number and possibly the offending line. Not sure what else is needed. -- Fabien.
On Mon, Mar 2, 2015 at 5:41 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >> but I'd like to have a more robust discussion about what we want the error >> reporting to look like rather than just sliding it into this patch. > > As an input, I suggest that the error reporting feature should provide a > clue about where the issue is, including a line number and possibly the > offending line. Not sure what else is needed. I agree. But I think it might be better to try to put everything related to a single error on one line, in a consistent format, e.g.: bad.sql:3: syntax error in set command at column 25 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Mon, Mar 2, 2015 at 5:41 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: >>> but I'd like to have a more robust discussion about what we want the error >>> reporting to look like rather than just sliding it into this patch. >> >> As an input, I suggest that the error reporting feature should provide a >> clue about where the issue is, including a line number and possibly the >> offending line. Not sure what else is needed. > > I agree. But I think it might be better to try to put everything > related to a single error on one line, in a consistent format, e.g.: > > bad.sql:3: syntax error in set command at column 25 Hmmm... sure that's better. I'll look into that. -- Fabien.