Thread: pgbench --tuple-size option
After publishing some test results with pgbench on SSD with varying page size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, and that results may be different with other tuple sizes. This patch adds an option to change the default tuple size, so that this can be tested easily. -- Fabien.
On 2014-08-15 11:46:52 +0200, Fabien COELHO wrote: > > After publishing some test results with pgbench on SSD with varying page > size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, and > that results may be different with other tuple sizes. > > This patch adds an option to change the default tuple size, so that this can > be tested easily. I don't think it's beneficial to put this into pgbench. There really isn't a relevant benefit over using a custom script here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hello Andres, >> This patch adds an option to change the default tuple size, so that this can >> be tested easily. > > I don't think it's beneficial to put this into pgbench. There really > isn't a relevant benefit over using a custom script here. The scripts to run are the standard ones. The difference is in the *initialization* phase (-i), namely the filler attribute size. There is no custom script for initialization in pgbench, so ISTM that this argument does not apply here. -- Fabien.
On 2014-08-15 11:58:41 +0200, Fabien COELHO wrote: > > Hello Andres, > > >>This patch adds an option to change the default tuple size, so that this can > >>be tested easily. > > > >I don't think it's beneficial to put this into pgbench. There really > >isn't a relevant benefit over using a custom script here. > > The scripts to run are the standard ones. The difference is in the > *initialization* phase (-i), namely the filler attribute size. There is no > custom script for initialization in pgbench, so ISTM that this argument does > not apply here. The custom initialization is to run a manual ALTER after the initialization. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
>>> I don't think it's beneficial to put this into pgbench. There really >>> isn't a relevant benefit over using a custom script here. >> >> The scripts to run are the standard ones. The difference is in the >> *initialization* phase (-i), namely the filler attribute size. There is no >> custom script for initialization in pgbench, so ISTM that this argument does >> not apply here. > > The custom initialization is to run a manual ALTER after the > initialization. Sure, it can be done this way. I'm not sure about the implication of ALTER on the table storage, thus I prefer all benchmarks to run exactly the same straightforward way in all cases so as to avoid unwanted effects on what I'm trying to measure, which is already noisy and unstable enough. -- Fabien.
On 2014-08-15 12:17:31 +0200, Fabien COELHO wrote: > > >>>I don't think it's beneficial to put this into pgbench. There really > >>>isn't a relevant benefit over using a custom script here. > >> > >>The scripts to run are the standard ones. The difference is in the > >>*initialization* phase (-i), namely the filler attribute size. There is no > >>custom script for initialization in pgbench, so ISTM that this argument does > >>not apply here. > > > >The custom initialization is to run a manual ALTER after the > >initialization. > > Sure, it can be done this way. > > I'm not sure about the implication of ALTER on the table storage, Should be fine in this case. But if that's what you're concerned about - understandably - it seems to make more sense to split -i into two. One to create the tables, and another to fill them. That'd allow to do manual stuff inbetween. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
>> I'm not sure about the implication of ALTER on the table storage, > > Should be fine in this case. But if that's what you're concerned about - > understandably - Indeed, my (long) experience with benchmarks is that it is a much more complicated that it looks if you want to really understand what you are getting, and to get anything meaningful. > it seems to make more sense to split -i into two. One to create the > tables, and another to fill them. That'd allow to do manual stuff > inbetween. Hmmm. This would mean much more changes than the pretty trivial patch I submitted: more options (2 parts init + compatibility with the previous case), splitting the "init" function, having a dependency and new error cases to check (you must have the table to fill them), some options apply to first part while other apply to second part, which would lead in any case to a signicantly more complicated documentation... a lot of trouble for my use case to answer Josh pertinent comments, and to be able to test the "tuple size" factor easily. Moreover, I would reject it myself as too much trouble for a small benefit. Feel free to reject the patch if you do not want it. I think that its cost/benefit is reasonable (one small option, small code changes, some benefit for people who want to measure performance in various cases). -- Fabien.
On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote: > >it seems to make more sense to split -i into two. One to create the > >tables, and another to fill them. That'd allow to do manual stuff > >inbetween. > > Hmmm. This would mean much more changes than the pretty trivial patch I > submitted FWIW, I find that patch really ugly. Adding the filler's with in a printf, after the actual DDL declaration. Without so much as a comment. Brr. >: more options (2 parts init + compatibility with the previous > case), splitting the "init" function, having a dependency and new error > cases to check (you must have the table to fill them), some options apply to > first part while other apply to second part, which would lead in any case to > a signicantly more complicated documentation... a lot of trouble for my use > case to answer Josh pertinent comments, and to be able to test the "tuple > size" factor easily. Moreover, I would reject it myself as too much trouble > for a small benefit. Well, it's something more generic, because it allows you do do more... > Feel free to reject the patch if you do not want it. I think that its > cost/benefit is reasonable (one small option, small code changes, some > benefit for people who want to measure performance in various cases). I personally think this isn't worth the price. But I'm just one guy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Aug 15, 2014 at 8:36 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote: >> >it seems to make more sense to split -i into two. One to create the >> >tables, and another to fill them. That'd allow to do manual stuff >> >inbetween. >> >> Hmmm. This would mean much more changes than the pretty trivial patch I >> submitted > > FWIW, I find that patch really ugly. Adding the filler's with in a > printf, after the actual DDL declaration. Without so much as a > comment. Brr. > >>: more options (2 parts init + compatibility with the previous >> case), splitting the "init" function, having a dependency and new error >> cases to check (you must have the table to fill them), some options apply to >> first part while other apply to second part, which would lead in any case to >> a signicantly more complicated documentation... a lot of trouble for my use >> case to answer Josh pertinent comments, and to be able to test the "tuple >> size" factor easily. Moreover, I would reject it myself as too much trouble >> for a small benefit. > > Well, it's something more generic, because it allows you do do more... > >> Feel free to reject the patch if you do not want it. I think that its >> cost/benefit is reasonable (one small option, small code changes, some >> benefit for people who want to measure performance in various cases). > > I personally think this isn't worth the price. But I'm just one guy. I also don't like this feature. The benefit of this option seems too small. If we apply this, we might want to support other options, for example, option to change the data type of each column, option to create new index using "minmax", option to change the fillfactor of each table, ...etc. There are countless such options, but I'm afraid that it's really hard to support so many options. Regards, -- Fujii Masao
>> Hmmm. This would mean much more changes than the pretty trivial patch I >> submitted > > FWIW, I find that patch really ugly. Adding the filler's with in a > printf, after the actual DDL declaration. Without so much as a > comment. Brr. Indeed. I'm not too proud of that very point either:-) You are right that it deserves at the minimum a clear comment. To put the varying size in the DDL string means vsprintf and splitting the query building some more, which I do not find desirable. > [...] > Well, it's something more generic, because it allows you do do more... Apart from I do not need it (at least right now), and that it is more work, my opinion is that it would be rejected. Not a strong insentive to spend time in that direction. -- Fabien.
>>> The custom initialization is to run a manual ALTER after the >>> initialization. >> >> Sure, it can be done this way. >> >> I'm not sure about the implication of ALTER on the table storage, > > Should be fine in this case. After some testing and laughing, my conclusion is "not fine at all". The "filler" attributes in "pgbench" are by default "EXTENDED", which mean possibly compressed... As the the default value is '', the compression, when tried for large sizes, performs very well, and the performance is the same as with a (declared) smaller tuple:-) Probably not the intention of the benchmark designer. Conclusion: I need an ALTER TABLE anyway to change the STORAGE. Or maybe pgbench should always do it anyway... Conclusion 2: I've noted the submission as "rejected" as both you and Fujii don't like it, and although I found it useful, but I can do without it quite easily. -- Fabien.