Thread: pgbench --tuple-size option

pgbench --tuple-size option

From
Fabien COELHO
Date:
After publishing some test results with pgbench on SSD with varying page 
size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, 
and that results may be different with other tuple sizes.

This patch adds an option to change the default tuple size, so that this 
can be tested easily.

-- 
Fabien.

Re: pgbench --tuple-size option

From
Andres Freund
Date:
On 2014-08-15 11:46:52 +0200, Fabien COELHO wrote:
> 
> After publishing some test results with pgbench on SSD with varying page
> size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, and
> that results may be different with other tuple sizes.
> 
> This patch adds an option to change the default tuple size, so that this can
> be tested easily.

I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgbench --tuple-size option

From
Fabien COELHO
Date:
Hello Andres,

>> This patch adds an option to change the default tuple size, so that this can
>> be tested easily.
>
> I don't think it's beneficial to put this into pgbench. There really
> isn't a relevant benefit over using a custom script here.

The scripts to run are the standard ones. The difference is in the 
*initialization* phase (-i), namely the filler attribute size. There is no 
custom script for initialization in pgbench, so ISTM that this argument 
does not apply here.

-- 
Fabien.



Re: pgbench --tuple-size option

From
Andres Freund
Date:
On 2014-08-15 11:58:41 +0200, Fabien COELHO wrote:
> 
> Hello Andres,
> 
> >>This patch adds an option to change the default tuple size, so that this can
> >>be tested easily.
> >
> >I don't think it's beneficial to put this into pgbench. There really
> >isn't a relevant benefit over using a custom script here.
> 
> The scripts to run are the standard ones. The difference is in the
> *initialization* phase (-i), namely the filler attribute size. There is no
> custom script for initialization in pgbench, so ISTM that this argument does
> not apply here.

The custom initialization is to run a manual ALTER after the
initialization.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgbench --tuple-size option

From
Fabien COELHO
Date:
>>> I don't think it's beneficial to put this into pgbench. There really
>>> isn't a relevant benefit over using a custom script here.
>>
>> The scripts to run are the standard ones. The difference is in the
>> *initialization* phase (-i), namely the filler attribute size. There is no
>> custom script for initialization in pgbench, so ISTM that this argument does
>> not apply here.
>
> The custom initialization is to run a manual ALTER after the
> initialization.

Sure, it can be done this way.

I'm not sure about the implication of ALTER on the table storage, thus I 
prefer all benchmarks to run exactly the same straightforward way in all 
cases so as to avoid unwanted effects on what I'm trying to measure, which 
is already noisy and unstable enough.

-- 
Fabien.



Re: pgbench --tuple-size option

From
Andres Freund
Date:
On 2014-08-15 12:17:31 +0200, Fabien COELHO wrote:
> 
> >>>I don't think it's beneficial to put this into pgbench. There really
> >>>isn't a relevant benefit over using a custom script here.
> >>
> >>The scripts to run are the standard ones. The difference is in the
> >>*initialization* phase (-i), namely the filler attribute size. There is no
> >>custom script for initialization in pgbench, so ISTM that this argument does
> >>not apply here.
> >
> >The custom initialization is to run a manual ALTER after the
> >initialization.
> 
> Sure, it can be done this way.
> 
> I'm not sure about the implication of ALTER on the table storage,

Should be fine in this case. But if that's what you're concerned about -
understandably - it seems to make more sense to split -i into two. One
to create the tables, and another to fill them. That'd allow to do
manual stuff inbetween.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgbench --tuple-size option

From
Fabien COELHO
Date:
>> I'm not sure about the implication of ALTER on the table storage,
>
> Should be fine in this case. But if that's what you're concerned about -
> understandably -

Indeed, my (long) experience with benchmarks is that it is a much more 
complicated that it looks if you want to really understand what you are 
getting, and to get anything meaningful.

> it seems to make more sense to split -i into two. One to create the 
> tables, and another to fill them. That'd allow to do manual stuff 
> inbetween.

Hmmm. This would mean much more changes than the pretty trivial patch I 
submitted: more options (2 parts init + compatibility with the previous 
case), splitting the "init" function, having a dependency and new error 
cases to check (you must have the table to fill them), some options apply 
to first part while other apply to second part, which would lead in any 
case to a signicantly more complicated documentation... a lot of trouble 
for my use case to answer Josh pertinent comments, and to be able to test 
the "tuple size" factor easily. Moreover, I would reject it myself as too 
much trouble for a small benefit.

Feel free to reject the patch if you do not want it. I think that its 
cost/benefit is reasonable (one small option, small code changes, some 
benefit for people who want to measure performance in various cases).

-- 
Fabien.



Re: pgbench --tuple-size option

From
Andres Freund
Date:
On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:
> >it seems to make more sense to split -i into two. One to create the
> >tables, and another to fill them. That'd allow to do manual stuff
> >inbetween.
> 
> Hmmm. This would mean much more changes than the pretty trivial patch I
> submitted

FWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.

>: more options (2 parts init + compatibility with the previous
> case), splitting the "init" function, having a dependency and new error
> cases to check (you must have the table to fill them), some options apply to
> first part while other apply to second part, which would lead in any case to
> a signicantly more complicated documentation... a lot of trouble for my use
> case to answer Josh pertinent comments, and to be able to test the "tuple
> size" factor easily. Moreover, I would reject it myself as too much trouble
> for a small benefit.

Well, it's something more generic, because it allows you do do more...

> Feel free to reject the patch if you do not want it. I think that its
> cost/benefit is reasonable (one small option, small code changes, some
> benefit for people who want to measure performance in various cases).

I personally think this isn't worth the price. But I'm just one guy.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pgbench --tuple-size option

From
Fujii Masao
Date:
On Fri, Aug 15, 2014 at 8:36 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:
>> >it seems to make more sense to split -i into two. One to create the
>> >tables, and another to fill them. That'd allow to do manual stuff
>> >inbetween.
>>
>> Hmmm. This would mean much more changes than the pretty trivial patch I
>> submitted
>
> FWIW, I find that patch really ugly. Adding the filler's with in a
> printf, after the actual DDL declaration. Without so much as a
> comment. Brr.
>
>>: more options (2 parts init + compatibility with the previous
>> case), splitting the "init" function, having a dependency and new error
>> cases to check (you must have the table to fill them), some options apply to
>> first part while other apply to second part, which would lead in any case to
>> a signicantly more complicated documentation... a lot of trouble for my use
>> case to answer Josh pertinent comments, and to be able to test the "tuple
>> size" factor easily. Moreover, I would reject it myself as too much trouble
>> for a small benefit.
>
> Well, it's something more generic, because it allows you do do more...
>
>> Feel free to reject the patch if you do not want it. I think that its
>> cost/benefit is reasonable (one small option, small code changes, some
>> benefit for people who want to measure performance in various cases).
>
> I personally think this isn't worth the price. But I'm just one guy.

I also don't like this feature. The benefit of this option seems too small.
If we apply this, we might want to support other options, for example,
option to change the data type of each column, option to create new
index using "minmax", option to change the fillfactor of each table, ...etc.
There are countless such options, but I'm afraid that it's really hard to
support so many options.

Regards,

-- 
Fujii Masao



Re: pgbench --tuple-size option

From
Fabien COELHO
Date:
>> Hmmm. This would mean much more changes than the pretty trivial patch I 
>> submitted
>
> FWIW, I find that patch really ugly. Adding the filler's with in a
> printf, after the actual DDL declaration. Without so much as a
> comment. Brr.

Indeed. I'm not too proud of that very point either:-) You are right that 
it deserves at the minimum a clear comment. To put the varying size in the 
DDL string means vsprintf and splitting the query building some more, 
which I do not find desirable.

> [...]
> Well, it's something more generic, because it allows you do do more...

Apart from I do not need it (at least right now), and that it is more 
work, my opinion is that it would be rejected. Not a strong insentive to 
spend time in that direction.

-- 
Fabien.



Re: pgbench --tuple-size option

From
Fabien COELHO
Date:
>>> The custom initialization is to run a manual ALTER after the
>>> initialization.
>>
>> Sure, it can be done this way.
>>
>> I'm not sure about the implication of ALTER on the table storage,
>
> Should be fine in this case.

After some testing and laughing, my conclusion is "not fine at all". The 
"filler" attributes in "pgbench" are by default "EXTENDED", which mean 
possibly compressed... As the the default value is '', the compression, 
when tried for large sizes, performs very well, and the performance is the 
same as with a (declared) smaller tuple:-) Probably not the intention of 
the benchmark designer. Conclusion: I need an ALTER TABLE anyway to change 
the STORAGE. Or maybe pgbench should always do it anyway...

Conclusion 2: I've noted the submission as "rejected" as both you and 
Fujii don't like it, and although I found it useful, but I can do without 
it quite easily.

-- 
Fabien.