Thread: PATCH: pgbench - aggregation of info written into log
Hi, this patch adds support for aggregation of info written into the log. Instead of info about each transaction, a summary for time intervals (with custom length) is written into the log. All you need to do is add "-A seconds", e.g. $ pgbench -T 3600 -A 10 -l db which will produce log with 10-second summaries, containing interval start timestamp, number of transactions, sum of latencies, sum of 2nd power of latencies, min and max latency (it's done this way to allow handling of multiple logs produced by "-j" option). This patch is a bit less polished (and more complex) than the other pgbench patch I've sent a while back, and I'm not sure how to handle the Windows branch. That needs to be fixed during the commit fest. kind regards Tomas
Attachment
On 24 Srpen 2012, 23:25, Tomas Vondra wrote: > Hi, > > this patch adds support for aggregation of info written into the log. > Instead of info about each transaction, a summary for time intervals (with > custom length) is written into the log. All you need to do is add "-A > seconds", e.g. > > $ pgbench -T 3600 -A 10 -l db > > which will produce log with 10-second summaries, containing interval start > timestamp, number of transactions, sum of latencies, sum of 2nd power of > latencies, min and max latency (it's done this way to allow handling of > multiple logs produced by "-j" option). I've forgot to mention that just like the other patch, this is meant for cases when you really don't need the raw per-transaction info, but per-second summary is just enough. This is helpful especially when the test produces a lot of transaction, thus huge log files etc. So you can either log all the transactions and then post-process the huge files, or completely skip that and just log the summaries. Tomas
On Fri, Aug 24, 2012 at 5:25 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > This patch is a bit less polished (and more complex) than the other > pgbench patch I've sent a while back, and I'm not sure how to handle the > Windows branch. That needs to be fixed during the commit fest. What's the problem with the Windows branch? Could you un-cuddle your brances to follow the PostgreSQL style, omit braces around single-statement blocks, and remove the spurious whitespace changes? Instead of having both use_log_agg and naggseconds, I think you can get by with just having a single variable that is zero if aggregation is not in use and is otherwise the aggregation period. On a related note, you can't specify -A without an associated value, so there is no point in documenting a "default". As with your other patch, I suggest using a long option name like --latency-aggregate-interval and then making the name of the variable in the code match the option name, with s/-/_/g, for clarity. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 30 Srpen 2012, 18:02, Robert Haas wrote: > On Fri, Aug 24, 2012 at 5:25 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> This patch is a bit less polished (and more complex) than the other >> pgbench patch I've sent a while back, and I'm not sure how to handle the >> Windows branch. That needs to be fixed during the commit fest. > > What's the problem with the Windows branch? Well, there are comments about how timestamp does not work on Windows (filling 0), and I'm not sure how that affects the patch (e.g. determining the aggregation interval). I have no Windows workstation available so I can't actually try that. > Could you un-cuddle your brances to follow the PostgreSQL style, omit > braces around single-statement blocks, and remove the spurious > whitespace changes? OK, will do. > Instead of having both use_log_agg and naggseconds, I think you can > get by with just having a single variable that is zero if aggregation > is not in use and is otherwise the aggregation period. On a related > note, you can't specify -A without an associated value, so there is no > point in documenting a "default". As with your other patch, I suggest > using a long option name like --latency-aggregate-interval and then > making the name of the variable in the code match the option name, > with s/-/_/g, for clarity. Why --latency-aggregate-interval? It has nothing to do with latencies, it's merely number of transactions per interval. Tomas
On 30 Srpen 2012, 23:47, Robert Haas wrote: > On Thu, Aug 30, 2012 at 3:25 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> On 30 Srpen 2012, 18:02, Robert Haas wrote: >>> On Fri, Aug 24, 2012 at 5:25 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >>>> This patch is a bit less polished (and more complex) than the other >>>> pgbench patch I've sent a while back, and I'm not sure how to handle >>>> the >>>> Windows branch. That needs to be fixed during the commit fest. >>> >>> What's the problem with the Windows branch? >> >> Well, there are comments about how timestamp does not work on Windows >> (filling 0), and I'm not sure how that affects the patch (e.g. >> determining >> the aggregation interval). I have no Windows workstation available so I >> can't actually try that. > > Hmm. That seems like it might be something we need to fix first... > >>> Could you un-cuddle your brances to follow the PostgreSQL style, omit >>> braces around single-statement blocks, and remove the spurious >>> whitespace changes? >> >> OK, will do. >> >>> Instead of having both use_log_agg and naggseconds, I think you can >>> get by with just having a single variable that is zero if aggregation >>> is not in use and is otherwise the aggregation period. On a related >>> note, you can't specify -A without an associated value, so there is no >>> point in documenting a "default". As with your other patch, I suggest >>> using a long option name like --latency-aggregate-interval and then >>> making the name of the variable in the code match the option name, >>> with s/-/_/g, for clarity. >> >> Why --latency-aggregate-interval? It has nothing to do with latencies, >> it's merely number of transactions per interval. > > Oh, I thought it was modifying the behavior of -l. It does, but AFAIK the "-l" means logging. I suppose "--aggregate-interval" would be a good option name, I don't see a reason to put there the additional word when there are other aggregated values (e.g. num of transactions). T.
On Thu, Aug 30, 2012 at 3:25 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > On 30 Srpen 2012, 18:02, Robert Haas wrote: >> On Fri, Aug 24, 2012 at 5:25 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >>> This patch is a bit less polished (and more complex) than the other >>> pgbench patch I've sent a while back, and I'm not sure how to handle the >>> Windows branch. That needs to be fixed during the commit fest. >> >> What's the problem with the Windows branch? > > Well, there are comments about how timestamp does not work on Windows > (filling 0), and I'm not sure how that affects the patch (e.g. determining > the aggregation interval). I have no Windows workstation available so I > can't actually try that. Hmm. That seems like it might be something we need to fix first... >> Could you un-cuddle your brances to follow the PostgreSQL style, omit >> braces around single-statement blocks, and remove the spurious >> whitespace changes? > > OK, will do. > >> Instead of having both use_log_agg and naggseconds, I think you can >> get by with just having a single variable that is zero if aggregation >> is not in use and is otherwise the aggregation period. On a related >> note, you can't specify -A without an associated value, so there is no >> point in documenting a "default". As with your other patch, I suggest >> using a long option name like --latency-aggregate-interval and then >> making the name of the variable in the code match the option name, >> with s/-/_/g, for clarity. > > Why --latency-aggregate-interval? It has nothing to do with latencies, > it's merely number of transactions per interval. Oh, I thought it was modifying the behavior of -l. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 30, 2012 at 5:55 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > It does, but AFAIK the "-l" means logging. I suppose > "--aggregate-interval" would be a good option name, I don't see a reason > to put there the additional word when there are other aggregated values > (e.g. num of transactions). Oh, I was thinking that the l was for latency, but maybe not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Dne 30.08.2012 18:02, Robert Haas napsal: > On Fri, Aug 24, 2012 at 5:25 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> This patch is a bit less polished (and more complex) than the other >> pgbench patch I've sent a while back, and I'm not sure how to handle >> the >> Windows branch. That needs to be fixed during the commit fest. > > What's the problem with the Windows branch? > > Could you un-cuddle your brances to follow the PostgreSQL style, omit > braces around single-statement blocks, and remove the spurious > whitespace changes? Done, or at least I don't see other formatting issues. Let me know if I missed something. > Instead of having both use_log_agg and naggseconds, I think you can > get by with just having a single variable that is zero if aggregation > is not in use and is otherwise the aggregation period. On a related > note, you can't specify -A without an associated value, so there is > no > point in documenting a "default". As with your other patch, I > suggest > using a long option name like --latency-aggregate-interval and then > making the name of the variable in the code match the option name, > with s/-/_/g, for clarity. Fixed. I've kept use_log_agg only and I've removed the default. Also I've added one more check (that the total duration is a multiple of the aggregation interval). And just as with the sampling patch, I believe the "-l" should not be enabled by default, but required. But if more people ask to enable it whenever the aggregation or sampling is used, I'm fine with it. Tomas
Attachment
On Sun, Sep 2, 2012 at 3:46 PM, Tomas Vondra <tv@fuzzy.cz> wrote: > > Fixed. I've kept use_log_agg only and I've removed the default. Also > I've added one more check (that the total duration is a multiple of > the aggregation interval). Hi Tomas, In the docs, -l is an option, not an application: "<application>-l</application>" Also, the docs still refer to <option>-A</option>, rather than to <option>--aggregate-interval</option>, in a few places. I think a section in the docs that points out that transactions run by specifying mulitple -f will be mingled together into an interval would be a good idea, as that could easily be overlooked (as I did earlier). The docs do not mention anywhere what the units for the latencies are. I think the format of the logfile should somehow make it unmistakably different from the regular -l logfile, for example by prefixing every line with "Aggregate: ". Or maybe there is some other solution, but I think that having two log formats, both of which consist of nothing but 6 integers, but yet mean very different things, is a recipe for confusion. Is it unavoidable that the interval be an integral number of seconds? I found the units in the original code confusing, so maybe there is some reason it needs to be like that that I don't understand yet. I'll look into it some more. Thanks, Jeff
On 3.9.2012 01:28, Jeff Janes wrote: > On Sun, Sep 2, 2012 at 3:46 PM, Tomas Vondra <tv@fuzzy.cz> wrote: >> >> Fixed. I've kept use_log_agg only and I've removed the default. Also >> I've added one more check (that the total duration is a multiple of >> the aggregation interval). > > Hi Tomas, > > In the docs, -l is an option, not an application: > "<application>-l</application>" > > Also, the docs still refer to <option>-A</option>, rather than to > <option>--aggregate-interval</option>, in a few places. Fixed. > I think a section in the docs that points out that transactions run by > specifying mulitple -f will be mingled together into an interval would > be a good idea, as that could easily be overlooked (as I did earlier). Fixed, added a paragraph mentioning this. > The docs do not mention anywhere what the units for the latencies are. > > I think the format of the logfile should somehow make it unmistakably > different from the regular -l logfile, for example by prefixing every > line with "Aggregate: ". Or maybe there is some other solution, but > I think that having two log formats, both of which consist of nothing > but 6 integers, but yet mean very different things, is a recipe for > confusion. Not sure how to do that, I'd say it's a responsibility of the user. > Is it unavoidable that the interval be an integral number of seconds? > I found the units in the original code confusing, so maybe there is > some reason it needs to be like that that I don't understand yet. > I'll look into it some more. Not really, but I don't see a real use case for such intervals, so I'm keeping the integers for now. Also, I've realized the last interval may not be logged at all - I'll take a look into this in the next version of the patch. Tomas
Attachment
Tomas Vondra wrote: > Also, I've realized the last interval may not be logged at all - I'll > take a look into this in the next version of the patch. I didn't see any later version of this patch posted anywhere. I guess it'll have to wait until the next commitfest. Please fix the remaining issues and resubmit. Thanks to Robert Haas, Jeff Janes and Pavel Stehule for the reviews. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services