Re: PATCH: pgbench - merging transaction logs - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: PATCH: pgbench - merging transaction logs |
Date | |
Msg-id | alpine.DEB.2.10.1503151935160.4015@sto Whole thread Raw |
In response to | Re: PATCH: pgbench - merging transaction logs (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: PATCH: pgbench - merging transaction logs
Re: PATCH: pgbench - merging transaction logs |
List | pgsql-hackers |
> Firstly, the fact that pgbench produces one file per thread is awkward. I agree, but I think it is due to the multi process thread emulation: if you have real threads, you can do a simple fprintf, possibly with some mutex, and you're done. There is really nothing to do to implement this feature. > Secondly, a separate tool (even if provided with pgbench) would require > passing detailed info about the log format - whether it's aggregated or > not, throttling or not, .... Hmmm. >> It could belong to pgbench if pgbench was *generating* the merged >> log file directly. [...] > > I considered this approach, i.e. adding another 'collector' thread > receiving results from all the other threads, but decided not to do > that. That would require a fair amount of inter-process communication, > locking etc. and might affect the measurements I agree that inter-process stuff should be avoided. This is not what I had in mind. I was thinking of "fprintf" on the same file handler by different threads. >> Another issue raised by your patch is that the log format may be >> improved, say by providing a one-field timestamp at the beginning of >> the line. > > I don't see how this is relevant to this patch? For the simple log format, all the parsing needed would be for the timestamp, and the remainder would just be text to pass along, no need to %d %f... whatever. >> The current IO complexity is in p²n where it should be simply pn... > > Maybe. Implementing a 2-way merge sort was the simplest solution at the > moment, and I don't think this really matters because the I/O generated > by the benchmark itself is usually much higher than this. If you do not do a n-way merge, you could do a 2-way merge on a binary tree so that the IO complexity would be p.log(p).n (I think), and not p²n. >> For aggregates, some data in the output may be special values >> "NaN/-/...", [...] > You mean the aggregated log? I can't think of a way to get there such > values - can you provide a plausible scenario how that could happen? Possibly I'm wrong. Ok, I tried it: sh> ./pgbench --rate=0.5 --aggregate-interval=1 --log sh> cat pgbench_log.12671 1426445236 1 5034 25341156 5034 5034 687471969 687 687 1426445237 0 0 0 0 0 0 0 0 0 1426445238 0 0 0 0 0 0 0 0 0 1426445239 1 8708 75829264 8708 8708 2063 42559692063 2063 ... Good news, I could not generate strange values. Just when there are 0 transactions possibly some care should be taken when combining values. >> It seems that system function calls are not tested for errors. > > That's true, but that's how the rest of pgbench code is written. Hmmm.... This is not entirely true, there are *some* checks if you look carefully:-) if ((fd = fopen(filename, "r")) == NULL) ... if ((fp = popen(command, "r")) == NULL) ... nsocks = select(...) if(nsocks < 0) ... >>> So integrating this into pgbench directly seems like a better approach, >>> and the attached patch implements that. >> >> You guessed that I disagree. Note that this is only my own opinion. >> >> In summary, I think that: >> >> (a) providing a clean script would be nice, >> >> (b) if we could get rid of the "thread emulation", pgbench could >> generate the merged logs directly and simply, and the option could >> be provided then. > > That however is not the goal of this patch. Sure. My point is that the amount of code you write to implement this merge stuff is due to this feature. Without it, the patch would probably need 10 lines of code. Moreover, the way it is implement requires scanning and reprinting, which means more work in many places to update the format later. > The thread emulation is there for a reason, My opinion is that it *was* there for a reason. Whether it makes sense today to still have it, maintain it, and have to write such heavy code for a trivial feature just because of it is another matter. > and I certainly am not going to work on eliminating it > (not sure that's even possible). I wish it will be:-) I would suggest this that I would support: implement this feature the simple way (aka fprintf, maybe a mutex) when compiled with threads, and generate an error "feature not available with process-based thread emulation" when compiled with processes. This way we avoid a, lot of heavy code to maintain in the future, and you still get the feature within pgbench. There are already some things which are not the same with thread emulation because it would have been tiring to implement for it for very little benefit. -- Fabien.
pgsql-hackers by date: