Thread: Ragged latency log data in multi-threaded pgbench
Just noticed a problem popping up sometimes with the new multi-threaded pgbench. This is on a Linux RPM build (the alpha4 set) compiled with '--disable-thread-safety'. Still trying to nail down whether that's a requirement for this problem to appear or not. I did most of my review of this feature with it turned on, and haven't been seeing this problem on other systems that are thread safe. Not sure yet if that's cause and effect or coincidence yet. Here's a sample invocation that produces ragged output for me on my one system: pgbench -S -T 5 -c 4 -j 4 -l pgbench The log file produced by this (pgbench_log.<pid>) is supposed to consist of a series of lines in the following format: client,trans,latency,filenum,sec,usec It looks like the switch between clients running on separate workers can lead to a mix of their respective lines showing up though. Here's a couple of typical samples, with the bad line in the middle of each set: 1 138 178 0 1268665788 607559 1 139 182 0 1268665788 607751 1 1402 0 2491 0 1268665788 586135 2 1 264 0 1268665788 586463 2 2 192 0 1268665788 586665 1 274 160 0 1268665788 632966 1 275 178 0 1268665788 633154 1 276 184 0 126866578 178 0 1268665788 614015 2 141 190 0 1268665788 614252 2 142 169 0 1268665788 614430 2 274 178 0 1268665788 639218 2 275 175 0 1268665788 639402 2 276 169 0 126866578 171 0 1268665788 626933 0 141 185 0 1268665788 627165 0 142 202 0 1268665788 627377 Looks like sometimes a client is only getting part of its line written out before getting stomped on by the next one. I think one of the assumptions being made about how to safely write to this log file may be broken by the multi-process implementation, which is what you get when thread-safety is not available. Since there should only be 6 fields here, I think you can find whether a given log file has this problem or not like this: cat pgbench_log.xxxxx | cut -d " " -f 7 | sort | uniq If anything comes out of that, the latency log file has at least one bad line in it. Similarly, this: cat pgbench_log.xxxxx | cut -d " " -f 1 | sort | uniq Should only show the client numbers; here there's some first columns with much bigger numbers too. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith wrote: > Just noticed a problem popping up sometimes with the new > multi-threaded pgbench. This is on a Linux RPM build (the alpha4 set) > compiled with '--disable-thread-safety'. Still trying to nail down > whether that's a requirement for this problem to appear or not. I did > most of my review of this feature with it turned on, and haven't been > seeing this problem on other systems that are thread safe. Not sure > yet if that's cause and effect or coincidence yet. > > We had to turn handsprings to prevent this sort of effect with the logging collector, which was a requirement of being able to implement CSV logging sanely. So I'm not surprised by this report. cheers andrew
Greg Smith <greg@2ndquadrant.com> writes: > Looks like sometimes a client is only getting part of its line written > out before getting stomped on by the next one. I think one of the > assumptions being made about how to safely write to this log file may be > broken by the multi-process implementation, which is what you get when > thread-safety is not available. pgbench doesn't make any effort at all to avoid interleaved writes on that file. I don't think there is anything much that can be done about it when you are using the forked-processes implementation. It's probably possible for it to show up on the multi-threads version too, depending on how hard libc tries to interlock stdio calls. regards, tom lane
On 3/15/10 8:41 AM, Greg Smith wrote: > Just noticed a problem popping up sometimes with the new multi-threaded > pgbench. This is on a Linux RPM build (the alpha4 set) compiled with > '--disable-thread-safety'. Still trying to nail down whether that's a > requirement for this problem to appear or not. I did most of my review > of this feature with it turned on, and haven't been seeing this problem > on other systems that are thread safe. Not sure yet if that's cause and > effect or coincidence yet. For my part, telling people that multi-thread pgbench doesn't work correctly on systems which are not thread-safe seems perfectly OK. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Greg Smith <greg@2ndquadrant.com> wrote: > It looks like the switch between clients running on separate workers can > lead to a mix of their respective lines showing up though. Oops. There might be two solutions for the issue: 1. Use explicit locks. The lock primitive will be pthread_mutex for multi-threaded implementations or semaphore for multi-threaded ones. 2. Use per-thread log files. File names would be"pgbench_log.<main-process-id>.<thread-id>". Which is better, or another idea? Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > 1. Use explicit locks. The lock primitive will be pthread_mutex for > multi-threaded implementations or semaphore for multi-threaded ones. > 2. Use per-thread log files. > File names would be "pgbench_log.<main-process-id>.<thread-id>". > I'm concerned that the locking itself will turn into a new pgbench bottleneck, just as we're clearing the point where it's not for the first time in a while. And that sounds like it has its own potential risks/complexity involved. I could live with per-thread log files. I think my pgbench-tools is the main consumer of these latency logs floating around right now, I just pushed a 9.0 update to handle the multiple workers option today that discovered this). It doesn't make any difference to what I'm doing how many file I have to process. Just a few lines of extra shell code for me to pull the rest into the import. That seems like the simplest solution that's guaranteed to work, just push the problem onto the client side instead where it's easier to deal with. Unless someone feels strongly that these have to be interleaved into one file, based on Andrew's suggestion that this is a hard problem to get right and Tom's suggestion that this might even extend into the proper threaded version too, I think a log file per worker is the easiest way out of this. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: > Greg Smith <greg@2ndquadrant.com> wrote: >> It looks like the switch between clients running on separate workers can >> lead to a mix of their respective lines showing up though. > Oops. There might be two solutions for the issue: > 1. Use explicit locks. The lock primitive will be pthread_mutex for > multi-threaded implementations or semaphore for multi-threaded ones. > 2. Use per-thread log files. > File names would be "pgbench_log.<main-process-id>.<thread-id>". I think #1 is out of the question, as the synchronization overhead will do serious damage to the whole point of having a multithreaded pgbench. #2 might be a reasonable idea. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes: > > Oops. There might be two solutions for the issue: > > 1. Use explicit locks. The lock primitive will be pthread_mutex for > > multi-threaded implementations or semaphore for multi-threaded ones. > > 2. Use per-thread log files. > > File names would be "pgbench_log.<main-process-id>.<thread-id>". > > I think #1 is out of the question, as the synchronization overhead will > do serious damage to the whole point of having a multithreaded pgbench. > #2 might be a reasonable idea. Ok, I'll go for #2. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: > > > 2. Use per-thread log files. > > > File names would be "pgbench_log.<main-process-id>.<thread-id>". Here is a patch to implement per-thread log files for pgbench -l. The log filenames are "pgbench_log.<main-process-id>.<thread-serial-number>" for each thread, but the first thread (including single-threaded) still uses "pgbench_log.<main-process-id>" for the name because of compatibility. Example: $ pgbench -c16 -j4 -l $ ls pgbench_log.2196 pgbench_log.2196.1 pgbench_log.2196.2 pgbench_log.2196.3 Comments and suggenstions welcome. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Attachment
Takahiro Itagaki wrote: > The log filenames are "pgbench_log.<main-process-id>.<thread-serial-number>" > for each thread, but the first thread (including single-threaded) still uses > "pgbench_log.<main-process-id>" for the name because of compatibility. > Attached is an updated version that I think is ready to commit. Only changes are docs--I rewrote those to improve the wording some. The code looked and tested fine to me. I just added support for the new format to pgbench-tools and am back to happily running large batches of tests using it again. I confirmed a few things: -On my CentOS system, the original problem is masked if you have "--enable-thread-safety" on; the multi-threaded output shows up without any broken lines into the single file. As I suspected it's only the multi-process implementation that shows the issue here. Since Tom points out that's luck rather than something that should be relied upon, I don't think that actually changes what to do here, it just explains why this wasn't obvious in earlier testing--normally I have thread safety on nowadays. -Patch corrects the problem. I took a build without thread safety on, demonstrated the issue with its pgbench. Apply the patch, rebuild just pgbench, run again; new multiple log files have no issue. -It's easy to convert existing scripts to utilize the new multiple log format. Right now the current idiom you're forced into using when running pgbench scripts is to track the PID it's run as, then use something like: mv pgbench_log.${PID} pgbench.log To convert to a stable filename for later processing. Now you just use something like this instead: cat pgbench_log.${PID}* > pgbench.log rm -f pgbench_log.${PID}* And that works fine. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 0019db4..28a8c84 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -131,11 +131,9 @@ int fillfactor = 100; #define ntellers 10 #define naccounts 100000 -FILE *LOGFILE = NULL; - bool use_log; /* log transaction latencies to a file */ - -int is_connect; /* establish connection for each transaction */ +bool is_connect; /* establish connection for each transaction */ +int main_pid; /* main process id used in log filename */ char *pghost = ""; char *pgport = ""; @@ -183,6 +181,7 @@ typedef struct */ typedef struct { + int tid; /* thread id */ pthread_t thread; /* thread handle */ CState *state; /* array of CState */ int nstate; /* length of state[] */ @@ -741,7 +740,7 @@ clientDone(CState *st, bool ok) /* return false iff client should be disconnected */ static bool -doCustom(CState *st, instr_time *conn_time) +doCustom(CState *st, instr_time *conn_time, FILE *log_file) { PGresult *res; Command **commands; @@ -778,7 +777,7 @@ top: /* * transaction finished: record the time it took in the log */ - if (use_log && commands[st->state + 1] == NULL) + if (log_file && commands[st->state + 1] == NULL) { instr_time now; instr_time diff; @@ -791,12 +790,12 @@ top: #ifndef WIN32 /* This is more than we really ought to know about instr_time */ - fprintf(LOGFILE, "%d %d %.0f %d %ld %ld\n", + fprintf(log_file, "%d %d %.0f %d %ld %ld\n", st->id, st->cnt, usec, st->use_file, (long) now.tv_sec, (long) now.tv_usec); #else /* On Windows, instr_time doesn't provide a timestamp anyway */ - fprintf(LOGFILE, "%d %d %.0f %d 0 0\n", + fprintf(log_file, "%d %d %.0f %d 0 0\n", st->id, st->cnt, usec, st->use_file); #endif } @@ -857,7 +856,7 @@ top: INSTR_TIME_ACCUM_DIFF(*conn_time, end, start); } - if (use_log && st->state == 0) + if (log_file && st->state == 0) INSTR_TIME_SET_CURRENT(st->txn_begin); if (commands[st->state]->type == SQL_COMMAND) @@ -1833,7 +1832,7 @@ main(int argc, char **argv) } break; case 'C': - is_connect = 1; + is_connect = true; break; case 's': scale_given = true; @@ -1955,6 +1954,12 @@ main(int argc, char **argv) exit(1); } + /* + * save main process id in the global variable because process id will be + * changed after fork. + */ + main_pid = (int) getpid(); + if (nclients > 1) { state = (CState *) realloc(state, sizeof(CState) * nclients); @@ -1980,20 +1985,6 @@ main(int argc, char **argv) } } - if (use_log) - { - char logpath[64]; - - snprintf(logpath, 64, "pgbench_log.%d", (int) getpid()); - LOGFILE = fopen(logpath, "w"); - - if (LOGFILE == NULL) - { - fprintf(stderr, "Couldn't open logfile \"%s\": %s", logpath, strerror(errno)); - exit(1); - } - } - if (debug) { if (duration <= 0) @@ -2111,6 +2102,7 @@ main(int argc, char **argv) threads = (TState *) malloc(sizeof(TState) * nthreads); for (i = 0; i < nthreads; i++) { + threads[i].tid = i; threads[i].state = &state[nclients / nthreads * i]; threads[i].nstate = nclients / nthreads; INSTR_TIME_SET_CURRENT(threads[i].start_time); @@ -2159,8 +2151,6 @@ main(int argc, char **argv) INSTR_TIME_SET_CURRENT(total_time); INSTR_TIME_SUBTRACT(total_time, start_time); printResults(ttype, total_xacts, nclients, nthreads, total_time, conn_total_time); - if (LOGFILE) - fclose(LOGFILE); return 0; } @@ -2171,6 +2161,7 @@ threadRun(void *arg) TState *thread = (TState *) arg; CState *state = thread->state; TResult *result; + FILE *log_file = NULL; /* per-thread log file */ instr_time start, end; int nstate = thread->nstate; @@ -2180,7 +2171,25 @@ threadRun(void *arg) result = malloc(sizeof(TResult)); INSTR_TIME_SET_ZERO(result->conn_time); - if (is_connect == 0) + /* open log file if requested */ + if (use_log) + { + char logpath[64]; + + if (thread->tid == 0) + snprintf(logpath, 64, "pgbench_log.%d", main_pid); + else + snprintf(logpath, 64, "pgbench_log.%d.%d", main_pid, thread->tid); + log_file = fopen(logpath, "w"); + + if (log_file == NULL) + { + fprintf(stderr, "Couldn't open logfile \"%s\": %s", logpath, strerror(errno)); + goto done; + } + } + + if (!is_connect) { /* make connections to the database */ for (i = 0; i < nstate; i++) @@ -2202,7 +2211,7 @@ threadRun(void *arg) int prev_ecnt = st->ecnt; st->use_file = getrand(0, num_files - 1); - if (!doCustom(st, &result->conn_time)) + if (!doCustom(st, &result->conn_time, log_file)) remains--; /* I've aborted */ if (st->ecnt > prev_ecnt && commands[st->state]->type == META_COMMAND) @@ -2304,7 +2313,7 @@ threadRun(void *arg) if (st->con && (FD_ISSET(PQsocket(st->con), &input_mask) || commands[st->state]->type == META_COMMAND)) { - if (!doCustom(st, &result->conn_time)) + if (!doCustom(st, &result->conn_time, log_file)) remains--; /* I've aborted */ } @@ -2326,6 +2335,8 @@ done: result->xacts += state[i].cnt; INSTR_TIME_SET_CURRENT(end); INSTR_TIME_ACCUM_DIFF(result->conn_time, end, start); + if (log_file) + fclose(log_file); return result; } diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml index 72fb0e5..a141395 100644 --- a/doc/src/sgml/pgbench.sgml +++ b/doc/src/sgml/pgbench.sgml @@ -551,6 +551,16 @@ END; taken by each transaction to a logfile. The logfile will be named <filename>pgbench_log.<replaceable>nnn</></filename>, where <replaceable>nnn</> is the PID of the pgbench process. + If the <literal>-j</> option is 2 or higher, creating multiple worker + threads, each will have its own log file. The first worker will use the + the same name for its log file as in the standard single worker case. + The additional log files for the other workers will be named + <filename>pgbench_log.<replaceable>nnn</>.<replaceable>mmm</></filename>, + where <replaceable>mmm</> is a sequential number for each worker starting + with 1. + </para> + + <para> The format of the log is: <programlisting>
Greg Smith <greg@2ndquadrant.com> wrote: > Attached is an updated version that I think is ready to commit. Only > changes are docs--I rewrote those to improve the wording some. Thanks for the correction. Applied. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > Greg Smith <greg@2ndquadrant.com> wrote: > > >> Attached is an updated version that I think is ready to commit. Only >> changes are docs--I rewrote those to improve the wording some. >> > > Thanks for the correction. Applied. > By the way: the pgbench.sgml that you committed looks like it passed through a system that added a CR to every line in it. Probably not the way you intended to commit that. So far I've done over 40 hours of pgbench runtime worth of testing (>500 runs) using the patched version without any issues, the code itself continues to act fine. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> wrote: > By the way: the pgbench.sgml that you committed looks like it passed > through a system that added a CR to every line in it. Probably not the > way you intended to commit that. Oops, fixed. Thanks. Regards, --- Takahiro Itagaki NTT Open Source Software Center
On Tue, Mar 23, 2010 at 05:10, Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote: > > Greg Smith <greg@2ndquadrant.com> wrote: > >> By the way: the pgbench.sgml that you committed looks like it passed >> through a system that added a CR to every line in it. Probably not the >> way you intended to commit that. > > Oops, fixed. Thanks. My guess is that this happened because you committed from Windows? I learned the hard way that this is a bad idea. (Luckily I learned it on other CVS projects, before I started committing to PostgreSQL). There are settings to make it not do that, but they are not reliable. I'd strongly suggest that you always just do a cvs diff on windows and then use a staging machine running linux or bsd or something to apply it through. And then you *always* run those patches through something like "fromdos". It's a bunch of extra steps, but it's really the only way to do it reliably. If that's not at all what happened, then well, it's still good advice I think, even if it doesn't apply :-) -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/