Thread: Ragged latency log data in multi-threaded pgbench

Ragged latency log data in multi-threaded pgbench

From

Greg Smith

Date:

15 March 2010, 12:41:31

Just noticed a problem popping up sometimes with the new multi-threaded 
pgbench.  This is on a Linux RPM build (the alpha4 set) compiled with 
'--disable-thread-safety'.  Still trying to nail down whether that's a 
requirement for this problem to appear or not.  I did most of my review 
of this feature with it turned on, and haven't been seeing this problem 
on other systems that are thread safe.  Not sure yet if that's cause and 
effect or coincidence yet.

Here's a sample invocation that produces ragged output for me on my one 
system:

pgbench -S -T 5 -c 4 -j 4 -l pgbench

The log file produced by this (pgbench_log.<pid>) is supposed to consist 
of a series of lines in the following format:

client,trans,latency,filenum,sec,usec

It looks like the switch between clients running on separate workers can 
lead to a mix of their respective lines showing up though.  Here's a 
couple of typical samples, with the bad line in the middle of each set:

1 138 178 0 1268665788 607559
1 139 182 0 1268665788 607751
1 1402 0 2491 0 1268665788 586135
2 1 264 0 1268665788 586463
2 2 192 0 1268665788 586665

1 274 160 0 1268665788 632966
1 275 178 0 1268665788 633154
1 276 184 0 126866578 178 0 1268665788 614015
2 141 190 0 1268665788 614252
2 142 169 0 1268665788 614430

2 274 178 0 1268665788 639218
2 275 175 0 1268665788 639402
2 276 169 0 126866578 171 0 1268665788 626933
0 141 185 0 1268665788 627165
0 142 202 0 1268665788 627377

Looks like sometimes a client is only getting part of its line written 
out before getting stomped on by the next one.  I think one of the 
assumptions being made about how to safely write to this log file may be 
broken by the multi-process implementation, which is what you get when 
thread-safety is not available.

Since there should only be 6 fields here, I think you can find whether a 
given log file has this problem or not like this:

cat pgbench_log.xxxxx | cut -d " " -f 7 | sort | uniq

If anything comes out of that, the latency log file has at least one bad 
line in it.

Similarly, this:

cat pgbench_log.xxxxx | cut -d " " -f 1 | sort | uniq

Should only show the client numbers; here there's some first columns 
with much bigger numbers too.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Ragged latency log data in multi-threaded pgbench

From

Andrew Dunstan

Date:

15 March 2010, 13:50:39


Greg Smith wrote:
> Just noticed a problem popping up sometimes with the new 
> multi-threaded pgbench.  This is on a Linux RPM build (the alpha4 set) 
> compiled with '--disable-thread-safety'.  Still trying to nail down 
> whether that's a requirement for this problem to appear or not.  I did 
> most of my review of this feature with it turned on, and haven't been 
> seeing this problem on other systems that are thread safe.  Not sure 
> yet if that's cause and effect or coincidence yet.
>
>

We had to turn handsprings to prevent this sort of effect with the 
logging collector, which was a requirement of being able to implement 
CSV logging sanely. So I'm not surprised by this report.

cheers

andrew

Re: Ragged latency log data in multi-threaded pgbench

From

Tom Lane

Date:

15 March 2010, 13:52:03

Greg Smith <greg@2ndquadrant.com> writes:
> Looks like sometimes a client is only getting part of its line written 
> out before getting stomped on by the next one.  I think one of the 
> assumptions being made about how to safely write to this log file may be 
> broken by the multi-process implementation, which is what you get when 
> thread-safety is not available.

pgbench doesn't make any effort at all to avoid interleaved writes on
that file.  I don't think there is anything much that can be done about
it when you are using the forked-processes implementation.  It's
probably possible for it to show up on the multi-threads version too,
depending on how hard libc tries to interlock stdio calls.
        regards, tom lane

Re: Ragged latency log data in multi-threaded pgbench

From

Josh Berkus

Date:

15 March 2010, 19:48:03

On 3/15/10 8:41 AM, Greg Smith wrote:
> Just noticed a problem popping up sometimes with the new multi-threaded
> pgbench.  This is on a Linux RPM build (the alpha4 set) compiled with
> '--disable-thread-safety'.  Still trying to nail down whether that's a
> requirement for this problem to appear or not.  I did most of my review
> of this feature with it turned on, and haven't been seeing this problem
> on other systems that are thread safe.  Not sure yet if that's cause and
> effect or coincidence yet.

For my part, telling people that multi-thread pgbench doesn't work
correctly on systems which are not thread-safe seems perfectly OK.


--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: Ragged latency log data in multi-threaded pgbench

From

Takahiro Itagaki

Date:

15 March 2010, 21:46:55

Greg Smith <greg@2ndquadrant.com> wrote:

> It looks like the switch between clients running on separate workers can 
> lead to a mix of their respective lines showing up though.

Oops. There might be two solutions for the issue: 1. Use explicit locks. The lock primitive will be pthread_mutex for
multi-threaded implementations or semaphore for multi-threaded ones. 2. Use per-thread log files.    File names would
be"pgbench_log.<main-process-id>.<thread-id>".

Which is better, or another idea?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Ragged latency log data in multi-threaded pgbench

From

Greg Smith

Date:

15 March 2010, 22:59:29

Takahiro Itagaki wrote:
>   1. Use explicit locks. The lock primitive will be pthread_mutex for
>      multi-threaded implementations or semaphore for multi-threaded ones.
>   2. Use per-thread log files.
>      File names would be "pgbench_log.<main-process-id>.<thread-id>".
>   

I'm concerned that the locking itself will turn into a new pgbench 
bottleneck, just as we're clearing the point where it's not for the 
first time in a while.  And that sounds like it has its own potential 
risks/complexity involved.

I could live with per-thread log files.  I think my pgbench-tools is the 
main consumer of these latency logs floating around right now, I just 
pushed a 9.0 update to handle the multiple workers option today that 
discovered this).  It doesn't make any difference to what I'm doing how 
many file I have to process.  Just a few lines of extra shell code for 
me to pull the rest into the import.  That seems like the simplest 
solution that's guaranteed to work, just push the problem onto the 
client side instead where it's easier to deal with.

Unless someone feels strongly that these have to be interleaved into one 
file, based on Andrew's suggestion that this is a hard problem to get 
right and Tom's suggestion that this might even extend into the proper 
threaded version too, I think a log file per worker is the easiest way 
out of this.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Ragged latency log data in multi-threaded pgbench

From

Tom Lane

Date:

15 March 2010, 23:02:31

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> Greg Smith <greg@2ndquadrant.com> wrote:
>> It looks like the switch between clients running on separate workers can 
>> lead to a mix of their respective lines showing up though.

> Oops. There might be two solutions for the issue:
>   1. Use explicit locks. The lock primitive will be pthread_mutex for
>      multi-threaded implementations or semaphore for multi-threaded ones.
>   2. Use per-thread log files.
>      File names would be "pgbench_log.<main-process-id>.<thread-id>".

I think #1 is out of the question, as the synchronization overhead will
do serious damage to the whole point of having a multithreaded pgbench.
#2 might be a reasonable idea.
        regards, tom lane

Re: Ragged latency log data in multi-threaded pgbench

From

Takahiro Itagaki

Date:

15 March 2010, 23:13:43

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> > Oops. There might be two solutions for the issue:
> >   1. Use explicit locks. The lock primitive will be pthread_mutex for
> >      multi-threaded implementations or semaphore for multi-threaded ones.
> >   2. Use per-thread log files.
> >      File names would be "pgbench_log.<main-process-id>.<thread-id>".
> 
> I think #1 is out of the question, as the synchronization overhead will
> do serious damage to the whole point of having a multithreaded pgbench.
> #2 might be a reasonable idea.

Ok, I'll go for #2.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Ragged latency log data in multi-threaded pgbench

From

Takahiro Itagaki

Date:

16 March 2010, 00:14:48

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote:

> > >   2. Use per-thread log files.
> > >      File names would be "pgbench_log.<main-process-id>.<thread-id>".

Here is a patch to implement per-thread log files for pgbench -l.

The log filenames are "pgbench_log.<main-process-id>.<thread-serial-number>"
for each thread, but the first thread (including single-threaded) still uses
"pgbench_log.<main-process-id>" for the name because of compatibility.

Example:
  $ pgbench -c16 -j4 -l
  $ ls
  pgbench_log.2196  pgbench_log.2196.1  pgbench_log.2196.2  pgbench_log.2196.3

Comments and suggenstions welcome.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachment

pgbench_log_20100316.patch

Re: Ragged latency log data in multi-threaded pgbench

From

Greg Smith

Date:

18 March 2010, 15:49:52

Takahiro Itagaki wrote:
> The log filenames are "pgbench_log.<main-process-id>.<thread-serial-number>"
> for each thread, but the first thread (including single-threaded) still uses
> "pgbench_log.<main-process-id>" for the name because of compatibility.
>

Attached is an updated version that I think is ready to commit.  Only
changes are docs--I rewrote those to improve the wording some.  The code
looked and tested fine to me.  I just added support for the new format
to pgbench-tools and am back to happily running large batches of tests
using it again.

I confirmed a few things:

-On my CentOS system, the original problem is masked if you have
"--enable-thread-safety" on; the multi-threaded output shows up without
any broken lines into the single file.  As I suspected it's only the
multi-process implementation that shows the issue here.  Since Tom
points out that's luck rather than something that should be relied upon,
I don't think that actually changes what to do here, it just explains
why this wasn't obvious in earlier testing--normally I have thread
safety on nowadays.

-Patch corrects the problem.  I took a build without thread safety on,
demonstrated the issue with its pgbench.  Apply the patch, rebuild just
pgbench, run again; new multiple log files have no issue.

-It's easy to convert existing scripts to utilize the new multiple log
format.  Right now the current idiom you're forced into using when
running pgbench scripts is to track the PID it's run as, then use
something like:

mv pgbench_log.${PID} pgbench.log

To convert to a stable filename for later processing.  Now you just use
something like this instead:

cat pgbench_log.${PID}* > pgbench.log
rm -f pgbench_log.${PID}*

And that works fine.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 0019db4..28a8c84 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -131,11 +131,9 @@ int            fillfactor = 100;
 #define ntellers    10
 #define naccounts    100000

-FILE       *LOGFILE = NULL;
-
 bool        use_log;            /* log transaction latencies to a file */
-
-int            is_connect;            /* establish connection for each transaction */
+bool        is_connect;            /* establish connection for each transaction */
+int            main_pid;            /* main process id used in log filename */

 char       *pghost = "";
 char       *pgport = "";
@@ -183,6 +181,7 @@ typedef struct
  */
 typedef struct
 {
+    int            tid;            /* thread id */
     pthread_t    thread;            /* thread handle */
     CState       *state;            /* array of CState */
     int            nstate;            /* length of state[] */
@@ -741,7 +740,7 @@ clientDone(CState *st, bool ok)

 /* return false iff client should be disconnected */
 static bool
-doCustom(CState *st, instr_time *conn_time)
+doCustom(CState *st, instr_time *conn_time, FILE *log_file)
 {
     PGresult   *res;
     Command   **commands;
@@ -778,7 +777,7 @@ top:
         /*
          * transaction finished: record the time it took in the log
          */
-        if (use_log && commands[st->state + 1] == NULL)
+        if (log_file && commands[st->state + 1] == NULL)
         {
             instr_time    now;
             instr_time    diff;
@@ -791,12 +790,12 @@ top:

 #ifndef WIN32
             /* This is more than we really ought to know about instr_time */
-            fprintf(LOGFILE, "%d %d %.0f %d %ld %ld\n",
+            fprintf(log_file, "%d %d %.0f %d %ld %ld\n",
                     st->id, st->cnt, usec, st->use_file,
                     (long) now.tv_sec, (long) now.tv_usec);
 #else
             /* On Windows, instr_time doesn't provide a timestamp anyway */
-            fprintf(LOGFILE, "%d %d %.0f %d 0 0\n",
+            fprintf(log_file, "%d %d %.0f %d 0 0\n",
                     st->id, st->cnt, usec, st->use_file);
 #endif
         }
@@ -857,7 +856,7 @@ top:
         INSTR_TIME_ACCUM_DIFF(*conn_time, end, start);
     }

-    if (use_log && st->state == 0)
+    if (log_file && st->state == 0)
         INSTR_TIME_SET_CURRENT(st->txn_begin);

     if (commands[st->state]->type == SQL_COMMAND)
@@ -1833,7 +1832,7 @@ main(int argc, char **argv)
                 }
                 break;
             case 'C':
-                is_connect = 1;
+                is_connect = true;
                 break;
             case 's':
                 scale_given = true;
@@ -1955,6 +1954,12 @@ main(int argc, char **argv)
         exit(1);
     }

+    /*
+     * save main process id in the global variable because process id will be
+     * changed after fork.
+     */
+    main_pid = (int) getpid();
+
     if (nclients > 1)
     {
         state = (CState *) realloc(state, sizeof(CState) * nclients);
@@ -1980,20 +1985,6 @@ main(int argc, char **argv)
         }
     }

-    if (use_log)
-    {
-        char        logpath[64];
-
-        snprintf(logpath, 64, "pgbench_log.%d", (int) getpid());
-        LOGFILE = fopen(logpath, "w");
-
-        if (LOGFILE == NULL)
-        {
-            fprintf(stderr, "Couldn't open logfile \"%s\": %s", logpath, strerror(errno));
-            exit(1);
-        }
-    }
-
     if (debug)
     {
         if (duration <= 0)
@@ -2111,6 +2102,7 @@ main(int argc, char **argv)
     threads = (TState *) malloc(sizeof(TState) * nthreads);
     for (i = 0; i < nthreads; i++)
     {
+        threads[i].tid = i;
         threads[i].state = &state[nclients / nthreads * i];
         threads[i].nstate = nclients / nthreads;
         INSTR_TIME_SET_CURRENT(threads[i].start_time);
@@ -2159,8 +2151,6 @@ main(int argc, char **argv)
     INSTR_TIME_SET_CURRENT(total_time);
     INSTR_TIME_SUBTRACT(total_time, start_time);
     printResults(ttype, total_xacts, nclients, nthreads, total_time, conn_total_time);
-    if (LOGFILE)
-        fclose(LOGFILE);

     return 0;
 }
@@ -2171,6 +2161,7 @@ threadRun(void *arg)
     TState       *thread = (TState *) arg;
     CState       *state = thread->state;
     TResult    *result;
+    FILE       *log_file = NULL;        /* per-thread log file */
     instr_time    start,
                 end;
     int            nstate = thread->nstate;
@@ -2180,7 +2171,25 @@ threadRun(void *arg)
     result = malloc(sizeof(TResult));
     INSTR_TIME_SET_ZERO(result->conn_time);

-    if (is_connect == 0)
+    /* open log file if requested */
+    if (use_log)
+    {
+        char        logpath[64];
+
+        if (thread->tid == 0)
+            snprintf(logpath, 64, "pgbench_log.%d", main_pid);
+        else
+            snprintf(logpath, 64, "pgbench_log.%d.%d", main_pid, thread->tid);
+        log_file = fopen(logpath, "w");
+
+        if (log_file == NULL)
+        {
+            fprintf(stderr, "Couldn't open logfile \"%s\": %s", logpath, strerror(errno));
+            goto done;
+        }
+    }
+
+    if (!is_connect)
     {
         /* make connections to the database */
         for (i = 0; i < nstate; i++)
@@ -2202,7 +2211,7 @@ threadRun(void *arg)
         int            prev_ecnt = st->ecnt;

         st->use_file = getrand(0, num_files - 1);
-        if (!doCustom(st, &result->conn_time))
+        if (!doCustom(st, &result->conn_time, log_file))
             remains--;            /* I've aborted */

         if (st->ecnt > prev_ecnt && commands[st->state]->type == META_COMMAND)
@@ -2304,7 +2313,7 @@ threadRun(void *arg)
             if (st->con && (FD_ISSET(PQsocket(st->con), &input_mask)
                             || commands[st->state]->type == META_COMMAND))
             {
-                if (!doCustom(st, &result->conn_time))
+                if (!doCustom(st, &result->conn_time, log_file))
                     remains--;    /* I've aborted */
             }

@@ -2326,6 +2335,8 @@ done:
         result->xacts += state[i].cnt;
     INSTR_TIME_SET_CURRENT(end);
     INSTR_TIME_ACCUM_DIFF(result->conn_time, end, start);
+    if (log_file)
+        fclose(log_file);
     return result;
 }

diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 72fb0e5..a141395 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -551,6 +551,16 @@ END;
    taken by each transaction to a logfile.  The logfile will be named
    <filename>pgbench_log.<replaceable>nnn</></filename>, where
    <replaceable>nnn</> is the PID of the pgbench process.
+   If the <literal>-j</> option is 2 or higher, creating multiple worker
+   threads, each will have its own log file. The first worker will use the
+   the same name for its log file as in the standard single worker case.
+   The additional log files for the other workers will be named
+   <filename>pgbench_log.<replaceable>nnn</>.<replaceable>mmm</></filename>,
+   where <replaceable>mmm</> is a sequential number for each worker starting
+   with 1.
+  </para>
+
+  <para>
    The format of the log is:

    <programlisting>

Re: Ragged latency log data in multi-threaded pgbench

From

Takahiro Itagaki

Date:

22 March 2010, 22:38:34

Greg Smith <greg@2ndquadrant.com> wrote:

> Attached is an updated version that I think is ready to commit.  Only 
> changes are docs--I rewrote those to improve the wording some.

Thanks for the correction. Applied.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Ragged latency log data in multi-threaded pgbench

From

Greg Smith

Date:

23 March 2010, 01:00:39

Takahiro Itagaki wrote:
> Greg Smith <greg@2ndquadrant.com> wrote:
>
>   
>> Attached is an updated version that I think is ready to commit.  Only 
>> changes are docs--I rewrote those to improve the wording some.
>>     
>
> Thanks for the correction. Applied.
>   

By the way: the pgbench.sgml that you committed looks like it passed 
through a system that added a CR to every line in it.  Probably not the 
way you intended to commit that.

So far I've done over 40 hours of pgbench runtime worth of testing (>500 
runs) using the patched version without any issues, the code itself 
continues to act fine.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Ragged latency log data in multi-threaded pgbench

From

Takahiro Itagaki

Date:

23 March 2010, 01:10:09

Greg Smith <greg@2ndquadrant.com> wrote:

> By the way: the pgbench.sgml that you committed looks like it passed 
> through a system that added a CR to every line in it.  Probably not the 
> way you intended to commit that.

Oops, fixed. Thanks.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Ragged latency log data in multi-threaded pgbench

From

Magnus Hagander

Date:

23 March 2010, 05:37:46

On Tue, Mar 23, 2010 at 05:10, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> Greg Smith <greg@2ndquadrant.com> wrote:
>
>> By the way: the pgbench.sgml that you committed looks like it passed
>> through a system that added a CR to every line in it.  Probably not the
>> way you intended to commit that.
>
> Oops, fixed. Thanks.

My guess is that this happened because you committed from Windows?

I learned the hard way that this is a bad idea. (Luckily I learned it
on other CVS projects, before I started committing to PostgreSQL).
There are settings to make it not do that, but they are not reliable.
I'd strongly suggest that you always just do a cvs diff on windows and
then use a staging machine running linux or bsd or something to apply
it through. And then you *always* run those patches through something
like "fromdos". It's a bunch of extra steps, but it's really the only
way to do it reliably.

If that's not at all what happened, then well, it's still good advice
I think, even if it doesn't apply :-)

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/