pgbench bug candidate: negative "initial connection time" - Mailing list pgsql-hackers

From kuroda.hayato@fujitsu.com
Subject pgbench bug candidate: negative "initial connection time"
Date
Msg-id TYCPR01MB5870057375ACA8A73099C649F5349@TYCPR01MB5870.jpnprd01.prod.outlook.com
Whole thread Raw
Responses Re: pgbench bug candidate: negative "initial connection time"
Re: pgbench bug candidate: negative "initial connection time"
List pgsql-hackers
Hi Hackers,

I played pgbench with wrong parameters, and I found bug-candidate.
The latest commit in my source is 3a09d75.

1. Do initdb and start.
2. Initialize schema and data with "scale factor" = 1.
3. execute following command many times:

$ pgbench -c 101 -j 10 postgres

Then, sometimes the negative " initial connection time" was returned.
Lateyncy average is also strange.

```
$ pgbench -c 101 -j 10 postgres
starting vacuum...end.
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:  sorry, too many clients already
pgbench (PostgreSQL) 14.0
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 101
number of threads: 10
number of transactions per client: 10
number of transactions actually processed: 910/1010
latency average = 41387686.662 ms
initial connection time = -372896921.586 ms
tps = 0.002440 (without initial connection time)
```

I sought pgbench.c and found a reason.
When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun().
And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero),
the following if-statement sets bench_start to zero.

```
   6494                 /* first recorded benchmarking start time */
   6495                 if (bench_start == 0 || thread->bench_start < bench_start)
   6496                         bench_start = thread->bench_start;
```

The wrong bench_start propagates to printResult() and then some invalid values are appered.

```
   6509         printResults(&stats, pg_time_now() - bench_start, conn_total_duration,
   6510                                  bench_start - start_time, latency_late);
```

I cannot distinguish whether we have to fix it, but I attache the patch.
This simply ignores a result when therad->bench_start is zero.

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Dilip Kumar
Date:
Subject: Re: Race condition in recovery?