Thread: pgbench bug candidate: negative "initial connection time"
Hi Hackers, I played pgbench with wrong parameters, and I found bug-candidate. The latest commit in my source is 3a09d75. 1. Do initdb and start. 2. Initialize schema and data with "scale factor" = 1. 3. execute following command many times: $ pgbench -c 101 -j 10 postgres Then, sometimes the negative " initial connection time" was returned. Lateyncy average is also strange. ``` $ pgbench -c 101 -j 10 postgres starting vacuum...end. pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already pgbench (PostgreSQL) 14.0 transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 101 number of threads: 10 number of transactions per client: 10 number of transactions actually processed: 910/1010 latency average = 41387686.662 ms initial connection time = -372896921.586 ms tps = 0.002440 (without initial connection time) ``` I sought pgbench.c and found a reason. When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun(). And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero), the following if-statement sets bench_start to zero. ``` 6494 /* first recorded benchmarking start time */ 6495 if (bench_start == 0 || thread->bench_start < bench_start) 6496 bench_start = thread->bench_start; ``` The wrong bench_start propagates to printResult() and then some invalid values are appered. ``` 6509 printResults(&stats, pg_time_now() - bench_start, conn_total_duration, 6510 bench_start - start_time, latency_late); ``` I cannot distinguish whether we have to fix it, but I attache the patch. This simply ignores a result when therad->bench_start is zero. How do you think? Best Regards, Hayato Kuroda FUJITSU LIMITED
Attachment
Hello Hayato-san, > I played pgbench with wrong parameters, That's good:-) > and I found bug-candidate. > > 1. Do initdb and start. > 2. Initialize schema and data with "scale factor" = 1. > 3. execute following command many times: > > $ pgbench -c 101 -j 10 postgres > > Then, sometimes the negative " initial connection time" was returned. > Lateyncy average is also strange. > > ``` > $ pgbench -c 101 -j 10 postgres > starting vacuum...end. > pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already Hmmm. AFAICR there was a decision to generate a report even if something went very wrong, in this case some client could not connect, so some values are not initialized, hence the absurd figures, as you show below. Maybe we should revisit this decision. > initial connection time = -372896921.586 ms > I sought pgbench.c and found a reason. > When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun(). > And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero), > the following if-statement sets bench_start to zero. > I cannot distinguish whether we have to fix it, but I attache the patch. > This simply ignores a result when therad->bench_start is zero. > How do you think? Hmmm. Possibly. Another option could be not to report anything after some errors. I'm not sure, because it would depend on the use case. I guess the command returned an error status as well. I'm going to give it some thoughts. -- Fabien.
RE: pgbench bug candidate: negative "initial connection time"
From
"kuroda.hayato@fujitsu.com"
Date:
Dear Fabien, Thank you for replying! > Hmmm. Possibly. Another option could be not to report anything after some > errors. I'm not sure, because it would depend on the use case. I guess the > command returned an error status as well. I did not know any use cases and decisions , but I vote to report nothing when error occurs. Best Regards, Hayato Kuroda FUJITSU LIMITED
Hello Kuroda-san, On Fri, 11 Jun 2021 08:58:45 +0000 "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote: > Hi Hackers, > > I played pgbench with wrong parameters, and I found bug-candidate. > The latest commit in my source is 3a09d75. > > 1. Do initdb and start. > 2. Initialize schema and data with "scale factor" = 1. > 3. execute following command many times: > > $ pgbench -c 101 -j 10 postgres > > Then, sometimes the negative " initial connection time" was returned. > Lateyncy average is also strange. > > ``` > $ pgbench -c 101 -j 10 postgres > starting vacuum...end. > pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already > pgbench (PostgreSQL) 14.0 > transaction type: <builtin: TPC-B (sort of)> > scaling factor: 1 > query mode: simple > number of clients: 101 > number of threads: 10 > number of transactions per client: 10 > number of transactions actually processed: 910/1010 > latency average = 41387686.662 ms > initial connection time = -372896921.586 ms > tps = 0.002440 (without initial connection time) > ``` > > I sought pgbench.c and found a reason. > When a thread failed to get some connections, they do not fill any values to thread->bench_start in threadRun(). > And if the failure is caused in the final thread (this means threads[nthreads - 1]->bench_start is zero), > the following if-statement sets bench_start to zero. > > ``` > 6494 /* first recorded benchmarking start time */ > 6495 if (bench_start == 0 || thread->bench_start < bench_start) > 6496 bench_start = thread->bench_start; > ``` > > The wrong bench_start propagates to printResult() and then some invalid values are appered. > > ``` > 6509 printResults(&stats, pg_time_now() - bench_start, conn_total_duration, > 6510 bench_start - start_time, latency_late); > ``` > > I cannot distinguish whether we have to fix it, but I attache the patch. > This simply ignores a result when therad->bench_start is zero. + /* skip if the thread faild to get connection */ + if (thread->bench_start == 0) + continue; It detects if a thread failed to get the initial connection by thread->bench_start == 0, but this assumes the initial valueis zero. For ensuring this, I think it is better to initialize it in an early state, for example like this. @@ -6419,6 +6419,7 @@ main(int argc, char **argv) initRandomState(&thread->ts_throttle_rs); initRandomState(&thread->ts_sample_rs); thread->logfile = NULL; /* filled in later */ + thread->bench_start = 0; thread->latency_late = 0; initStats(&thread->stats, 0) typo: faild -> failed Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
On Mon, 14 Jun 2021 00:42:12 +0000 "kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com> wrote: > Dear Fabien, > > Thank you for replying! > > > Hmmm. Possibly. Another option could be not to report anything after some > > errors. I'm not sure, because it would depend on the use case. I guess the > > command returned an error status as well. > > I did not know any use cases and decisions , but I vote to report nothing when error occurs. I would prefer to abort the thread whose connection got an error and report results for other threads, as handled when doConnect fails in CSTATE_START_TX state. In this case, we have to set the state to CSTATE_ABORT before going to 'done' as fixed in the attached patch, in order to ensure that exit status is 2 and the result reports "pgbench: fatal: Run was aborted; the above results are incomplete." Otherwise, if we want pgbench to exit immediately when a connection error occurs, we have tocall exit(1) to ensure the exit code is 1, of course. Anyway, it is wrong that thecurrent pgbench exit successfully with exit code 0 when doConnnect fails. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
>>> Hmmm. Possibly. Another option could be not to report anything after some >>> errors. I'm not sure, because it would depend on the use case. I guess the >>> command returned an error status as well. >> >> I did not know any use cases and decisions , but I vote to report nothing when error occurs. > > I would prefer to abort the thread whose connection got an error and report > results for other threads, as handled when doConnect fails in CSTATE_START_TX > state. It is unclear to me whether it makes much sense to report performance when things go wrong. At least when a one connection per client bench is run ISTM that it should not proceed, because the bench could not even start as prescribe. When connection break while the bench has already started, maybe it makes more sense to proceed, although I guess that maybe reattempting connections would make also sense in such case. > In this case, we have to set the state to CSTATE_ABORT before going to 'done' > as fixed in the attached patch, in order to ensure that exit status is 2 and the > result reports "pgbench: fatal: Run was aborted; the above results are incomplete." Hmmm. I agree that at least reporting that there was an issue is a good idea. > Otherwise, if we want pgbench to exit immediately when a connection error occurs, > we have tocall exit(1) to ensure the exit code is 1, of course. Anyway, it is wrong > that thecurrent pgbench exit successfully with exit code 0 when doConnnect fails. Indeed, I can only agree on this one. -- Fabien.
Hello Fabien, On Mon, 14 Jun 2021 11:30:14 +0200 (CEST) Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >>> Hmmm. Possibly. Another option could be not to report anything after some > >>> errors. I'm not sure, because it would depend on the use case. I guess the > >>> command returned an error status as well. > >> > >> I did not know any use cases and decisions , but I vote to report nothing when error occurs. > > > > I would prefer to abort the thread whose connection got an error and report > > results for other threads, as handled when doConnect fails in CSTATE_START_TX > > state. > > It is unclear to me whether it makes much sense to report performance when > things go wrong. At least when a one connection per client bench is run > ISTM that it should not proceed, because the bench could not even start > as prescribe. I agreed that when an initial connections fails we cannot start a bench in the condition that the user wants and that we should stop early to let the user know it and check the conf. I attached a patch, which is a fusion of my previous patch that changes the state to CSTATE_ABORT when the socket get failure during the bench, and a part of your patch attached in [1] that exits for initial failures. [1] https://www.postgresql.org/message-id/alpine.DEB.2.22.394.2106141011100.1338009%40pseudo > When connection break while the bench has already started, > maybe it makes more sense to proceed, The result would be incomplete also in this case. However, the reason why it is worth to proceed is that such information is still useful for users, or we don't want to waste the bench that has already started? > although I guess that maybe > reattempting connections would make also sense in such case. This might become possible after pgbench gets the feature to retry in deadlock or serialization errors. I am working on rebase of the patch [2] and I will submit this in a few days. [2] https://www.postgresql.org/message-id/20210524112910.444fbfdfbff747bd3b9720ee@sraoss.co.jp -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
Hello Yugo-san, >> When connection break while the bench has already started, >> maybe it makes more sense to proceed, > > The result would be incomplete also in this case. However, the reason why > it is worth to proceed is that such information is still useful for users, > or we don't want to waste the bench that has already started? Hmmm. It depends on what the user is testing. If one is interested in client resilience under errors, the bench should probably attempt a reconnect. If one is interested in best performance when all is well, then clearly something is amiss and there is no point to go on. >> although I guess that maybe reattempting connections would make also >> sense in such case. > > This might become possible after pgbench gets the feature to retry in deadlock > or serialization errors. Yes, I agree that part of the needed infrastructure would be in place for that. As reconnecting is already in place under -c, so possibly it is just a matter of switching between states with some care. > I am working on rebase of the patch [2] and I will submit this in a few > days. Ok. Very good, I look forward to your submission! I'll be sure to look at it. -- Fabien.
On Wed, 16 Jun 2021 20:25:31 +0200 (CEST) Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Hello Yugo-san, > > >> When connection break while the bench has already started, > >> maybe it makes more sense to proceed, > > > > The result would be incomplete also in this case. However, the reason why > > it is worth to proceed is that such information is still useful for users, > > or we don't want to waste the bench that has already started? > > Hmmm. It depends on what the user is testing. If one is interested in > client resilience under errors, the bench should probably attempt a > reconnect. If one is interested in best performance when all is well, > then clearly something is amiss and there is no point to go on. Agreed. After --max-tries options is implemented on pgbench, we would be able to add a new feature to allow users to choose if we try to reconnect or not. However, we don't have it yet for now, so we should just abort the client and report the abortion at the end of the bench when a connection or socket error occurs during the bench, as same the existing behaviour. By the way, the issue of inital connection erros reported in this thread will be fixed by the patch attached in my previous post (a major part are written by you :-) ). Is this acceptable for you? Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hello Yugo-san, > By the way, the issue of inital connection erros reported in this thread > will be fixed by the patch attached in my previous post (a major part are > written by you :-) That does not, on its own, ensure that it is bug free:-) > ). Is this acceptable for you? I disagree on two counts: First, thread[0] should not appear. Second, currently the *only* function to change the client state is advanceConnectionState, so it can be checked there and any bug is only there. We had issues before when several functions where doing updates, and it was a mess to understand what was going on. I really want that it stays that way, so I disagree with setting the state to ABORTED from threadRun. Moreover I do not see that it brings a feature, so ISTM that it is not an actual issue not to do it? -- Fabien.
Hello Fabien, On Thu, 17 Jun 2021 10:37:05 +0200 (CEST) Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > ). Is this acceptable for you? > > I disagree on two counts: > > First, thread[0] should not appear. > > Second, currently the *only* function to change the client state is > advanceConnectionState, so it can be checked there and any bug is only > there. We had issues before when several functions where doing updates, > and it was a mess to understand what was going on. I really want that it > stays that way, so I disagree with setting the state to ABORTED from > threadRun. Moreover I do not see that it brings a feature, so ISTM that it > is not an actual issue not to do it? Ok. I gave up to change the state in threadRun. Instead, I changed the condition at the end of bench, which enables to report abortion due to socket errors. +@@ -6480,7 +6490,7 @@ main(int argc, char **argv) + #endif /* ENABLE_THREAD_SAFETY */ + + for (int j = 0; j < thread->nstate; j++) +- if (thread->state[j].state == CSTATE_ABORTED) ++ if (thread->state[j].state != CSTATE_FINISHED) + exit_code = 2; + + /* aggregate thread level stats */ Does this make sense? -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
>> Second, currently the *only* function to change the client state is >> advanceConnectionState, so it can be checked there and any bug is only >> there. We had issues before when several functions where doing updates, >> and it was a mess to understand what was going on. I really want that it >> stays that way, so I disagree with setting the state to ABORTED from >> threadRun. Moreover I do not see that it brings a feature, so ISTM that it >> is not an actual issue not to do it? > > Ok. I gave up to change the state in threadRun. Instead, I changed the > condition at the end of bench, which enables to report abortion due to > socket errors. > > +@@ -6480,7 +6490,7 @@ main(int argc, char **argv) > + #endif /* ENABLE_THREAD_SAFETY */ > + > + for (int j = 0; j < thread->nstate; j++) > +- if (thread->state[j].state == CSTATE_ABORTED) > ++ if (thread->state[j].state != CSTATE_FINISHED) > + exit_code = 2; > + > + /* aggregate thread level stats */ > > Does this make sense? Yes, definitely. -- Fabien.
At Thu, 17 Jun 2021 11:52:04 +0200 (CEST), Fabien COELHO <coelho@cri.ensmp.fr> wrote in > > Ok. I gave up to change the state in threadRun. Instead, I changed the > > condition at the end of bench, which enables to report abortion due to > > socket errors. > > > > +@@ -6480,7 +6490,7 @@ main(int argc, char **argv) > > + #endif /* ENABLE_THREAD_SAFETY */ > > + > > + for (int j = 0; j < thread->nstate; j++) > > +- if (thread->state[j].state == CSTATE_ABORTED) > > ++ if (thread->state[j].state != CSTATE_FINISHED) > > + exit_code = 2; > > + > > + /* aggregate thread level stats */ > > > > Does this make sense? > > Yes, definitely. I sought for a simple way to enforce all client finishes with the states abort or finished but I didn't find. So +1 for the change. However, as a matter of style. if we touch the code maybe we want to enclose the if statement. Doing this means we regard any state other than CSTATE_FINISHED as aborted. So, the current goto's to done in threadRun are effectively aborting a part or the all clients running on the thread. So for example the following place: pgbench.c:6713 > /* must be something wrong */ > pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); > goto done; Should say such like "thread %d aborted: %s() failed: ...". I'm not sure what is the consensus here about the case where aborted client can recoonect to the same server. This patch doesn't that. However, I think causing reconnection needs more workthan accepted as a fix while beta. ==== + /* as the bench is already running, we do not abort */ pg_log_error("client %d aborted while establishing connection", st->id); st->state = CSTATE_ABORTED; The comment looks strange that it is saying "we don't abort" while setting the state to CSTATE_ABORT then showing "client %d aborted". ==== if ((con = doConnect()) == NULL) + { + pg_log_fatal("connection for initialization failed"); exit(1); doConnect() prints an error emssage given from libpq. So The additional messaget is redundant. ==== errno = THREAD_BARRIER_INIT(&barrier, nthreads); if (errno != 0) + { pg_log_fatal("could not initialize barrier: %m"); + exit(1); This is a run-time error. Maybe we should return 2 in that case. === if (thread->logfile == NULL) { pg_log_fatal("could not open logfile \"%s\": %m", logpath); - goto done; + exit(1); Maybe we should exit with 2 this case. If we exit this case, we might also want to exit when fclose() fails. (Currently the error of fclose() is ignored.) === + /* coldly abort on connection failure */ + pg_log_fatal("cannot create connection for thread %d client %d", + thread->tid, i); + exit(1); It seems to me that the "thread %d client %d(not clinent id but the client index within the thread)" doesn't make sense to users. Even if we showed a message like that, it should show only the global clinent id (cstate->id). I think that we should return with 2 here but we return with 1 in another place for the same reason.. === /* must be something wrong */ - pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD); + pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); goto done; Why doesn't a fatal error cause an immediate exit? (And if we change this to fatal, we also need to change similar errors in the same function to fatal.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, > Doing this means we regard any state other than CSTATE_FINISHED as > aborted. So, the current goto's to done in threadRun are effectively > aborting a part or the all clients running on the thread. So for > example the following place: > > pgbench.c:6713 >> /* must be something wrong */ >> pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); >> goto done; > > Should say such like "thread %d aborted: %s() failed: ...". Yep, possibly. > I'm not sure what is the consensus here about the case where aborted > client can recoonect to the same server. This patch doesn't that. Non trivial future work. > However, I think causing reconnection needs more work than accepted as > a fix while beta. It is an entire project which requires some thinking about. > + /* as the bench is already running, we do not abort */ > pg_log_error("client %d aborted while establishing connection", st->id); > st->state = CSTATE_ABORTED; > > The comment looks strange that it is saying "we don't abort" while > setting the state to CSTATE_ABORT then showing "client %d aborted". Indeed. There is abort from the client, which just means that it stops sending transaction, and abort for the process, which is basically "exit(1)". > ==== > if ((con = doConnect()) == NULL) > + { > + pg_log_fatal("connection for initialization failed"); > exit(1); > > doConnect() prints an error emssage given from libpq. So The > additional messaget is redundant. This is not the same for me: doConnect may fail but we may decide to go retry the connection later, or just one client may be disconnected but others are going on, which is different from deciding to stop the whole program, which deserves a message on its own. > ==== > errno = THREAD_BARRIER_INIT(&barrier, nthreads); > if (errno != 0) > + { > pg_log_fatal("could not initialize barrier: %m"); > + exit(1); > > This is a run-time error. Maybe we should return 2 in that case. Hmmm. Yep. > === > if (thread->logfile == NULL) > { > pg_log_fatal("could not open logfile \"%s\": %m", logpath); > - goto done; > + exit(1); > > Maybe we should exit with 2 this case. Yep. > If we exit this case, we might also want to exit when fclose() fails. > (Currently the error of fclose() is ignored.) Not sure. I'd let it at that for now. > === > + /* coldly abort on connection failure */ > + pg_log_fatal("cannot create connection for thread %d client %d", > + thread->tid, i); > + exit(1); > > It seems to me that the "thread %d client %d(not clinent id but the > client index within the thread)" doesn't make sense to users. Even if > we showed a message like that, it should show only the global clinent > id (cstate->id). This is not obvious to me. I think that we should be homogeneous with what is already done around. > I think that we should return with 2 here but we return with 1 > in another place for the same reason.. Possibly. > /* must be something wrong */ > - pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD); > + pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); > goto done; > > Why doesn't a fatal error cause an immediate exit? Good point. I do not know, but I would expect it to be the case, and AFAICR it does not. > (And if we change this to fatal, we also need to change similar errors > in the same function to fatal.) Possibly. I'll look into it over the week-end. Thanks for the feedback! -- Fabien.
>>> /* must be something wrong */ >>> pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); >>> goto done; >> >> Should say such like "thread %d aborted: %s() failed: ...". After having a lookg, there are already plenty such cases. I'd say not to change anything for beta, and think of it for the next round. >> ==== >> errno = THREAD_BARRIER_INIT(&barrier, nthreads); >> if (errno != 0) >> + { >> pg_log_fatal("could not initialize barrier: %m"); >> + exit(1); >> >> This is a run-time error. Maybe we should return 2 in that case. I think that you are right, but there are plenty such places where exit should be 2 instead of 1 if the doc is followed: """Errors during the run such as database errors or problems in the script will result in exit status 2.""" My beta take is to let these as they are, i.e. pretty inconsistent all over pgbench, and schedule a cleanup on the next round. >> === >> if (thread->logfile == NULL) >> { >> pg_log_fatal("could not open logfile \"%s\": %m", logpath); >> - goto done; >> + exit(1); >> >> Maybe we should exit with 2 this case. > > Yep. The bench is not even started, this is not really run time yet, 1 seems ok. The failure may be due to a typo in the path which comes from the user. >> If we exit this case, we might also want to exit when fclose() fails. >> (Currently the error of fclose() is ignored.) > > Not sure. I'd let it at that for now. I stand by this. >> + /* coldly abort on connection failure */ >> + pg_log_fatal("cannot create connection for thread %d client %d", >> + thread->tid, i); >> + exit(1); >> >> It seems to me that the "thread %d client %d(not clinent id but the >> client index within the thread)" doesn't make sense to users. Even if >> we showed a message like that, it should show only the global clinent >> id (cstate->id). > > This is not obvious to me. I think that we should be homogeneous with what is > already done around. ok for only giving the global client id. >> I think that we should return with 2 here but we return with 1 >> in another place for the same reason.. > > Possibly. Again for this one, the bench has not really started, so 1 seems fine. >> /* must be something wrong */ >> - pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD); >> + pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); >> goto done; >> >> Why doesn't a fatal error cause an immediate exit? > > Good point. I do not know, but I would expect it to be the case, and AFAICR > it does not. > >> (And if we change this to fatal, we also need to change similar errors in >> the same function to fatal.) > > Possibly. On second look, I think that error is fine, indeed we do not stop the process, so "fatal" it is not; Attached Yugo-san patch with some updates discussed in the previous mails, so as to move things along. -- Fabien.
Attachment
Hello Horiguchi-san, Fabien, On Fri, 18 Jun 2021 15:58:48 +0200 (CEST) Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > >>> /* must be something wrong */ > >>> pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); > >>> goto done; > >> > >> Should say such like "thread %d aborted: %s() failed: ...". > > After having a lookg, there are already plenty such cases. I'd say not to > change anything for beta, and think of it for the next round. Agreed. Basically, I think the existing message should be left as is. > >> ==== > >> errno = THREAD_BARRIER_INIT(&barrier, nthreads); > >> if (errno != 0) > >> + { > >> pg_log_fatal("could not initialize barrier: %m"); > >> + exit(1); > >> > >> This is a run-time error. Maybe we should return 2 in that case. > > I think that you are right, but there are plenty such places where exit > should be 2 instead of 1 if the doc is followed: > > """Errors during the run such as database errors or problems in the script > will result in exit status 2.""" > > My beta take is to let these as they are, i.e. pretty inconsistent all > over pgbench, and schedule a cleanup on the next round. As same as the below Fabian's comment about thread->logfile, > >> === > >> if (thread->logfile == NULL) > >> { > >> pg_log_fatal("could not open logfile \"%s\": %m", logpath); > >> - goto done; > >> + exit(1); > >> > >> Maybe we should exit with 2 this case. > > > > Yep. > > The bench is not even started, this is not really run time yet, 1 seems > ok. The failure may be due to a typo in the path which comes from the > user. the bench is not started at THREAD_BARRIER_INIT, so I think exit(1) is ok. > > >> /* must be something wrong */ > >> - pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD); > >> + pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD); > >> goto done; > >> > >> Why doesn't a fatal error cause an immediate exit? > > > > Good point. I do not know, but I would expect it to be the case, and AFAICR > > it does not. > > > >> (And if we change this to fatal, we also need to change similar errors in > >> the same function to fatal.) > > > > Possibly. > > On second look, I think that error is fine, indeed we do not stop the > process, so "fatal" it is not; I replaced this 'fatal' with 'error' because we are aborting the client instead of exit(1). When pgbench was rewritten to use common logging API by the commit 30a3e772b40, somehow pg_log_fatal was used, but I am wondering it should have be pg_log_error. > Attached Yugo-san patch with some updates discussed in the previous mails, > so as to move things along. Thank you for update. I agree with this fix. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hello, On Fri, 18 Jun 2021 15:58:48 +0200 (CEST) Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Attached Yugo-san patch with some updates discussed in the previous mails, > so as to move things along. I attached the patch rebased to a change due to 856de3b39cf. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
On 2021/07/29 13:23, Yugo NAGATA wrote: > Hello, > > On Fri, 18 Jun 2021 15:58:48 +0200 (CEST) > Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> Attached Yugo-san patch with some updates discussed in the previous mails, >> so as to move things along. > > I attached the patch rebased to a change due to 856de3b39cf. + pg_log_fatal("connection for initialization failed"); + pg_log_fatal("setup connection failed"); + pg_log_fatal("cannot create connection for client %d", These fatal messages output when doConnect() fails should be a bit more consistent each other? For example, could not create connection for initialization could not create connection for setup could not create connection for client %d I'm not sure, but *if* "xxx failed" is more proper for pgbench, what about connection for initialization failed connection for setup failed connection for client %d failed > Exit status 1 indicates static problems such as invalid command-line options. > Errors during the run such as database errors or problems in the script will > result in exit status 2. While reading the code and docs related to the patch, I found these descriptions in pgbench docs. The first description needs to be updated? Because even database error (e.g., failure of connection for setup) can result in exit status 1 if it happens before the benchmark actually runs. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hello Fujii-san, On Tue, 7 Sep 2021 02:34:17 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > On 2021/07/29 13:23, Yugo NAGATA wrote: > > Hello, > > > > On Fri, 18 Jun 2021 15:58:48 +0200 (CEST) > > Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > >> Attached Yugo-san patch with some updates discussed in the previous mails, > >> so as to move things along. > > > > I attached the patch rebased to a change due to 856de3b39cf. > > + pg_log_fatal("connection for initialization failed"); > + pg_log_fatal("setup connection failed"); > + pg_log_fatal("cannot create connection for client %d", > > These fatal messages output when doConnect() fails should be a bit more consistent each other? For example, > > could not create connection for initialization > could not create connection for setup > could not create connection for client %d Ok. I fixed as your suggestion. > > Exit status 1 indicates static problems such as invalid command-line options. > > Errors during the run such as database errors or problems in the script will > > result in exit status 2. > > While reading the code and docs related to the patch, I found > these descriptions in pgbench docs. The first description needs to be > updated? Because even database error (e.g., failure of connection for setup) > can result in exit status 1 if it happens before the benchmark actually runs. That makes sense. Failures of setup connection or initial connection doesn't seem 'static problems'. I rewrote this description to explain exit status 1 indicates also interal errors and early errors. Exit status 1 indicates static problems such as invalid command-line options or internal errors which are supposed to never occur. Early errors that occur when starting benchmark such as initial connection failures also exit with status 1. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
On 2021/09/24 7:26, Yugo NAGATA wrote: > That makes sense. Failures of setup connection or initial connection doesn't > seem 'static problems'. I rewrote this description to explain exit status 1 > indicates also interal errors and early errors. > > Exit status 1 indicates static problems such as invalid command-line options > or internal errors which are supposed to never occur. Early errors that occur > when starting benchmark such as initial connection failures also exit with > status 1. LGTM. Barring any objection, I will commit the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/09/24 11:26, Fujii Masao wrote: > > > On 2021/09/24 7:26, Yugo NAGATA wrote: >> That makes sense. Failures of setup connection or initial connection doesn't >> seem 'static problems'. I rewrote this description to explain exit status 1 >> indicates also interal errors and early errors. >> >> Exit status 1 indicates static problems such as invalid command-line options >> or internal errors which are supposed to never occur. Early errors that occur >> when starting benchmark such as initial connection failures also exit with >> status 1. > > LGTM. Barring any objection, I will commit the patch. I extracted two changes from the patch and pushed (also back-patched) them. The remainings are the changes of handling of initial connection or logfile open failures. I agree to push them at least for the master. But I'm not sure if they should be back-patched. Without these changes, even when those failures happen, pgbench proceeds the benchmark and reports the result. But with the changes, pgbench exits immediately in that case. I'm not sure if there are people who expect this behavior, but if there are, maybe we should not change it at least at stable branches. Thought? BTW, when logfile fails to be opened, pgbench gets stuck due to commit aeb57af8e6. So even if we decided not to back-patch those changes, we should improve the handling of logfile open failure, to fix the issue. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, Sep 29, 2021 at 10:11:53PM +0900, Fujii Masao wrote: > BTW, when logfile fails to be opened, pgbench gets stuck due to commit > aeb57af8e6. So even if we decided not to back-patch those changes, > we should improve the handling of logfile open failure, to fix the issue. There is an entry in the CF for this thread: https://commitfest.postgresql.org/34/3219/ I have moved that to the next one as some pieces are missing. If you are planning to handle the rest, could you register your name as a committer? -- Michael
Attachment
On 2021/10/01 15:27, Michael Paquier wrote: > On Wed, Sep 29, 2021 at 10:11:53PM +0900, Fujii Masao wrote: >> BTW, when logfile fails to be opened, pgbench gets stuck due to commit >> aeb57af8e6. So even if we decided not to back-patch those changes, >> we should improve the handling of logfile open failure, to fix the issue. > > There is an entry in the CF for this thread: > https://commitfest.postgresql.org/34/3219/ > > I have moved that to the next one as some pieces are missing. If you > are planning to handle the rest, could you register your name as a > committer? Thanks for letting me know that! I registered myself as a committer of the patch again. pg_time_usec_t conn_duration; /* cumulated connection and deconnection * delays */ BTW, while reading the patch, I found the above comment in pgbench.c. "deconnection" seems a valid word in French (?), but isn't it better to replace it with "disconnection"? Patch attached. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2021/09/29 22:11, Fujii Masao wrote: > > > On 2021/09/24 11:26, Fujii Masao wrote: >> >> >> On 2021/09/24 7:26, Yugo NAGATA wrote: >>> That makes sense. Failures of setup connection or initial connection doesn't >>> seem 'static problems'. I rewrote this description to explain exit status 1 >>> indicates also interal errors and early errors. >>> >>> Exit status 1 indicates static problems such as invalid command-line options >>> or internal errors which are supposed to never occur. Early errors that occur >>> when starting benchmark such as initial connection failures also exit with >>> status 1. >> >> LGTM. Barring any objection, I will commit the patch. > > I extracted two changes from the patch and pushed (also back-patched) them. > > The remainings are the changes of handling of initial connection or > logfile open failures. I agree to push them at least for the master. > But I'm not sure if they should be back-patched. Without these changes, > even when those failures happen, pgbench proceeds the benchmark and > reports the result. But with the changes, pgbench exits immediately in > that case. I'm not sure if there are people who expect this behavior, > but if there are, maybe we should not change it at least at stable branches. > Thought? The current behavior should be improved, but not a bug. So I don't think that the patch needs to be back-patched. Barring any objection, I will push the attached patch to the master. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2021/10/09 0:41, Fujii Masao wrote: > > > On 2021/10/01 15:27, Michael Paquier wrote: >> On Wed, Sep 29, 2021 at 10:11:53PM +0900, Fujii Masao wrote: >>> BTW, when logfile fails to be opened, pgbench gets stuck due to commit >>> aeb57af8e6. So even if we decided not to back-patch those changes, >>> we should improve the handling of logfile open failure, to fix the issue. >> >> There is an entry in the CF for this thread: >> https://commitfest.postgresql.org/34/3219/ >> >> I have moved that to the next one as some pieces are missing. If you >> are planning to handle the rest, could you register your name as a >> committer? > > Thanks for letting me know that! > I registered myself as a committer of the patch again. > > > pg_time_usec_t conn_duration; /* cumulated connection and deconnection > * delays */ > > BTW, while reading the patch, I found the above comment in pgbench.c. > "deconnection" seems a valid word in French (?), but isn't it better to > replace it with "disconnection"? Patch attached. Barring any objection, I will push this patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/11/01 23:01, Fujii Masao wrote: >> The remainings are the changes of handling of initial connection or >> logfile open failures. I agree to push them at least for the master. >> But I'm not sure if they should be back-patched. Without these changes, >> even when those failures happen, pgbench proceeds the benchmark and >> reports the result. But with the changes, pgbench exits immediately in >> that case. I'm not sure if there are people who expect this behavior, >> but if there are, maybe we should not change it at least at stable branches. >> Thought? > > The current behavior should be improved, but not a bug. > So I don't think that the patch needs to be back-patched. > Barring any objection, I will push the attached patch to the master. Pushed. Thanks! I also pushed the typo-fix patch that I proposed upthread. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, 2 Nov 2021 23:11:39 +0900 Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > On 2021/11/01 23:01, Fujii Masao wrote: > >> The remainings are the changes of handling of initial connection or > >> logfile open failures. I agree to push them at least for the master. > >> But I'm not sure if they should be back-patched. Without these changes, > >> even when those failures happen, pgbench proceeds the benchmark and > >> reports the result. But with the changes, pgbench exits immediately in > >> that case. I'm not sure if there are people who expect this behavior, > >> but if there are, maybe we should not change it at least at stable branches. > >> Thought? > > > > The current behavior should be improved, but not a bug. > > So I don't think that the patch needs to be back-patched. > > Barring any objection, I will push the attached patch to the master. > > Pushed. Thanks! Thanks! > > I also pushed the typo-fix patch that I proposed upthread. > > Regards, > > -- > Fujii Masao > Advanced Computing Technology Center > Research and Development Headquarters > NTT DATA CORPORATION > > -- Yugo NAGATA <nagata@sraoss.co.jp>