Thread: pgbench logging broken by time logic changes
Back on March 10 Thomas Munro committed and wrestled multiple reworks of the pgbench code from Fabien and the crew. The feature to synchronize startup I'm looking forward to testing now that I have a packaged beta. Variations on that problem have bit me so many times I added code last year to my pgbench processing pipeline to just throw out the first and last 10% of every data set.
Before I could get to startup timing I noticed the pgbench logging output was broken via commit 547f04e7 "Improve time logic": https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org
A lot of things are timed in pgbench now so I appreciate the idea. Y'all started that whole big thread about sync on my birthday though and I didn't follow the details of what that was reviewed against. For the logging use case I suspect it's just broken everywhere. The two platforms I tested were PGDG Ubuntu beta1 apt install and Mac git build. Example:
$ createdb pgbench
$ pgbench -i -s 1 pgbench
$ pgbench -S -T 1 -l pgbench
$ head pgbench_log.*
0 1 1730 0 1537380 70911
0 2 541 0 1537380 71474
The epoch time is the 5th column in the output, and this week it should look like this:
0 1 1411 0 1623767029 732926
0 2 711 0 1623767029 733660
If you're not an epoch guru who recognizes what's wrong already, you might grab https://github.com/gregs1104/pgbench-tools/ and party like it's 1970 to see it:
$ ~/repos/pgbench-tools/log-to-csv 1 local < pgbench_log* | head
1970-01-18 14:03:00.070911,0,1.73,1,local
1970-01-18 14:03:00.071474,0,0.541,1,local
I have a lot of community oriented work backed up behind this right now, so I'm gonna be really honest. This time rework commit in its current form makes me uncomfortable at this point in the release schedule. The commit has already fought through two rounds of platform specific bug fixes. But since the buildfarm doesn't test the logging feature, that whole process is suspect.
My take on the PostgreSQL way to proceed: this bug exposes that pgbench logging is a feature we finally need to design testing for. We need a new buildfarm test and then a march through a full release phase to see how it goes. Only then should we start messing with the time logic. Even if we fixed the source today on both my test platforms, I'd still be nervous that beta 2 could ship and more performance testing could fall over from this modification. And that's cutting things a little close.
The fastest way to get me back to comfortable would be to unwind 547f04e7 and its associated fixes and take it back to review. I understand the intent and value; I appreciate the work so far. The big industry architecture shift from Intel to ARM has me worried about time overhead again, the old code is wonky, and in the PG15 release cycle I already have resources planned around this area.
# PG15 Plans
I didn't intend to roll back in after time away and go right to a revert review. But I also really don't want to start my public PG14 story documenting the reality that I had to use PG13's pgbench to generate my examples either. I can't fight much with this logging problem while also doing my planned public performance testing of PG14. I already had to push back a solid bit of Beta 1 PR from this week, some "community PG is great!" promotional blogging.
Let me offer what I can commit to from Crunchy corporate. I'm about to submit multiple pgbench feature changes to the open CF starting July, with David Christiansen. We and the rest of Crunchy will happily help re-review this time change idea, its logging issues, testing, rejoin the study of platform time call overhead, and bash the whole mess into shape for PG15. I personally am looking forward to it.
The commit made a functional change to the way connection time is displayed; that I can take or leave as committed. I'm not sure it can be decoupled from the rest of the changes. It did cause a small breaking pgbench output parsing problem for me, just trivial regex adjustment. That break would fit in fine with my upcoming round of submissions.
--
Greg Smith greg.smith@crunchydata.com
Director of Open Source Strategy, Crunchy Data
Before I could get to startup timing I noticed the pgbench logging output was broken via commit 547f04e7 "Improve time logic": https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org
A lot of things are timed in pgbench now so I appreciate the idea. Y'all started that whole big thread about sync on my birthday though and I didn't follow the details of what that was reviewed against. For the logging use case I suspect it's just broken everywhere. The two platforms I tested were PGDG Ubuntu beta1 apt install and Mac git build. Example:
$ createdb pgbench
$ pgbench -i -s 1 pgbench
$ pgbench -S -T 1 -l pgbench
$ head pgbench_log.*
0 1 1730 0 1537380 70911
0 2 541 0 1537380 71474
The epoch time is the 5th column in the output, and this week it should look like this:
0 1 1411 0 1623767029 732926
0 2 711 0 1623767029 733660
If you're not an epoch guru who recognizes what's wrong already, you might grab https://github.com/gregs1104/pgbench-tools/ and party like it's 1970 to see it:
$ ~/repos/pgbench-tools/log-to-csv 1 local < pgbench_log* | head
1970-01-18 14:03:00.070911,0,1.73,1,local
1970-01-18 14:03:00.071474,0,0.541,1,local
I have a lot of community oriented work backed up behind this right now, so I'm gonna be really honest. This time rework commit in its current form makes me uncomfortable at this point in the release schedule. The commit has already fought through two rounds of platform specific bug fixes. But since the buildfarm doesn't test the logging feature, that whole process is suspect.
My take on the PostgreSQL way to proceed: this bug exposes that pgbench logging is a feature we finally need to design testing for. We need a new buildfarm test and then a march through a full release phase to see how it goes. Only then should we start messing with the time logic. Even if we fixed the source today on both my test platforms, I'd still be nervous that beta 2 could ship and more performance testing could fall over from this modification. And that's cutting things a little close.
The fastest way to get me back to comfortable would be to unwind 547f04e7 and its associated fixes and take it back to review. I understand the intent and value; I appreciate the work so far. The big industry architecture shift from Intel to ARM has me worried about time overhead again, the old code is wonky, and in the PG15 release cycle I already have resources planned around this area.
# PG15 Plans
I didn't intend to roll back in after time away and go right to a revert review. But I also really don't want to start my public PG14 story documenting the reality that I had to use PG13's pgbench to generate my examples either. I can't fight much with this logging problem while also doing my planned public performance testing of PG14. I already had to push back a solid bit of Beta 1 PR from this week, some "community PG is great!" promotional blogging.
Let me offer what I can commit to from Crunchy corporate. I'm about to submit multiple pgbench feature changes to the open CF starting July, with David Christiansen. We and the rest of Crunchy will happily help re-review this time change idea, its logging issues, testing, rejoin the study of platform time call overhead, and bash the whole mess into shape for PG15. I personally am looking forward to it.
The commit made a functional change to the way connection time is displayed; that I can take or leave as committed. I'm not sure it can be decoupled from the rest of the changes. It did cause a small breaking pgbench output parsing problem for me, just trivial regex adjustment. That break would fit in fine with my upcoming round of submissions.
--
Greg Smith greg.smith@crunchydata.com
Director of Open Source Strategy, Crunchy Data
On Wed, 16 Jun 2021 10:49:36 -0400 Gregory Smith <gregsmithpgsql@gmail.com> wrote: > A lot of things are timed in pgbench now so I appreciate the idea. Y'all > started that whole big thread about sync on my birthday though and I didn't > follow the details of what that was reviewed against. For the logging use > case I suspect it's just broken everywhere. The two platforms I tested > were PGDG Ubuntu beta1 apt install and Mac git build. Example: > > $ createdb pgbench > $ pgbench -i -s 1 pgbench > $ pgbench -S -T 1 -l pgbench > $ head pgbench_log.* > 0 1 1730 0 1537380 70911 > 0 2 541 0 1537380 71474 > > The epoch time is the 5th column in the output, and this week it should > look like this: > > 0 1 1411 0 1623767029 732926 > 0 2 711 0 1623767029 733660 > > If you're not an epoch guru who recognizes what's wrong already, you might > grab https://github.com/gregs1104/pgbench-tools/ and party like it's 1970 > to see it: > > $ ~/repos/pgbench-tools/log-to-csv 1 local < pgbench_log* | head > 1970-01-18 14:03:00.070911,0,1.73,1,local > 1970-01-18 14:03:00.071474,0,0.541,1,local After the commit, pgbench tries to get the current timestamp by calling pg_time_now(). This uses INSTR_TIME_SET_CURRENT in it, but this macro can call clock_gettime(CLOCK_MONOTONIC[_RAW], ) instead of gettimeofday or clock_gettime(CLOCK_REALTIME, ). When CLOCK_MONOTONIC[_RAW] is used, clock_gettime doesn't return epoch time. Therefore, we can use INSTR_TIME_SET_CURRENT aiming to calculate a duration, but we should not have used this to get the current timestamp. I think we can fix this issue by using gettimeofday for logging as before this was changed. I attached the patch. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
Hello Greg, > I have a lot of community oriented work backed up behind this right now, so > I'm gonna be really honest. This time rework commit in its current form > makes me uncomfortable at this point in the release schedule. The commit > has already fought through two rounds of platform specific bug fixes. But > since the buildfarm doesn't test the logging feature, that whole process is > suspect. Logging is/should going to be fixed. > My take on the PostgreSQL way to proceed: this bug exposes that pgbench > logging is a feature we finally need to design testing for. Sure. The key feedback for me is the usual one: what is not tested does not work. Wow:-) I'm unhappy because I already added tap tests for time-sensitive features (-T and others, maybe logging aggregates, cannot remember), which have been removed because they could fail under some circonstances (eg very very very very slow hosts), or required some special handling (a few lines of code) in pgbench, and the net result of this is there is not a single test in place for some features:-( There is no problem with proposing tests, the problem is that they are accepted, or if they are accepted then that they are not removed at the first small issue but rather fixed, or their limitations accepted, because testing time-sensitive features is not as simple as testing functional features. Note that currently there is not a single test of psql with autocommit off or with "on error rollback". Last time a submitted tap tests to raise psql test coverage from 50% to 90%, it was rejected. I'll admit that I'm tired arguing that more testing is required, and I'm very happy if someone else is ready to try again. Good luck! :-) > We need a new buildfarm test and then a march through a full release > phase to see how it goes. > Only then should we start messing with the time logic. Even if we fixed > the source today on both my test platforms, I'd still be nervous that > beta 2 could ship and more performance testing could fall over from this > modification. And that's cutting things a little close. Well, the point beta is to discover bugs not caught by reviews and dev tests, fix them, and possibly add tests which would have caught them. If you revert all features on the first issue in a corner case and put it back to the queue, then I do not see why the review and dev tests will be much better on the next round, so it does not really help moving things forward. IMHO, the pragmatic approach is to look at fixing first, and maybe revert if the problems are deep. I'm not sure this is obviously the case here. -- Fabien.
> pg_time_now(). This uses INSTR_TIME_SET_CURRENT in it, but this macro > can call clock_gettime(CLOCK_MONOTONIC[_RAW], ) instead of gettimeofday > or clock_gettime(CLOCK_REALTIME, ). When CLOCK_MONOTONIC[_RAW] is used, > clock_gettime doesn't return epoch time. Therefore, we can use > INSTR_TIME_SET_CURRENT aiming to calculate a duration, but we should > not have used this to get the current timestamp. > > I think we can fix this issue by using gettimeofday for logging as before > this was changed. I attached the patch. I cannot say that I'm thrilled by having multiple tv stuff back in several place. I can be okay with one, though. What about the attached? Does it make sense? -- Fabien.
Attachment
On 6/16/21 2:59 PM, Fabien COELHO wrote: > > Hello Greg, > >> I have a lot of community oriented work backed up behind this right >> now, so >> I'm gonna be really honest. This time rework commit in its current form >> makes me uncomfortable at this point in the release schedule. The >> commit >> has already fought through two rounds of platform specific bug >> fixes. But >> since the buildfarm doesn't test the logging feature, that whole >> process is >> suspect. > > Logging is/should going to be fixed. > >> My take on the PostgreSQL way to proceed: this bug exposes that pgbench >> logging is a feature we finally need to design testing for. > > Sure. > > The key feedback for me is the usual one: what is not tested does not > work. Wow:-) Agreed. > > I'm unhappy because I already added tap tests for time-sensitive > features (-T and others, maybe logging aggregates, cannot remember), > which have been removed because they could fail under some > circonstances (eg very very very very slow hosts), or required some > special handling (a few lines of code) in pgbench, and the net result > of this is there is not a single test in place for some features:-( I'm not familiar with exactly what happened in this case, but tests need to be resilient over a wide range of performance characteristics. One way around this issue might be to have a way of detecting that it's on a slow platform and if so either skipping tests (Test::More provides plenty of support for this) or expecting different results. > > There is no problem with proposing tests, the problem is that they are > accepted, or if they are accepted then that they are not removed at > the first small issue but rather fixed, or their limitations accepted, > because testing time-sensitive features is not as simple as testing > functional features. > > Note that currently there is not a single test of psql with autocommit > off or with "on error rollback". Last time a submitted tap tests to > raise psql test coverage from 50% to 90%, it was rejected. I'll admit > that I'm tired arguing that more testing is required, and I'm very > happy if someone else is ready to try again. Good luck! :-) :-( > >> We need a new buildfarm test and then a march through a full release >> phase to see how it goes. > >> Only then should we start messing with the time logic. Even if we >> fixed the source today on both my test platforms, I'd still be >> nervous that beta 2 could ship and more performance testing could >> fall over from this modification. And that's cutting things a little >> close. > > Well, the point beta is to discover bugs not caught by reviews and dev > tests, fix them, and possibly add tests which would have caught them. > > If you revert all features on the first issue in a corner case and put > it back to the queue, then I do not see why the review and dev tests > will be much better on the next round, so it does not really help > moving things forward. > > IMHO, the pragmatic approach is to look at fixing first, and maybe > revert if the problems are deep. I'm not sure this is obviously the > case here. It does look like the submitted fix basically reverts the changes w.r.t. this timestamp logging. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi Greg, On Thu, Jun 17, 2021 at 2:49 AM Gregory Smith <gregsmithpgsql@gmail.com> wrote: > Back on March 10 Thomas Munro committed and wrestled multiple reworks of the pgbench code from Fabien and the crew. Thefeature to synchronize startup I'm looking forward to testing now that I have a packaged beta. Variations on that problemhave bit me so many times I added code last year to my pgbench processing pipeline to just throw out the first andlast 10% of every data set. Yeah, commit aeb57af8 is a nice improvement and was the main thing I wanted to get into the tree for 14 in this area, because it was measuring the wrong thing. > Before I could get to startup timing I noticed the pgbench logging output was broken via commit 547f04e7 "Improve timelogic": https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org It does suck that we broke the logging and that it took 3 months for anyone to notice and report it to the list. Seems like it should be straightforward to fix, though, with fixes already proposed (though I haven't studied them yet, will do). > I have a lot of community oriented work backed up behind this right now, so I'm gonna be really honest. This time reworkcommit in its current form makes me uncomfortable at this point in the release schedule. The commit has already foughtthrough two rounds of platform specific bug fixes. But since the buildfarm doesn't test the logging feature, thatwhole process is suspect. It's true that this work produced a few rounds of small portability follow-ups: c427de42 (work around strange hacks elsewhere in the tree for AIX), 68b34b23 (missing calling convention specifier on Windows), and de91c3b9 (adjust pthread missing-function code for threadless builds). These were problems that didn't show up on developer or CI systems (including threadless and Windows), and IMHO are typical sorts of problems you expect to have to work through when stuff hits the build farm, especially when using new system interfaces. So I don't think any of that, on its own, supports reverting anything here. > My take on the PostgreSQL way to proceed: this bug exposes that pgbench logging is a feature we finally need to designtesting for. We need a new buildfarm test and then a march through a full release phase to see how it goes. Onlythen should we start messing with the time logic. Even if we fixed the source today on both my test platforms, I'd stillbe nervous that beta 2 could ship and more performance testing could fall over from this modification. And that's cuttingthings a little close. > > The fastest way to get me back to comfortable would be to unwind 547f04e7 and its associated fixes and take it back toreview. I understand the intent and value; I appreciate the work so far. The big industry architecture shift from Intelto ARM has me worried about time overhead again, the old code is wonky, and in the PG15 release cycle I already haveresources planned around this area. Let me study the proposed fixes on this and the other thread about pgbench logging for a bit. Glad to hear that you're working on this area. I guess you might be researching stuff along the same sorts of lines as in the thread "Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?" (though that's about the executor). As I already expressed in that thread, if the backend's instrumentation code is improved as proposed there, we'll probably want to rip some of these pgbench changes out anyway and go back to common instrumentation code. For that reason, I'm not super attached to that new pg_time_usec_t stuff at all, and wouldn't be sad if we reverted that piece. I am moderately attached to the sync changes, though. pgbench 13 is objectively producing incorrect results in that respect.
On Thu, Jun 17, 2021 at 12:36:10PM +1200, Thomas Munro wrote: > For that reason, I'm not super attached to that new pg_time_usec_t > stuff at all, and wouldn't be sad if we reverted that piece. I am > moderately attached to the sync changes, though. pgbench 13 is > objectively producing incorrect results in that respect. There is another item in this area where pgbench uses incorrect maths when aggregating the stats of transactions mid-run and at the end of a thread, issue caused by 547f04e as this code path forgot to handle the s <-> us conversion: https://www.postgresql.org/message-id/CAF7igB1r6wRfSCEAB-iZBKxkowWY6+dFF2jObSdd9+iVK+vHJg@mail.gmail.com Wouldn't it be better to put all those fixes into the same bag? If you drop the business with pg_time_usec_t, it looks like we don't really need to do anything there. -- Michael
Attachment
On Wed, 16 Jun 2021 21:11:41 +0200 (CEST) Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > pg_time_now(). This uses INSTR_TIME_SET_CURRENT in it, but this macro > > can call clock_gettime(CLOCK_MONOTONIC[_RAW], ) instead of gettimeofday > > or clock_gettime(CLOCK_REALTIME, ). When CLOCK_MONOTONIC[_RAW] is used, > > clock_gettime doesn't return epoch time. Therefore, we can use > > INSTR_TIME_SET_CURRENT aiming to calculate a duration, but we should > > not have used this to get the current timestamp. > > > > I think we can fix this issue by using gettimeofday for logging as before > > this was changed. I attached the patch. > > I cannot say that I'm thrilled by having multiple tv stuff back in several > place. I can be okay with one, though. What about the attached? Does it > make sense? At first, I also thought of fixing pg_time_now() to use gettimeofday() instead of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is proper to measure time interval. I mean, this macro uses lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval timing even in the face of changes to the system clock due to NTP. The commit 547f04e7 changed all of INSTR_TIME_SET_CURRENT, gettimeofday(), and time() to pg_now_time() which calls INSTR_TIME_SET_CURRENT in it. So, my patch intented to revert these changes only about gettimeofday() and time(), and remain changes about INSTR_TIME_SET_CURRENT. I attached the updated patch because I forgot to revert pg_now_time() to time(NULL). Another idea to fix is adding 'use_epoch' flag to pg_now_time() like below, pg_time_now(bool use_epoch) { if (use_epoch) { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec * 1000000 + tv.tv_usec; } else { instr_time now; INSTR_TIME_SET_CURRENT(now); return (pg_time_usec_t) INSTR_TIME_GET_MICROSEC(now); } } or making an additional function that returns epoch time. By the way, there is another advantage of using clock_gettime(). That is, this returns precise results in nanoseconds. However, we can not benefit from it because pg_time_now() converts the value to uint64 by using INSTR_TIME_GET_MICROSEC. So, if we would like more precious statistics in pgbench, it might be better to use INSTR_TIME_GET_MICROSEC instead of pg_time_now in other places. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
At Thu, 17 Jun 2021 12:23:42 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in > On Wed, 16 Jun 2021 21:11:41 +0200 (CEST) > Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > I cannot say that I'm thrilled by having multiple tv stuff back in several > > place. I can be okay with one, though. What about the attached? Does it > > make sense? +1 The patch rounds down sd->start_time from ms to s but it seems to me a degradation. > At first, I also thought of fixing pg_time_now() to use gettimeofday() instead > of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is > proper to measure time interval. I mean, this macro uses > lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval > timing even in the face of changes to the system clock due to NTP. If I understand the op correctly, the problem here is the time values in pgbench log file are based on a bogus epoch. If it's the only issue here and and if we just want to show the time based on the unix epoch time, just recording the difference would work as I scketched in the attached. (Precisely theepoch would move if we set the system clock but I don't think that matters:p) > By the way, there is another advantage of using clock_gettime(). That is, this > returns precise results in nanoseconds. However, we can not benefit from it because > pg_time_now() converts the value to uint64 by using INSTR_TIME_GET_MICROSEC. So, > if we would like more precious statistics in pgbench, it might be better to use > INSTR_TIME_GET_MICROSEC instead of pg_time_now in other places. I'm not sure we have transaction lasts for very short time that nanoseconds matters. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index d7479925cb..0ce9315a2b 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -342,6 +342,8 @@ typedef struct StatsData SimpleStats lag; } StatsData; +pg_time_usec_t epoch_shift; + /* * Struct to keep random state. */ @@ -3803,7 +3805,7 @@ doLog(TState *thread, CState *st, { /* print aggregated report to logfile */ fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f", - agg->start_time, + agg->start_time + epoch_shift, agg->cnt, agg->latency.sum, agg->latency.sum2, @@ -3834,12 +3836,12 @@ doLog(TState *thread, CState *st, if (skipped) fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " " INT64_FORMAT, - st->id, st->cnt, st->use_file, now / 1000000, now % 1000000); + st->id, st->cnt, st->use_file, (now + epoch_shift) / 1000000, (now + epoch_shift) % 1000000); else fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " " INT64_FORMAT, st->id, st->cnt, latency, st->use_file, - now / 1000000, now % 1000000); + (now + epoch_shift) / 1000000, (now + epoch_shift) % 1000000); if (throttle_delay) fprintf(logfile, " %.0f", lag); fputc('\n', logfile); @@ -5775,7 +5777,12 @@ main(int argc, char **argv) char *env; int exit_code = 0; + struct timeval tv; + gettimeofday(&tv, NULL); + + epoch_shift = tv.tv_sec * 1000000 + tv.tv_usec - pg_time_now(); + pg_logging_init(argv[0]); progname = get_progname(argv[0]);
On Thu, 17 Jun 2021 14:17:56 +0900 (JST) Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > At Thu, 17 Jun 2021 12:23:42 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in > > On Wed, 16 Jun 2021 21:11:41 +0200 (CEST) > > Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > I cannot say that I'm thrilled by having multiple tv stuff back in several > > > place. I can be okay with one, though. What about the attached? Does it > > > make sense? > > +1 The patch rounds down sd->start_time from ms to s but it seems to > me a degradation. I don't think that's matter because sd->start_time is used only for log aggregation and aggregate-interval is specified in seconds, though. > > At first, I also thought of fixing pg_time_now() to use gettimeofday() instead > > of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is > > proper to measure time interval. I mean, this macro uses > > lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval > > timing even in the face of changes to the system clock due to NTP. > > If I understand the op correctly, the problem here is the time values > in pgbench log file are based on a bogus epoch. If it's the only issue > here and and if we just want to show the time based on the unix epoch > time, just recording the difference would work as I scketched in the > attached. (Precisely theepoch would move if we set the system clock > but I don't think that matters:p) That makes sense. If the system clock is shifted due to NTP (for example) it would not affect the measurement although timestamps in logs could be shifted because gettimeofday is called only once. If we fix it in this way, we should fix also printProgressReport(). if (progress_timestamp) { - snprintf(tbuf, sizeof(tbuf), "%.3f s", PG_TIME_GET_DOUBLE(now)); + snprintf(tbuf, sizeof(tbuf), "%.3f s", PG_TIME_GET_DOUBLE(now * epoch_shift)); } > > By the way, there is another advantage of using clock_gettime(). That is, this > > returns precise results in nanoseconds. However, we can not benefit from it because > > pg_time_now() converts the value to uint64 by using INSTR_TIME_GET_MICROSEC. So, > > if we would like more precious statistics in pgbench, it might be better to use > > INSTR_TIME_GET_MICROSEC instead of pg_time_now in other places. > > I'm not sure we have transaction lasts for very short time that > nanoseconds matters. I thought it might affect the accuracy when statistics are accumulated through a huge numbers of transactions, but I am fine with it if no one cares it. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Hello Yugo-san, >>> I think we can fix this issue by using gettimeofday for logging as before >>> this was changed. I attached the patch. >> >> I cannot say that I'm thrilled by having multiple tv stuff back in several >> place. I can be okay with one, though. What about the attached? Does it >> make sense? > > At first, I also thought of fixing pg_time_now() to use gettimeofday() instead > of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is > proper to measure time interval. I mean, this macro uses > lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval > timing even in the face of changes to the system clock due to NTP. Ok, I was thinking that it was possible that there was this kind of implication. Now, does it matter that much if a few transactions are skewed by NTP from time to time? Having to deal with different time functions in the same file seems painful. > The commit 547f04e7 changed all of INSTR_TIME_SET_CURRENT, gettimeofday(), and > time() to pg_now_time() which calls INSTR_TIME_SET_CURRENT in it. So, my patch > intented to revert these changes only about gettimeofday() and time(), and remain > changes about INSTR_TIME_SET_CURRENT. Hmmm. > pg_time_now(bool use_epoch) > { > if (use_epoch) > { > struct timeval tv; > gettimeofday(&tv, NULL); > return tv.tv_sec * 1000000 + tv.tv_usec; > } > else > { > instr_time now; > INSTR_TIME_SET_CURRENT(now); > return (pg_time_usec_t) INSTR_TIME_GET_MICROSEC(now); > } > } > > or making an additional function that returns epoch time. Yes, but when to call which version? How to avoid confusion? After giving it some thoughts, ISTM that the best short-term decision is just to have epoch everywhere, i.e. having now() to rely on gettimeofday, because: - at least one user is unhappy with not having epoch in the log file, and indeed it makes sense to be unhappy about that if they want to correlated logs. So I agree to undo that, or provide an option to undo it. - having different times with different origins at different point in the code makes it really hard to understand and maintain, and if we trade maintainability for precision it should really be worth it, and I'm not sure that working around NTP adjustment is worth it right now. In the not so short term, I'd say that the best approach would be use relative time internally and just to offset these with a global epoch start time when displaying a timestamp. > By the way, there is another advantage of using clock_gettime(). That is, this > returns precise results in nanoseconds. However, we can not benefit from it because > pg_time_now() converts the value to uint64 by using INSTR_TIME_GET_MICROSEC. So, > if we would like more precious statistics in pgbench, it might be better to use > INSTR_TIME_GET_MICROSEC instead of pg_time_now in other places. The INSTR_TIME macros are pretty ugly and inefficient, especially when time arithmetic is involved because they re-implements 64 bits operations based on 32 bit ints. I really wanted to get rid of that as much as possible. From a database benchmarking perspective ISTM that µs is the right smallest unit, given that a transaction implies significant delays such as networks communications, parsing, and so on. So I do not think we should ever need nanos. In conclusion, ISTM that it is enough to simply change now to call gettimeofday to fix the issue raised by Greg. This is patch v1 on the thread. -- Fabien.
Hello, >>> I cannot say that I'm thrilled by having multiple tv stuff back in several >>> place. I can be okay with one, though. What about the attached? Does it >>> make sense? > > +1 The patch rounds down sd->start_time from ms to s but it seems to > me a degradation. Yes, please we should not use time. >> At first, I also thought of fixing pg_time_now() to use gettimeofday() instead >> of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is >> proper to measure time interval. I mean, this macro uses >> lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval >> timing even in the face of changes to the system clock due to NTP. > > If I understand the op correctly, the problem here is the time values > in pgbench log file are based on a bogus epoch. It is not "bogus", but is not necessary epoch depending on the underlying function called behind by INSTR_TIME macros, and people are entitled to expect epoch for log correlations. > If it's the only issue > here and and if we just want to show the time based on the unix epoch > time, just recording the difference would work as I scketched in the > attached. (Precisely theepoch would move if we set the system clock > but I don't think that matters:p) I do like the approach. I'm hesitant to promote it for fixing the beta, but the code impact is small enough, so I'd say yes. Maybe there is a similar issue with progress which should probably use the same approach. I think that aligning the implementations can wait for pg15. The patch as white space issues. Attached an updated version which fixes that, adds comments and simplify the code a little bit. > I'm not sure we have transaction lasts for very short time that > nanoseconds matters. Indeed. -- Fabien.
Attachment
Hello Thomas, >> Before I could get to startup timing I noticed the pgbench logging >> output was broken via commit 547f04e7 "Improve time logic": >> https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org > > It does suck that we broke the logging and that it took 3 months for > anyone to notice and report it to the list. Indeed. Well, it also demonstrates that beta are useful. > Seems like it should be straightforward to fix, though, with fixes > already proposed (though I haven't studied them yet, will do). I think that fixing logging is simple enough, thus a revert is not necessary. >> I have a lot of community oriented work backed up behind this right >> now, so I'm gonna be really honest. This time rework commit in its >> current form makes me uncomfortable at this point in the release >> schedule. The commit has already fought through two rounds of platform >> specific bug fixes. But since the buildfarm doesn't test the logging >> feature, that whole process is suspect. > > It's true that this work produced a few rounds of small portability > follow-ups: c427de42 (work around strange hacks elsewhere in the tree > for AIX), 68b34b23 (missing calling convention specifier on Windows), > and de91c3b9 (adjust pthread missing-function code for threadless > builds). These were problems that didn't show up on developer or CI > systems (including threadless and Windows), and IMHO are typical sorts > of problems you expect to have to work through when stuff hits the > build farm, especially when using new system interfaces. So I don't > think any of that, on its own, supports reverting anything here. Yep, the buildfarm is here to catch portability issues, and it does its job:-) There is no doubt that logging is has been broken because of lack of tests in this area, shame on us. I think it is easy to fix. > [...] For that reason, I'm not super attached to that new pg_time_usec_t > stuff at all, and wouldn't be sad if we reverted that piece. Well, I was sooo happy to get rid of INSTR_TIME ugly and inefficient macros in pgbench… so anything looks better to me. Note that Michaël is having a look at fixing pgbench logging issues. -- Fabien.
> Wouldn't it be better to put all those fixes into the same bag? Attached. -- Fabien.
>> Wouldn't it be better to put all those fixes into the same bag? > > Attached. Even better if the patch is not empty. -- Fabien.
Attachment
On Thu, 17 Jun 2021 10:18:03 +0200 (CEST) Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > >> Wouldn't it be better to put all those fixes into the same bag? > > > > Attached. > > Even better if the patch is not empty. I found you forgot to fix printProgressReport(). Also, according to the document, interval_start in Aggregated Logging seems to be printed in seconds instead of ms. <para> Here is some example output: <screen> 1345828501 5601 1542744 483552416 61 2573 1345828503 7884 1979812 565806736 60 1479 1345828505 7208 1979422 567277552 59 1391 1345828507 7685 1980268 569784714 60 1398 1345828509 7073 1979779 573489941 236 1411 </screen></para> If we obey the document and keep the back-compatibility, we should fix logAgg(). The attached patch includes these fixes. Regards, Yugo Nagata -- Yugo NAGATA <nagata@sraoss.co.jp>
Attachment
> I found you forgot to fix printProgressReport(). Indeed. > Also, according to the document, interval_start in Aggregated Logging > seems to be printed in seconds instead of ms. Indeed. I'm unsure about what we should really want there, but for a beta bug fix I agree that it must simply comply to the old documented behavior. > The attached patch includes these fixes. Thanks. Works for me. -- Fabien.
On Thu, Jun 17, 2021 at 7:24 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Seems like it should be straightforward to fix, though, with fixes > > already proposed (though I haven't studied them yet, will do). > > I think that fixing logging is simple enough, thus a revert is not > necessary. I prepared a draft revert patch for discussion, just in case it comes in handy. This reverts "pgbench: Improve time logic.", but "pgbench: Synchronize client threads." remains (slightly rearranged). I'm on the fence TBH, I can see that it's really small things and it seems we have the patches, but it's late, late enough that benchmarking gurus are showing up to benchmark with it for real, and it's not great to be getting in the way of that with what is mostly refactoring work, so I don't think it would be a bad thing if we agreed to try again in 15. (A number of arguments for and against moving pgbench out of the postgresql source tree and release cycle occur to me, but I guess that's a topic for another thread.) > > [...] For that reason, I'm not super attached to that new pg_time_usec_t > > stuff at all, and wouldn't be sad if we reverted that piece. > > Well, I was sooo happy to get rid of INSTR_TIME ugly and inefficient > macros in pgbench… so anything looks better to me. I don't love it either, in this code or the executor. (I know you also don't like the THREAD_CREATE etc macros. I have something to propose to improve that for 15....) > Note that Michaël is having a look at fixing pgbench logging issues. Yeah I've been catching up with these threads.
Attachment
On 6/17/21 8:49 AM, Thomas Munro wrote: > On Thu, Jun 17, 2021 at 7:24 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: >>> Seems like it should be straightforward to fix, though, with fixes >>> already proposed (though I haven't studied them yet, will do). >> I think that fixing logging is simple enough, thus a revert is not >> necessary. > I prepared a draft revert patch for discussion, just in case it comes > in handy. This reverts "pgbench: Improve time logic.", but "pgbench: > Synchronize client threads." remains (slightly rearranged). > > I'm on the fence TBH, I can see that it's really small things and it > seems we have the patches, but it's late, late enough that > benchmarking gurus are showing up to benchmark with it for real, and > it's not great to be getting in the way of that with what is mostly > refactoring work, so I don't think it would be a bad thing if we > agreed to try again in 15. Is there an identified issue beyond the concrete example Greg gave of the timestamps? We are still fixing a few things with potentially far more impact than anything in pgbench, so fixing this wouldn't bother me that much, as long as we get it done for Beta2. > (A number of arguments for and against > moving pgbench out of the postgresql source tree and release cycle > occur to me, but I guess that's a topic for another thread.) > Indeed. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Wed, Jun 16, 2021 at 2:59 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
I'm unhappy because I already added tap tests for time-sensitive features
(-T and others, maybe logging aggregates, cannot remember), which have
been removed because they could fail under some circonstances (eg very
very very very slow hosts), or required some special handling (a few lines
of code) in pgbench, and the net result of this is there is not a single
test in place for some features:-(
I understand your struggle and I hope I was clear about two things:
-Doing better in this messy area takes a team that goes from development to release management, and I had no right to complain unless I brought resources to improve in the specific areas of the process that I want to be better.
I think the only thing you and I disagree on is that you see a "first issue in a corner case" where I see a process failure that is absolutely vital for me to improve. Since the reality is that I might be the best positioned person to actually move said process forward in a meaningful long-term way, I have every intention of applying pressure to the area you're frustrated at. Crunchy has a whole parallel review team to the community one now focused on what our corporate and government customers need for software process control and procedure compliance. The primary business problem I'm working on now is how to include performance review in that mix.
I already know I need to re-engage with you over how I need min/max numbers in the aggregate logging output to accomplish some valuable goals. When I get around to that this summer, I'd really enjoy talking with you a bit, video call or something, about really any community topic you're frustrated with. I have a lot riding now on the productivity of the PostgreSQL hacker community and I want everyone to succeed at the best goals.
There is no problem with proposing tests, the problem is that they are
accepted, or if they are accepted then that they are not removed at the
first small issue but rather fixed, or their limitations accepted, because
testing time-sensitive features is not as simple as testing functional
features.
For 2020 Crunchy gave me a sort of sabbatical year to research community oriented benchmarking topics. Having a self contained project in my home turned out to be the perfect way to spend *that* wreck of a year.
I made significant progress toward the idea of having a performance farm for PostgreSQL. On my laptop today is a 14GB database with 1s resolution latency traces for 663 days of pgbench time running 4 workloads across a small bare metal farm of various operating systems and hardware classes. I can answer questions like "how long does a typical SSD take to execute an INSERT commit?" across my farm with SQL. It's at the "works for me!" stage of development, and I thought this was the right time in the development cycle to start sharing improvement ideas from my work; thus the other submissions in progress I alluded to.
The logging feature is in an intermediate spot where validating it requires light custom tooling that compares its output against known variables like the system time. It doesn't quite have a performance component to it. Since this time logic detail is a well known portability minefield, I thought demanding that particular test was a pretty easy sell.
That you in particular are frustrated here makes perfect sense to me. I am fresh and ready to carry this forward some distance, and I hope the outcome makes you happy
> Is there an identified issue beyond the concrete example Greg gave of > the timestamps? AFAICS, there is a patch which fixes all known issues linked to pgbench logging. Whether there exists other issues is possible, but the "broken" area was quite specific. There are also some TAP tests on pgbench which do catch issues. -- Fabien.
Hello Greg, > I think the only thing you and I disagree on is that you see a "first > issue in a corner case" where I see a process failure that is absolutely > vital for me to improve. Hmmm. I agree that improvements are needed, but for me there is simply a few missing (removed) tap tests which should/could have caught these issues, which are AFAICS limited to the untested area. Given the speed of the process and the energy and patience needed to move things forward, reverting means that the patch is probably dead for at least a year, possibly an eon, and that is too bad because IMHO it was an improvement (my eyes are watering when I see INSTR_TIME macros), so I'd prefer a fix rather than a revert if it is possible, which in this case I think it could be. > Since the reality is that I might be the best positioned person Good for you:-) > to actually move said process forward in a meaningful long-term way, I > have every intention of applying pressure to the area you're frustrated > at. Crunchy has a whole parallel review team to the community one now > focused on what our corporate and government customers need for software > process control and procedure compliance. The primary business problem > I'm working on now is how to include performance review in that mix. This idea has been around for some time now. It is quite a task, and a working and possibly extended pgbench is just one part of the overall software, infrastructure and procedure needed to have that. > I already know I need to re-engage with you over how I need min/max numbers > in the aggregate logging output to accomplish some valuable goals. I do try to review every patch submitted about pgbench. Feel free to fire! > When I get around to that this summer, I'd really enjoy talking with you > a bit, video call or something, about really any community topic you're > frustrated with. "frustrated" may be a strong word. I'm somehow annoyed, and unlikely to ever submit many tests improvements in the future. >> There is no problem with proposing tests, the problem is that they are >> accepted, or if they are accepted then that they are not removed at the >> first small issue but rather fixed, or their limitations accepted, because >> testing time-sensitive features is not as simple as testing functional >> features. > > For 2020 Crunchy gave me a sort of sabbatical year to research community > oriented benchmarking topics. Having a self contained project in my home > turned out to be the perfect way to spend *that* wreck of a year. Yep. > I made significant progress toward the idea of having a performance farm > for PostgreSQL. On my laptop today is a 14GB database with 1s resolution > latency traces for 663 days of pgbench time running 4 workloads across a > small bare metal farm of various operating systems and hardware classes. Wow. > I can answer questions like "how long does a typical SSD take to execute > an INSERT commit?" across my farm with SQL. So, what is the answer? :-) > It's at the "works for me!" stage of development, and I thought this was > the right time in the development cycle to start sharing improvement > ideas from my work; thus the other submissions in progress I alluded to. > > The logging feature is in an intermediate spot where validating it requires > light custom tooling that compares its output against known variables like > the system time. Sure. > It doesn't quite have a performance component to it. Hmmm, if you log all transactions it can becomes the performance bottleneck quite quickly:-) > Since this time logic detail is a well known portability minefield, I > thought demanding that particular test was a pretty easy sell. The test I recalled was removed at ad51c6f. Ok, it would not have caught the issue about timestamp (although it could have been improved to do so), but it would have caught the trivial one about the catchup loop in aggregate interval generating too many lines because of a forgotten conversion to µs. -- Fabien.
Hello Thomas, > I prepared a draft revert patch for discussion, just in case it comes > in handy. This reverts "pgbench: Improve time logic.", but "pgbench: > Synchronize client threads." remains (slightly rearranged). I had a quick look. I had forgotten that this patch also fixed the long-running brain-damaged tps computation that has been bothering me for years, so that one sane performance figure is now reported instead of two not-clear-to-interpret take-your-pick figures. It would be a real loss if this user-facing fix is removed in the process:-( -- Fabien.
On Fri, Jun 18, 2021 at 12:49:42AM +1200, Thomas Munro wrote: > I'm on the fence TBH, I can see that it's really small things and it > seems we have the patches, but it's late, late enough that > benchmarking gurus are showing up to benchmark with it for real, and > it's not great to be getting in the way of that with what is mostly > refactoring work, so I don't think it would be a bad thing if we > agreed to try again in 15. (A number of arguments for and against > moving pgbench out of the postgresql source tree and release cycle > occur to me, but I guess that's a topic for another thread.) I may be missing something of course, but I don't see any strong reason why we need to do a revert here if we have patches to discuss first. >> Note that Michaël is having a look at fixing pgbench logging issues. > > Yeah I've been catching up with these threads. Thomas, do you want me to look more at this issue? I don't feel comfortable with the idea of doing anything if you are planning to look at this thread and you are the owner here, so that should be your call. From what I can see, we have the same area getting patched with patches across two threads, so it seems better to give up the other thread and just focus on the discussion here, where v7 has been sent: https://www.postgresql.org/message-id/20210617175542.ad6b9b82926d8469e8520324@sraoss.co.jp https://www.postgresql.org/message-id/CAF7igB1r6wRfSCEAB-iZBKxkowWY6%2BdFF2jObSdd9%2BiVK%2BvHJg%40mail.gmail.com -- Michael
Attachment
On Wed, Jun 16, 2021 at 03:13:30PM -0400, Andrew Dunstan wrote: > On 6/16/21 2:59 PM, Fabien COELHO wrote: > > The key feedback for me is the usual one: what is not tested does not > > work. Wow:-) > > Agreed. > > > I'm unhappy because I already added tap tests for time-sensitive > > features (-T and others, maybe logging aggregates, cannot remember), > > which have been removed because they could fail under some > > circonstances (eg very very very very slow hosts), or required some > > special handling (a few lines of code) in pgbench, and the net result > > of this is there is not a single test in place for some features:-( > > I'm not familiar with exactly what happened in this case, but tests need > to be resilient over a wide range of performance characteristics. One > way around this issue might be to have a way of detecting that it's on a > slow platform and if so either skipping tests (Test::More provides > plenty of support for this) or expecting different results. Detection would need the host to be consistently slow, like running under Valgrind or a 20-year-old CPU. We also test on systems having highly-variable performance due to other processes competing for the same hardware. I'd perhaps add a "./configure --enable-realtime-tests" option that enables affected tests. Testers should use the option whenever the execution environment has sufficient reserved CPU.
On Fri, Jun 18, 2021 at 12:31 PM Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Jun 18, 2021 at 12:49:42AM +1200, Thomas Munro wrote: > > Yeah I've been catching up with these threads. > > Thomas, do you want me to look more at this issue? I don't feel > comfortable with the idea of doing anything if you are planning to > look at this thread and you are the owner here, so that should be your > call. > > From what I can see, we have the same area getting patched with > patches across two threads, so it seems better to give up the other > thread and just focus on the discussion here, where v7 has been sent: > https://www.postgresql.org/message-id/20210617175542.ad6b9b82926d8469e8520324@sraoss.co.jp > https://www.postgresql.org/message-id/CAF7igB1r6wRfSCEAB-iZBKxkowWY6%2BdFF2jObSdd9%2BiVK%2BvHJg%40mail.gmail.com Thanks for looking so far. It's the weekend here and I need to unplug, but I'll test these changes and if all looks good push on Monday.
On Sat, Jun 19, 2021 at 11:59:16AM +1200, Thomas Munro wrote: > Thanks for looking so far. It's the weekend here and I need to > unplug, but I'll test these changes and if all looks good push on > Monday. Thanks for the update. Have a good weekend. -- Michael
Attachment
On 2021-Jun-19, Thomas Munro wrote: > Thanks for looking so far. It's the weekend here and I need to > unplug, but I'll test these changes and if all looks good push on > Monday. Surely not the same day as the beta stamp... -- Álvaro Herrera Valdivia, Chile "Always assume the user will do much worse than the stupidest thing you can imagine." (Julien PUYDT)
On Sun, Jun 20, 2021 at 3:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2021-Jun-19, Thomas Munro wrote: > > Thanks for looking so far. It's the weekend here and I need to > > unplug, but I'll test these changes and if all looks good push on > > Monday. > > Surely not the same day as the beta stamp... Because of timezones, that'll be Sunday in the Americas. Still too close?
On Sun, Jun 20, 2021 at 4:52 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Sun, Jun 20, 2021 at 3:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2021-Jun-19, Thomas Munro wrote: > > > Thanks for looking so far. It's the weekend here and I need to > > > unplug, but I'll test these changes and if all looks good push on > > > Monday. > > > > Surely not the same day as the beta stamp... > > Because of timezones, that'll be Sunday in the Americas. Still too close? Upon reflection, that amounts to the same thing really, so yeah, scratch that plan. I'll defer until after that (and then I'll be leaning more towards the revert option).
> Upon reflection, that amounts to the same thing really, so yeah, > scratch that plan. I'll defer until after that (and then I'll be > leaning more towards the revert option). Sigh. I do not understand anything about the decision processus. If you do revert, please consider NOT reverting the tps computation changes intermixed in the patch. -- Fabien.
On 6/20/21 5:02 AM, Fabien COELHO wrote: > >> Upon reflection, that amounts to the same thing really, so yeah, >> scratch that plan. I'll defer until after that (and then I'll be >> leaning more towards the revert option). > > Sigh. I do not understand anything about the decision processus. Yes, sometimes it passeth all understanding. There will certainly be a BETA3, and in every recent year except last year there has been a BETA4. If this were core server code threatening data integrity I would be inclined to be more strict, but after all pg_bench is a utility program, and I think we can allow a little more latitude. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Sun, Jun 20, 2021 at 10:15:55AM -0400, Andrew Dunstan wrote: > If this were core server code threatening data integrity I would be > inclined to be more strict, but after all pg_bench is a utility program, > and I think we can allow a little more latitude. +1. Let's be flexible here. It looks better to not rush a fix, and we still have some time ahead. -- Michael
Attachment
Bonjour Michaël, >> If this were core server code threatening data integrity I would be >> inclined to be more strict, but after all pg_bench is a utility program, >> and I think we can allow a little more latitude. > > +1. Let's be flexible here. It looks better to not rush a fix, and > we still have some time ahead. Attached an updated v8 patch which adds (reinstate) an improved TAP test which would have caught the various regressions on logs. Given that such tests were removed once before, I'm unsure whether they will be acceptable, despite that their usefulness has been clearly demonstrated. At least it is for the record. Sigh:-( -- Fabien.
Attachment
On Tue, Jun 22, 2021 at 12:06:45PM +0200, Fabien COELHO wrote: > Attached an updated v8 patch which adds (reinstate) an improved TAP test > which would have caught the various regressions on logs. > Given that such tests were removed once before, I'm unsure whether they will > be acceptable, despite that their usefulness has been clearly demonstrated. > At least it is for the record. Sigh:-( Thanks! This v8 is an addition of the fix for the epoch with the adjustments for the aggregate reports in the logs. The maths look rather right after a read and after some tests. +# note: this test is time sensitive, and may fail on a very +# loaded host. +# note: --progress-timestamp is not tested +my $delay = pgbench( + '-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2' + . ' --rate=20 --latency-limit=1000 -j ' . $nthreads + . ' -c 3 -r', + 0, + [ qr{type: multiple}, + qr{clients: 3}, + qr{threads: $nthreads}, + qr{duration: 2 s}, + qr{script 1: .* select only}, + qr{script 2: .* select only}, + qr{statement latencies in milliseconds}, + qr{FROM pgbench_accounts} ], + [ qr{vacuum}, qr{progress: 1\b} ], + 'pgbench progress', undef, + "--log-prefix=$bdir/001_pgbench_log_1"); Could we make that shorter at 1s? That will shorten the duration of the test run. It is easy to miss that this test is for --aggregate-interval (aka the logAgg() path) without a comment. +# cool check that we are around 2 seconds +# The rate may results in an unlucky schedule which triggers +# an early exit, hence the loose bound. +ok(0.0 < $delay && $delay < 4.0, "-T 2 run around 2 seconds"); The command itself would not fail, but we would just fail on a machine where the difference in start/stop time is higher than 4 seconds, right? On RPI-level machines, this could fail reliably. I am not completely sure what's the additional value we can get from that extra test, to be honest. +# $nthreads threads, 2 seconds, but due to timing imprecision we might get +# only 1 or as many as 3 progress reports per thread. +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, + qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$}); + Now this one is good and actually stable thanks to the fact that we'd get at least the final logs, and the main complain we got to discuss about on this thread was the format of the logs. I would say to give up on the first test, and keep the second. But, is this regex correct? Shouldn't we check for six integer fields only with the first one having a minimal number of digits for the epoch? -- Michael
Attachment
Hello, > +# note: this test is time sensitive, and may fail on a very > +# loaded host. > +# note: --progress-timestamp is not tested > +my $delay = pgbench( > + '-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2' > + . ' --rate=20 --latency-limit=1000 -j ' . $nthreads > + . ' -c 3 -r', > + 0, > + [ qr{type: multiple}, > + qr{clients: 3}, > + qr{threads: $nthreads}, > + qr{duration: 2 s}, > + qr{script 1: .* select only}, > + qr{script 2: .* select only}, > + qr{statement latencies in milliseconds}, > + qr{FROM pgbench_accounts} ], > + [ qr{vacuum}, qr{progress: 1\b} ], > + 'pgbench progress', undef, > + "--log-prefix=$bdir/001_pgbench_log_1"); > Could we make that shorter at 1s? That will shorten the duration of > the test run. It is easy to miss that this test is for > --aggregate-interval (aka the logAgg() path) without a comment. It is for -T, -P and --aggregate-interval. The units of all these is seconds, the minimum is 1, I put 2 so that It is pretty sure to get some output. We could try 1, but I'm less confident that an output is ensured in all cases on a very slow host which may decide to shorten the run before having shown a progress for instance. > +# cool check that we are around 2 seconds > +# The rate may results in an unlucky schedule which triggers > +# an early exit, hence the loose bound. > +ok(0.0 < $delay && $delay < 4.0, "-T 2 run around 2 seconds"); > > The command itself would not fail, but we would just fail on a machine > where the difference in start/stop time is higher than 4 seconds, > right? Yep. It could detect a struck pgbench process which was one of the reported issue. Maybe there should be a timeout added. > On RPI-level machines, this could fail reliably. Dunno, Not sure what RPI means. Probably not "Retail Price Index"… maybe Rasberry-Pi? In that case, the O-4 second leeway is intended to be loose enough to accomodate such hosts, but I cannot test that. > I am not completely sure what's the additional value we can get from > that extra test, to be honest. This would be to detect a somehow stuck process. It could be looser if necessary. Or removed, or preferably commented out, or enabled with some options (eg an environment variable? configure option?). Such control could also skip all 3 calls, in which case the 2 seconds is not an issue. > +# $nthreads threads, 2 seconds, but due to timing imprecision we might get > +# only 1 or as many as 3 progress reports per thread. > +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, > + qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$}); > + > Now this one is good and actually stable thanks to the fact that we'd > get at least the final logs, and the main complain we got to discuss > about on this thread was the format of the logs. Yep, this test would have probably detected the epoch issue reported by Greg. > I would say to give up on the first test, and keep the second. You mean remove the time check and keep the log check. I'd like to keep a time check, or at least have it in comment so that I can enable it simply. > But, is this regex correct? Shouldn't we check for six integer fields > only with the first one having a minimal number of digits for the epoch? Epoch (seconds since 1970-01-01?) is currently 10 digits. Not sure how well it would work if some host have another zero start date. Given the options of the bench run, there are that many fields in the log output, I'm not sure why we would want to check for less? -- Fabien.
On Wed, Jun 23, 2021 at 08:26:32AM +0200, Fabien COELHO wrote: >> Could we make that shorter at 1s? That will shorten the duration of >> the test run. It is easy to miss that this test is for >> --aggregate-interval (aka the logAgg() path) without a comment. > > It is for -T, -P and --aggregate-interval. The units of all these is > seconds, the minimum is 1, I put 2 so that It is pretty sure to get some > output. We could try 1, but I'm less confident that an output is ensured in > all cases on a very slow host which may decide to shorten the run before > having shown a progress for instance. Could it be possible to document the intention of the test and its coverage then? With the current patch, one has to guess what's the intention behind this case. >> +# $nthreads threads, 2 seconds, but due to timing imprecision we might get >> +# only 1 or as many as 3 progress reports per thread. >> +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, >> + qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$}); >> + >> Now this one is good and actually stable thanks to the fact that we'd >> get at least the final logs, and the main complain we got to discuss >> about on this thread was the format of the logs. > > Yep, this test would have probably detected the epoch issue reported by > Greg. (Sorry I missed the use of throttle_delay that would generate 10 fields in the log file) Hm.. Could it be possible to tighten a bit the regex used here then? I was playing with it and it is not really picky in terms of patterns allowed or rejected. The follow-up checks for check_pgbench_logs() could be a bit more restrictive as well, but my regex-fu is bad. >> I would say to give up on the first test, and keep the second. > > You mean remove the time check and keep the log check. I'd like to keep a > time check, or at least have it in comment so that I can enable it simply. I'd rather avoid tests that tend to fail on slow machines. There is a flotilla in the buildfarm. -- Michael
Attachment
Bonjour Michaël, > Could it be possible to document the intention of the test and its > coverage then? With the current patch, one has to guess what's the > intention behind this case. Ok, see attached. >>> +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, >>> + qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$}); > > Hm.. Could it be possible to tighten a bit the regex used here then? > I was playing with it and it is not really picky in terms of patterns > allowed or rejected. > The follow-up checks for check_pgbench_logs() could be a bit more > restrictive as well, but my regex-fu is bad. Given the probabilistic nature of a --rate run and the variability of hosts which may run the tests, it is hard to be more specific that \d+ for actual performance data. The run may try 0 or 50 transaction within a second (both with pretty low probabilities), so the test mostly checks the format and some basic sanity on the output and logs. >>> I would say to give up on the first test, and keep the second. >> >> You mean remove the time check and keep the log check. I'd like to keep a >> time check, or at least have it in comment so that I can enable it simply. > > I'd rather avoid tests that tend to fail on slow machines. There is a > flotilla in the buildfarm. Commented out in attached v9. -- Fabien.
Attachment
On 2021-Jun-23, Fabien COELHO wrote: > +# cool check that we are around 2 seconds > +# The rate may results in an unlucky schedule which triggers > +# an early exit, hence the loose bound. > +# > +# THIS TEST IS COMMENTED OUT BUT PLEASE LET IT THERE SO THAT > +# IT CAN BE ENABLED EASILY. > +# > +## ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds"); I think you should use Test::More's "skip" for this, perhaps something like this: SKIP: { skip "This test is unreliable"; # explain why ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds"); } ... or, actually, even better would be to use a TODO block, so that the test is run and reports its status, but if it happens not to succeed it will not cause the whole test to fail. That way you'll accumulate some evidence that may serve to improve the test in the future until it works fully: TODO: { local $TODO = "Ths test is unreliable"; ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds"); } -- Álvaro Herrera Valdivia, Chile "El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte" (Ijon Tichy en Viajes, Stanislaw Lem)
Ola Álvaro, > ... or, actually, even better would be to use a TODO block, so that the > test is run and reports its status, but if it happens not to succeed it > will not cause the whole test to fail. That way you'll accumulate some > evidence that may serve to improve the test in the future until it > works fully: > > TODO: { > local $TODO = "Ths test is unreliable"; > > ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds"); > } Thanks for the hint! Why not, having the ability to collect data is a good thing, so attached v10 does that. If something go wrong, the TODO section could be extended around all calls. -- Fabien.
Attachment
On Wed, Jun 23, 2021 at 10:01:28PM +0200, Fabien COELHO wrote: > Thanks for the hint! Why not, having the ability to collect data is a good > thing, so attached v10 does that. If something go wrong, the TODO section > could be extended around all calls. +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, + qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$}); FWIW, I am still seeing problems with the regex pattern you are using here, because this fails to detect the number of fields we should have generated here, for one. If you are not convinced, just run your new test and extend or reduce the amount of data generated by logAgg() in your patch: the tests will still pass. So I have investigated this stuff in details. The regular expressions are correctly shaped, but the use of grep() in check_pgbench_logs() seems to be incorrect. For example, let's take an aggregate report generated by your new test: "1624498086 13 27632 60597490 1683 2853 3227 883179 120 386 123" Here are some extra ones, shorter and longer: "1624498086 13 27632 60597490 1683 2853 3227 8831"; "1624498086 13 27632 60597490 1683 2853 3227 883179 120 386 123 123"; Using grep() with "$re" results in all the fields matching. Using on the contrary "/$re/" in grep(), like list_files(), would only match the first one, which is correct. Please see attached a small script to show my point, called perl_grep.pl. With this issue fixed, I have bumped into what looks like a different bug in the tests. 001_pgbench_log_2 uses pgbench with 2 clients, but expects only patterns in the logs where the first column value uses only 0. With two clients, those first values can be either 0 or 1 due to the client ID set. It seems to me that we had better fix this issue and back-patch where this has been introduced so as we have exact match checks with the log formarts, no? Please see the attached. Thoughts? -- Michael
Attachment
Bonjour Michaël, > Using grep() with "$re" results in all the fields matching. Using on > the contrary "/$re/" in grep(), like list_files(), would only match > the first one, which is correct. Ok, good catch. Perl is kind of a strange language. > With this issue fixed, I have bumped into what looks like a different > bug in the tests. 001_pgbench_log_2 uses pgbench with 2 clients, but> > expects only patterns in the logs where the first column value uses only > 0. With two clients, those first values can be either 0 or 1 due to the > client ID set. Indeed. The tests passes because the number of expected lines is quite > It seems to me that we had better fix this issue and back-patch where > this has been introduced so as we have exact match checks with the log > formarts, no? Please see the attached. Ok, however the regex should be "^[01] ...". Attached v11 with your fixes + the above regex fix. -- Fabien.
Attachment
On 6/24/21 2:46 AM, Fabien COELHO wrote: > > Bonjour Michaël, > >> Using grep() with "$re" results in all the fields matching. Using on >> the contrary "/$re/" in grep(), like list_files(), would only match >> the first one, which is correct. > > Ok, good catch. Perl is kind of a strange language. Not really, the explanation is fairly simple: grep returns the values for which the test is true. grep ("$re",@values) doesn't perform a regex test against the values, it tests the truth of "$re" for each value, i.e. it's more or less the same as grep (1, @values), which will always returns the whole @values list. By contrast grep (/$re/, @values) returns those elements of @values that match the regex. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Jun 24, 2021 at 08:03:27AM -0400, Andrew Dunstan wrote: > On 6/24/21 2:46 AM, Fabien COELHO wrote: >>> Using grep() with "$re" results in all the fields matching. Using on >>> the contrary "/$re/" in grep(), like list_files(), would only match >>> the first one, which is correct. >> >> Ok, good catch. Perl is kind of a strange language. Okay, I have extracted this part from your patch, and back-patched this fix down to 11. The comments were a good addition, so I have kept them. I have also made the second regex of check_pgbench_logs() pickier with the client ID value expected, as it can only be 0. > By contrast grep (/$re/, @values) returns those elements of @values that > match the regex. Thanks for the details here. -- Michael
Attachment
Bonjour Michaël, > Okay, I have extracted this part from your patch, and back-patched > this fix down to 11. The comments were a good addition, so I have > kept them. I have also made the second regex of check_pgbench_logs() > pickier with the client ID value expected, as it can only be 0. Attached the remaining part of the patch to fix known issues on pgbench logging. I've added an entry on the open item on the wiki. I'm unsure about who the owner should be. -- Fabien.
Attachment
On Wed, Jun 30, 2021 at 09:45:47AM +0200, Fabien COELHO wrote: > Attached the remaining part of the patch to fix known issues on pgbench > logging. > > I've added an entry on the open item on the wiki. I'm unsure about who the > owner should be. There is already an item: "Incorrect time maths in pgbench". -- Michael
Attachment
On Wed, Jun 30, 2021 at 8:05 PM Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Jun 30, 2021 at 09:45:47AM +0200, Fabien COELHO wrote: > > Attached the remaining part of the patch to fix known issues on pgbench > > logging. > > > > I've added an entry on the open item on the wiki. I'm unsure about who the > > owner should be. > > There is already an item: "Incorrect time maths in pgbench". Fabien, thanks for the updated patch, I'm looking at it. I removed the duplicate item. More soon.
Hello Thomas, >>> I've added an entry on the open item on the wiki. I'm unsure about who the >>> owner should be. >> >> There is already an item: "Incorrect time maths in pgbench". Argh *shoot*, I went over the list too quickly, looking for "log" as a keyword. > Fabien, thanks for the updated patch, I'm looking at it. I removed > the duplicate item. More soon. Thanks. Sorry for the noise. -- Fabien.
>> Fabien, thanks for the updated patch, I'm looking at it. After looking at it again, here is an update which ensure 64 bits on epoch_shift computation. -- Fabien.
Attachment
On Wed, Jun 30, 2021 at 9:55 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> Fabien, thanks for the updated patch, I'm looking at it. > > After looking at it again, here is an update which ensure 64 bits on > epoch_shift computation. Hi Fabien, The code in pgbench 13 aggregates into buckets that begin on the boundaries of wall clock seconds, because it is triggered by changes in time_t. In the current patch, we aggregate data into buckets that begin on the boundaries of whole seconds since start_time. Those boundaries are not aligned with wall clock seconds, and yet we print out the times rounded to wall clock seconds. With the following temporary hack: static void logAgg(FILE *logfile, StatsData *agg) { - fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f", - (agg->start_time + epoch_shift) / 1000000, + fprintf(logfile, /*INT64_FORMAT*/ "%f " INT64_FORMAT " %.0f %.0f %.0f %.0f", + (agg->start_time + epoch_shift) / 1000000.0, ... you can see what I mean: 1625115080.840406 325 999256 3197232764 1450 6846 Perhaps we should round the start time of the first aggregate down to the nearest wall clock second? That would mean that the first aggregate misses a part of a second (as it does in pgbench 13), but all later aggregates begin at the time we write in the log (as it does in pgbench 13). That is, if we log 1625115080 we mean "all results >= 1625115080.000000". It's a small detail, but it could be important for someone trying to correlate the log with other data. What do you think?
Hello Thomas, >> After looking at it again, here is an update which ensure 64 bits on >> epoch_shift computation. > > The code in pgbench 13 aggregates into buckets that begin on the > boundaries of wall clock seconds, because it is triggered by changes > in time_t. In the current patch, we aggregate data into buckets that > begin on the boundaries of whole seconds since start_time. Those > boundaries are not aligned with wall clock seconds, and yet we print > out the times rounded to wall clock seconds. Yes, I noticed this small changed, and did not feel it was an issue at the time. I thought of doing something like the format change your are suggesting. However people would like it and it would need to be discussed, hence it stayed that way… People have scripts to process log files and do not like format changes, basically. > Perhaps we should round the start time of the first aggregate down to > the nearest wall clock second? Yep, but that requires a common start point for all threads. Why not. > That would mean that the first aggregate misses a part of a second (as > it does in pgbench 13), but all later aggregates begin at the time we > write in the log (as it does in pgbench 13). That is, if we log > 1625115080 we mean "all results >= 1625115080.000000". It's a small > detail, but it could be important for someone trying to correlate the > log with other data. What do you think? I think that you are right. The simplest way is to align on whole seconds, which is easier than changing the format and have complaints about that, or not align and have complaints about the timestamp being rounded. Attached a v14 in that spirit. -- Fabien.
Attachment
On Thu, Jul 1, 2021 at 8:50 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Attached a v14 in that spirit. Thanks! This doesn't seem to address the complaint, though. Don't you need to do something like this? (See also attached.) + initStats(&aggs, start - (start + epoch_shift) % 1000000); That should reproduce what pgbench 13 does implicitly when it uses time(NULL). Namely, it rewinds to the start of the current *wall clock* second, so that all future aggregates also start at round number wall clock times, at the cost of making the first aggregate miss out on a fraction of a second. I wonder if some of the confusion on the other thread about the final aggregate[1] was due to this difference. By rounding down, we get a "head start" (because the first aggregate is short), so we usually manage to record the expected number of aggregates before time runs out. It's a race though. Your non-rounding version was more likely to lose the race and finish before the final expected aggregate was logged, so you added code to force a final aggregate to be logged. Do I have this right? I'm not entirely sure how useful a partial final aggregate is (it's probably one you have to throw away, like the first one, no? Isn't it better if we only have to throw away the first one?). I'm not sure, but if we keep that change, a couple of very minor nits: I found the "tx" parameter name a little confusing. Do you think it's clearer if we change it to "final" (with inverted sense)? For the final aggregate, shouldn't we call doLog() only if agg->cnt > 0? I think I'd be inclined to take that change back out though, making this patch very small and net behaviour like pgbench 13, if you agree with my explanation for why you had to add it and why it's not actually necessary with the fixed rounding shown above. (And perhaps in v15 we might consider other ideas like using hi-res times in the log and not rounding, etc, a topic for later.) I don't really see the value in the test that checks that $delay falls in the range 1.5s - 2.5s and then ignores the result. If it hangs forever, we'll find out about it, and otherwise no human or machine will ever care about that test. I removed it from this version. Were you really attached to it? I made some very minor language tweaks in comments (we don't usually shorten "benchmark" to "bench" in English, "series" keeps the -s in singular (blame the Romans), etc). I think we should make it clear when we mean the *Unix* epoch (a comment "switch to epoch" isn't meaningful on its own, to me at least), so I changed that in a few places. [1] https://www.postgresql.org/message-id/alpine.DEB.2.22.394.2106102323310.3698412%40pseudo
Attachment
On Thu, Jun 17, 2021 at 7:18 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > I'm not sure we have transaction lasts for very short time that > nanoseconds matters. > Nanoseconds may not matter yet, but they could be handy when for example we want to determine the order of parallel query executions. We are less than an order of magnitude away from being able to do 1M inserts/updates/deletes per second, so microseconds already are not always 100% reliable. We could possibly move to using LSNs fetched as part of the queries for this case, but this will surely introduce more problems than it solves :) Cheers ----- Hannu Krosing Google Cloud - We have a long list of planned contributions and we are hiring. Contact me if interested.
Hello Thomas, > Thanks! This doesn't seem to address the complaint, though. Don't > you need to do something like this? (See also attached.) > > + initStats(&aggs, start - (start + epoch_shift) % 1000000); ISTM that this is: (start + epoch_shift) / 1000000 * 1000000 > That should reproduce what pgbench 13 does implicitly when it uses > time(NULL). I understand that you are shifting the aggregate internal start time to epoch, however ISTM that other points in the program are not shifted consistently with this, eg the while comparison in doLog? Also if start time is log shifted, then it should not be shifted again when printed (in logAgg). Attached version tries to be consistent. > Namely, it rewinds to the start of the current *wall clock* second, so > that all future aggregates also start at round number wall clock times, > at the cost of making the first aggregate miss out on a fraction of a > second. ISTM that it was already wall clock time, but not epoch wall clock. I'm okay with realigning aggregates on full seconds. > I wonder if some of the confusion on the other thread about the final > aggregate[1] was due to this difference. Dunno. The parallel execution with thread is a pain when handling details. > By rounding down, we get a "head start" (because the first aggregate is > short), so we usually manage to record the expected number of aggregates > before time runs out. Fine with me if everything is consistent. > It's a race though. Your non-rounding version was more likely > to lose the race and finish before the final expected aggregate was > logged, so you added code to force a final aggregate to be logged. ISTM that we always want to force because some modes can have low tps, and the aggregates should be "full". > Do I have this right? I'm not entirely sure how useful a partial final > aggregate is If you ask for 10 seconds run with 1 aggregate per second, you expect to see (at least, about) 10 lines, and I want to ensure that, otherwise people will ask questions, tools will have to look for special cases, missing rows, whatever, and it will be a pain there. We want to produce something simple, consistent, reliable, that tools can depend on. > (it's probably one you have to throw away, like the first one, no? > Isn't it better if we only have to throw away the first one?). This should be the user decision to drop it or not, not the tool producing it, IMO. > I'm not sure, but if we keep that change, a couple of very minor nits: > I found the "tx" parameter name a little confusing. Do you think it's > clearer if we change it to "final" (with inverted sense)? I agree that tx is not a very good name, but the inversion does not look right to me. The "normal" behavior is > For the final aggregate, shouldn't we call doLog() only if agg->cnt > 0? No, I think that we should want to have all aggregates, even with zeros, so that the user can expect a deterministic number of lines. > I think I'd be inclined to take that change back out though, making this > patch very small and net behaviour like pgbench 13, if you agree with my > explanation for why you had to add it and why it's not actually > necessary with the fixed rounding shown above. (And perhaps in v15 we > might consider other ideas like using hi-res times in the log and not > rounding, etc, a topic for later.) I think that I'm moslty okay. > I don't really see the value in the test that checks that $delay falls > in the range 1.5s - 2.5s and then ignores the result. If it hangs > forever, we'll find out about it, and otherwise no human or machine > will ever care about that test. I removed it from this version. Were > you really attached to it? YES, REALLY! It would just have caught quite a few of the regressions we are trying to address here. I want it there even if ignored because I'll look for it to avoid regressions in the future. If the test is actually removed, recreating it is a pain. If you really want to disactivate it, use if(0) but PLEASE let it there so that it can ne reactivated for tests very simply, not bad maintaining some test outside of the tree. Also, if farm logs show that it is okay on all animals, it can be switched on by removing the ignore trick. > I made some very minor language tweaks in comments (we don't usually > shorten "benchmark" to "bench" in English, "series" keeps the -s in > singular (blame the Romans), etc). Thanks! My English is kind of fuzzy in the details:-) > I think we should make it clear when we mean the *Unix* epoch (a > comment "switch to epoch" isn't meaningful on its own, to me at > least), so I changed that in a few places. Ok. Attached v16: - tries to be consistent wrt epoch & aggregates, aligning to Unix epoch as you suggested. - renames tx as accumulate, but does not invert it. - always shows aggregates so that the user can depend on the output, even if stats are zero, because ISTM that clever must be avoided. - put tests back, even if ignored, because I really want them available easily. When/if you get to commit this patch, eventually, do not forget that I'm pushing forward fixes contributed by others, including Kyotaro Horiguchi and Yugo Nagata. -- Fabien.
Attachment
Hello Hannu, >> I'm not sure we have transaction lasts for very short time that >> nanoseconds matters. > > Nanoseconds may not matter yet, but they could be handy when for > example we want to determine the order of parallel query executions. > > We are less than an order of magnitude away from being able to do 1M > inserts/updates/deletes per second, so microseconds already are not > always 100% reliable. ISTM that 1M tps would be with really a lot of parallel clients, thus the latency of each would be quite measurable, so that µs would still make sense for measuring their performance? If an actual network is involved, the network latency is already 100-200 µs even before executing any code. -- Fabien.
On Fri, Jul 9, 2021 at 5:15 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Thanks! This doesn't seem to address the complaint, though. Don't > > you need to do something like this? (See also attached.) > > > > + initStats(&aggs, start - (start + epoch_shift) % 1000000); > > ISTM that this is: (start + epoch_shift) / 1000000 * 1000000 Sure, it's equivalent if you also change doLog() to match the change in epoch, as your v16 did. Your version seems fine to me. The key thing is that we only start new aggregates when the Unix epoch time crosses over a XXX.000000 boundary, since we're only logging the XXX part. That's much like(*) pgbench13, and I'm happy however you want to code that. Case closed on that point. Thanks! > > Isn't it better if we only have to throw away the first one?). > > This should be the user decision to drop it or not, not the tool producing > it, IMO. Let me try this complaint again. It's admittedly a minor detail, but my goal in this thread is to match pgbench13's treatment of aggregate boundaries and partial aggregates, so that we can close the open item for 14 with a minimal fix that doesn't make any unnecessary changes. Discussions about improvements or pre-existing problems can wait. First, let me demonstrate that pgbench13 throws away final partial aggregates. I hacked REL_13_STABLE like so: if (agg_interval > 0) { /* log aggregated but not yet reported transactions */ +fprintf(thread->logfile, "XXX log aggregated but not yet reported transactions: aggs.cnt = %ld\n", aggs.cnt); doLog(thread, state, &aggs, false, 0, 0); } fclose(thread->logfile); I ran pgbench -T6 --aggregate-interval 1 -l -S postgres, and it produced a log file containing: ===BOF=== 1625782245 7974 428574 24998642 49 683 1625782246 19165 998755 53217669 49 310 1625782247 19657 998868 51577774 47 189 1625782248 19707 998898 51660408 47 189 1625782249 19969 998867 50454999 48 156 1625782250 19845 998877 51071013 47 210 XXX log aggregated but not yet reported transactions: aggs.cnt = 10988 ===EOF=== We can see three interesting things: 1. The first aggregate is partial (only ~7k transactions, because it started partway through a second). Users have to throw away that first aggregate because its statistics are noise. That is the price to pay to have later aggregates start at the time they say. 2. We get 5 more full aggregates (~19k transactions). That's a total of 6 aggregates, which makes intuitive sense with -T6. 3. At the end, the extra call to doLog() did nothing, and yet cnt = 10988. That's data we're throwing away, because Unix epoch time has not advanced far enough to reach a new aggregate start time (it's not impossible, but probability is very low). Checking the commit log, I see that the code that claims to log the final aggregate came from commit b6037664960 (2016); apparently it doesn't do what the comments seem to think it does (did that ever work? Should perhaps be cleaned up, but separately, it's not an open item for 14). Now, in your recent patches you force that final partial aggregate to be logged in that case with that new flag mechanism, as we can see: ===BOF=== 1625783726 11823 609143 32170321 48 549 1625783727 19530 998995 52383115 47 210 1625783728 19468 999026 52208898 46 181 1625783729 19826 999001 51238427 46 185 1625783730 19195 999110 52841674 49 172 1625783731 18572 998992 56028876 48 318 1625783732 7484 388620 20951100 48 316 ===EOF=== 1. We get a partial initial aggregate just like 13. That sacrificial aggregate helps us synchronize the rest of the aggregates with the logged timestamps. Good. 2. We get 5 full aggregates (~19k transactions) just like 13. As in 13, that's quite likely, because the first one was "short" so we almost always reach the end of the 6th one before -T6 runs out of sand. Good. 3. We get a new partial aggregate at the end. Users would have to throw that one away too. This is not a big deal, but it's a change in behaviour that should be discussed. Given that that last one is a waste of pixels and a (so far) unjustified change in behaviour, I propose, this time with a little more force and an updated patch, that we abandon that part of the change. I submit that you only added that because your earlier patches didn't have the partial aggregate at the start, so then it often didn't produce the 6th line of output. So, you forced it to log whatever it had left, even though the full time hadn't elapsed yet. Now we don't need that. The patch and resulting code are simpler, and the user experience matches 13. See attached. > > I don't really see the value in the test that checks that $delay falls > > in the range 1.5s - 2.5s and then ignores the result. If it hangs > > forever, we'll find out about it, and otherwise no human or machine > > will ever care about that test. I removed it from this version. Were > > you really attached to it? > > YES, REALLY! It would just have caught quite a few of the regressions we > are trying to address here. I want it there even if ignored because I'll > look for it to avoid regressions in the future. If the test is actually > removed, recreating it is a pain. If you really want to disactivate it, > use if(0) but PLEASE let it there so that it can ne reactivated for tests > very simply, not bad maintaining some test outside of the tree. Ok, you win :-) > When/if you get to commit this patch, eventually, do not forget that I'm > pushing forward fixes contributed by others, including Kyotaro Horiguchi > and Yugo Nagata. Fixed, thanks. * I say "much like" and not "exactly like"; of course there may be a subtle difference if ntpd adjusts your clock while a benchmark is running. Something must give, and 14's coding prefers to keep the duration of aggregates stable at exactly X seconds according to the high precision timer, so that the statistics it reports are meaningful, but 13 prefers to keep the timestamps it logs in sync with other software using gettimeofday() and will give you a weird short or long aggregate to achieve that (producing bad statistics). I can see arguments for both but I'm OK with that change and I see that it is in line with your general goal of switching to modern accurate time interfaces.
Attachment
Hello Thomas, >>> Isn't it better if we only have to throw away the first one?). >> >> This should be the user decision to drop it or not, not the tool >> producing it, IMO. > > Let me try this complaint again. [...] I understand your point. For me removing silently the last bucket is not right because the sum of aggregates does not match the final grand total. This change is intentional and borders on a bug fix, which is what this patch was doing, even if it is also a small behavioral change: We should want the detailed and final reports in agreement. I do agree that users should probably ignore the first and last lines. > See attached. Works for me: patch applies, global and local check ok. I'm fine with it. If it was me, I'd still show the last bucket, but it does not matter much. -- Fabien.
On Fri, Jul 9, 2021 at 5:14 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Works for me: patch applies, global and local check ok. I'm fine with it. I hoped we were done here but I realised that your check for 1-3 log lines will not survive the harsh environment of the build farm. Adding sleep(2) before the final doLog() confirms that. I had two ideas: 1. Give up and expect 1-180 lines. (180s is the current timeout tolerance used elsewhere to deal with swamped/valgrind/swapping/microsd computers, after a few rounds of inflation, so if you need an arbitrary large number to avoid buildfarm measles that's my suggestion....) 2. Change doLog() to give up after end_time. But then I think you'd need to make it 0-3 :-( I think the logging could be tightened up to work the way you expected in future work, though. Perhaps we could change all logging to work with transaction start time instead of log-writing time, which doLog() should receive. If you never start a transaction after end_time, then there can never be an aggregate that begins after that, and the whole thing becomes more deterministic. That kind of alignment of aggregate timing with whole-run timing could also get rid of those partial aggregates completely. But that's an idea for 15. So I think we should do 1 for now. Objections or better ideas?
>> Works for me: patch applies, global and local check ok. I'm fine with it. > > I hoped we were done here but I realised that your check for 1-3 log > lines will not survive the harsh environment of the build farm. > Adding sleep(2) before the final doLog() confirms that. I had two > ideas: > > 1. Give up and expect 1-180 lines. (180s is the current timeout > tolerance used elsewhere to deal with > swamped/valgrind/swapping/microsd computers, after a few rounds of > inflation, so if you need an arbitrary large number to avoid buildfarm > measles that's my suggestion....) > 2. Change doLog() to give up after end_time. But then I think you'd > need to make it 0-3 :-( > > I think the logging could be tightened up to work the way you expected > in future work, though. Perhaps we could change all logging to work > with transaction start time instead of log-writing time, which doLog() > should receive. If you never start a transaction after end_time, then > there can never be an aggregate that begins after that, and the whole > thing becomes more deterministic. That kind of alignment of aggregate > timing with whole-run timing could also get rid of those partial > aggregates completely. But that's an idea for 15. > > So I think we should do 1 for now. Objections or better ideas? At least, we now that it is too much. What about moving the test as is in the TODO section with a comment, next to the other one, for now? I hesitated to suggest that before for the above risks, but I was very naively optimistic that it may pass because the test is not that too demanding. Someone suggested to have a "real-time" configure switch to enable/disable time-sensitive tests. -- Fabien.
Hello again, >> I hoped we were done here but I realised that your check for 1-3 log >> lines will not survive the harsh environment of the build farm. >> Adding sleep(2) before the final doLog() confirms that. I had two >> ideas: >> So I think we should do 1 for now. Objections or better ideas? > > At least, we now that it is too much. I misread your point. You think that it should fail, but it is not tried yet. I'm rather optimistic that it should not fail, but I'm okay with averting the risk anyway. > What about moving the test as is in the TODO section with a comment, next to > the other one, for now? I stand by this solution which should allow to get some data from the field, as v18 attached. If all is green then the TODO could be removed later. -- Fabien.
Attachment
Hi Fabien, I committed the code change without the new TAP tests, because I didn't want to leave the open item hanging any longer. As for the test, ... On Sat, Jul 10, 2021 at 9:36 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> I hoped we were done here but I realised that your check for 1-3 log > >> lines will not survive the harsh environment of the build farm. > >> Adding sleep(2) before the final doLog() confirms that. I had two > >> ideas: > I misread your point. You think that it should fail, but it is not > tried yet. I'm rather optimistic that it should not fail, but I'm okay > with averting the risk anyway. ... I know it can fail, and your v18 didn't fix that, because... +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, ^ | ... this range can be exceeded. That's because extra aggregations are created based on doLog() reading the clock after a transaction is finished, entirely independently of the -T mechanism deciding when to stop the benchmark, and potentially many seconds later in adverse conditions. As I mentioned, you can see it fail with your own eyes if you hack the code like so: if (agg_interval > 0) { + /* + * XXX: simulate an overloaded raspberry pi swapping to a microsd + * card or other random delays as we can expect in the build farm + */ + sleep(3); /* log aggregated but not yet reported transactions */ doLog(thread, state, &aggs, false, 0, 0); } > I stand by this solution which should allow to get some data from the > field, as v18 attached. If all is green then the TODO could be removed > later. I suspect the number of aggregates could be made deterministic, as I described in an earlier message. What do you think about doing something like that first for the next release, before trying to add assertions about the number of aggregates? I'm with you on the importance of testing, but it seems better to start by making the thing more testable.
Hello Thomas, > I committed the code change without the new TAP tests, because I > didn't want to leave the open item hanging any longer. Ok. Good. > As for the test, ... [...] Argh, so there are no tests that would have caught the regressions:-( > ... I know it can fail, and your v18 didn't fix that, because... > > +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3, > ... this range can be exceeded. Indeed. I meant to move that one in the TODO section as well, not just the previous call, so that all time-sensitive tests are fully ignored but reported, which would be enough for me. > I suspect the number of aggregates could be made deterministic, as I > described in an earlier message. What do you think about doing > something like that first for the next release, before trying to add > assertions about the number of aggregates? I think that last time I did something to get more deterministic results in pgbench, which involved a few lines of hocus-pocus in pgbench, the patch got rejected:-) An "ignored" tests looked like a good compromise to check how things are going in the farm and to be able to check for more non regressions when developing pgbench, without introducing behavioral changes. > I'm with you on the importance of testing, but it seems better to start > by making the thing more testable. I'm used to my test patches being rejected, including modifying pgbench behavior to make it more testable. Am I mad enough to retry? Maybe, maybe not. Attached the fully "ignored" version of the time features test as a patch. -- Fabien.
Attachment
> On 11 Jul 2021, at 15:07, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > Attached the fully "ignored" version of the time features test as a patch. This version of the patch is failing to apply on top of HEAD, can you please submit a rebased version? -- Daniel Gustafsson https://vmware.com/
> On 4 Nov 2021, at 13:38, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 11 Jul 2021, at 15:07, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > >> Attached the fully "ignored" version of the time features test as a patch. > > This version of the patch is failing to apply on top of HEAD, can you please > submit a rebased version? I'm marking this patch Returned with Feedback, please feel free to resubmit this when there is an updated version of the patch available. -- Daniel Gustafsson https://vmware.com/