Re: pgbench logging broken by time logic changes - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: pgbench logging broken by time logic changes |
Date | |
Msg-id | CA+hUKGKCQ4DUo1diCg6SPhnpaYWF37uEPbHhy=FKv4Cnuqn8_g@mail.gmail.com Whole thread Raw |
In response to | Re: pgbench logging broken by time logic changes (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: pgbench logging broken by time logic changes
|
List | pgsql-hackers |
On Fri, Jul 9, 2021 at 5:15 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > Thanks! This doesn't seem to address the complaint, though. Don't > > you need to do something like this? (See also attached.) > > > > + initStats(&aggs, start - (start + epoch_shift) % 1000000); > > ISTM that this is: (start + epoch_shift) / 1000000 * 1000000 Sure, it's equivalent if you also change doLog() to match the change in epoch, as your v16 did. Your version seems fine to me. The key thing is that we only start new aggregates when the Unix epoch time crosses over a XXX.000000 boundary, since we're only logging the XXX part. That's much like(*) pgbench13, and I'm happy however you want to code that. Case closed on that point. Thanks! > > Isn't it better if we only have to throw away the first one?). > > This should be the user decision to drop it or not, not the tool producing > it, IMO. Let me try this complaint again. It's admittedly a minor detail, but my goal in this thread is to match pgbench13's treatment of aggregate boundaries and partial aggregates, so that we can close the open item for 14 with a minimal fix that doesn't make any unnecessary changes. Discussions about improvements or pre-existing problems can wait. First, let me demonstrate that pgbench13 throws away final partial aggregates. I hacked REL_13_STABLE like so: if (agg_interval > 0) { /* log aggregated but not yet reported transactions */ +fprintf(thread->logfile, "XXX log aggregated but not yet reported transactions: aggs.cnt = %ld\n", aggs.cnt); doLog(thread, state, &aggs, false, 0, 0); } fclose(thread->logfile); I ran pgbench -T6 --aggregate-interval 1 -l -S postgres, and it produced a log file containing: ===BOF=== 1625782245 7974 428574 24998642 49 683 1625782246 19165 998755 53217669 49 310 1625782247 19657 998868 51577774 47 189 1625782248 19707 998898 51660408 47 189 1625782249 19969 998867 50454999 48 156 1625782250 19845 998877 51071013 47 210 XXX log aggregated but not yet reported transactions: aggs.cnt = 10988 ===EOF=== We can see three interesting things: 1. The first aggregate is partial (only ~7k transactions, because it started partway through a second). Users have to throw away that first aggregate because its statistics are noise. That is the price to pay to have later aggregates start at the time they say. 2. We get 5 more full aggregates (~19k transactions). That's a total of 6 aggregates, which makes intuitive sense with -T6. 3. At the end, the extra call to doLog() did nothing, and yet cnt = 10988. That's data we're throwing away, because Unix epoch time has not advanced far enough to reach a new aggregate start time (it's not impossible, but probability is very low). Checking the commit log, I see that the code that claims to log the final aggregate came from commit b6037664960 (2016); apparently it doesn't do what the comments seem to think it does (did that ever work? Should perhaps be cleaned up, but separately, it's not an open item for 14). Now, in your recent patches you force that final partial aggregate to be logged in that case with that new flag mechanism, as we can see: ===BOF=== 1625783726 11823 609143 32170321 48 549 1625783727 19530 998995 52383115 47 210 1625783728 19468 999026 52208898 46 181 1625783729 19826 999001 51238427 46 185 1625783730 19195 999110 52841674 49 172 1625783731 18572 998992 56028876 48 318 1625783732 7484 388620 20951100 48 316 ===EOF=== 1. We get a partial initial aggregate just like 13. That sacrificial aggregate helps us synchronize the rest of the aggregates with the logged timestamps. Good. 2. We get 5 full aggregates (~19k transactions) just like 13. As in 13, that's quite likely, because the first one was "short" so we almost always reach the end of the 6th one before -T6 runs out of sand. Good. 3. We get a new partial aggregate at the end. Users would have to throw that one away too. This is not a big deal, but it's a change in behaviour that should be discussed. Given that that last one is a waste of pixels and a (so far) unjustified change in behaviour, I propose, this time with a little more force and an updated patch, that we abandon that part of the change. I submit that you only added that because your earlier patches didn't have the partial aggregate at the start, so then it often didn't produce the 6th line of output. So, you forced it to log whatever it had left, even though the full time hadn't elapsed yet. Now we don't need that. The patch and resulting code are simpler, and the user experience matches 13. See attached. > > I don't really see the value in the test that checks that $delay falls > > in the range 1.5s - 2.5s and then ignores the result. If it hangs > > forever, we'll find out about it, and otherwise no human or machine > > will ever care about that test. I removed it from this version. Were > > you really attached to it? > > YES, REALLY! It would just have caught quite a few of the regressions we > are trying to address here. I want it there even if ignored because I'll > look for it to avoid regressions in the future. If the test is actually > removed, recreating it is a pain. If you really want to disactivate it, > use if(0) but PLEASE let it there so that it can ne reactivated for tests > very simply, not bad maintaining some test outside of the tree. Ok, you win :-) > When/if you get to commit this patch, eventually, do not forget that I'm > pushing forward fixes contributed by others, including Kyotaro Horiguchi > and Yugo Nagata. Fixed, thanks. * I say "much like" and not "exactly like"; of course there may be a subtle difference if ntpd adjusts your clock while a benchmark is running. Something must give, and 14's coding prefers to keep the duration of aggregates stable at exactly X seconds according to the high precision timer, so that the statistics it reports are meaningful, but 13 prefers to keep the timestamps it logs in sync with other software using gettimeofday() and will give you a weird short or long aggregate to achieve that (producing bad statistics). I can see arguments for both but I'm OK with that change and I see that it is in line with your general goal of switching to modern accurate time interfaces.
Attachment
pgsql-hackers by date: