Thread: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm
Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm
From
Alexey Kondratov
Date:
Hi Hackers, Today I have accidentally noticed that autoprewarm feature of pg_prewarm used TimestampDifference()'s results in a wrong way. First, it used *seconds* result from it as a *milliseconds*. It was causing it to make dump file autoprewarm.blocks ~every second with default setting of autoprewarm_interval = 300s. Here is a log part with debug output in this case: ``` 2020-11-09 19:09:00.162 MSK [85328] LOG: dumping autoprewarm.blocks 2020-11-09 19:09:01.161 MSK [85328] LOG: dumping autoprewarm.blocks 2020-11-09 19:09:02.160 MSK [85328] LOG: dumping autoprewarm.blocks 2020-11-09 19:09:03.159 MSK [85328] LOG: dumping autoprewarm.blocks ``` After fixing this issue I have noticed that it still dumps blocks twice at each timeout (here I set autoprewarm_interval to 15s): ``` 2020-11-09 19:18:59.692 MSK [85662] LOG: dumping autoprewarm.blocks 2020-11-09 19:18:59.700 MSK [85662] LOG: dumping autoprewarm.blocks 2020-11-09 19:19:14.694 MSK [85662] LOG: dumping autoprewarm.blocks 2020-11-09 19:19:14.704 MSK [85662] LOG: dumping autoprewarm.blocks ``` This happens because at timeout time we were using continue, but actually we still have to wait the entire autoprewarm_interval after successful dumping. I have fixed both issues in the attached patches and also added a minimalistic tap test as a first one to verify that this automatic damping still works after refactoring. I put Robert into CC, since he is an author of this feature. What do you think? Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Attachment
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm
From
Jeevan Ladhe
Date:
Hi,
On Mon, Nov 9, 2020 at 10:15 PM Alexey Kondratov <a.kondratov@postgrespro.ru> wrote:
Hi Hackers,
Today I have accidentally noticed that autoprewarm feature of pg_prewarm
used TimestampDifference()'s results in a wrong way.
First, it used *seconds* result from it as a *milliseconds*. It was
causing it to make dump file autoprewarm.blocks ~every second with
default setting of autoprewarm_interval = 300s.
I had a look at this, and I agree that this is an issue. I also had a look at
the patch 0002, and the patch looks good to me.
In patch 0003 there is a typo:
+ /* We have to sleep even after a successfull dump */
s/successfull/successful
Regards,
Jeevan Ladhe
Alexey Kondratov <a.kondratov@postgrespro.ru> writes: > After fixing this issue I have noticed that it still dumps blocks twice > at each timeout (here I set autoprewarm_interval to 15s): > ... > This happens because at timeout time we were using continue, but > actually we still have to wait the entire autoprewarm_interval after > successful dumping. I don't think your 0001 is correct. It would be okay if apw_dump_now() could be counted on to take negligible time, but we shouldn't assume that should we? I agree that the "continue" seems a bit bogus, because it's skipping the ResetLatch call at the bottom of the loop; it's not quite clear to me whether that's a good thing or not. But the general idea of the existing code seems to be to loop around and make a fresh calculation of how-long-to-wait, and that doesn't seem wrong. 0002 seems like a pretty clear bug fix, though I wonder if this is exactly what we want to do going forward. It seems like a very large fraction of the callers of TimestampDifference would like to have the value in msec, which means we're doing a whole lot of expensive and error-prone arithmetic to break down the difference to sec/usec and then put it back together again. Let's get rid of that by inventing, say TimestampDifferenceMilliseconds(...). BTW, I see another bug of a related ilk. Look what postgres_fdw/connection.c is doing: TimestampDifference(now, endtime, &secs, µsecs); /* To protect against clock skew, limit sleep to one minute. */ cur_timeout = Min(60000, secs * USECS_PER_SEC + microsecs); /* Sleep until there's something to do */ wc = WaitLatchOrSocket(MyLatch, WL_LATCH_SET | WL_SOCKET_READABLE | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, PQsocket(conn), cur_timeout, PG_WAIT_EXTENSION); WaitLatchOrSocket's timeout is measured in msec not usec. I think the comment about "clock skew" is complete BS, and the Min() calculation was put in as a workaround by somebody observing that the sleep waited too long, but not understanding why. regards, tom lane
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm
From
Alexey Kondratov
Date:
On 2020-11-09 21:53, Tom Lane wrote: > Alexey Kondratov <a.kondratov@postgrespro.ru> writes: >> After fixing this issue I have noticed that it still dumps blocks >> twice >> at each timeout (here I set autoprewarm_interval to 15s): >> ... >> This happens because at timeout time we were using continue, but >> actually we still have to wait the entire autoprewarm_interval after >> successful dumping. > > I don't think your 0001 is correct. It would be okay if apw_dump_now() > could be counted on to take negligible time, but we shouldn't assume > that should we? > Yes, it seems so, if I understand you correctly. I had a doubt about possibility of pg_ctl to exit earlier than a dumping process. Now I added an explicit wait for dump file into test. > I agree that the "continue" seems a bit bogus, because it's skipping > the ResetLatch call at the bottom of the loop; it's not quite clear > to me whether that's a good thing or not. But the general idea of > the existing code seems to be to loop around and make a fresh > calculation > of how-long-to-wait, and that doesn't seem wrong. I have left the last patch intact, since it resolves the 'double dump' issue, but I agree with нщгк point about existing logic of the code, although it is a bit broken. So I have to think more about how to fix it in a better way. > 0002 seems like a pretty clear bug fix, though I wonder if this is > exactly > what we want to do going forward. It seems like a very large fraction > of > the callers of TimestampDifference would like to have the value in > msec, > which means we're doing a whole lot of expensive and error-prone > arithmetic to break down the difference to sec/usec and then put it > back together again. Let's get rid of that by inventing, say > TimestampDifferenceMilliseconds(...). Yeah, I get into this problem after a bug in another extension — pg_wait_sampling. I have attached 0002, which implements TimestampDifferenceMilliseconds(), so 0003 just uses this new function to solve the initial issues. If it looks good to you, then we can switch all similar callers to it. > BTW, I see another bug of a related ilk. Look what > postgres_fdw/connection.c is doing: > > TimestampDifference(now, endtime, &secs, µsecs); > > /* To protect against clock skew, limit sleep to one > minute. */ > cur_timeout = Min(60000, secs * USECS_PER_SEC + > microsecs); > > /* Sleep until there's something to do */ > wc = WaitLatchOrSocket(MyLatch, > WL_LATCH_SET | > WL_SOCKET_READABLE | > WL_TIMEOUT | > WL_EXIT_ON_PM_DEATH, > PQsocket(conn), > cur_timeout, PG_WAIT_EXTENSION); > > WaitLatchOrSocket's timeout is measured in msec not usec. I think the > comment about "clock skew" is complete BS, and the Min() calculation > was > put in as a workaround by somebody observing that the sleep waited too > long, but not understanding why. I wonder how many troubles one can get with all these unit conversions. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Attachment
Alexey Kondratov <a.kondratov@postgrespro.ru> writes: > On 2020-11-09 21:53, Tom Lane wrote: >> 0002 seems like a pretty clear bug fix, though I wonder if this is exactly >> what we want to do going forward. It seems like a very large fraction of >> the callers of TimestampDifference would like to have the value in msec, >> which means we're doing a whole lot of expensive and error-prone >> arithmetic to break down the difference to sec/usec and then put it >> back together again. Let's get rid of that by inventing, say >> TimestampDifferenceMilliseconds(...). > Yeah, I get into this problem after a bug in another extension — > pg_wait_sampling. I have attached 0002, which implements > TimestampDifferenceMilliseconds(), so 0003 just uses this new function > to solve the initial issues. If it looks good to you, then we can switch > all similar callers to it. Yeah, let's move forward with that --- in fact, I'm inclined to back-patch it. (Not till the current release cycle is done, though. I don't find this important enough to justify a last-moment patch.) BTW, I wonder if we shouldn't make TimestampDifferenceMilliseconds round any fractional millisecond up rather than down. Rounding down seems to create a hazard of uselessly waking just before the delay is completed. Better to wake just after. I still think your 0001 is fishy, but don't have time today to stare at it more closely. regards, tom lane
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm
From
Alexey Kondratov
Date:
On 2020-11-09 23:25, Tom Lane wrote: > Alexey Kondratov <a.kondratov@postgrespro.ru> writes: >> On 2020-11-09 21:53, Tom Lane wrote: >>> 0002 seems like a pretty clear bug fix, though I wonder if this is >>> exactly >>> what we want to do going forward. It seems like a very large >>> fraction of >>> the callers of TimestampDifference would like to have the value in >>> msec, >>> which means we're doing a whole lot of expensive and error-prone >>> arithmetic to break down the difference to sec/usec and then put it >>> back together again. Let's get rid of that by inventing, say >>> TimestampDifferenceMilliseconds(...). > >> Yeah, I get into this problem after a bug in another extension — >> pg_wait_sampling. I have attached 0002, which implements >> TimestampDifferenceMilliseconds(), so 0003 just uses this new function >> to solve the initial issues. If it looks good to you, then we can >> switch >> all similar callers to it. > > Yeah, let's move forward with that --- in fact, I'm inclined to > back-patch it. (Not till the current release cycle is done, though. > I don't find this important enough to justify a last-moment patch.) > > BTW, I wonder if we shouldn't make TimestampDifferenceMilliseconds > round any fractional millisecond up rather than down. Rounding down > seems to create a hazard of uselessly waking just before the delay is > completed. Better to wake just after. > Yes, it make sense. I have changed TimestampDifferenceMilliseconds() to round result up if there is a reminder. After looking on the autoprewarm code more closely I have realised that this 'double dump' issues was not an issues at all. I have just misplaced a debug elog(), so its second output in the log was only indicating that we calculated delay_in_ms one more time. Actually, even with wrong calculation of delay_in_ms the only problem was that we were busy looping with ~1 second interval instead of waiting on latch. It is still a buggy behaviour, but much less harmful than I have originally thought. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
Attachment
Alexey Kondratov <a.kondratov@postgrespro.ru> writes: > After looking on the autoprewarm code more closely I have realised that > this 'double dump' issues was not an issues at all. I have just > misplaced a debug elog(), so its second output in the log was only > indicating that we calculated delay_in_ms one more time. Ah --- that explains why I couldn't see a problem. I've pushed 0001+0002 plus some followup work to fix other places that could usefully use TimestampDifferenceMilliseconds(). I have not done anything with 0003 (the TAP test for pg_prewarm), and will leave that to the judgment of somebody who's worked with pg_prewarm before. To me it looks like it's not really testing things very carefully at all; on the other hand, we have exactly zero test coverage of that module today, so maybe something is better than nothing. regards, tom lane
Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm
From
Alexey Kondratov
Date:
On 2020-11-11 06:59, Tom Lane wrote: > Alexey Kondratov <a.kondratov@postgrespro.ru> writes: >> After looking on the autoprewarm code more closely I have realised >> that >> this 'double dump' issues was not an issues at all. I have just >> misplaced a debug elog(), so its second output in the log was only >> indicating that we calculated delay_in_ms one more time. > > Ah --- that explains why I couldn't see a problem. > > I've pushed 0001+0002 plus some followup work to fix other places > that could usefully use TimestampDifferenceMilliseconds(). I have > not done anything with 0003 (the TAP test for pg_prewarm), and will > leave that to the judgment of somebody who's worked with pg_prewarm > before. To me it looks like it's not really testing things very > carefully at all; on the other hand, we have exactly zero test > coverage of that module today, so maybe something is better than > nothing. > Great, thank you for generalisation of the issue and working on it. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company