Thread: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

From
Alexey Kondratov
Date:
Hi Hackers,

Today I have accidentally noticed that autoprewarm feature of pg_prewarm 
used TimestampDifference()'s results in a wrong way.

First, it used *seconds* result from it as a *milliseconds*. It was 
causing it to make dump file autoprewarm.blocks ~every second with 
default setting of autoprewarm_interval = 300s.

Here is a log part with debug output in this case:

```
2020-11-09 19:09:00.162 MSK [85328] LOG:  dumping autoprewarm.blocks
2020-11-09 19:09:01.161 MSK [85328] LOG:  dumping autoprewarm.blocks
2020-11-09 19:09:02.160 MSK [85328] LOG:  dumping autoprewarm.blocks
2020-11-09 19:09:03.159 MSK [85328] LOG:  dumping autoprewarm.blocks
```

After fixing this issue I have noticed that it still dumps blocks twice 
at each timeout (here I set autoprewarm_interval to 15s):

```
2020-11-09 19:18:59.692 MSK [85662] LOG:  dumping autoprewarm.blocks
2020-11-09 19:18:59.700 MSK [85662] LOG:  dumping autoprewarm.blocks

2020-11-09 19:19:14.694 MSK [85662] LOG:  dumping autoprewarm.blocks
2020-11-09 19:19:14.704 MSK [85662] LOG:  dumping autoprewarm.blocks
```

This happens because at timeout time we were using continue, but 
actually we still have to wait the entire autoprewarm_interval after 
successful dumping.

I have fixed both issues in the attached patches and also added a 
minimalistic tap test as a first one to verify that this automatic 
damping still works after refactoring. I put Robert into CC, since he is 
an author of this feature.

What do you think?


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment

Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

From
Jeevan Ladhe
Date:
Hi,

On Mon, Nov 9, 2020 at 10:15 PM Alexey Kondratov <a.kondratov@postgrespro.ru> wrote:
Hi Hackers,

Today I have accidentally noticed that autoprewarm feature of pg_prewarm
used TimestampDifference()'s results in a wrong way.

First, it used *seconds* result from it as a *milliseconds*. It was
causing it to make dump file autoprewarm.blocks ~every second with
default setting of autoprewarm_interval = 300s.


I had a look at this, and I agree that this is an issue. I also had a look at
the patch 0002, and the patch looks good to me.

In patch 0003 there is a typo:

+ /* We have to sleep even after a successfull dump */

s/successfull/successful


Regards,
Jeevan Ladhe
Alexey Kondratov <a.kondratov@postgrespro.ru> writes:
> After fixing this issue I have noticed that it still dumps blocks twice
> at each timeout (here I set autoprewarm_interval to 15s):
> ...
> This happens because at timeout time we were using continue, but
> actually we still have to wait the entire autoprewarm_interval after
> successful dumping.

I don't think your 0001 is correct.  It would be okay if apw_dump_now()
could be counted on to take negligible time, but we shouldn't assume
that should we?

I agree that the "continue" seems a bit bogus, because it's skipping
the ResetLatch call at the bottom of the loop; it's not quite clear
to me whether that's a good thing or not.  But the general idea of
the existing code seems to be to loop around and make a fresh calculation
of how-long-to-wait, and that doesn't seem wrong.

0002 seems like a pretty clear bug fix, though I wonder if this is exactly
what we want to do going forward.  It seems like a very large fraction of
the callers of TimestampDifference would like to have the value in msec,
which means we're doing a whole lot of expensive and error-prone
arithmetic to break down the difference to sec/usec and then put it
back together again.  Let's get rid of that by inventing, say
TimestampDifferenceMilliseconds(...).

BTW, I see another bug of a related ilk.  Look what
postgres_fdw/connection.c is doing:

                TimestampDifference(now, endtime, &secs, µsecs);

                /* To protect against clock skew, limit sleep to one minute. */
                cur_timeout = Min(60000, secs * USECS_PER_SEC + microsecs);

                /* Sleep until there's something to do */
                wc = WaitLatchOrSocket(MyLatch,
                                       WL_LATCH_SET | WL_SOCKET_READABLE |
                                       WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
                                       PQsocket(conn),
                                       cur_timeout, PG_WAIT_EXTENSION);

WaitLatchOrSocket's timeout is measured in msec not usec.  I think the
comment about "clock skew" is complete BS, and the Min() calculation was
put in as a workaround by somebody observing that the sleep waited too
long, but not understanding why.

            regards, tom lane



Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

From
Alexey Kondratov
Date:
On 2020-11-09 21:53, Tom Lane wrote:
> Alexey Kondratov <a.kondratov@postgrespro.ru> writes:
>> After fixing this issue I have noticed that it still dumps blocks 
>> twice
>> at each timeout (here I set autoprewarm_interval to 15s):
>> ...
>> This happens because at timeout time we were using continue, but
>> actually we still have to wait the entire autoprewarm_interval after
>> successful dumping.
> 
> I don't think your 0001 is correct.  It would be okay if apw_dump_now()
> could be counted on to take negligible time, but we shouldn't assume
> that should we?
> 

Yes, it seems so, if I understand you correctly. I had a doubt about 
possibility of pg_ctl to exit earlier than a dumping process. Now I 
added an explicit wait for dump file into test.

> I agree that the "continue" seems a bit bogus, because it's skipping
> the ResetLatch call at the bottom of the loop; it's not quite clear
> to me whether that's a good thing or not.  But the general idea of
> the existing code seems to be to loop around and make a fresh 
> calculation
> of how-long-to-wait, and that doesn't seem wrong.

I have left the last patch intact, since it resolves the 'double dump' 
issue, but I agree with нщгк point about existing logic of the code, 
although it is a bit broken. So I have to think more about how to fix it 
in a better way.

> 0002 seems like a pretty clear bug fix, though I wonder if this is 
> exactly
> what we want to do going forward.  It seems like a very large fraction 
> of
> the callers of TimestampDifference would like to have the value in 
> msec,
> which means we're doing a whole lot of expensive and error-prone
> arithmetic to break down the difference to sec/usec and then put it
> back together again.  Let's get rid of that by inventing, say
> TimestampDifferenceMilliseconds(...).

Yeah, I get into this problem after a bug in another extension — 
pg_wait_sampling. I have attached 0002, which implements 
TimestampDifferenceMilliseconds(), so 0003 just uses this new function 
to solve the initial issues. If it looks good to you, then we can switch 
all similar callers to it.

> BTW, I see another bug of a related ilk.  Look what
> postgres_fdw/connection.c is doing:
> 
>                 TimestampDifference(now, endtime, &secs, µsecs);
> 
>                 /* To protect against clock skew, limit sleep to one 
> minute. */
>                 cur_timeout = Min(60000, secs * USECS_PER_SEC + 
> microsecs);
> 
>                 /* Sleep until there's something to do */
>                 wc = WaitLatchOrSocket(MyLatch,
>                                        WL_LATCH_SET | 
> WL_SOCKET_READABLE |
>                                        WL_TIMEOUT | 
> WL_EXIT_ON_PM_DEATH,
>                                        PQsocket(conn),
>                                        cur_timeout, PG_WAIT_EXTENSION);
> 
> WaitLatchOrSocket's timeout is measured in msec not usec.  I think the
> comment about "clock skew" is complete BS, and the Min() calculation 
> was
> put in as a workaround by somebody observing that the sleep waited too
> long, but not understanding why.

I wonder how many troubles one can get with all these unit conversions.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment
Alexey Kondratov <a.kondratov@postgrespro.ru> writes:
> On 2020-11-09 21:53, Tom Lane wrote:
>> 0002 seems like a pretty clear bug fix, though I wonder if this is exactly
>> what we want to do going forward.  It seems like a very large fraction of
>> the callers of TimestampDifference would like to have the value in msec,
>> which means we're doing a whole lot of expensive and error-prone
>> arithmetic to break down the difference to sec/usec and then put it
>> back together again.  Let's get rid of that by inventing, say
>> TimestampDifferenceMilliseconds(...).

> Yeah, I get into this problem after a bug in another extension — 
> pg_wait_sampling. I have attached 0002, which implements 
> TimestampDifferenceMilliseconds(), so 0003 just uses this new function 
> to solve the initial issues. If it looks good to you, then we can switch 
> all similar callers to it.

Yeah, let's move forward with that --- in fact, I'm inclined to
back-patch it.  (Not till the current release cycle is done, though.
I don't find this important enough to justify a last-moment patch.)

BTW, I wonder if we shouldn't make TimestampDifferenceMilliseconds
round any fractional millisecond up rather than down.  Rounding down
seems to create a hazard of uselessly waking just before the delay is
completed.  Better to wake just after.

I still think your 0001 is fishy, but don't have time today to stare at
it more closely.

            regards, tom lane



Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

From
Alexey Kondratov
Date:
On 2020-11-09 23:25, Tom Lane wrote:
> Alexey Kondratov <a.kondratov@postgrespro.ru> writes:
>> On 2020-11-09 21:53, Tom Lane wrote:
>>> 0002 seems like a pretty clear bug fix, though I wonder if this is 
>>> exactly
>>> what we want to do going forward.  It seems like a very large 
>>> fraction of
>>> the callers of TimestampDifference would like to have the value in 
>>> msec,
>>> which means we're doing a whole lot of expensive and error-prone
>>> arithmetic to break down the difference to sec/usec and then put it
>>> back together again.  Let's get rid of that by inventing, say
>>> TimestampDifferenceMilliseconds(...).
> 
>> Yeah, I get into this problem after a bug in another extension —
>> pg_wait_sampling. I have attached 0002, which implements
>> TimestampDifferenceMilliseconds(), so 0003 just uses this new function
>> to solve the initial issues. If it looks good to you, then we can 
>> switch
>> all similar callers to it.
> 
> Yeah, let's move forward with that --- in fact, I'm inclined to
> back-patch it.  (Not till the current release cycle is done, though.
> I don't find this important enough to justify a last-moment patch.)
> 
> BTW, I wonder if we shouldn't make TimestampDifferenceMilliseconds
> round any fractional millisecond up rather than down.  Rounding down
> seems to create a hazard of uselessly waking just before the delay is
> completed.  Better to wake just after.
> 

Yes, it make sense. I have changed TimestampDifferenceMilliseconds() to 
round result up if there is a reminder.

After looking on the autoprewarm code more closely I have realised that 
this 'double dump' issues was not an issues at all. I have just 
misplaced a debug elog(), so its second output in the log was only 
indicating that we calculated delay_in_ms one more time. Actually, even 
with wrong calculation of delay_in_ms the only problem was that we were 
busy looping with ~1 second interval instead of waiting on latch.

It is still a buggy behaviour, but much less harmful than I have 
originally thought.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company
Attachment
Alexey Kondratov <a.kondratov@postgrespro.ru> writes:
> After looking on the autoprewarm code more closely I have realised that 
> this 'double dump' issues was not an issues at all. I have just 
> misplaced a debug elog(), so its second output in the log was only 
> indicating that we calculated delay_in_ms one more time.

Ah --- that explains why I couldn't see a problem.

I've pushed 0001+0002 plus some followup work to fix other places
that could usefully use TimestampDifferenceMilliseconds().  I have
not done anything with 0003 (the TAP test for pg_prewarm), and will
leave that to the judgment of somebody who's worked with pg_prewarm
before.  To me it looks like it's not really testing things very
carefully at all; on the other hand, we have exactly zero test
coverage of that module today, so maybe something is better than
nothing.

            regards, tom lane



Re: Misuse of TimestampDifference() in the autoprewarm feature of pg_prewarm

From
Alexey Kondratov
Date:
On 2020-11-11 06:59, Tom Lane wrote:
> Alexey Kondratov <a.kondratov@postgrespro.ru> writes:
>> After looking on the autoprewarm code more closely I have realised 
>> that
>> this 'double dump' issues was not an issues at all. I have just
>> misplaced a debug elog(), so its second output in the log was only
>> indicating that we calculated delay_in_ms one more time.
> 
> Ah --- that explains why I couldn't see a problem.
> 
> I've pushed 0001+0002 plus some followup work to fix other places
> that could usefully use TimestampDifferenceMilliseconds().  I have
> not done anything with 0003 (the TAP test for pg_prewarm), and will
> leave that to the judgment of somebody who's worked with pg_prewarm
> before.  To me it looks like it's not really testing things very
> carefully at all; on the other hand, we have exactly zero test
> coverage of that module today, so maybe something is better than
> nothing.
> 

Great, thank you for generalisation of the issue and working on it.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company