Thread: Re: Address the bug in 041_checkpoint_at_promote.pl
> Removing the wakeup makes the test > complete, but it should wait in its lookup loop. Thank you for confirming. Besides fixing the if condition as done in the patch, do you think any other changes are necessary? Best Regards, Nitin Jadhav Azure Database for PostgreSQL Microsoft On Wed, Feb 12, 2025 at 1:54 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Feb 12, 2025 at 01:28:55PM +0530, Nitin Jadhav wrote: > > The code is intended to wait for the restart point to complete before > > proceeding. However, it doesn't actually wait. Regardless of whether > > the restart point completes, the loop exits after the first iteration > > because the if condition always evaluates to true. This happens > > because $logstart is not passed as an argument to log_contains() by > > mistake. If the restart point operation is quick, this issue might not > > be noticeable, which is often the case. > > > > I've attached a patch to fix this issue. Please review and share your feedback. > > Oops, you're right. I am measuring 2ms or so in average between the wakeup > and the restartpoint complete. Removing the wakeup makes the test > complete, but it should wait in its lookup loop. > > Will fix down to v17 where this error has been introduced. > -- > Michael
> > Removing the wakeup makes the test > > complete, but it should wait in its lookup loop. > > Thank you for confirming. Besides fixing the if condition as done in > the patch, do you think any other changes are necessary? I see that it's already been committed and understand that no other changes are needed. Thank you! Best Regards, Nitin Jadhav Azure Database for PostgreSQL Microsoft On Wed, Feb 12, 2025 at 3:29 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > Removing the wakeup makes the test > > complete, but it should wait in its lookup loop. > > Thank you for confirming. Besides fixing the if condition as done in > the patch, do you think any other changes are necessary? > > Best Regards, > Nitin Jadhav > Azure Database for PostgreSQL > Microsoft > > On Wed, Feb 12, 2025 at 1:54 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Wed, Feb 12, 2025 at 01:28:55PM +0530, Nitin Jadhav wrote: > > > The code is intended to wait for the restart point to complete before > > > proceeding. However, it doesn't actually wait. Regardless of whether > > > the restart point completes, the loop exits after the first iteration > > > because the if condition always evaluates to true. This happens > > > because $logstart is not passed as an argument to log_contains() by > > > mistake. If the restart point operation is quick, this issue might not > > > be noticeable, which is often the case. > > > > > > I've attached a patch to fix this issue. Please review and share your feedback. > > > > Oops, you're right. I am measuring 2ms or so in average between the wakeup > > and the restartpoint complete. Removing the wakeup makes the test > > complete, but it should wait in its lookup loop. > > > > Will fix down to v17 where this error has been introduced. > > -- > > Michael
On Thu, Feb 13, 2025 at 5:08 AM Michael Paquier <michael@paquier.xyz> wrote: > > Anyway, how did you find that? Did you see a pattern when running the > test on a very slow machine or just from a read? That was a good > catch. +1. I was wondering the same. -- Best Wishes, Ashutosh Bapat
> > Anyway, how did you find that? Did you see a pattern when running the > > test on a very slow machine or just from a read? That was a good > > catch. > +1. I was wondering the same. I was writing a TAP test to reproduce a crash recovery issue and used parts of 041_checkpoint_at_promote.pl. Unfortunately, my test wasn't waiting for the desired message to appear in the log. I then realized there was a mistake in log_contains(), which I had copied from the existing test. After testing 041_checkpoint_at_promote.pl multiple times to see if it worked as expected, I noticed differences in some iterations. Best Regards, Nitin Jadhav Azure Database for PostgreSQL Microsoft On Thu, Feb 13, 2025 at 11:18 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Feb 13, 2025 at 5:08 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > Anyway, how did you find that? Did you see a pattern when running the > > test on a very slow machine or just from a read? That was a good > > catch. > +1. I was wondering the same. > > > -- > Best Wishes, > Ashutosh Bapat