Thread: Ignore heap rewrites for materialized views in logical replication
Hi,
While investigating an internal report, I concluded that it is a bug. The
reproducible test case is simple (check 0002) and it consists of a FOR ALL
TABLES publication and a non-empty materialized view on publisher. After the
setup, if you refresh the MV, you got the following message on the subscriber:
ERROR: logical replication target relation "public.pg_temp_NNNNN" does not exist
That's because the commit 1a499c2520 (that fixes the heap rewrite for tables)
forgot to consider that materialized views can also create transient heaps and
they should also be skipped. The affected version is only 10 because 11
contains a different solution (commit 325f2ec555) that provides a proper fix
for the heap rewrite handling in logical decoding.
0001 is a patch to skip MV too. I attached 0002 to demonstrate the issue but it
doesn't seem appropriate to be included. The test was written to detect the
error and bail out. After this fix, it takes a considerable amount of time to
finish the test because it waits for a message that never arrives. Since nobody
reports this bug in 5 years and considering that version 10 will be EOL in 6
months, I don't think an additional test is crucial here.
Attachment
On Sat, May 28, 2022 at 2:44 AM Euler Taveira <euler@eulerto.com> wrote: > > While investigating an internal report, I concluded that it is a bug. The > reproducible test case is simple (check 0002) and it consists of a FOR ALL > TABLES publication and a non-empty materialized view on publisher. After the > setup, if you refresh the MV, you got the following message on the subscriber: > > ERROR: logical replication target relation "public.pg_temp_NNNNN" does not exist > > That's because the commit 1a499c2520 (that fixes the heap rewrite for tables) > forgot to consider that materialized views can also create transient heaps and > they should also be skipped. The affected version is only 10 because 11 > contains a different solution (commit 325f2ec555) that provides a proper fix > for the heap rewrite handling in logical decoding. > > 0001 is a patch to skip MV too. > I agree with your analysis and the fix looks correct to me. > I attached 0002 to demonstrate the issue but it > doesn't seem appropriate to be included. The test was written to detect the > error and bail out. After this fix, it takes a considerable amount of time to > finish the test because it waits for a message that never arrives. > Instead of waiting for an error, we can try to insert into a new table created by the test case after the 'Refresh ..' command and wait for the change to be replicated by using wait_for_caught_up. > Since nobody > reports this bug in 5 years and considering that version 10 will be EOL in 6 > months, I don't think an additional test is crucial here. > Let's try to see if we can simplify the test so that it can be committed along with a fix. If we are not able to find any reasonable way then we can think of skipping it. -- With Regards, Amit Kapila.
On Sat, May 28, 2022, at 7:07 AM, Amit Kapila wrote:
I agree with your analysis and the fix looks correct to me.
Thanks for checking.
Instead of waiting for an error, we can try to insert into a new tablecreated by the test case after the 'Refresh ..' command and wait forthe change to be replicated by using wait_for_caught_up.
That's a good idea. [modifying the test...] I used the same table. Whenever the
new row arrives on the subscriber or it reads that error message, it bails out.
Let's try to see if we can simplify the test so that it can becommitted along with a fix. If we are not able to find any reasonableway then we can think of skipping it.
The new test is attached.
Attachment
On Tue, May 31, 2022 at 6:27 AM Euler Taveira <euler@eulerto.com> wrote: > > On Sat, May 28, 2022, at 7:07 AM, Amit Kapila wrote: > > I agree with your analysis and the fix looks correct to me. > > Thanks for checking. > > Instead of waiting for an error, we can try to insert into a new table > created by the test case after the 'Refresh ..' command and wait for > the change to be replicated by using wait_for_caught_up. > > That's a good idea. [modifying the test...] I used the same table. Whenever the > new row arrives on the subscriber or it reads that error message, it bails out. > I think we don't need the retry logical to check error, a simple wait_for_caught_up should be sufficient as we are doing in other tests. See attached. I have slightly modified the commit message as well. Kindly let me know what you think? -- With Regards, Amit Kapila.
Attachment
On Tue, May 31, 2022, at 11:13 AM, Amit Kapila wrote:
I think we don't need the retry logical to check error, a simplewait_for_caught_up should be sufficient as we are doing in othertests. See attached. I have slightly modified the commit message aswell. Kindly let me know what you think?
Your modification will hang until the test timeout without the patch. That's
why I avoided to use wait_for_caught_up and used a loop for fast exit on success
or failure. I'm fine with a simple test case like you proposed.
On Tue, May 31, 2022 at 8:28 PM Euler Taveira <euler@eulerto.com> wrote: > > On Tue, May 31, 2022, at 11:13 AM, Amit Kapila wrote: > > I think we don't need the retry logical to check error, a simple > wait_for_caught_up should be sufficient as we are doing in other > tests. See attached. I have slightly modified the commit message as > well. Kindly let me know what you think? > > Your modification will hang until the test timeout without the patch. That's > why I avoided to use wait_for_caught_up and used a loop for fast exit on success > or failure. > Right, but that is true for other tests as well and we are not expecting to face this/other errors. I think keeping it simple and similar to other tests seems enough for this case. > I'm fine with a simple test case like you proposed. > Thanks, I'll push this in a day or two unless I see any other suggestions/comments. Note to others: this is v10 fix only. As mentioned by Euler in his initial email, this is not required from v11 onwards due to a different solution for this problem via commit 325f2ec555. -- With Regards, Amit Kapila.
On Wed, Jun 1, 2022 at 10:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, May 31, 2022 at 8:28 PM Euler Taveira <euler@eulerto.com> wrote: > > > > On Tue, May 31, 2022, at 11:13 AM, Amit Kapila wrote: > > > > I think we don't need the retry logical to check error, a simple > > wait_for_caught_up should be sufficient as we are doing in other > > tests. See attached. I have slightly modified the commit message as > > well. Kindly let me know what you think? > > > > Your modification will hang until the test timeout without the patch. That's > > why I avoided to use wait_for_caught_up and used a loop for fast exit on success > > or failure. > > > > Right, but that is true for other tests as well and we are not > expecting to face this/other errors. I think keeping it simple and > similar to other tests seems enough for this case. > > > I'm fine with a simple test case like you proposed. > > > > Thanks, I'll push this in a day or two unless I see any other > suggestions/comments. > Pushed. -- With Regards, Amit Kapila.