Thread: [BUG] Logical replica crash if there was an error in a function.

[BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

21 August 2022, 14:33:57

Hello!

Here is a fix for the bug first described in:
https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru

Reproduction:
1) On master with 'wal_level = logical' execute mascmd.sql attached.

2) On replica substitute the correct port in repcmd.sql and execute it.

3) On master execute command:
INSERT INTO rul_rule_set VALUES ('1', 'name','1','age','true');

Replica will crash with:
FailedAssertion("ActivePortal && ActivePortal->status == PORTAL_ACTIVE", File: "pg_proc.c", Line: 1038, PID: 42894)
in infinite loop.

After applying this patch replica will give the correct error message instead of assertion:

2022-08-21 17:08:39.935 MSK [143171] ERROR:  relation "rul_rule_set" does not exist at character 172
2022-08-21 17:08:39.935 MSK [143171] QUERY:
    -- Last modified: 2022-08-21 17:08:39.930842+03
    with parameters as (
<<--- skipped by me --- >>>
             )
2022-08-21 17:08:39.935 MSK [143171] CONTEXT:  SQL statement "create or replace function public.rule_set_selector(
<<--- skipped by me --- >>>
    SQL statement "call public.rebuild_rule_set_selector()"
    PL/pgSQL function public.rul_rule_set_trg() line 4 at CALL
    processing remote data for replication origin "pg_16401" during "INSERT"
         for replication target relation "public.rul_rule_set" in transaction 741 finished at 0/17BE180

With best regards,
-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

21 August 2022, 15:32:36

Hello!

On 21.08.2022 17:33, Anton A. Melnikov wrote:
> Hello!
> 
> Here is a fix for the bug first described in:
> https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru
> 

Sorry, there was a wrong patch in the first letter.
Here is a right version.


With best regards,
-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v2-0001-Fix-logical-replica-assert-on-func-error.patch

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

30 August 2022, 07:09:04

Hello!

The patch was rebased on current master.
And here is a simplified crash reproduction:
1) On primary with 'wal_level = logical' execute:
  CREATE TABLE public.test (id int NOT NULL, val integer);
  CREATE PUBLICATION test_pub FOR TABLE test;

2) On replica replace XXXX in the repcmd.sql  attached with primary port and execute it:
psql -f repcmd.sql

3) On master execute command:
INSERT INTO test VALUES ('1');

With best regards,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

08 September 2022, 08:47:06

Hello!

Added a TAP test for this case.

On 30.08.2022 10:09, Anton A. Melnikov wrote:
> Hello!
> 
> The patch was rebased on current master.
> And here is a simplified crash reproduction:
> 1) On primary with 'wal_level = logical' execute:
>   CREATE TABLE public.test (id int NOT NULL, val integer);
>   CREATE PUBLICATION test_pub FOR TABLE test;
> 
> 2) On replica replace XXXX in the repcmd.sql  attached with primary port and execute it:
> psql -f repcmd.sql
> 
> 3) On master execute command:
> INSERT INTO test VALUES ('1');
> 
  
With best regards,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [BUG] Logical replica crash if there was an error in a function.

From

Tom Lane

Date:

24 September 2022, 17:27:23

"Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
> [ v4-0001-Fix-logical-replica-assert-on-func-error.patch ]

I took a quick look at this.  I think you're solving the
problem in the wrong place.  The real issue is why are
we not setting up ActivePortal correctly when running
user-defined code in a logrep worker?  There is other code
that expects that to be set, eg EnsurePortalSnapshotExists.

            regards, tom lane

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

09 October 2022, 09:24:23

Hello!

Thanks for reply!

On 24.09.2022 20:27, Tom Lane wrote:
> I think you're solving the
> problem in the wrong place.  The real issue is why are
> we not setting up ActivePortal correctly when running
> user-defined code in a logrep worker?
During a common query from the backend ActivePortal becomes defined
in the PortalRun and then AfterTriggerEndQuery starts with
non-NULL ActivePortal after ExecutorFinish.
When the logrep worker is applying messages there are neither
PortalStart nor PortalRun calls. And AfterTriggerEndQuery starts
with undefined ActivePortal after finish-edata().
May be it's normal behavior?

> There is other code
> that expects that to be set, eg EnsurePortalSnapshotExists.

When the logrep worker is applying message it doesn't have to use
ActivePortal in EnsurePortalSnapshotExists because ActiveSnapshot is already installed.
It is set at the beginning of each transaction in the begin_replication_step call.

On the other hand, function_parse_error_transpose() tries to get
the original query text  (INSERT INTO test VALUES ('1') in our case) from
the ActivePortal to clarify the location of the error.
But in the logrep worker there is no way to restore original query text
from the logrep message. There is only 'zipped' query equivalent to the original.
So any function_parse_error_transpose() call seems to be useless
in the logrep worker.

And it looks like we can simply omit match_prosrc_to_query() call there.
The attached patch does it.

Best wishes,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v5-0001-Fix-logical-replica-assert-on-func-error.patch

Re: [BUG] Logical replica crash if there was an error in a function.

From

Alvaro Herrera

Date:

10 October 2022, 10:06:45

On 2022-Sep-24, Tom Lane wrote:

> "Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
> > [ v4-0001-Fix-logical-replica-assert-on-func-error.patch ]
> 
> I took a quick look at this.  I think you're solving the
> problem in the wrong place.  The real issue is why are
> we not setting up ActivePortal correctly when running
> user-defined code in a logrep worker?  There is other code
> that expects that to be set, eg EnsurePortalSnapshotExists.

Right ... mostly, the logical replication *does* attempt to set up a
transaction and active snapshot when replaying actions (c.f.
begin_replication_step()).  Is this firing at an inappropriate time,
perhaps?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: [BUG] Logical replica crash if there was an error in a function.

From

Michael Paquier

Date:

12 October 2022, 05:24:29

On Sun, Oct 09, 2022 at 12:24:23PM +0300, Anton A. Melnikov wrote:
> On the other hand, function_parse_error_transpose() tries to get
> the original query text  (INSERT INTO test VALUES ('1') in our case) from
> the ActivePortal to clarify the location of the error.
> But in the logrep worker there is no way to restore original query text
> from the logrep message. There is only 'zipped' query equivalent to the original.
> So any function_parse_error_transpose() call seems to be useless
> in the logrep worker.

Yeah, the query string is not available in this context, but it surely
looks wrong to me to assume that something as low-level as
function_parse_error_transpose() needs to be updated for the sake of a
logical worker, while we have other areas that would expect a portal
to be set up.
--
Michael

Attachment

signature.asc

Re: [BUG] Logical replica crash if there was an error in a function.

From

Tom Lane

Date:

02 November 2022, 18:02:28

Michael Paquier <michael@paquier.xyz> writes:
> Yeah, the query string is not available in this context, but it surely
> looks wrong to me to assume that something as low-level as
> function_parse_error_transpose() needs to be updated for the sake of a
> logical worker, while we have other areas that would expect a portal
> to be set up.

After thinking about this some more, I'm withdrawing my opposition to
fixing it by making function_parse_error_transpose() cope with not
having an active portal.  I have a few reasons:

* A Portal is intended to contain an executor state.  While worker.c
does fake up an EState, there's certainly no plan tree or planstate tree,
and I doubt it'd be sane to create dummy ones.  So even if we made a
Portal, it'd be lacking a lot of the stuff one would expect to find there.
I fear that moving the cause of this sort of problem from "there's no
ActivePortal" to "there's an ActivePortal but it lacks field X" wouldn't
be an improvement.

* There is actually just one other consumer of ActivePortal,
namely EnsurePortalSnapshotExists, and that doesn't offer a lot of
support for the idea that ActivePortal must always be set.  It says

     * Nothing to do if a snapshot is set.  (We take it on faith that the
     * outermost active snapshot belongs to some Portal; or if there is no
     * Portal, it's somebody else's responsibility to manage things.)

and "it's somebody else's responsibility" summarizes the situation
here pretty perfectly.  worker.c *does* set up an active snapshot.

* The comment in function_parse_error_transpose() freely admits that
looking at the ActivePortal is a hack.  It works, more or less, for
the intended case of reporting a function-body syntax error nicely
during CREATE FUNCTION.  But it's capable of getting false-positive
matches, so maybe someday we should replace it with something more
bulletproof.

* There isn't any strong reason why function_parse_error_transpose()
has to succeed at finding the original query text.  Its fallback
approach of treating the syntax error position as internal to the
function body text is fine, in fact it's just what we want here.

So I'm now good with the idea of just not failing.  I don't like
the patch as presented though.  First, the cfbot is quite rightly
complaining about the "uninitialized variable" warning it draws.
Second, I don't see a good reason to tie the change to logical
replication in any way.  Let's just change the Assert to an if(),
as attached.

            regards, tom lane

diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index a9fe45e347..e03b98bcd2 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -1017,7 +1017,6 @@ function_parse_error_transpose(const char *prosrc)
 {
     int            origerrposition;
     int            newerrposition;
-    const char *queryText;

     /*
      * Nothing to do unless we are dealing with a syntax error that has a
@@ -1035,11 +1034,22 @@ function_parse_error_transpose(const char *prosrc)
     }

     /* We can get the original query text from the active portal (hack...) */
-    Assert(ActivePortal && ActivePortal->status == PORTAL_ACTIVE);
-    queryText = ActivePortal->sourceText;
+    if (ActivePortal && ActivePortal->status == PORTAL_ACTIVE)
+    {
+        const char *queryText = ActivePortal->sourceText;

-    /* Try to locate the prosrc in the original text */
-    newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+        /* Try to locate the prosrc in the original text */
+        newerrposition = match_prosrc_to_query(prosrc, queryText,
+                                               origerrposition);
+    }
+    else
+    {
+        /*
+         * Quietly give up if no ActivePortal.  This is an unusual situation
+         * but it can happen in, e.g., logical replication workers.
+         */
+        newerrposition = -1;
+    }

     if (newerrposition > 0)
     {

Re: [BUG] Logical replica crash if there was an error in a function.

From

Amit Kapila

Date:

03 November 2022, 04:11:46

On Wed, Nov 2, 2022 at 11:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> So I'm now good with the idea of just not failing.  I don't like
> the patch as presented though.  First, the cfbot is quite rightly
> complaining about the "uninitialized variable" warning it draws.
> Second, I don't see a good reason to tie the change to logical
> replication in any way.  Let's just change the Assert to an if(),
> as attached.
>

LGTM. I don't know if it is a good idea to omit the test case for this
scenario. If required, we can reuse the test case from Sawada-San's
patch in the email [1].

[1] - https://www.postgresql.org/message-id/CAD21AoDKA%2BMB4M9BOnct_%3DZj5bNHbkYn6oKZ2aOQp8m%3D3x2GhQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: [BUG] Logical replica crash if there was an error in a function.

From

Tom Lane

Date:

03 November 2022, 15:29:04

Amit Kapila <amit.kapila16@gmail.com> writes:
> LGTM. I don't know if it is a good idea to omit the test case for this
> scenario. If required, we can reuse the test case from Sawada-San's
> patch in the email [1].

I don't think the cost of that test case is justified by the tiny
probability that it'd ever catch anything.  If we were just adding a
query or two to an existing scenario, that could be okay; but spinning
up and syncing a whole new primary and standby database is *expensive*
when you multiply it by the number of times developers and buildfarm
animals are going to run the tests in the future.

There's also the little issue that I'm not sure it would actually
detect a problem if we had one.  The case is going to fail, and
what we want to know is just how messily it fails, and I think the
TAP infrastructure isn't very sensitive to that ... especially
if the test isn't even checking for specific error messages.

            regards, tom lane

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

15 November 2022, 01:39:53

Hello!

On 02.11.2022 21:02, Tom Lane wrote:
> So I'm now good with the idea of just not failing.  I don't like
> the patch as presented though.  First, the cfbot is quite rightly
> complaining about the "uninitialized variable" warning it draws.
> Second, I don't see a good reason to tie the change to logical
> replication in any way.  Let's just change the Assert to an if(),
> as attached.

Fully agreed that is the most optimal solution for that case. Thanks!
Surely it's very rare one but there was a real segfault at production server.
Someone came up with the idea to modify function like public.test_selector()
in repcmd.sql (see above) on the fly with adding to it :last_modification:
field from current time and some other parameters with the help of replace()
inside the creation of the rebuild_test() procedure.

On 03.11.2022 18:29, Tom Lane wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> LGTM. I don't know if it is a good idea to omit the test case for this
>> scenario. If required, we can reuse the test case from Sawada-San's
>> patch in the email [1].
> 
> I don't think the cost of that test case is justified by the tiny
> probability that it'd ever catch anything.  If we were just adding a
> query or two to an existing scenario, that could be okay; but spinning
> up and syncing a whole new primary and standby database is *expensive*
> when you multiply it by the number of times developers and buildfarm
> animals are going to run the tests in the future.
> 
> There's also the little issue that I'm not sure it would actually
> detect a problem if we had one.  The case is going to fail, and
> what we want to know is just how messily it fails, and I think the
> TAP infrastructure isn't very sensitive to that ... especially
> if the test isn't even checking for specific error messages.
Seems it is possible to do a test without these remarks. The attached
test uses existing nodes and checks the specific error message. Additionally
i've tried to reduce overall number of nodes previously
used in this test in a similar way.

Would be glad for comments and remarks.

With best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v1-0001-Test-for-func-error-in-logrep-worker.patch

Re: [BUG] Logical replica crash if there was an error in a function.

From

Tom Lane

Date:

15 November 2022, 01:59:42

"Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
> On 02.11.2022 21:02, Tom Lane wrote:
>> I don't think the cost of that test case is justified by the tiny
>> probability that it'd ever catch anything.

> Seems it is possible to do a test without these remarks. The attached
> test uses existing nodes and checks the specific error message.

My opinion remains unchanged: this would be a very poor use of test
cycles.

> Additionally
> i've tried to reduce overall number of nodes previously
> used in this test in a similar way.

Optimizing existing tests isn't an answer to that.  We could
install those optimizations without adding a new test case.

            regards, tom lane

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

16 November 2022, 14:52:50

Thanks a lot for the fast reply!

On 03.11.2022 18:29, Tom Lane wrote:
> If we were just adding a
> query or two to an existing scenario, that could be okay; but spinning
> up and syncing a whole new primary and standby database is *expensive*
> when you multiply it by the number of times developers and buildfarm
> animals are going to run the tests in the future.
>

On 15.11.2022 04:59, Tom Lane wrote:
> "Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
>> On 02.11.2022 21:02, Tom Lane wrote:
>>> I don't think the cost of that test case is justified by the tiny
>>> probability that it'd ever catch anything.
>
>> Seems it is possible to do a test without these remarks. The attached
>> test uses existing nodes and checks the specific error message.
>
> My opinion remains unchanged: this would be a very poor use of test
> cycles.

Sorry, i didn't fully understand what is required and
added some functions to the test that spend extra cpu time. But i found
that it is possible to make a test according to previous remarks by adding
only a few extra queries to an existent test without any additional syncing.

Experimentally, i could not observe any significant difference in
CPU usage due to this test addition.
The difference in the CPU usage was less than standard error.
See 100_bugs-CPU-time.txt attached.

>> Additionally
>> i've tried to reduce overall number of nodes previously
>> used in this test in a similar way.
>
> Optimizing existing tests isn't an answer to that.  We could
> install those optimizations without adding a new test case.

Oh sure! Here is a separate patch for this optimization:
https://www.postgresql.org/message-id/eb7aa992-c2d7-6ce7-4942-0c784231a362%40inbox.ru
On the contrary with previous case in that one the difference in the CPU usage
during 100_bugs.pl is essential; it decreases approximately by 30%.

With the best wishes!

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [BUG] Logical replica crash if there was an error in a function.

From

Andres Freund

Date:

07 December 2022, 18:03:26

Hi,

On 2022-11-16 17:52:50 +0300, Anton A. Melnikov wrote:
> Sorry, i didn't fully understand what is required and
> added some functions to the test that spend extra cpu time. But i found
> that it is possible to make a test according to previous remarks by adding
> only a few extra queries to an existent test without any additional syncing.
> 
> Experimentally, i could not observe any significant difference in
> CPU usage due to this test addition.
> The difference in the CPU usage was less than standard error.
> See 100_bugs-CPU-time.txt attached.
> 
> > > Additionally
> > > i've tried to reduce overall number of nodes previously
> > > used in this test in a similar way.
> > 
> > Optimizing existing tests isn't an answer to that.  We could
> > install those optimizations without adding a new test case.
> 
> Oh sure! Here is a separate patch for this optimization:
> https://www.postgresql.org/message-id/eb7aa992-c2d7-6ce7-4942-0c784231a362%40inbox.ru
> On the contrary with previous case in that one the difference in the CPU usage
> during 100_bugs.pl is essential; it decreases approximately by 30%.

This CF entry causes tests to fail on all platforms:
https://cirrus-ci.com/build/5755408111894528

E.g.

https://api.cirrus-ci.com/v1/artifact/task/5298457144459264/testrun/build/testrun/subscription/100_bugs/log/regress_log_100_bugs

#### Begin standard error
psql:<stdin>:1: NOTICE:  dropped replication slot "sub1" on publisher
#### End standard error
timed out waiting for match: ERROR:  relation "error_name" does not exist at character at
/tmp/cirrus-ci-build/src/test/subscription/t/100_bugs.plline 115.
 

Greetings,

Andres Freund

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

11 December 2022, 03:50:49

On 07.12.2022 21:03, Andres Freund wrote:

> 
> This CF entry causes tests to fail on all platforms:
> https://cirrus-ci.com/build/5755408111894528
> 
> E.g.
>
https://api.cirrus-ci.com/v1/artifact/task/5298457144459264/testrun/build/testrun/subscription/100_bugs/log/regress_log_100_bugs
> 
> #### Begin standard error
> psql:<stdin>:1: NOTICE:  dropped replication slot "sub1" on publisher
> #### End standard error
> timed out waiting for match: ERROR:  relation "error_name" does not exist at character at
/tmp/cirrus-ci-build/src/test/subscription/t/100_bugs.plline 115.
 
> 
> Greetings,
> 
> Andres Freund

Thank you for reminding!

There was a conflict when applying v2 on current master.
Rebased v3 is attached.

Best wishes!

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v3-0001-Test-for-func-error-in-logrep-worker.patch

Re: [BUG] Logical replica crash if there was an error in a function.

From

vignesh C

Date:

07 January 2023, 12:27:18

On Sun, 11 Dec 2022 at 09:21, Anton A. Melnikov <aamelnikov@inbox.ru> wrote:
>
>
> On 07.12.2022 21:03, Andres Freund wrote:
>
> >
> > This CF entry causes tests to fail on all platforms:
> > https://cirrus-ci.com/build/5755408111894528
> >
> > E.g.
> >
https://api.cirrus-ci.com/v1/artifact/task/5298457144459264/testrun/build/testrun/subscription/100_bugs/log/regress_log_100_bugs
> >
> > #### Begin standard error
> > psql:<stdin>:1: NOTICE:  dropped replication slot "sub1" on publisher
> > #### End standard error
> > timed out waiting for match: ERROR:  relation "error_name" does not exist at character at
/tmp/cirrus-ci-build/src/test/subscription/t/100_bugs.plline 115.
 
> >
> > Greetings,
> >
> > Andres Freund
>
> Thank you for reminding!
>
> There was a conflict when applying v2 on current master.
> Rebased v3 is attached.

Few suggestions:
1) There is a warning:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+       "ERROR:  relation \"error_name\" does not exist at character"
+);

 "my" variable $result masks earlier declaration in same scope at
t/100_bugs.pl line 400.

You can change:
my $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
to
$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");

2) Now that the crash is fixed, you could change it to a better message:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+       "ERROR:  relation \"error_name\" does not exist at character"
+);

Regards,
Vignesh

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

08 January 2023, 06:02:33

Thanks for your remarks.

On 07.01.2023 15:27, vignesh C wrote:
> 
> Few suggestions:
> 1) There is a warning:
> +# This would crash on the subscriber if not fixed
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
> +
> +my $result = $node_subscriber->wait_for_log(
> +       "ERROR:  relation \"error_name\" does not exist at character"
> +);
> 
>   "my" variable $result masks earlier declaration in same scope at
> t/100_bugs.pl line 400.
> 
> You can change:
> my $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
> to
> $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");

The reason is that the patch fell behind the master.
Fixed in v4 together with rebasing on current master.

> 2) Now that the crash is fixed, you could change it to a better message:
> +# This would crash on the subscriber if not fixed
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
> +
> +my $result = $node_subscriber->wait_for_log(
> +       "ERROR:  relation \"error_name\" does not exist at character"
> +);
> 

Tried to make this comment more clear.

Best wishes for the new year!


-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v4-0001-Test-for-func-error-in-logrep-worker.patch

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

16 March 2023, 13:14:00

Hello!

On 15.03.2023 21:29, Gregory Stark (as CFM) wrote:

> These patches that are "Needs Review" and have received no comments at
> all since before March 1st are these. If your patch is amongst this
> list I would suggest any of:
> 
> 1) Move it yourself to the next CF (or withdraw it)
> 2) Post to the list with any pending questions asking for specific
> feedback -- it's much more likely to get feedback than just a generic
> "here's a patch plz review"...
> 3) Mark it Ready for Committer and possibly post explaining the
> resolution to any earlier questions to make it easier for a committer
> to understand the state
>

There were some remarks:
1) very poor use of test cycles (by Tom Lane)
Solved in v2 by adding few extra queries to an existent test without any additional syncing.
2) the patch-tester fails on all platforms (by Andres Freund)
Fixed in v3.
3) warning with "my" variable $result and suggestion to correct comment (by vignesh C)
Both fixed in v4.

Now there are no any pending questions, so moved it to RFC.

With the best regards!

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

v4-0001-Test-for-func-error-in-logrep-worker.patch

Re: [BUG] Logical replica crash if there was an error in a function.

From

Tom Lane

Date:

03 April 2023, 18:49:17

"Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
> Now there are no any pending questions, so moved it to RFC.

I did not think this case was worth memorializing in a test before,
and I still do not.  I'm inclined to reject this patch.

            regards, tom lane

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

05 April 2023, 14:04:39

Hello!

On 03.04.2023 21:49, Tom Lane wrote:
> "Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
>> Now there are no any pending questions, so moved it to RFC.
> 
> I did not think this case was worth memorializing in a test before,
> and I still do not.  I'm inclined to reject this patch.

Earlier, when discussing this test, there was a suggestion like this:

> If we were just adding a
> query or two to an existing scenario, that could be okay;

The current version of the test seems to be satisfies this condition.
The queries added do not affect the total test time within the measurement error.
This case is rare, of cause, but it really took place in practice.

So either there are some more reasons why this test should not be accepted that
i do not understand, or i misunderstood something from the above.

Could you help me to figure out, please.

Would be very grateful.

Sincerely yours,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [BUG] Logical replica crash if there was an error in a function.

From

Tom Lane

Date:

05 April 2023, 14:35:59

"Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
> On 03.04.2023 21:49, Tom Lane wrote:
>> I did not think this case was worth memorializing in a test before,
>> and I still do not.  I'm inclined to reject this patch.

> Could you help me to figure out, please.

The problem was an Assert that was speculative when it went in,
and which we eventually found was wrong in the context of logical
replication.  We removed the Assert.  I don't think we need a test
case to keep us from putting back the Assert.  That line of thinking
leads to test suites that run for fourteen hours and are near useless
because developers can't run them easily.

            regards, tom lane

Re: [BUG] Logical replica crash if there was an error in a function.

From

"Anton A. Melnikov"

Date:

06 April 2023, 10:24:21

On 05.04.2023 17:35, Tom Lane wrote:
> "Anton A. Melnikov" <aamelnikov@inbox.ru> writes:
>> On 03.04.2023 21:49, Tom Lane wrote:
>>> I did not think this case was worth memorializing in a test before,
>>> and I still do not.  I'm inclined to reject this patch.
> 
>> Could you help me to figure out, please.
> 
> The problem was an Assert that was speculative when it went in,
> and which we eventually found was wrong in the context of logical
> replication.  We removed the Assert.  I don't think we need a test
> case to keep us from putting back the Assert.  That line of thinking
> leads to test suites that run for fourteen hours and are near useless
> because developers can't run them easily.
> 
>             regards, tom lane

Ok, i understand! Thanks a lot for the clarification. A rather important point,
i'll take it into account for the future.
Let's do that. Revoked the patch.

With the best wishes!

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company