RE: Disabled logical replication origin session causes primary key errors - Mailing list pgsql-bugs

From Hayato Kuroda (Fujitsu)
Subject RE: Disabled logical replication origin session causes primary key errors
Date
Msg-id OSCPR01MB149668436272E7ADE05202A1AF5BA2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Disabled logical replication origin session causes primary key errors  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Disabled logical replication origin session causes primary key errors
List pgsql-bugs
Dear Sawada-san,

Thanks for giving comments!

> +# The bug was that the replication origin wasn’t updated whe
> +# apply_error_callback() was called with elevel >= ERROR, and the apply
> worker
> +# continued running afterward.
> 
> I think it would be better to mention the fact that the problem
> happened when an error was caught for instance by a plpgsql function.
> How about rewriting it as follows?
> 
> # The bug was that when an ERROR was caught, for instance by a
> PL/pgSQL function,
> # the apply worker reset the replication origin but continued processing
> # subsequent changes. This behavior resulted in a failure to update
> the replication
> # origin during further apply operations.

I tried to describe the internal reasons of bugs, but yours did reported facts.
+1, replaced.

> ---
> +# Define an after-trigger function for the table insert. It can be fired even
> +# by the apply worker and always raises an exception. This situation allows
> +# worker continue after apply_error_callback() is called with elevel = ERROR.
> +$node_subscriber->safe_psql(
> +        'postgres', q{
> +CREATE FUNCTION handle_exception_trigger()
> +RETURNS TRIGGER AS $$
> +BEGIN
> +        BEGIN
> +                -- Raise an exception
> +                RAISE EXCEPTION 'This is a test exception';
> +        EXCEPTION
> +                WHEN OTHERS THEN
> +                        RETURN NEW;
> +        END;
> +
> +        RETURN NEW;
> +END;
> +$$ LANGUAGE plpgsql;
> +});
> +
> +$node_subscriber->safe_psql(
> +        'postgres', q{
> +CREATE TRIGGER silent_exception_trigger
> +AFTER INSERT OR UPDATE ON t1
> +FOR EACH ROW
> +EXECUTE FUNCTION handle_exception_trigger();
> +
> +ALTER TABLE t1 ENABLE ALWAYS TRIGGER silent_exception_trigger;
> +});
> 
> How about rewriting the comment as follows?
> 
> # Create an AFTER INSERT trigger on the table that raises and subsequently
> # handles an exception. Subsequent insertions will trigger this exception,
> # causing the apply worker to invoke its error callback with an ERROR. However,
> # since the error is caught within the trigger, the apply worker will continue
> # processing changes.

Fixed.

> And can we execute these queries in one safe_psql() call?

Yes possible. I intentionally separated to make it clearer, but I did not have
strong opinions. Fixed.

> ---
> +# Obtain current remote_lsn value to check its advancement later
> +my $remote_lsn = $node_subscriber->safe_psql('postgres',
> +        "SELECT remote_lsn FROM
> pg_catalog.pg_replication_origin_status os, pg_catalog.pg_subscription
> s WHERE os.external_id = 'pg_' || s.oid AND s.subname = 'regress_sub'"
> +);
> 
> It seems to make sense to me to get the remote_lsn value just before
> executing INSERT after creating the trigger.

Moved.

> Is it a conventional way to always use schema-qualified catalogs names
> in regression tests? Looking at other tests in src/test/subscription,
> there are only three cases:
> 
> % git grep pg_catalog src/test/subscription/t
> src/test/subscription/t/001_rep_changes.pl:       FROM
> pg_catalog.pg_stat_io
> src/test/subscription/t/020_messages.pl:    "SELECT COUNT(*) FROM
> pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND
> active='f'",
> src/test/subscription/t/029_on_error.pl:    "SELECT subenabled = false
> FROM pg_catalog.pg_subscription WHERE subname = 'sub'"
> 
> ISTM that we don't necessarily need to make the catalog name schema-qualified.

I referred parts of Cluster.pm and 040_standby_failover_slots_sync.pl, and they
had "pg_catalog" prefix. After considering more, instances would be created within
the test and we ensure to connect to them - we can assume they are safe, OK to
remove it.

> ---
> We might want to stop both the publisher and the subscriber at the end
> of the tests.

Opps, added.

Attached patch could pass tests on ,my env, and pgperltidy said OK.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Command order bug in pg_dump
Next
From: "Euler Taveira"
Date:
Subject: Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load