Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Support logical replication of DDLs
Date
Msg-id CAD21AoCXCAQ5QyXu9-xs30ViUHtUxQMmf-818d8GX--5pTmZ7g@mail.gmail.com
Whole thread Raw
In response to RE: Support logical replication of DDLs  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: Support logical replication of DDLs  (Michael Paquier <michael@paquier.xyz>)
RE: Support logical replication of DDLs  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Tue, Jul 11, 2023 at 8:01 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Hi,
>
> We have been researching how to create a test that detects failures resulting
> from future syntax changes, where the deparser fails to update properly.
>
> The basic idea comes from what Robert Haas suggested in [1]: when running the
> regression test(tests in parallel_schedule), we replace the executing ddl
> statement with the its deparsed version and execute the deparsed statement, so
> that we can run all the regression with the deparsed statement and can expect
> the output to be the same as the existing expected/*.out. As developers
> typically add new regression tests to test new syntax, so we expect this test
> can automatically identify most of the new syntax changes.
>
> One thing to note is that when entering the event trigger(where the deparsing
> happen), the ddl have already been executed. So we need to get the deparsed
> statement before the execution and replace the current statement with it. To
> achieve this, the current approach is to create another database with deparser
> trigger and in the ProcessUtility hook(e.g. tdeparser_ProcessUtility in the
> patch) we redirect the local statement to that remote database, then the
> statment will be deparsed and stored somewhere, we can query the remote
> database to get the deparsed statement and use it to replace the original
> statment.
>
> The process looks like:
> 1) create another database and deparser event trigger in it before running the regression test.
> 2) run the regression test (the statement will be redirect to the remote database and get the deparsed statement)
> 3) compare the output file with the expected ones.
>
> Attach the POC patch(0004) for this approach. Note that, the new test module only
> test test_setup.sql, create_index.sql, create_table.sql and alter_table.sql as
> we only support deparsing TABLE related command for now. And the current test
> cannot pass because of some bugs in deparser, we will fix these bugs soon.
>
>
> TO IMPROVE:
>
> 1. The current approach needs to handle the ERRORs, WARNINGs, and NOTICEs from
> the remote database. Currently it will directly rethrow any ERRORs encountered
> in the remote database. However, for WARNINGs and NOTICEs, we will only rethrow
> them along with the ERRORs. This is done to prevent duplicate messages in the
> output file during local statement execution, which would be inconsistent with
> the existing expected output. Note that this approach may potentially miss some
> bugs, as there could be additional WARNINGs or NOTICEs caused by the deparser
> in the remote database.
>
> 2. The variable reference and assignment (xxx /gset and select :var_name) will
> not be sent to the server(only the qualified value will be sent), but it's
> possible the variable in another session should be set to a different value, so
> this can cause inconsistent output.
>
> 3 .CREATE INDEX CONCURRENTLY will create an invalid index internally even if it
> reports an ERROR later. But since we will directly rethrow the remote ERROR in
> the main database, we won't execute the "CREATE INDEX CONCURRENTLY" in the main
> database. This means that we cannot see the invalid index in the main database.
>
> To improve the above points, another variety is: run the regression test twice.
> The first run is solely intended to collect all the deparsed statements. We can
> dump these statements from the database and then reload them in the second
> regression run. This allows us to utilize the deparsed statements to replace
> the local statements in the second regression run. This approach does not need
> to handle any remote messages and client variable stuff during execution,
> although it could take more time to finsh the test.
>

I've considered some alternative approaches but I prefer the second
approach. A long test time could not be a big problem unless we run it
by default. We can prepare a buildfarm animal that is configured to
run the DDL deparse tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: logicalrep_message_type throws an error
Next
From: Michael Paquier
Date:
Subject: Re: Add TOAST support for more system tables