Hi,
We have been researching how to create a test that detects failures resulting
from future syntax changes, where the deparser fails to update properly.
The basic idea comes from what Robert Haas suggested in [1]: when running the
regression test(tests in parallel_schedule), we replace the executing ddl
statement with the its deparsed version and execute the deparsed statement, so
that we can run all the regression with the deparsed statement and can expect
the output to be the same as the existing expected/*.out. As developers
typically add new regression tests to test new syntax, so we expect this test
can automatically identify most of the new syntax changes.
One thing to note is that when entering the event trigger(where the deparsing
happen), the ddl have already been executed. So we need to get the deparsed
statement before the execution and replace the current statement with it. To
achieve this, the current approach is to create another database with deparser
trigger and in the ProcessUtility hook(e.g. tdeparser_ProcessUtility in the
patch) we redirect the local statement to that remote database, then the
statment will be deparsed and stored somewhere, we can query the remote
database to get the deparsed statement and use it to replace the original
statment.
The process looks like:
1) create another database and deparser event trigger in it before running the regression test.
2) run the regression test (the statement will be redirect to the remote database and get the deparsed statement)
3) compare the output file with the expected ones.
Attach the POC patch(0004) for this approach. Note that, the new test module only
test test_setup.sql, create_index.sql, create_table.sql and alter_table.sql as
we only support deparsing TABLE related command for now. And the current test
cannot pass because of some bugs in deparser, we will fix these bugs soon.
TO IMPROVE:
1. The current approach needs to handle the ERRORs, WARNINGs, and NOTICEs from
the remote database. Currently it will directly rethrow any ERRORs encountered
in the remote database. However, for WARNINGs and NOTICEs, we will only rethrow
them along with the ERRORs. This is done to prevent duplicate messages in the
output file during local statement execution, which would be inconsistent with
the existing expected output. Note that this approach may potentially miss some
bugs, as there could be additional WARNINGs or NOTICEs caused by the deparser
in the remote database.
2. The variable reference and assignment (xxx /gset and select :var_name) will
not be sent to the server(only the qualified value will be sent), but it's
possible the variable in another session should be set to a different value, so
this can cause inconsistent output.
3 .CREATE INDEX CONCURRENTLY will create an invalid index internally even if it
reports an ERROR later. But since we will directly rethrow the remote ERROR in
the main database, we won't execute the "CREATE INDEX CONCURRENTLY" in the main
database. This means that we cannot see the invalid index in the main database.
To improve the above points, another variety is: run the regression test twice.
The first run is solely intended to collect all the deparsed statements. We can
dump these statements from the database and then reload them in the second
regression run. This allows us to utilize the deparsed statements to replace
the local statements in the second regression run. This approach does not need
to handle any remote messages and client variable stuff during execution,
although it could take more time to finsh the test.
[1] https://www.postgresql.org/message-id/CA%2BTgmoZ%3DvZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA%40mail.gmail.com
Best Regards,
Hou zj