Thread: Testing DDL deparsing support
DDL deparsing is a feature that allows collection of DDL commands as they are executed in a server, in some flexible, complete, and fully-contained format that allows manipulation, storage, and transmission. This feature has several use cases; the two best known ones are DDL replication and DDL auditing. We have came up with a design that uses a JSON structure to store commands. It is similar to the C sprintf() call in spirit: there is a base format string, which is a generic template for each command type, and contains placeholders that represent the variable parts of the command. The values for the placeholders in each specific command are members of the JSON object. A helper function is provided that expands the format string and replaces the placeholders with the values, and returns the SQL command as text. This design lets the user operate on the JSON structure in either a read-only fashion (for example to block table creation if the names don't satisfy a certain condition), or by modifying it (for example, to change the schema name so that tables are created in different schemas when they are replicated to some remote server). This design is mostly accepted by the community. The one sticking point is testing: how do we ensure that the JSON representation we have created correctly deparses back into a command that has the same effect as the original command. This was expressed by Robert Haas: http://www.postgresql.org/message-id/CA+TgmoZ=vZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA@mail.gmail.com The problem cannot be solved by a standard regression test module which runs a bunch of previously-defined commands and verifies the output. We need not only check the output for the commands as they exist today, but also we need to ensure that this does not get broken as future patches modify the existing commands as well as create completely new commands. The challenge here is to create a new testing framework that ensures the DDL deparsing module will be maintained by future hackers as the DDL grammar is modified. What and How to Test -------------------- Our goal should be that patch authors run "make check-world" in their patched copies and notice that the DDL deparse test is failing; they can then modify deparse_utility.c to add support for the new commands, which should in general be pretty straightforward. This way, maintaining deparsing code would be part of new patches just like we require pg_dump support and documentation for new features. It would not work to require patch authors to add their new commands to a new pg_regress test file, because most would not be aware of the need, or they would just forget to do it, and patches would be submitted and possibly even committed without any realization of the breakage caused. There are two things we can rely on: standard regression tests, and pg_dump. Standard regression tests are helpful because patch authors include new test cases in the patches that stress their new options or commands. This new test framework needs to be something that internally runs the regression tests and exercises DDL deparsing as the regression tests execute DDL. That would mean that new commands and options would automatically be deparse-tested by our new framework as soon as patch authors add standard regress support. One thing is first-grade testing, that is ensure that the deparsed version of a DDL command can be executed at all, for if the deparsed version throws an error, it's immediately obvious that the deparse code is bogus. This is because we know the original command did not throw an error: otherwise, the deparse code would not have run at all, because ddl_command_end triggers are only executed once the original command has completed execution. So first-grade testing ensures that no trivial bugs are present. But there's second-grade verification as well: is the object produced by the deparsed version identical to the one produced by the original command? One trivial but incomplete approach is to run the command, then save the deparsed version; run the deparsed version, and deparse that one too; compare both. The problem with this approach is that if the deparse code is omitting some clause (say it omits IN TABLESPACE in a CREATE TABLE command), then both deparsed versions would contain the same bug yet they would compare equal. Therefore this approach is not good enough. The best idea we have so far to attack second-grade testing is to trust pg_dump to expose differences: accumulate commands as they run in the regression database, the run the deparsed versions in a different database; then pg_dump both databases and compare the dumped outputs. Proof-of-concept ---------------- We have now implemented this as a proof-of-concept; the code is available in the deparse branch at: http://git.postgresql.org/gitweb/?p=2ndquadrant_bdr.git a diff is attached for reference, but relies on the deparsing functionality available in the deparse branch. To implement the DDL deparsing, a pseudo-test is executed first, which creates an event trigger and a table in which to store queries captured by the event trigger. Following conclusion of all regression tests, a further test is executed which exports the query table, imports it into the second database and runs pg_dump; the output of this is then compared against the expected output. This test can fail either at the import stage, if the deparsed commands are syntactically incorrect, or at the comparison stage, if the a deparsed command is valid but syntactically different to the original. To facilitate this, some minimal changes to pg_regress itself have been necessary. In the current proof-of-concept it automatically creates (and where appropriate drops) the "shadow" database used to load the deparsed commands; and also provides a couple of additional tokens to the .source files to provide information otherwise unavailable to the SQL scripts such as the location of pg_dump and the name of the "shadow" database. A simple schedule to demonstrate this is available; execute from the src/test/regress/ directory like this: ./pg_regress \ --temp-install=./tmp_check \ --top-builddir=../../.. \ --dlpath=. \ --schedule=./schedule_ddl_deparse_demo Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Thu, Nov 27, 2014 at 11:43 PM, Ian Barwick <ian@2ndquadrant.com> wrote: > DDL deparsing is a feature that allows collection of DDL commands as they > are > executed in a server, in some flexible, complete, and fully-contained format > that allows manipulation, storage, and transmission. This feature has > several > use cases; the two best known ones are DDL replication and DDL auditing. > > We have came up with a design that uses a JSON structure to store commands. > It is similar to the C sprintf() call in spirit: there is a base format > string, which is a generic template for each command type, and contains > placeholders that represent the variable parts of the command. The values > for > the placeholders in each specific command are members of the JSON object. A > helper function is provided that expands the format string and replaces the > placeholders with the values, and returns the SQL command as text. This > design lets the user operate on the JSON structure in either a read-only > fashion (for example to block table creation if the names don't satisfy a > certain condition), or by modifying it (for example, to change the schema > name > so that tables are created in different schemas when they are replicated to > some remote server). > > This design is mostly accepted by the community. The one sticking point is > testing: how do we ensure that the JSON representation we have created > correctly deparses back into a command that has the same effect as the > original command. This was expressed by Robert Haas: > http://www.postgresql.org/message-id/CA+TgmoZ=vZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA@mail.gmail.com > > The problem cannot be solved by a standard regression test module which runs > a > bunch of previously-defined commands and verifies the output. We need not > only check the output for the commands as they exist today, but also we need > to ensure that this does not get broken as future patches modify the > existing > commands as well as create completely new commands. > > The challenge here is to create a new testing framework that ensures the DDL > deparsing module will be maintained by future hackers as the DDL grammar is > modified. > > What and How to Test > -------------------- > > Our goal should be that patch authors run "make check-world" in their > patched > copies and notice that the DDL deparse test is failing; they can then modify > deparse_utility.c to add support for the new commands, which should in > general > be pretty straightforward. This way, maintaining deparsing code would be > part > of new patches just like we require pg_dump support and documentation for > new > features. > > It would not work to require patch authors to add their new commands to a > new > pg_regress test file, because most would not be aware of the need, or they > would just forget to do it, and patches would be submitted and possibly even > committed without any realization of the breakage caused. > > There are two things we can rely on: standard regression tests, and pg_dump. > > Standard regression tests are helpful because patch authors include new test > cases in the patches that stress their new options or commands. This new > test > framework needs to be something that internally runs the regression tests > and > exercises DDL deparsing as the regression tests execute DDL. That would > mean > that new commands and options would automatically be deparse-tested by our > new > framework as soon as patch authors add standard regress support. > > One thing is first-grade testing, that is ensure that the deparsed version > of > a DDL command can be executed at all, for if the deparsed version throws an > error, it's immediately obvious that the deparse code is bogus. This is > because we know the original command did not throw an error: otherwise, the > deparse code would not have run at all, because ddl_command_end triggers are > only executed once the original command has completed execution. So > first-grade testing ensures that no trivial bugs are present. > > But there's second-grade verification as well: is the object produced by the > deparsed version identical to the one produced by the original command? One > trivial but incomplete approach is to run the command, then save the > deparsed > version; run the deparsed version, and deparse that one too; compare both. > The problem with this approach is that if the deparse code is omitting some > clause (say it omits IN TABLESPACE in a CREATE TABLE command), then both > deparsed versions would contain the same bug yet they would compare equal. > Therefore this approach is not good enough. > > The best idea we have so far to attack second-grade testing is to trust > pg_dump to expose differences: accumulate commands as they run in the > regression database, the run the deparsed versions in a different database; > then pg_dump both databases and compare the dumped outputs. > > Proof-of-concept > ---------------- > > We have now implemented this as a proof-of-concept; the code is available > in the deparse branch at: > > http://git.postgresql.org/gitweb/?p=2ndquadrant_bdr.git > > a diff is attached for reference, but relies on the deparsing functionality > available in the deparse branch. > > To implement the DDL deparsing, a pseudo-test is executed first, which > creates an event trigger and a table in which to store queries captured > by the event trigger. Following conclusion of all regression tests, a > further test is executed which exports the query table, imports it into > the second database and runs pg_dump; the output of this is then compared > against the expected output. This test can fail either at the import > stage, if the deparsed commands are syntactically incorrect, or at the > comparison stage, if the a deparsed command is valid but syntactically > different to the original. > > To facilitate this, some minimal changes to pg_regress itself have been > necessary. In the current proof-of-concept it automatically creates > (and where appropriate drops) the "shadow" database used to load the > deparsed commands; and also provides a couple of additional tokens to > the .source files to provide information otherwise unavailable to the > SQL scripts such as the location of pg_dump and the name of the "shadow" > database. > > A simple schedule to demonstrate this is available; execute from the > src/test/regress/ directory like this: > > ./pg_regress \ > --temp-install=./tmp_check \ > --top-builddir=../../.. \ > --dlpath=. \ > --schedule=./schedule_ddl_deparse_demo I haven't read the code, but this concept seems good to me. It has the unfortunate weakness that a difference could exist during the *middle* of the regression test run that is gone by the *end* of the run, but our existing pg_upgrade testing has the same weakness, so I guess we can view this as one more reason not to be too aggressive about having regression tests drop the unshared objects they create. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Thu, Nov 27, 2014 at 11:43 PM, Ian Barwick <ian@2ndquadrant.com> wrote: > > A simple schedule to demonstrate this is available; execute from the > > src/test/regress/ directory like this: > > > > ./pg_regress \ > > --temp-install=./tmp_check \ > > --top-builddir=../../.. \ > > --dlpath=. \ > > --schedule=./schedule_ddl_deparse_demo > > I haven't read the code, but this concept seems good to me. Excellent, thanks. > It has the unfortunate weakness that a difference could exist during > the *middle* of the regression test run that is gone by the *end* of > the run, but our existing pg_upgrade testing has the same weakness, so > I guess we can view this as one more reason not to be too aggressive > about having regression tests drop the unshared objects they create. Agreed. Not dropping objects also helps test pg_dump itself; the normal procedure there is run the regression tests, then pg_dump the regression database. Objects that are dropped never exercise their corresponding pg_dump support code, which I think is a bad thing. I think we should institute a policy that regression tests must keep the objects they create; maybe not all of them, but at least a sample large enough to cover all interesting possibilities. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote: > Standard regression tests are helpful because patch authors include new test > cases in the patches that stress their new options or commands. This new test > framework needs to be something that internally runs the regression tests and > exercises DDL deparsing as the regression tests execute DDL. That would mean > that new commands and options would automatically be deparse-tested by our new > framework as soon as patch authors add standard regress support. Are you saying every time a new option is added to a command that a new regression test needs to be added? We don't normally do that, and it could easily bloat the regression tests over time. In summary, this testing will help, but it will not be fully reliable. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian wrote: > On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote: > > Standard regression tests are helpful because patch authors include new test > > cases in the patches that stress their new options or commands. This new test > > framework needs to be something that internally runs the regression tests and > > exercises DDL deparsing as the regression tests execute DDL. That would mean > > that new commands and options would automatically be deparse-tested by our new > > framework as soon as patch authors add standard regress support. > > Are you saying every time a new option is added to a command that a new > regression test needs to be added? Not necessarily -- an existing test could be modified, as well. > We don't normally do that, I sure hope we do have all options covered by tests. > and it could easily bloat the regression tests over time. We had 103 regression tests in 8.2 and we have 145 in 9.4. Does this qualify as bloat? > In summary, this testing will help, but it will not be fully reliable. No testing is ever fully reliable. If it were, there would never be bugs. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 5, 2014 at 09:29:59AM -0300, Alvaro Herrera wrote: > Bruce Momjian wrote: > > On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote: > > > Standard regression tests are helpful because patch authors include new test > > > cases in the patches that stress their new options or commands. This new test > > > framework needs to be something that internally runs the regression tests and > > > exercises DDL deparsing as the regression tests execute DDL. That would mean > > > that new commands and options would automatically be deparse-tested by our new > > > framework as soon as patch authors add standard regress support. > > > > Are you saying every time a new option is added to a command that a new > > regression test needs to be added? > > Not necessarily -- an existing test could be modified, as well. > > > We don't normally do that, > > I sure hope we do have all options covered by tests. Are you saying that every combination of ALTER options is tested? We have rejected simple regression test additions on the basis that the syntax works and is unlikely to break once tested once by the developer. > > and it could easily bloat the regression tests over time. > > We had 103 regression tests in 8.2 and we have 145 in 9.4. Does this > qualify as bloat? No, that seems fine. I am worried about having to have a test for every syntax change, which we currently don't do? Was that issue not clear in my first email? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian wrote: > On Fri, Dec 5, 2014 at 09:29:59AM -0300, Alvaro Herrera wrote: > > Bruce Momjian wrote: > > > On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote: > > > > Standard regression tests are helpful because patch authors include new test > > > > cases in the patches that stress their new options or commands. This new test > > > > framework needs to be something that internally runs the regression tests and > > > > exercises DDL deparsing as the regression tests execute DDL. That would mean > > > > that new commands and options would automatically be deparse-tested by our new > > > > framework as soon as patch authors add standard regress support. > > > > > > Are you saying every time a new option is added to a command that a new > > > regression test needs to be added? > > > > Not necessarily -- an existing test could be modified, as well. > > > > > We don't normally do that, > > > > I sure hope we do have all options covered by tests. > > Are you saying that every combination of ALTER options is tested? Well, ALTER TABLE is special: you can give several subcommands, and each subcommand can be one of a rather long list of possible subcommands. Testing every combination would mean a combinatorial explosion, which would indeed be too large. But surely we want a small bunch of tests to prove that having several subcommands works fine, and also at least one test for every possible subcommand. > We have rejected simple regression test additions on the basis that > the syntax works and is unlikely to break once tested once by the > developer. This rationale doesn't sound so good to me. Something might work fine the minute it is committed, but someone else might break it inadvertently later; this has actually happened. Having no tests at all for a feature isn't good. I know we have recently rejected patches that added tests only to improve the coverage percent, for instance in CREATE DATABASE, because the runtime of the tests got too large. Are there other examples of rejected tests? > > > and it could easily bloat the regression tests over time. > > > > We had 103 regression tests in 8.2 and we have 145 in 9.4. Does this > > qualify as bloat? > > No, that seems fine. I am worried about having to have a test for every > syntax change, which we currently don't do? Was that issue not clear in > my first email? Well, if with "every syntax change" you mean "every feature addition", then I think we should have at least one test for each, yes. It's not like we add new syntax every day anyway. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 5, 2014 at 04:10:12PM -0300, Alvaro Herrera wrote: > Well, ALTER TABLE is special: you can give several subcommands, and each > subcommand can be one of a rather long list of possible subcommands. > Testing every combination would mean a combinatorial explosion, which > would indeed be too large. But surely we want a small bunch of tests to > prove that having several subcommands works fine, and also at least one > test for every possible subcommand. > > > We have rejected simple regression test additions on the basis that > > the syntax works and is unlikely to break once tested once by the > > developer. > > This rationale doesn't sound so good to me. Something might work fine > the minute it is committed, but someone else might break it > inadvertently later; this has actually happened. Having no tests at all > for a feature isn't good. > > I know we have recently rejected patches that added tests only to > improve the coverage percent, for instance in CREATE DATABASE, because > the runtime of the tests got too large. Are there other examples of > rejected tests? Yes, there are many cases we have added options or keywords but didn't add a regression test. > > > > and it could easily bloat the regression tests over time. > > > > > > We had 103 regression tests in 8.2 and we have 145 in 9.4. Does this > > > qualify as bloat? > > > > No, that seems fine. I am worried about having to have a test for every > > syntax change, which we currently don't do? Was that issue not clear in > > my first email? > > Well, if with "every syntax change" you mean "every feature addition", > then I think we should have at least one test for each, yes. It's not > like we add new syntax every day anyway. Well, my point is that this is a new behavior we have to do, at least to test the logical DDL behavior --- I suppose it could be remove after testing. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Tue, Dec 2, 2014 at 03:13:07PM -0300, Alvaro Herrera wrote: > Robert Haas wrote: > > On Thu, Nov 27, 2014 at 11:43 PM, Ian Barwick <ian@2ndquadrant.com> wrote: > > > > A simple schedule to demonstrate this is available; execute from the > > > src/test/regress/ directory like this: > > > > > > ./pg_regress \ > > > --temp-install=./tmp_check \ > > > --top-builddir=../../.. \ > > > --dlpath=. \ > > > --schedule=./schedule_ddl_deparse_demo > > > > I haven't read the code, but this concept seems good to me. > > Excellent, thanks. > > > It has the unfortunate weakness that a difference could exist during > > the *middle* of the regression test run that is gone by the *end* of > > the run, but our existing pg_upgrade testing has the same weakness, so > > I guess we can view this as one more reason not to be too aggressive > > about having regression tests drop the unshared objects they create. > > Agreed. Not dropping objects also helps test pg_dump itself; the normal > procedure there is run the regression tests, then pg_dump the regression > database. Objects that are dropped never exercise their corresponding > pg_dump support code, which I think is a bad thing. I think we should > institute a policy that regression tests must keep the objects they > create; maybe not all of them, but at least a sample large enough to > cover all interesting possibilities. This causes creation DDL is checked if it is used in the regression database, but what about ALTER and DROP? pg_dump doesn't issue those, except in special cases like inheritance. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 14/12/07 12:43, Bruce Momjian wrote: > On Tue, Dec 2, 2014 at 03:13:07PM -0300, Alvaro Herrera wrote: >> Robert Haas wrote: >>> On Thu, Nov 27, 2014 at 11:43 PM, Ian Barwick <ian@2ndquadrant.com> wrote: >> >>>> A simple schedule to demonstrate this is available; execute from the >>>> src/test/regress/ directory like this: >>>> >>>> ./pg_regress \ >>>> --temp-install=./tmp_check \ >>>> --top-builddir=../../.. \ >>>> --dlpath=. \ >>>> --schedule=./schedule_ddl_deparse_demo >>> >>> I haven't read the code, but this concept seems good to me. >> >> Excellent, thanks. >> >>> It has the unfortunate weakness that a difference could exist during >>> the *middle* of the regression test run that is gone by the *end* of >>> the run, but our existing pg_upgrade testing has the same weakness, so >>> I guess we can view this as one more reason not to be too aggressive >>> about having regression tests drop the unshared objects they create. >> >> Agreed. Not dropping objects also helps test pg_dump itself; the normal >> procedure there is run the regression tests, then pg_dump the regression >> database. Objects that are dropped never exercise their corresponding >> pg_dump support code, which I think is a bad thing. I think we should >> institute a policy that regression tests must keep the objects they >> create; maybe not all of them, but at least a sample large enough to >> cover all interesting possibilities. > > This causes creation DDL is checked if it is used in the regression > database, but what about ALTER and DROP? pg_dump doesn't issue those, > except in special cases like inheritance. Sure, pg_dump won't contain ALTER/DROP DDL; we are using pg_dump after replaying the DDL commands to compare the actual state of the database with the expected state. As I'm in the middle of writing these tests, before I go any further do you accept the tests need to be included? Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
On Sat, Dec 6, 2014 at 10:43 PM, Bruce Momjian <bruce@momjian.us> wrote: > This causes creation DDL is checked if it is used in the regression > database, but what about ALTER and DROP? pg_dump doesn't issue those, > except in special cases like inheritance. The proposed testing mechanism should cover any ALTER commands that are in the regression tests provided that those objects are not subsequently dropped -- because if the ALTER commands aren't replayed properly, then the later pg_dump won't produce the same output. There probably are some gaps in our current regression tests in this area, but that's probably a good thing to fix regardless of this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 8, 2014 at 12:43:36PM -0500, Robert Haas wrote: > On Sat, Dec 6, 2014 at 10:43 PM, Bruce Momjian <bruce@momjian.us> wrote: > > This causes creation DDL is checked if it is used in the regression > > database, but what about ALTER and DROP? pg_dump doesn't issue those, > > except in special cases like inheritance. > > The proposed testing mechanism should cover any ALTER commands that > are in the regression tests provided that those objects are not > subsequently dropped -- because if the ALTER commands aren't replayed > properly, then the later pg_dump won't produce the same output. > > There probably are some gaps in our current regression tests in this > area, but that's probably a good thing to fix regardless of this. OK, I understand now that the ALTERs are being passed to the slave and we then can test that against pg_dump --- sounds good. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +