Thread: Extending USING [heap | mytam | yourtam] grammar and behavior
Hackers, I have extended the grammar to allow "USING NOT method [, ...]" to exclude one or more TAMs in a CREATE TABLE statement. This may sound like a weird thing to do, but it is surprisingly useful when developing new Table Access Methods,particularly when you are developing two or more, not just one. To explain: Developing a new TAM takes an awful lot of testing, and much of it is duplicative of the existing core regression test suite. Leveraging the existing tests saves an awful lot of test development. When developing just one TAM, leveraging the existing tests isn't too hard. Without much work*, you can set default_table_access_method=mytamfor the duration of the check-world. You'll get a few test failures this way. Some willbe in tests that probe the catalogs to verify that /heap/ is stored there, and instead /mytam/ is found. Others willbe tests that are sensitive to the number of rows that fit per page, etc. But a surprising number of tests just pass,at least after you get the TAM itself debugged. When developing two or more TAMs, this falls apart. Some tests may be worth fixing up (perhaps with alternate output files)for "mytam", but not for "columnar_tam". That might be because the test is checking fundamentally row-store-ish propertiesof the table, which has no applicability to your column-store-ish TAM. In that case, "USING NOT columnar_tam"fixes the test failure when columnar is the default, without preventing the test from testing "mytam" when ithappens to be the default. Once you have enough TAMs developed and deployed, this USING NOT business becomes useful in production. You might have differentdefaults on different servers, or for different customers, etc., and for a given piece of DDL that you want to releaseyou only want to say which TAMs not to use, not to nail down which TAM must be used. Thoughts? I'll hold off posting a patch until the general idea is debated. [*] It takes some extra work to get the TAP tests to play along. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 15, 2022 at 06:16:21PM -0700, Mark Dilger wrote: > When developing two or more TAMs, this falls apart. Some tests may > be worth fixing up (perhaps with alternate output files) for > "mytam", but not for "columnar_tam". That might be because the test > is checking fundamentally row-store-ish properties of the table, > which has no applicability to your column-store-ish TAM. In that > case, "USING NOT columnar_tam" fixes the test failure when columnar > is the default, without preventing the test from testing "mytam" > when it happens to be the default. I think that it is very important for the in-core test suite to remain transparent in terms of options used for table AMs (or compression), and this has improved a lot over the last years with options like HIDE_TABLEAM and HIDE_TOAST_COMPRESSION. Things could have actually more ORDER BY clauses to ensure more ordering of the results, as long as the tests don't want to stress a specific planning path. However, your problem is basically that you develop multiple AMs, but you want to have regression tests that do checks across more than one table AM at the same time. Am I getting that right? Why is a grammar extension necessary for what looks like a test structure problem when there are interdependencies across multiple AMs developped? > Once you have enough TAMs developed and deployed, this USING NOT > business becomes useful in production. You might have different > defaults on different servers, or for different customers, etc., and > for a given piece of DDL that you want to release you only want to > say which TAMs not to use, not to nail down which TAM must be used. I am not sure to see why this would be something users would actually use in prod. That means to pick up something else than what the server thinks is the best default AM but where somebody does not want to trust the default, while generating an error if specifying the default AM in the USING NOT clause. On top of that default_table_access_method is user-settable. -- Michael
Attachment
On Wed, Jun 15, 2022 at 8:51 PM Michael Paquier <michael@paquier.xyz> wrote:
On top of that
default_table_access_method is user-settable.
FWIW this proposal acknowledges that and basically leverages it to the hilt, turning it into something like search_path. I strongly dislike the idea of any workflow that depends on a GUC in this manner. The fact that it is user-settable is, IMO, a flaw, not a feature, at least as far as production settings are concerned.
It is a novel API for PostgreSQL to rely upon setting a GUC then attaching "unless" configurations to individual objects to ignore it. And what would be chosen (ultimately fallback is heap?), or whether it would simply error, is presently, as you say, undefined.
In production this general behavior becomes useful only under the condition that among the various named access methods some of them don't even exist on the server in question, but that a fallback option would be acceptable in that case. But that suggests extending "USING" to accept multiple names, not inventing a "NOT USING".
That all said, I can understand that testing presents its own special needs. But testing is probably where GUCs shine. So why not implement this capability as a GUC that is set just before the table is created instead of extending the grammar for it? Add it to "developer options" and call it a day. Dump/Restore no longer has to care about it, and its value once the table exists is basically zero anyway.
David J.
> On Jun 15, 2022, at 8:51 PM, Michael Paquier <michael@paquier.xyz> wrote: > > However, > your problem is basically that you develop multiple AMs, but you want > to have regression tests that do checks across more than one table AM > at the same time. It is true that I test multiple table AMs at the same time, but that's a somewhat different concern. > Am I getting that right? Not exactly. > Why is a grammar > extension necessary for what looks like a test structure problem when > there are interdependencies across multiple AMs developped? Ok, I didn't want to get into my exact process, because it involves other changes that I don't expect -hackers to want. But basically what I do is: ./configure --with-default-tam=chicago && make && make check-world That fails for a few tests, and I manually change the create table statements in tests that are not chicago-compatible to"using not chicago". Then ./configure --with-default-tam=detroit && make && make check-world That fails for some other set of tests, but note that the tests with "using not chicago" are still using detroit in thissecond run. That wouldn't be true if I'd fixed up the tests in the first run "using heap". Then I can also add my own tests which might make some chicago backed tables plus some detroit backed tables and see howthey interact. But that's superfluous to the issue of just trying to leverage the existing tests as much as I can withouthaving to reinvent tests to cover "chicago", and then reinvent again to cover "detroit", and so forth. If you develop enough TAMs in parallel, and go with the "using heap" solution, you eventually have zero coverage for anyof the TAMs, because you'll eventually be "using heap" in all the tables of all the tests. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Jun 15, 2022, at 8:51 PM, Michael Paquier <michael@paquier.xyz> wrote: > > I am not sure to see why this would be something users would actually > use in prod. That means to pick up something else than what the > server thinks is the best default AM but where somebody does not want > to trust the default, while generating an error if specifying the > default AM in the USING NOT clause. Sorry for the lack of clarity. I do not suggest raising an error. If you say "USING NOT foo", and foo is the default tableaccess method, then you get the same behavior as a "USING heap" would have gotten you, otherwise, you get the same behavioras not providing any USING clause at all. In future, we might want to create a list of fallback tams rather than just hardcoding "heap" as the one and only fallback,but I haven't run into an actual need for that. If you're wondering what "USING NOT heap" falls back to, I thinkthat could error, or it could just use heap anyway. Whatever. That's why I'm still soliciting for comments at thisphase rather than posting a patch. > On top of that > default_table_access_method is user-settable. Yeah, but specifying a "USING foo" clause is also open to any user, so I don't see why this matters. "USING NOT foo" isjust shorthand for checking the current default_table_access_method, and then either appending a "USING heap" clause orappending no clause. Since the user can do this anyway, what's the security implication in some syntactic sugar? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company