Thread: Extending USING [heap | mytam | yourtam] grammar and behavior

Extending USING [heap | mytam | yourtam] grammar and behavior

From
Mark Dilger
Date:
Hackers,

I have extended the grammar to allow "USING NOT method [, ...]" to exclude one or more TAMs in a CREATE TABLE
statement. This may sound like a weird thing to do, but it is surprisingly useful when developing new Table Access
Methods,particularly when you are developing two or more, not just one.  To explain: 

Developing a new TAM takes an awful lot of testing, and much of it is duplicative of the existing core regression test
suite. Leveraging the existing tests saves an awful lot of test development. 

When developing just one TAM, leveraging the existing tests isn't too hard.  Without much work*, you can set
default_table_access_method=mytamfor the duration of the check-world.  You'll get a few test failures this way.  Some
willbe in tests that probe the catalogs to verify that /heap/ is stored there, and instead /mytam/ is found.  Others
willbe tests that are sensitive to the number of rows that fit per page, etc.  But a surprising number of tests just
pass,at least after you get the TAM itself debugged. 

When developing two or more TAMs, this falls apart.  Some tests may be worth fixing up (perhaps with alternate output
files)for "mytam", but not for "columnar_tam".  That might be because the test is checking fundamentally row-store-ish
propertiesof the table, which has no applicability to your column-store-ish TAM.  In that case, "USING NOT
columnar_tam"fixes the test failure when columnar is the default, without preventing the test from testing "mytam" when
ithappens to be the default. 

Once you have enough TAMs developed and deployed, this USING NOT business becomes useful in production.  You might have
differentdefaults on different servers, or for different customers, etc., and for a given piece of DDL that you want to
releaseyou only want to say which TAMs not to use, not to nail down which TAM must be used. 

Thoughts?  I'll hold off posting a patch until the general idea is debated.


[*] It takes some extra work to get the TAP tests to play along.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Extending USING [heap | mytam | yourtam] grammar and behavior

From
Michael Paquier
Date:
On Wed, Jun 15, 2022 at 06:16:21PM -0700, Mark Dilger wrote:
> When developing two or more TAMs, this falls apart.  Some tests may
> be worth fixing up (perhaps with alternate output files) for
> "mytam", but not for "columnar_tam".  That might be because the test
> is checking fundamentally row-store-ish properties of the table,
> which has no applicability to your column-store-ish TAM.  In that
> case, "USING NOT columnar_tam" fixes the test failure when columnar
> is the default, without preventing the test from testing "mytam"
> when it happens to be the default.

I think that it is very important for the in-core test suite to remain
transparent in terms of options used for table AMs (or compression),
and this has improved a lot over the last years with options like
HIDE_TABLEAM and HIDE_TOAST_COMPRESSION.  Things could have actually
more ORDER BY clauses to ensure more ordering of the results, as long
as the tests don't want to stress a specific planning path.  However,
your problem is basically that you develop multiple AMs, but you want
to have regression tests that do checks across more than one table AM
at the same time.  Am I getting that right?  Why is a grammar
extension necessary for what looks like a test structure problem when
there are interdependencies across multiple AMs developped?

> Once you have enough TAMs developed and deployed, this USING NOT
> business becomes useful in production.  You might have different
> defaults on different servers, or for different customers, etc., and
> for a given piece of DDL that you want to release you only want to
> say which TAMs not to use, not to nail down which TAM must be used.

I am not sure to see why this would be something users would actually
use in prod.  That means to pick up something else than what the
server thinks is the best default AM but where somebody does not want
to trust the default, while generating an error if specifying the
default AM in the USING NOT clause.  On top of that
default_table_access_method is user-settable.
--
Michael

Attachment

Re: Extending USING [heap | mytam | yourtam] grammar and behavior

From
"David G. Johnston"
Date:
On Wed, Jun 15, 2022 at 8:51 PM Michael Paquier <michael@paquier.xyz> wrote:
On top of that
default_table_access_method is user-settable.


FWIW this proposal acknowledges that and basically leverages it to the hilt, turning it into something like search_path.  I strongly dislike the idea of any workflow that depends on a GUC in this manner.  The fact that it is user-settable is, IMO, a flaw, not a feature, at least as far as production settings are concerned.

It is a novel API for PostgreSQL to rely upon setting a GUC then attaching "unless" configurations to individual objects to ignore it.  And what would be chosen (ultimately fallback is heap?), or whether it would simply error, is presently, as you say, undefined.

In production this general behavior becomes useful only under the condition that among the various named access methods some of them don't even exist on the server in question, but that a fallback option would be acceptable in that case.  But that suggests extending "USING" to accept multiple names, not inventing a "NOT USING".

That all said, I can understand that testing presents its own special needs.  But testing is probably where GUCs shine.  So why not implement this capability as a GUC that is set just before the table is created instead of extending the grammar for it?  Add it to "developer options" and call it a day.  Dump/Restore no longer has to care about it, and its value once the table exists is basically zero anyway.

David J.

Re: Extending USING [heap | mytam | yourtam] grammar and behavior

From
Mark Dilger
Date:

> On Jun 15, 2022, at 8:51 PM, Michael Paquier <michael@paquier.xyz> wrote:
>
> However,
> your problem is basically that you develop multiple AMs, but you want
> to have regression tests that do checks across more than one table AM
> at the same time.

It is true that I test multiple table AMs at the same time, but that's a somewhat different concern.

>  Am I getting that right?

Not exactly.

> Why is a grammar
> extension necessary for what looks like a test structure problem when
> there are interdependencies across multiple AMs developped?

Ok, I didn't want to get into my exact process, because it involves other changes that I don't expect -hackers to want.
But basically what I do is: 

./configure --with-default-tam=chicago && make && make check-world

That fails for a few tests, and I manually change the create table statements in tests that are not chicago-compatible
to"using not chicago".  Then 

./configure --with-default-tam=detroit && make && make check-world

That fails for some other set of tests, but note that the tests with "using not chicago" are still using detroit in
thissecond run.  That wouldn't be true if I'd fixed up the tests in the first run "using heap". 

Then I can also add my own tests which might make some chicago backed tables plus some detroit backed tables and see
howthey interact.  But that's superfluous to the issue of just trying to leverage the existing tests as much as I can
withouthaving to reinvent tests to cover "chicago", and then reinvent again to cover "detroit", and so forth. 

If you develop enough TAMs in parallel, and go with the "using heap" solution, you eventually have zero coverage for
anyof the TAMs, because you'll eventually be "using heap" in all the tables of all the tests. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Extending USING [heap | mytam | yourtam] grammar and behavior

From
Mark Dilger
Date:

> On Jun 15, 2022, at 8:51 PM, Michael Paquier <michael@paquier.xyz> wrote:
>
> I am not sure to see why this would be something users would actually
> use in prod.  That means to pick up something else than what the
> server thinks is the best default AM but where somebody does not want
> to trust the default, while generating an error if specifying the
> default AM in the USING NOT clause.

Sorry for the lack of clarity.  I do not suggest raising an error.  If you say "USING NOT foo", and foo is the default
tableaccess method, then you get the same behavior as a "USING heap" would have gotten you, otherwise, you get the same
behavioras not providing any USING clause at all. 

In future, we might want to create a list of fallback tams rather than just hardcoding "heap" as the one and only
fallback,but I haven't run into an actual need for that.  If you're wondering what "USING NOT heap" falls back to, I
thinkthat could error, or it could just use heap anyway.  Whatever.  That's why I'm still soliciting for comments at
thisphase rather than posting a patch. 

>  On top of that
> default_table_access_method is user-settable.

Yeah, but specifying a "USING foo" clause is also open to any user, so I don't see why this matters.  "USING NOT foo"
isjust shorthand for checking the current default_table_access_method, and then either appending a "USING heap" clause
orappending no clause.  Since the user can do this anyway, what's the security implication in some syntactic sugar? 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company