Thread: Displaying and dumping of table access methods
Hi, Over in [1] we're discussing the development of the pluggable storage patchset, which allows different ways of storing table data. One thing I'd like to discuss with a wider audience than the implementation details is psql and pg_dump handling of table access methods. Currently the patchset neither dumps nor displays table access methods . That's clearly not right. The reason for that however is not that it's hard to dump/display code-wise, but that to me the correct behaviour is not obvious. The reason to make table storage pluggable is after all that the table access method can be changed, and part of developing new table access methods is being able to run the regression tests. A patch at [2] adds display of a table's access method to \d+ - but that means that running the tests with a different default table access method (e.g. using PGOPTIONS='-c default_table_access_method=...) there'll be a significant number of test failures, even though the test results did not meaningfully differ. Similarly, if pg_dump starts to dump table access methods either unconditionally, or for all non-heap AMS, the pg_dump tests fail due to unimportant differences. A third issue, less important in my opinion, is that specifying the table access method means that it's harder to dump/restore into a table with a different AM. One way to solve this would be for psql/pg_dump to only define the table access methods for tables that differ from the currently configured default_table_access_method. That'd mean that only a few tests that intentionally test AMs would display/dump the access method. On the other hand that seems like it's a bit too much magic. While I don't like that option, I haven't really come up with something better. Having alternative outputs for nearly every test file for e.g. zheap if/when we merge it, seems unmaintainable. It's less insane for the pg_dump tests. An alternative approach based on that would be to hack pg_regress to magically ignore "Access method: ..." type differences, but that seems like a bad idea to me. Greetings, Andres Freund [1] https://postgr.es/m/20180703070645.wchpu5muyto5n647%40alap3.anarazel.de [2] https://postgr.es/m/CA+q6zcWMHSbLkKO7eq95t15m3R1X9FCcm0kT3dGS2gGSRO9kKw@mail.gmail.com [3] https://postgr.es/m/20181215193700.nov7bphxyge4qkez@alap3.anarazel.de
Greetings, * Andres Freund (andres@anarazel.de) wrote: > Over in [1] we're discussing the development of the pluggable storage > patchset, which allows different ways of storing table data. > > One thing I'd like to discuss with a wider audience than the > implementation details is psql and pg_dump handling of table access > methods. > > Currently the patchset neither dumps nor displays table access > methods . That's clearly not right. Agreed. > The reason for that however is not that it's hard to dump/display > code-wise, but that to me the correct behaviour is not obvious. While it might be a lot of changes to the regression output results, I tend to feel that showng the access method is a very important aspect of the relation and therefore should be in \d output. > The reason to make table storage pluggable is after all that the table > access method can be changed, and part of developing new table access > methods is being able to run the regression tests. We certainly want people to be able to run the regression tests, but it feels like we will need more regression tests in the future as we wish to cover both the built-in heap AM and the new zheap AM, so we should really have those both run independently. I don't think we'll be able to have just one set of regression tests that cover everything interesting for both, sadly. Perhaps there's a way to have a set of regression tests which are run for both, and another set that's run for each, but building all of that logic is a fair bit of work and I'm not sure how much it's really saving us. > A patch at [2] adds display of a table's access method to \d+ - but that > means that running the tests with a different default table access > method (e.g. using PGOPTIONS='-c default_table_access_method=...) > there'll be a significant number of test failures, even though the test > results did not meaningfully differ. Yeah, I'm not really thrilled with this approach. > Similarly, if pg_dump starts to dump table access methods either > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to > unimportant differences. In reality, pg_dump already depends on quite a few defaults to be in place, so I don't see a particular issue with adding this into that set. New tests would need to have new pg_dump checks, of course, but that's generally the case as well. > A third issue, less important in my opinion, is that specifying the > table access method means that it's harder to dump/restore into a table > with a different AM. I understand this concern but I view it as an independent consideration. There are a lot of transformations which one might wish for when dumping and restoring data, a number of which are handled through various options (--no-owner, --no-acls, etc) and it seems like we could do something similar here. > One way to solve this would be for psql/pg_dump to only define the table > access methods for tables that differ from the currently configured > default_table_access_method. That'd mean that only a few tests that > intentionally test AMs would display/dump the access method. On the > other hand that seems like it's a bit too much magic. I'm not a fan of depending on the currently set default_table_access_method.. When would that be set in the pg_restore process? Or in the SQL script that's created? Really though, that does strike me as quite a bit of magic. > While I don't like that option, I haven't really come up with something > better. Having alternative outputs for nearly every test file for > e.g. zheap if/when we merge it, seems unmaintainable. It's less insane > for the pg_dump tests. I'm thinking less of alternative output files and more of additional tests for zheap cases... > An alternative approach based on that would be to hack pg_regress to > magically ignore "Access method: ..." type differences, but that seems > like a bad idea to me. I agree, that's not a good idea. Thanks! Stephen
Attachment
Hi, On 2019-01-07 19:19:46 -0500, Stephen Frost wrote: > Greetings, > > * Andres Freund (andres@anarazel.de) wrote: > > Over in [1] we're discussing the development of the pluggable storage > > patchset, which allows different ways of storing table data. > > > > One thing I'd like to discuss with a wider audience than the > > implementation details is psql and pg_dump handling of table access > > methods. > > > > Currently the patchset neither dumps nor displays table access > > methods . That's clearly not right. > > Agreed. > > > The reason for that however is not that it's hard to dump/display > > code-wise, but that to me the correct behaviour is not obvious. > > While it might be a lot of changes to the regression output results, I > tend to feel that showng the access method is a very important aspect of > the relation and therefore should be in \d output. I don't see how that'd be feasible. Imo it is/was absolutely crucial for zheap development to be able to use the existing postgres tests. > > The reason to make table storage pluggable is after all that the table > > access method can be changed, and part of developing new table access > > methods is being able to run the regression tests. > > We certainly want people to be able to run the regression tests, but it > feels like we will need more regression tests in the future as we wish > to cover both the built-in heap AM and the new zheap AM, so we should > really have those both run independently. I don't think we'll be able > to have just one set of regression tests that cover everything > interesting for both, sadly. Perhaps there's a way to have a set of > regression tests which are run for both, and another set that's run for > each, but building all of that logic is a fair bit of work and I'm not > sure how much it's really saving us. I don't think there's any sort of contradiction here. I don't think it's feasible to have tests tests for every feature duplicated for each supported AM, we have enough difficulty maintaining our current tests. But that doesn't mean it's a problem to have individual test [files] run with an explicitly assigned AM - the test can just do a SET default_table_access_method = zheap; or explicitly say USING zheap. > > A patch at [2] adds display of a table's access method to \d+ - but that > > means that running the tests with a different default table access > > method (e.g. using PGOPTIONS='-c default_table_access_method=...) > > there'll be a significant number of test failures, even though the test > > results did not meaningfully differ. > > Yeah, I'm not really thrilled with this approach. > > > Similarly, if pg_dump starts to dump table access methods either > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to > > unimportant differences. > > In reality, pg_dump already depends on quite a few defaults to be in > place, so I don't see a particular issue with adding this into that set. > New tests would need to have new pg_dump checks, of course, but that's > generally the case as well. I am not sure what you mean here? Are you suggesting that there'd be a separate set of pg_dump test for zheap and every other possible future AM? To me the approach you're suggesting is going to lead to an explosion of redundant tests, which are really hard to maintain, especially for out-of-tree AMs. Out of tree AMs with the setup I propose can just install the AM into the template database and set PGOPTIONS, and they have pretty good coverage. Greetings, Andres Freund
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-01-07 19:19:46 -0500, Stephen Frost wrote: > > * Andres Freund (andres@anarazel.de) wrote: > > > Over in [1] we're discussing the development of the pluggable storage > > > patchset, which allows different ways of storing table data. > > > > > > One thing I'd like to discuss with a wider audience than the > > > implementation details is psql and pg_dump handling of table access > > > methods. > > > > > > Currently the patchset neither dumps nor displays table access > > > methods . That's clearly not right. > > > > Agreed. > > > > > The reason for that however is not that it's hard to dump/display > > > code-wise, but that to me the correct behaviour is not obvious. > > > > While it might be a lot of changes to the regression output results, I > > tend to feel that showng the access method is a very important aspect of > > the relation and therefore should be in \d output. > > I don't see how that'd be feasible. Imo it is/was absolutely crucial > for zheap development to be able to use the existing postgres tests. I don't agree with the general assumption that "we did this for development and therefore it should be done that way forever". Instead, I would look at adding tests where there's a difference between the two, or possibly some difference, and make sure that there isn't, and make sure that the code paths are covered. > > > The reason to make table storage pluggable is after all that the table > > > access method can be changed, and part of developing new table access > > > methods is being able to run the regression tests. > > > > We certainly want people to be able to run the regression tests, but it > > feels like we will need more regression tests in the future as we wish > > to cover both the built-in heap AM and the new zheap AM, so we should > > really have those both run independently. I don't think we'll be able > > to have just one set of regression tests that cover everything > > interesting for both, sadly. Perhaps there's a way to have a set of > > regression tests which are run for both, and another set that's run for > > each, but building all of that logic is a fair bit of work and I'm not > > sure how much it's really saving us. > > I don't think there's any sort of contradiction here. I don't think it's > feasible to have tests tests for every feature duplicated for each > supported AM, we have enough difficulty maintaining our current > tests. But that doesn't mean it's a problem to have individual test > [files] run with an explicitly assigned AM - the test can just do a SET > default_table_access_method = zheap; or explicitly say USING zheap. I don't mean to suggest that there's a contradiction. I don't have any problem with new tests being added which set the default AM to zheap, as long as it's clear that such is happening for downstream tests. > > > A patch at [2] adds display of a table's access method to \d+ - but that > > > means that running the tests with a different default table access > > > method (e.g. using PGOPTIONS='-c default_table_access_method=...) > > > there'll be a significant number of test failures, even though the test > > > results did not meaningfully differ. > > > > Yeah, I'm not really thrilled with this approach. > > > > > Similarly, if pg_dump starts to dump table access methods either > > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to > > > unimportant differences. > > > > In reality, pg_dump already depends on quite a few defaults to be in > > place, so I don't see a particular issue with adding this into that set. > > New tests would need to have new pg_dump checks, of course, but that's > > generally the case as well. > > I am not sure what you mean here? Are you suggesting that there'd be a > separate set of pg_dump test for zheap and every other possible future > AM? I'm suggesting that pg_dump would have additional tests for zheap, in addition to the existing tests we already have. No more, no less, really. > To me the approach you're suggesting is going to lead to an explosion of > redundant tests, which are really hard to maintain, especially for > out-of-tree AMs. Out of tree AMs with the setup I propose can just > install the AM into the template database and set PGOPTIONS, and they > have pretty good coverage. I'm frankly much less interested in out-of-tree AMs as we aren't going to have in-core regression tests for them anyway, because we can't as they aren't in our tree and, ultimately, I don't find them to have anywhere near the same value that in-core AMs have. I don't have any problem with out-of-tree AMs hacking things up as they see fit and then running whatever tests they want, but it is, and always will be, a very different discussion and ultimately a different result when we're talking about what will be incorporated and supported as part of core. Thanks! Stephen
Attachment
Hi, On 2019-01-07 20:30:13 -0500, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > On 2019-01-07 19:19:46 -0500, Stephen Frost wrote: > > > * Andres Freund (andres@anarazel.de) wrote: > > > > Over in [1] we're discussing the development of the pluggable storage > > > > patchset, which allows different ways of storing table data. > > > > > > > > One thing I'd like to discuss with a wider audience than the > > > > implementation details is psql and pg_dump handling of table access > > > > methods. > > > > > > > > Currently the patchset neither dumps nor displays table access > > > > methods . That's clearly not right. > > > > > > Agreed. > > > > > > > The reason for that however is not that it's hard to dump/display > > > > code-wise, but that to me the correct behaviour is not obvious. > > > > > > While it might be a lot of changes to the regression output results, I > > > tend to feel that showng the access method is a very important aspect of > > > the relation and therefore should be in \d output. > > > > I don't see how that'd be feasible. Imo it is/was absolutely crucial > > for zheap development to be able to use the existing postgres tests. > > I don't agree with the general assumption that "we did this for > development and therefore it should be done that way forever". > > Instead, I would look at adding tests where there's a difference > between the two, or possibly some difference, and make sure that there > isn't, and make sure that the code paths are covered. I think this approach makes no sense whatsover. It's entirely possible to encounter bugs in table AM relevant code in places one would not think so. But even if one were, foolishly, to exclude those, the pieces of code we know are highly affected by the way the AM works are a a significant (at the very least 10-20% of tests) percentage of our tests. Duplicating them even just between heap and zheap would be a major technical debt. DML, ON CONFLICT, just about all isolationtester test, etc all are absolutely crucial to test different AMs for correctness. > > To me the approach you're suggesting is going to lead to an explosion of > > redundant tests, which are really hard to maintain, especially for > > out-of-tree AMs. Out of tree AMs with the setup I propose can just > > install the AM into the template database and set PGOPTIONS, and they > > have pretty good coverage. > > I'm frankly much less interested in out-of-tree AMs as we aren't going > to have in-core regression tests for them anyway, because we can't as > they aren't in our tree and, ultimately, I don't find them to have > anywhere near the same value that in-core AMs have. I think you must be missing my point: Adding spurious differences due to "Table Access Method: heap" vs "Table Access Method: blarg" makes it unnecessarily hard to reuse the in-core tests for any additional AM, be it in-core or out of core. I fail to see what us not having explicit tests for such AMs in core has to do with my point. Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly* better from a maintainability POV than including the AM in the output. Andres
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-01-07 20:30:13 -0500, Stephen Frost wrote: > > * Andres Freund (andres@anarazel.de) wrote: > > > On 2019-01-07 19:19:46 -0500, Stephen Frost wrote: > > > > * Andres Freund (andres@anarazel.de) wrote: > > > > > Over in [1] we're discussing the development of the pluggable storage > > > > > patchset, which allows different ways of storing table data. > > > > > > > > > > One thing I'd like to discuss with a wider audience than the > > > > > implementation details is psql and pg_dump handling of table access > > > > > methods. > > > > > > > > > > Currently the patchset neither dumps nor displays table access > > > > > methods . That's clearly not right. > > > > > > > > Agreed. > > > > > > > > > The reason for that however is not that it's hard to dump/display > > > > > code-wise, but that to me the correct behaviour is not obvious. > > > > > > > > While it might be a lot of changes to the regression output results, I > > > > tend to feel that showng the access method is a very important aspect of > > > > the relation and therefore should be in \d output. > > > > > > I don't see how that'd be feasible. Imo it is/was absolutely crucial > > > for zheap development to be able to use the existing postgres tests. > > > > I don't agree with the general assumption that "we did this for > > development and therefore it should be done that way forever". > > > > Instead, I would look at adding tests where there's a difference > > between the two, or possibly some difference, and make sure that there > > isn't, and make sure that the code paths are covered. > > I think this approach makes no sense whatsover. It's entirely possible > to encounter bugs in table AM relevant code in places one would not > think so. But even if one were, foolishly, to exclude those, the pieces > of code we know are highly affected by the way the AM works are a a > significant (at the very least 10-20% of tests) percentage of our > tests. Duplicating them even just between heap and zheap would be a > major technical debt. DML, ON CONFLICT, just about all isolationtester > test, etc all are absolutely crucial to test different AMs for > correctness. I am generally on board with minimizing the amount of duplicate code that we have, but we must run those tests independently, so it's really a question of if we build a system where we can parametize a set of tests to run and then run them for every AM and compare the output to either the generalized output or the per-AM output, or if we build on the existing system and simply have an independent set of tests. It's not clear to me, at the current point, which will be the lower level of ongoing effort, but when it comes to the effort required today, it seems pretty clear to me that whacking around the current test environment to rerun tests is a larger amount of effort. If that's the requirement, then so be it and I'm on-board, but I'm also open to considering a lesser requirement for a completely new feature. > > > To me the approach you're suggesting is going to lead to an explosion of > > > redundant tests, which are really hard to maintain, especially for > > > out-of-tree AMs. Out of tree AMs with the setup I propose can just > > > install the AM into the template database and set PGOPTIONS, and they > > > have pretty good coverage. > > > > I'm frankly much less interested in out-of-tree AMs as we aren't going > > to have in-core regression tests for them anyway, because we can't as > > they aren't in our tree and, ultimately, I don't find them to have > > anywhere near the same value that in-core AMs have. > > I think you must be missing my point: Adding spurious differences due to > "Table Access Method: heap" vs "Table Access Method: blarg" makes it > unnecessarily hard to reuse the in-core tests for any additional AM, be > it in-core or out of core. I fail to see what us not having explicit > tests for such AMs in core has to do with my point. I don't think I'm missing your point. If you believe that we should be swayed by this argument into agreeing to change what we believe the user-facing psql \d output should be, then I am very hopeful that you're completely wrong. The psql \d output should be driven by what will be best for our users, not by what's best by external AMs or, really, anything else. > Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or > HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly* > better from a maintainability POV than including the AM in the output. I'm pretty sure I said in my last reply that I'm alright with psql and pg_dump not outputting a result for the default value, provided the default is understood to always really be the default, but, again, what we should be concerned about here is what the end user is dealing with and I'm not particularly incensed to support something different even if it's around a variable of some kind for external AMs, or external *whatevers*. I'm also a bit confused as to why we are spending a good bit of time arguing about external AMs without any discussion or definition of what they are or what their requirements are. If such things seriously exist, then let us talk about them and try to come up with solutions for them; if they don't, then we can talk about them when they come up. Thanks! Stephen
Attachment
Hi, On 2019-01-07 21:08:58 -0500, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > On 2019-01-07 20:30:13 -0500, Stephen Frost wrote: > > > I don't agree with the general assumption that "we did this for > > > development and therefore it should be done that way forever". > > > > > > Instead, I would look at adding tests where there's a difference > > > between the two, or possibly some difference, and make sure that there > > > isn't, and make sure that the code paths are covered. > > > > I think this approach makes no sense whatsover. It's entirely possible > > to encounter bugs in table AM relevant code in places one would not > > think so. But even if one were, foolishly, to exclude those, the pieces > > of code we know are highly affected by the way the AM works are a a > > significant (at the very least 10-20% of tests) percentage of our > > tests. Duplicating them even just between heap and zheap would be a > > major technical debt. DML, ON CONFLICT, just about all isolationtester > > test, etc all are absolutely crucial to test different AMs for > > correctness. > > I am generally on board with minimizing the amount of duplicate code > that we have, but we must run those tests independently, so it's really > a question of if we build a system where we can parametize a set of > tests to run and then run them for every AM and compare the output to > either the generalized output or the per-AM output, or if we build on > the existing system and simply have an independent set of tests. It's > not clear to me, at the current point, which will be the lower level of > ongoing effort Huh? It's absolutely *trivial* from a buildsystem POV to run the tests again with a different default AM. That's precisely why I'm talking about this. Just setting PGOPTIONS='-c default_table_access_method=zheap' in the new makefile target (the ms run scripts are similar) is sufficient. And we don't need to force everyone to constantly run tests with e.g. both heap and zheap, it's sufficient to do so on a few buildfarm machines, and whenever changing AM level code. Rerunning all the tests with a different AM is just setting the same environment variable, but running check-world as the target. Obviously that does not preclude a few tests that explicitly test features specific to an AM. E.g. the zheap branch's tests have an explicit zheap regression file and a few zheap specific isolationtester spec files that a) test zheap specific beheaviour b) make sure that the most basic zheap behaviour is tested even when not running the tests with zheap as the default AM. And even if you were to successfully argue that it's sufficient during normal development to only have a few zheap specific additional tests, we'd certainly want to make it possible to occasionally explicitly run the rest of the tests under zheap to see whether additional stuff has been broken - and that's much harder to sift through if there's a lot of spurious test failures due to \d[+] outputting additional/differing data. > ..., but when it comes to the effort required today, it seems pretty > clear to me that whacking around the current test environment to rerun > tests is a larger amount of effort. How did you come to that conclusion? Adding a makefile and vcregress.pl target is pretty trivial. > > Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or > > HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly* > > better from a maintainability POV than including the AM in the output. > > I'm pretty sure I said in my last reply that I'm alright with psql and > pg_dump not outputting a result for the default value I don't see that anywhere in your replies. Are you referring to: > I don't have any problem with new tests being added which set the > default AM to zheap, as long as it's clear that such is happening for > downstream tests. ? If so, that's not addressing the reason why I'm suggesting to have something like HIDE_TABLE_AMS. The point is that that'd allow us to cater the default psql output to humans, while still choosing not to display the AM for regression tests (thereby allowing to run the tests). > provided the default is understood to always really be the default What do you mean by that? Are you arguing that it should be impossible in test scenarios to override default_table_access_method? Or that pg_dump/psql should check for a hardcoded 'heap' AM (via a macro or whatnot)? > I'm also a bit confused as to why we are spending a good bit of time > arguing about external AMs without any discussion or definition of what > they are or what their requirements are. If such things seriously > exist, then let us talk about them and try to come up with solutions for > them; if they don't, then we can talk about them when they come up. We are working seriously hard on making AMs pluggable. Zheap is not yet, and won't be that soon, part of core. The concerns for an in-core zheap (which needs to maintain the test infrastructure during the remainder of its out-of-core development!) and out-of-core AMs are pretty aligned. I don't get your confusion. Greetings, Andres Freund
On Mon, Jan 07, 2019 at 06:31:52PM -0800, Andres Freund wrote: > Huh? It's absolutely *trivial* from a buildsystem POV to run the tests > again with a different default AM. That's precisely why I'm talking > about this. Just setting PGOPTIONS='-c > default_table_access_method=zheap' in the new makefile target (the ms > run scripts are similar) is sufficient. And we don't need to force > everyone to constantly run tests with e.g. both heap and zheap, it's > sufficient to do so on a few buildfarm machines, and whenever changing > AM level code. Rerunning all the tests with a different AM is just > setting the same environment variable, but running check-world as the > target. Another point is that having default_table_access_method facilitates the restore of tables across AMs similarly to tablespaces, so CREATE TABLE dumps should not include the AM part. > And even if you were to successfully argue that it's sufficient during > normal development to only have a few zheap specific additional tests, > we'd certainly want to make it possible to occasionally explicitly run > the rest of the tests under zheap to see whether additional stuff has > been broken - and that's much harder to sift through if there's a lot of > spurious test failures due to \d[+] outputting additional/differing > data. The specific-heap tests could be included as an extra module in src/test/modules easily, so removing from the main tests what is not completely transparent may make sense. Most users use make-check to test a patch quickly, so we could miss some bugs because of that during review. Still, people seem to be better-educated lately in the fact that they need to do an actual check-world when checking a patch at full. So personally I can live with a split where it makes sense. Being able to easily validate an AM implementation would be nice. Isolation tests may be another deal though for DMLs. > We are working seriously hard on making AMs pluggable. Zheap is not yet, > and won't be that soon, part of core. The concerns for an in-core zheap > (which needs to maintain the test infrastructure during the remainder of > its out-of-core development!) and out-of-core AMs are pretty aligned. I > don't get your confusion. I would imagine that a full-fledged AM should be able to maintain compatibility with the full set of queries that heap is able to support, so if you can make the tests transparent enough so as they can be run for any AMs without alternate input in the core tree, then that's a goal worth it. Don't you have plan inconsistencies as well with zheap? In short, improving portability incrementally is good for the long-term prospective. From that point of view adding the AM to \d+ output may be a bad idea, as there are modules out of core which rely on psql meta-commands, and it would be nice to be able to test those tests as well for those plugins with different types of AMs. -- Michael
Attachment
On Tue, Jan 8, 2019 at 3:02 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Jan 07, 2019 at 06:31:52PM -0800, Andres Freund wrote:
> Huh? It's absolutely *trivial* from a buildsystem POV to run the tests
> again with a different default AM. That's precisely why I'm talking
> about this. Just setting PGOPTIONS='-c
> default_table_access_method=zheap' in the new makefile target (the ms
> run scripts are similar) is sufficient. And we don't need to force
> everyone to constantly run tests with e.g. both heap and zheap, it's
> sufficient to do so on a few buildfarm machines, and whenever changing
> AM level code. Rerunning all the tests with a different AM is just
> setting the same environment variable, but running check-world as the
> target.
PGOPTIONS or any similar options are good for the AM development
to test their AM's with all the existing PostgreSQL features.
Another point is that having default_table_access_method facilitates
the restore of tables across AMs similarly to tablespaces, so CREATE
TABLE dumps should not include the AM part.
+1 to the above approach to dump "set default_table_access_method".
> And even if you were to successfully argue that it's sufficient during
> normal development to only have a few zheap specific additional tests,
> we'd certainly want to make it possible to occasionally explicitly run
> the rest of the tests under zheap to see whether additional stuff has
> been broken - and that's much harder to sift through if there's a lot of
> spurious test failures due to \d[+] outputting additional/differing
> data.
The specific-heap tests could be included as an extra module in
src/test/modules easily, so removing from the main tests what is not
completely transparent may make sense. Most users use make-check to
test a patch quickly, so we could miss some bugs because of that
during review. Still, people seem to be better-educated lately in the
fact that they need to do an actual check-world when checking a patch
at full. So personally I can live with a split where it makes sense.
Being able to easily validate an AM implementation would be nice.
Isolation tests may be another deal though for DMLs.
> We are working seriously hard on making AMs pluggable. Zheap is not yet,
> and won't be that soon, part of core. The concerns for an in-core zheap
> (which needs to maintain the test infrastructure during the remainder of
> its out-of-core development!) and out-of-core AMs are pretty aligned. I
> don't get your confusion.
I would imagine that a full-fledged AM should be able to maintain
compatibility with the full set of queries that heap is able to
support, so if you can make the tests transparent enough so as they
can be run for any AMs without alternate input in the core tree, then
that's a goal worth it. Don't you have plan inconsistencies as well
with zheap?
In short, improving portability incrementally is good for the
long-term prospective. From that point of view adding the AM to \d+
output may be a bad idea, as there are modules out of core which
rely on psql meta-commands, and it would be nice to be able to test
those tests as well for those plugins with different types of AMs.
failures. Currently \d+ also doesn't show all the details of the relation like
owner and etc.
Regards,
Haribabu Kommi
Fujitsu Australia
On 08/01/2019 00:56, Andres Freund wrote: > A patch at [2] adds display of a table's access method to \d+ - but that > means that running the tests with a different default table access > method (e.g. using PGOPTIONS='-c default_table_access_method=...) > there'll be a significant number of test failures, even though the test > results did not meaningfully differ. For psql, a variable that hides the access method if it's the default. > Similarly, if pg_dump starts to dump table access methods either > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to > unimportant differences. For pg_dump, track and set the default_table_access_method setting throughout the dump (similar to how default_with_oids was handled, I believe). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-01-08 13:02:00 +0900, Michael Paquier wrote: > On Mon, Jan 07, 2019 at 06:31:52PM -0800, Andres Freund wrote: > > Huh? It's absolutely *trivial* from a buildsystem POV to run the tests > > again with a different default AM. That's precisely why I'm talking > > about this. Just setting PGOPTIONS='-c > > default_table_access_method=zheap' in the new makefile target (the ms > > run scripts are similar) is sufficient. And we don't need to force > > everyone to constantly run tests with e.g. both heap and zheap, it's > > sufficient to do so on a few buildfarm machines, and whenever changing > > AM level code. Rerunning all the tests with a different AM is just > > setting the same environment variable, but running check-world as the > > target. > > Another point is that having default_table_access_method facilitates > the restore of tables across AMs similarly to tablespaces, so CREATE > TABLE dumps should not include the AM part. That's what I suggested in the first message in this thread, or did I miss a difference? > > And even if you were to successfully argue that it's sufficient during > > normal development to only have a few zheap specific additional tests, > > we'd certainly want to make it possible to occasionally explicitly run > > the rest of the tests under zheap to see whether additional stuff has > > been broken - and that's much harder to sift through if there's a lot of > > spurious test failures due to \d[+] outputting additional/differing > > data. > > The specific-heap tests could be included as an extra module in > src/test/modules easily, so removing from the main tests what is not > completely transparent may make sense. Why does it need to be an extra module, rather than just exta regression files / sections of files that just explicitly specify the AM? Seems a lot easier and faster. > > We are working seriously hard on making AMs pluggable. Zheap is not yet, > > and won't be that soon, part of core. The concerns for an in-core zheap > > (which needs to maintain the test infrastructure during the remainder of > > its out-of-core development!) and out-of-core AMs are pretty aligned. I > > don't get your confusion. > > I would imagine that a full-fledged AM should be able to maintain > compatibility with the full set of queries that heap is able to > support, so if you can make the tests transparent enough so as they > can be run for any AMs without alternate input in the core tree, then > that's a goal worth it. Don't you have plan inconsistencies as well > with zheap? In the core tests there's a fair number of things that can be cured by adding an ORDER BY to the tests, which seems sensible. We've added a lot of those over the years anyway. There's additionally a number of plans that change, which currently is handled by alternatives output files, but I think we should move to reduce those differences, that's probably not too hard. Greetings, Andres Freund
Hi, On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote: > On 08/01/2019 00:56, Andres Freund wrote: > > A patch at [2] adds display of a table's access method to \d+ - but that > > means that running the tests with a different default table access > > method (e.g. using PGOPTIONS='-c default_table_access_method=...) > > there'll be a significant number of test failures, even though the test > > results did not meaningfully differ. > > For psql, a variable that hides the access method if it's the default. Yea, I think that seems the least contentious solution. Don't like it too much, but it seems better than the alternative. I wonder if we want one for multiple regression related issues, or whether one specifically about table AMs is more appropriate. I lean towards the latter. > > Similarly, if pg_dump starts to dump table access methods either > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to > > unimportant differences. > > For pg_dump, track and set the default_table_access_method setting > throughout the dump (similar to how default_with_oids was handled, I > believe). Yea, that's similar to that, and I think that makes sense. Greetings, Andres Freund
On Tue, Jan 08, 2019 at 09:29:49AM -0800, Andres Freund wrote: > On 2019-01-08 13:02:00 +0900, Michael Paquier wrote: >> The specific-heap tests could be included as an extra module in >> src/test/modules easily, so removing from the main tests what is not >> completely transparent may make sense. > > Why does it need to be an extra module, rather than just extra regression > files / sections of files that just explicitly specify the AM? Seems a > lot easier and faster. The point would be to keep individual Makefiles simpler to maintain, and separating things can make it simpler. I cannot say for sure without seeing how things would change though based on what you are suggesting, and that may finish by being a matter of taste. > In the core tests there's a fair number of things that can be cured by > adding an ORDER BY to the tests, which seems sensible. We've added a lot > of those over the years anyway. When working on Postgres-XC I cursed about the need to add many ORDER BY queries to ensure the ordering of tuples fetched from different nodes, and we actually had an option to enforce the default distribution used by tables, so that would be really nice to close the gap. > There's additionally a number of plans > that change, which currently is handled by alternatives output files, > but I think we should move to reduce those differences, that's probably > not too hard. Okay, that's not surprising. I guess it depends on how many alternate files are needed and if it is possible to split things so as we avoid unnecessary output in alternate files. A lot of things you are proposing on this thread make sense in my experience. -- Michael
Attachment
> On Tue, Jan 8, 2019 at 6:34 PM Andres Freund <andres@anarazel.de> wrote: > > On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote: > > On 08/01/2019 00:56, Andres Freund wrote: > > > A patch at [2] adds display of a table's access method to \d+ - but that > > > means that running the tests with a different default table access > > > method (e.g. using PGOPTIONS='-c default_table_access_method=...) > > > there'll be a significant number of test failures, even though the test > > > results did not meaningfully differ. > > > > For psql, a variable that hides the access method if it's the default. > > Yea, I think that seems the least contentious solution. Don't like it > too much, but it seems better than the alternative. I wonder if we want > one for multiple regression related issues, or whether one specifically > about table AMs is more appropriate. I lean towards the latter. Are there any similar existing regression related issues? If no, then probably the latter indeed makes more sense. > > > Similarly, if pg_dump starts to dump table access methods either > > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to > > > unimportant differences. > > > > For pg_dump, track and set the default_table_access_method setting > > throughout the dump (similar to how default_with_oids was handled, I > > believe). > > Yea, that's similar to that, and I think that makes sense. Yes, sounds like a reasonable approach, I can proceed with it.
On Tue, Jan 8, 2019 at 11:04 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote: > > On 08/01/2019 00:56, Andres Freund wrote: > > > A patch at [2] adds display of a table's access method to \d+ - but that > > > means that running the tests with a different default table access > > > method (e.g. using PGOPTIONS='-c default_table_access_method=...) > > > there'll be a significant number of test failures, even though the test > > > results did not meaningfully differ. > > > > For psql, a variable that hides the access method if it's the default. > > Yea, I think that seems the least contentious solution. > +1. > Don't like it > too much, but it seems better than the alternative. I wonder if we want > one for multiple regression related issues, or whether one specifically > about table AMs is more appropriate. I lean towards the latter. > I didn't understand what is the earlier part "I wonder if we want one for multiple regression related issues". What do you mean by multiple regression related issues? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, On 2019-01-09 18:26:16 +0530, Amit Kapila wrote: > On Tue, Jan 8, 2019 at 11:04 PM Andres Freund <andres@anarazel.de> wrote: > +1. > > > Don't like it > > too much, but it seems better than the alternative. I wonder if we want > > one for multiple regression related issues, or whether one specifically > > about table AMs is more appropriate. I lean towards the latter. > > > > I didn't understand what is the earlier part "I wonder if we want one > for multiple regression related issues". What do you mean by multiple > regression related issues? One flag that covers all things that make psql output less useful for regression test output, or one flag that just controls the table access method display. Greetings, Andres Freund
On Wed, Jan 9, 2019 at 10:53 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2019-01-09 18:26:16 +0530, Amit Kapila wrote: > > On Tue, Jan 8, 2019 at 11:04 PM Andres Freund <andres@anarazel.de> wrote: > > +1. > > > > > Don't like it > > > too much, but it seems better than the alternative. I wonder if we want > > > one for multiple regression related issues, or whether one specifically > > > about table AMs is more appropriate. I lean towards the latter. > > > > > > > I didn't understand what is the earlier part "I wonder if we want one > > for multiple regression related issues". What do you mean by multiple > > regression related issues? > > One flag that covers all things that make psql output less useful for > regression test output, or one flag that just controls the table access > method display. > +1 for the later (one flag that just controls the table access method display) as that looks clean. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 9 Jan 2019 at 14:30, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Tue, Jan 8, 2019 at 6:34 PM Andres Freund <andres@anarazel.de> wrote: > > > > On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote: > > > On 08/01/2019 00:56, Andres Freund wrote: > > > > A patch at [2] adds display of a table's access method to \d+ - but that > > > > means that running the tests with a different default table access > > > > method (e.g. using PGOPTIONS='-c default_table_access_method=...) > > > > there'll be a significant number of test failures, even though the test > > > > results did not meaningfully differ. > > > > > > For psql, a variable that hides the access method if it's the default. > > > > Yea, I think that seems the least contentious solution. Don't like it > > too much, but it seems better than the alternative. I wonder if we want > > one for multiple regression related issues, or whether one specifically > > about table AMs is more appropriate. I lean towards the latter. > > Are there any similar existing regression related issues? If no, then probably > the latter indeed makes more sense. > > > > > Similarly, if pg_dump starts to dump table access methods either > > > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to > > > > unimportant differences. > > > > > > For pg_dump, track and set the default_table_access_method setting > > > throughout the dump (similar to how default_with_oids was handled, I > > > believe). > > > > Yea, that's similar to that, and I think that makes sense. > > Yes, sounds like a reasonable approach, I can proceed with it. Dmitry, I believe you have taken the pg_dump part only. If that's right, I can proceed with the psql part. Does that sound right ? > -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
On 10/01/2019 04:58, Amit Kapila wrote: >> One flag that covers all things that make psql output less useful for >> regression test output, or one flag that just controls the table access >> method display. >> > +1 for the later (one flag that just controls the table access method > display) as that looks clean. Yeah, I'd prefer a specific flag. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Fri, Jan 11, 2019 at 6:02 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > > Yes, sounds like a reasonable approach, I can proceed with it. > > Dmitry, I believe you have taken the pg_dump part only. If that's > right, I can proceed with the psql part. Does that sound right ? Well, actually I've meant that I'm going to proceed with both, since I've posted both psql and pg_dump patches. But of course you're welcome to submit any new version or improvements you want.
On Fri, 11 Jan 2019 at 14:47, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Fri, Jan 11, 2019 at 6:02 AM Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > > > > Yes, sounds like a reasonable approach, I can proceed with it. > > > > Dmitry, I believe you have taken the pg_dump part only. If that's > > right, I can proceed with the psql part. Does that sound right ? > > Well, actually I've meant that I'm going to proceed with both, since I've > posted both psql and pg_dump patches. But of course you're welcome to submit > any new version or improvements you want. Ok, I will review the patches that you send, and we can work on improvements if needed. Thanks. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company