Re: pg_dump is broken for partition tablespaces - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: pg_dump is broken for partition tablespaces |
Date | |
Msg-id | 20190423222633.GA8364@alvherre.pgsql Whole thread Raw |
In response to | Re: pg_dump is broken for partition tablespaces (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: pg_dump is broken for partition tablespaces
|
List | pgsql-hackers |
Thanks for taking the time to think through this. On 2019-Apr-22, Robert Haas wrote: > If we know what the feature is supposed to do, then it should be > possible to look at each relevant piece of code and decides whether it > implements the specification correctly or not. But if we don't have a > specification -- that is, we don't know precisely what the feature is > supposed to do -- then we'll just end up whacking the behavior around > trying to get some combination that makes sense, and the whole effort > is probably doomed. Clearly this is now the crux of the issue ... the specification was unclear, as I cobbled it together without thinking about the overall implications, taking pieces of advice from various posts and my own ideas of how it should work. I am trying to split things in a way that makes the most sense to offer the best possible combination of functionality. I think there are three steps to getting this done: 1. for pg10 and pg11, make pg_dump work correctly. I think having pg_dump use CREATE TABLE PARTITION OF is not correct when the partition has a mismatching column definition. So my proposal is to back-patch David's pg_dump patch to pg10 and pg11. Thread: https://postgr.es/m/20190423185007.GA27954@alvherre.pgsql Note that this changes pg_dump behavior in back branches. 2. for pg12, try to keep as much of the tablespace inheritance functionality as possible (more below on how this works), without running into weird cases. I think if we just forbid the case of the tablespace being defined to the database tablespace, all the weird cases disappear from the code. 3. For pg13, we can try to put back the functionality of the database tablespace as default for a partition. You suggest using the value 1, and Tom suggests adding a new predefined row in pg_tablespace that has the meaning of "whatever the default tablespace is for the current database". I have a slight preference towards having the additional magical row(s). Maybe people would like to see #3 put it pg12. I don't oppose that, but it seems to be new functionality development; RMT would have to exempt this from feature freeze. Robert wrote: > - I don't have any idea how to salvage things in v11, where the > catalog representation is irredeemably ambiguous. We'd probably have > to be satisfied with fairly goofy behavior in v11. This behavior is in use only for indexes in pg11, and does not allow setting the database tablespace anyway (it just gets set back to 0 if you do that), so I don't think there's anything *too* goofy about it. (But whatever we do, pg11 behavior for tables would differ from pg12; and it would differ for indexes too, depending on how we handle the default_tablespace thing.) > - It's not enough to have a good catalog representation. You also > have to be able to recreate those catalog states. I think you > probably need a way to set the reltablespace to whatever value you > need it to have without changing the tables under it. Like ALTER > TABLE ONLY blah {TABLESPACE some_tablespace_name | NO TABLESPACE} or > some such thing. That exact idea may be wrong, but I think we are not > going to have much luck getting to a happy place without something of > this sort. David proposed ALTER TABLE .. TABLESPACE DEFAULT as a way to put back the database tablespace behavior (and that's implemented in my previous patch), which seems good to me. I'm not sure I like "NO TABLESPACE" better than DEFAULT, but if others do, I'm not really wedded to it. Now I discuss how tablespace inheritance works. To define the tablespace of a regular table, there's a simple algorithm: a1. if there's a TABLESPACE clause, use that. b1. otherwise, if there's a default_tablespace, use that. c1. otherwise, use the database tablespace. d1. if we end up with the database tablespace, overwrite with 0. If creating a partition, there is the additional rule that parent's tablespace overrides default_tablespace: a2. if there's a TABLESPACE clause, use that. b2. otherwise, if the parent has a tablespace, use that. c2. otherwise, if there's a default_tablespace, use that. d2. otherwise, use the database tablespace. e2. if we end up with the database tablespace, overwrite with 0. Finally, there's the question of what defines the parent's tablespace. I think the best algorithm to determine the tablespace when creating a partitioned relation is the same as for partitions: a3. if there's a TABLESPACE clause, use that. b3. otherwise, if the parent has a tablespace, use that. (We have this because partitions might be partitioned.) c3. otherwise, if there's default_tablespace, use that. d3. otherwise, use the database tablespace. e3. if we end up with the database tablespace, overwrite with 0. Tracing through the sets of rules, they are all alike -- the only difference is that regular tables lack the rule about parent's tablespace. Some notes on this: 1. We have rule c3, that is, we still honor default_tablespace when creating partitioned tables. Andres said, upthread, that we shouldn't do that -- if the tablespace isn't *explicitly* mentioned, he said, the partitions have no business landing on that tablespace. https://postgr.es/m/20190306223741.lolaaimhkkp4kict@alap3.anarazel.de However I think it would be more inconsistent to not have that than to have it, for the case when you create a partitioned partition. I tried to implement things the way he suggests and ended up with more warts. Would it be surprising for users that all subsequent partitions are created in that partition? Perhaps, but in Tom's words, "there are many things [in this area] that are surprising"; we should be sure to display the situation clearly, and if they don't like it, they can simply do ALTER TABLE SET TABLESPACE and things are good again. default_tablespace does not seem to be a particularly popular feature; I suspect pg_dump is by far its biggest user, and we make it not fall afoul of this problem with patch #1, so it never creates any partitions anyway. 2. we have rule b2 ahead of c2; however Tom said users are unlikely to remember which rule comes first. I don't care much for that argument; surely they can just read the docs. This is not the first not-completely-intuitive thing for a DBA, so. 3. If you really want a partitioned rel to end up in the database tablespace overriding default_tablespace, you have two options: i) reset default_tablespace ahead of time (SET LOCAL), or ii) use ALTER TABLE ... TABLESPACE afterwards. Maybe we should have a iii) "CREATE TABLE ... TABLESPACE DEFAULT" clause that does it, but I'm not sure it's necessary. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: