Thread: Fixing pg_dump
OK, I think it might save me some time if I get some guidance on how we should modify pg_dump to fix the owner/grants issue. I intend to make new archives created with 7.5 pg_dump have the fix, and restoring pre 7.5 binary dumps will have exactlythe previous behaviour. The reason for this is that extracting the acls and owners to the end requires scanning the entire archive twice - not necessarily something we want to do (is it?) So, I have to make acls and owners completely independent DumpableObjects, right? Then I can get them sorted to the end, etc., etc. I also intend to implement a flag that makes it use the sql standard set session_authorization behaviour. Is all that the way to go? Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > I intend to make new archives created with 7.5 pg_dump have the fix, and > restoring pre 7.5 binary dumps will have exactly the previous > behaviour. The reason for this is that extracting the acls and owners > to the end requires scanning the entire archive twice - not necessarily > something we want to do (is it?) I might be wrong about this, but I had the idea that the entire archive TOC gets loaded into memory, and so there'd be no noticeable penalty in scanning it twice. regards, tom lane
>>I intend to make new archives created with 7.5 pg_dump have the fix, and >> restoring pre 7.5 binary dumps will have exactly the previous >>behaviour. The reason for this is that extracting the acls and owners >>to the end requires scanning the entire archive twice - not necessarily >>something we want to do (is it?) > > I might be wrong about this, but I had the idea that the entire archive > TOC gets loaded into memory, and so there'd be no noticeable penalty in > scanning it twice. Well, it probably gets scanned twice to do -c mode? But...it seems kind of hacky to scan it again for owners and privs - are you sure you want me to go that way? Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > But...it seems kind of hacky to scan it again for owners and privs - are > you sure you want me to go that way? If there's not a big performance penalty, sure. Being fully compatible with existing archive files is a sufficient win to justify sins much worse than this one. regards, tom lane
> If there's not a big performance penalty, sure. Being fully compatible > with existing archive files is a sufficient win to justify sins much > worse than this one. Out of interest - have you developed an opinion on the pg_get_serial_sequence thing? Also, what's the latest I can get pg_dump fixes in? Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > Out of interest - have you developed an opinion on the > pg_get_serial_sequence thing? Will apply as soon as I get a chance --- I'd intended to clean out the patch queue today, but didn't get past Magnus' stuff. > Also, what's the latest I can get pg_dump fixes in? Roughly speaking, sizeof(patch)/time_to_freeze is a constant. regards, tom lane
>>But...it seems kind of hacky to scan it again for owners and privs - are >>you sure you want me to go that way? > > If there's not a big performance penalty, sure. Being fully compatible > with existing archive files is a sufficient win to justify sins much > worse than this one. Ah, crap. I tried adding the extra scan in and it as all well and good up until the second where I realised that the TocEntry struct has no field that allows me to know the correct way of finding the full descriptor of each object. For example, consider an operator. It's not enough for me to know the name of the operator, I also must know the type of its operands. That given, I see no way to implement this using a second scan, except perhaps rewriting the drop command...which is extremely dodgy! I'm running out of time unfortunately, and I need to know from you whether I should go back to my work on making owner and acl TOC entries fully independent? All this means is that people restoring pre-7.5 binary dumps into 7.5 will not get the owner fixes... But people using the binary format to upgrade seems like a pretty rare case to me! Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > I tried adding the extra scan in and it as all well and good up until > the second where I realised that the TocEntry struct has no field that > allows me to know the correct way of finding the full descriptor of each > object. Ugh. Definitely an oversight. Don't suppose you want to think about pulling the name out of the DROP command ;-) ? > I'm running out of time unfortunately, and I need to know from you > whether I should go back to my work on making owner and acl TOC entries > fully independent? All this means is that people restoring pre-7.5 > binary dumps into 7.5 will not get the owner fixes... But people using > the binary format to upgrade seems like a pretty rare case to me! Hardly --- for instance, people using large objects have no other choice. You can *not* change pg_restore in a way that will make it impossible for such people to restore their dumps (and no, I don't think it will fly to tell someone after the fact they should have used 7.5 pg_dump...) Maybe it's sufficient to have a backwards-compatibility mode in which the SET SESSION AUTH commands still get issued same-as-ever. In fact, you could just automatically do that if you see the archive version is too old to have ALTER OWNER support. On the whole though, I think editing the DROP commands might be the best way. Are there any cases where that would actually not work? regards, tom lane
> Ugh. Definitely an oversight. Don't suppose you want to think about > pulling the name out of the DROP command ;-) ? Yeah, I've already done it - it's ugleeey, but it works :P >>I'm running out of time unfortunately, and I need to know from you >>whether I should go back to my work on making owner and acl TOC entries >>fully independent? All this means is that people restoring pre-7.5 >>binary dumps into 7.5 will not get the owner fixes... But people using >>the binary format to upgrade seems like a pretty rare case to me! > > > Hardly --- for instance, people using large objects have no other > choice. Yes they can - they can just use the 7.5 pg_dump to upgrade... > You can *not* change pg_restore in a way that will make it impossible > for such people to restore their dumps (and no, I don't think it will > fly to tell someone after the fact they should have used 7.5 pg_dump...) I didn't say they won't work, I just said that they will not have a fix for the owner/acl issue - I'll just check the version of their binary dump and if it doesn't have independent acls and owners, I will just restore it using the old set session authorization method. > Maybe it's sufficient to have a backwards-compatibility mode in which > the SET SESSION AUTH commands still get issued same-as-ever. In fact, > you could just automatically do that if you see the archive version is > too old to have ALTER OWNER support. That's what I was suggesting :) Peter E wanted me to keep a set session auth mode around anyway for sql standard compatibilty - is that still a requirement (it doesn't seem hard to do). However...I think that if you ask 100 postgres users if they want to be able to restore their own backups, or have sql standard backups, I know which one 100 of them will choose :) Actually, this brings up another point - people occasionally complain on the list that pg_dump is not considered important enough :( ie. Is there any good reason we cannot backport the entire new pg_dump to the 7.4 branch, and change the 3 small things that prevent its output restoring to 7.4. This is because there are many 7.4 users who cannot restore their own backups to 7.4 in an emergency, and have resorted to using CVS pg_dump and running sed scripts over it... > On the whole though, I think editing the DROP commands might be the best > way. Are there any cases where that would actually not work? There's a couple of tricks. * Drop commands for TYPEs have 'CASCADE' on the end (has that always been true) * I currently assume that the last two characters in a drop command are ; and \n. I'm not sure if this has always been the case. Maybe I should make it loop until it removes the trailing semi-colon. * Do we no longer worry about the SCHEMA AUTHORIZATION clause? I might set it to keep being issued in 'sql standard mode', but otherwise we cannot use it in dumps any more. * I have to special case VIEWS and SEQUENCES to use ALTER TABLE...OWNER TO * On the second scan I'm not sure what to use as the comment above each owner and acl bit - I'll make it the same as the master object. * On the second scan I have to hard-code the list of object types that need to have owners and acls dumped. Other than that it works really well :) I just have to tie off some loose ends and make really sure that my drop command extraction is safe from buffer errors, etc. Hopefully I'll be able to submit tonight. Chris
> * Do we no longer worry about the SCHEMA AUTHORIZATION clause? I might > set it to keep being issued in 'sql standard mode', but otherwise we > cannot use it in dumps any more. Actually, that's not true - I'm being silly. We can use the AUTHORIZATION clause instead of ALTER SCHEMA ... OWNER TO :) Another question: We currently fully qualify DROP command with the namespace so that drops will not accidentally modify the system catalogs. Shouldn't this also be necessary on ALL non-CREATE commands? ie. All ALTER, GRANT and REVOKE commands? Otherwise, if the create table command associated with each of these fails (for whatever reason), the script could happily carry on and modify the system catalog tables? Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > Actually, this brings up another point - people occasionally complain on > the list that pg_dump is not considered important enough :( ie. Is > there any good reason we cannot backport the entire new pg_dump to the > 7.4 branch, and change the 3 small things that prevent its output > restoring to 7.4. Lack of testing. Maybe after 7.5 has been out for a while, we'll have enough trust in current pg_dump to think of doing that, but on the other hand it would be somewhat moot at that point. The main point though is that 7.4 is supposed to be a *stable* branch, and dropping a rewritten pg_dump that hasn't even been through beta yet into a stable branch is Simply Not Done. > * Drop commands for TYPEs have 'CASCADE' on the end (has that always > been true) Yeek. That's got to be a hangover from pre-dependency-chasing days. Let's lose it in our current output, at least. > * I currently assume that the last two characters in a drop command are > ; and \n. I'm not sure if this has always been the case. Maybe I > should make it loop until it removes the trailing semi-colon. I'd go for "remove while the last char is either ; or \n"; should cover all cases. regards, tom lane
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > We currently fully qualify DROP command with the namespace so that drops > will not accidentally modify the system catalogs. Shouldn't this also > be necessary on ALL non-CREATE commands? > Otherwise, if the create table command associated with each of these > fails (for whatever reason), the script could happily carry on and > modify the system catalog tables? I don't buy it. There's a tradeoff here between certainty of doing what you want and having a script that is easy to edit. DROP is a dangerous weapon and we should be circumspect about applying it, but ALTER OWNER etc are much less so. Also, the point about qualifying the DROP is that you do not know whether the object is there initially, and so you could be dropping the wrong thing even in non-error cases. The scenario where the CREATE fails is much less probable. regards, tom lane
> I don't buy it. There's a tradeoff here between certainty of doing what > you want and having a script that is easy to edit. DROP is a dangerous > weapon and we should be circumspect about applying it, but ALTER OWNER > etc are much less so. > > Also, the point about qualifying the DROP is that you do not know > whether the object is there initially, and so you could be dropping > the wrong thing even in non-error cases. The scenario where the CREATE > fails is much less probable. OK, does it matter then if I leave the ALTER OWNER statement qualified? It would be a pain in the butt to parse off theexisting DROP command's namespace qualification. I've run into an ACLs problem now. This is the situation: ALTER TABLE public.forums_threads OWNER TO usadmin; -- -- Name: forums_threads; Type: ACL; Schema: public; Owner: usadmin -- REVOKE ALL ON TABLE forums_threads FROM PUBLIC; REVOKE ALL ON TABLE forums_threads FROM usadmin; SET SESSION AUTHORIZATION brett; GRANT ALL ON TABLE forums_threads TO brett WITH GRANT OPTION; RESET SESSION AUTHORIZATION; It fails trying to grant brett the GRANT OPTION as the user brett. I haven't yet looked into the logic behind dumping these acls in common.c, but it seems that maybe it's a hangover from historicalalter owner commands on that relation? ie. The acl is still there from when brett used to own that table? Do you still plan to fix that? I'll see if there's anything I can do in pg_dump to detect the condition and fix it if youlike. Chris
>>* Drop commands for TYPEs have 'CASCADE' on the end (has that always >>been true) > > Yeek. That's got to be a hangover from pre-dependency-chasing days. > Let's lose it in our current output, at least. I think it's necessary due to the circular dependency between types and their I/O functions. Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > ... The acl is still there from when brett used to own that table? > Do you still plan to fix that? Yeah, that's still on my should-fix-for-7.5 list (and I think Fabien was going to, or already did, submit some ACL-hacking code to help). That is, ALTER OWNER should adjust the ACL so that grants made by/to the former owner now appear to be by/to the new owner. However, there's still the problem that a dump from a 7.4 or older database might exhibit the inconsistent ACL entries. Not very sure how to cope ... regards, tom lane
> Yeah, that's still on my should-fix-for-7.5 list (and I think Fabien was > going to, or already did, submit some ACL-hacking code to help). That > is, ALTER OWNER should adjust the ACL so that grants made by/to the > former owner now appear to be by/to the new owner. > > However, there's still the problem that a dump from a 7.4 or older > database might exhibit the inconsistent ACL entries. Not very sure > how to cope ... How about "if the grant is to user blah, from user blah and blah is not the owner of the object, grant as default user, not user blah"? Chris