Thread: Fixing pg_dump

Fixing pg_dump

From
Christopher Kings-Lynne
Date:
OK,

I think it might save me some time if I get some guidance on how we 
should modify pg_dump to fix the owner/grants issue.

I intend to make new archives created with 7.5 pg_dump have the fix, and  restoring pre 7.5 binary dumps will have
exactlythe previous 
 
behaviour.  The reason for this is that extracting the acls and owners 
to the end requires scanning the entire archive twice - not necessarily 
something we want to do (is it?)

So, I have to make acls and owners completely independent 
DumpableObjects, right?  Then I can get them sorted to the end, etc., etc.

I also intend to implement a flag that makes it use the sql standard set 
session_authorization behaviour.

Is all that the way to go?

Chris



Re: Fixing pg_dump

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> I intend to make new archives created with 7.5 pg_dump have the fix, and 
>   restoring pre 7.5 binary dumps will have exactly the previous 
> behaviour.  The reason for this is that extracting the acls and owners 
> to the end requires scanning the entire archive twice - not necessarily 
> something we want to do (is it?)

I might be wrong about this, but I had the idea that the entire archive
TOC gets loaded into memory, and so there'd be no noticeable penalty in
scanning it twice.
        regards, tom lane


Re: Fixing pg_dump

From
Christopher Kings-Lynne
Date:
>>I intend to make new archives created with 7.5 pg_dump have the fix, and 
>>  restoring pre 7.5 binary dumps will have exactly the previous 
>>behaviour.  The reason for this is that extracting the acls and owners 
>>to the end requires scanning the entire archive twice - not necessarily 
>>something we want to do (is it?)
> 
> I might be wrong about this, but I had the idea that the entire archive
> TOC gets loaded into memory, and so there'd be no noticeable penalty in
> scanning it twice.

Well, it probably gets scanned twice to do -c mode?

But...it seems kind of hacky to scan it again for owners and privs - are 
you sure you want me to go that way?

Chris



Re: Fixing pg_dump

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> But...it seems kind of hacky to scan it again for owners and privs - are 
> you sure you want me to go that way?

If there's not a big performance penalty, sure.  Being fully compatible
with existing archive files is a sufficient win to justify sins much
worse than this one.
        regards, tom lane


Re: Fixing pg_dump

From
Christopher Kings-Lynne
Date:
> If there's not a big performance penalty, sure.  Being fully compatible
> with existing archive files is a sufficient win to justify sins much
> worse than this one.

Out of interest - have you developed an opinion on the 
pg_get_serial_sequence thing?  Also, what's the latest I can get pg_dump 
fixes in?

Chris



Re: Fixing pg_dump

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> Out of interest - have you developed an opinion on the 
> pg_get_serial_sequence thing?

Will apply as soon as I get a chance --- I'd intended to clean out the
patch queue today, but didn't get past Magnus' stuff.

> Also, what's the latest I can get pg_dump fixes in?

Roughly speaking, sizeof(patch)/time_to_freeze is a constant.
        regards, tom lane


Re: Fixing pg_dump

From
Christopher Kings-Lynne
Date:
>>But...it seems kind of hacky to scan it again for owners and privs - are 
>>you sure you want me to go that way?
> 
> If there's not a big performance penalty, sure.  Being fully compatible
> with existing archive files is a sufficient win to justify sins much
> worse than this one.

Ah, crap.

I tried adding the extra scan in and it as all well and good up until 
the second where I realised that the TocEntry struct has no field that 
allows me to know the correct way of finding the full descriptor of each 
object.

For example, consider an operator. It's not enough for me to know the 
name of the operator, I also must know the type of its operands.

That given, I see no way to implement this using a second scan, except 
perhaps rewriting the drop command...which is extremely dodgy!

I'm running out of time unfortunately, and I need to know from you 
whether I should go back to my work on making owner and acl TOC entries 
fully independent?  All this means is that people restoring pre-7.5 
binary dumps into 7.5 will not get the owner fixes...  But people using 
the binary format to upgrade seems like a pretty rare case to me!

Chris



Re: Fixing pg_dump

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> I tried adding the extra scan in and it as all well and good up until 
> the second where I realised that the TocEntry struct has no field that 
> allows me to know the correct way of finding the full descriptor of each 
> object.

Ugh.  Definitely an oversight.  Don't suppose you want to think about
pulling the name out of the DROP command ;-) ?

> I'm running out of time unfortunately, and I need to know from you 
> whether I should go back to my work on making owner and acl TOC entries 
> fully independent?  All this means is that people restoring pre-7.5 
> binary dumps into 7.5 will not get the owner fixes...  But people using 
> the binary format to upgrade seems like a pretty rare case to me!

Hardly --- for instance, people using large objects have no other
choice.

You can *not* change pg_restore in a way that will make it impossible
for such people to restore their dumps (and no, I don't think it will
fly to tell someone after the fact they should have used 7.5 pg_dump...)

Maybe it's sufficient to have a backwards-compatibility mode in which
the SET SESSION AUTH commands still get issued same-as-ever.  In fact,
you could just automatically do that if you see the archive version is
too old to have ALTER OWNER support.

On the whole though, I think editing the DROP commands might be the best
way.  Are there any cases where that would actually not work?
        regards, tom lane


Re: Fixing pg_dump

From
Christopher Kings-Lynne
Date:
> Ugh.  Definitely an oversight.  Don't suppose you want to think about
> pulling the name out of the DROP command ;-) ?

Yeah, I've already done it - it's ugleeey, but it works :P

>>I'm running out of time unfortunately, and I need to know from you 
>>whether I should go back to my work on making owner and acl TOC entries 
>>fully independent?  All this means is that people restoring pre-7.5 
>>binary dumps into 7.5 will not get the owner fixes...  But people using 
>>the binary format to upgrade seems like a pretty rare case to me!
> 
> 
> Hardly --- for instance, people using large objects have no other
> choice.

Yes they can - they can just use the 7.5 pg_dump to upgrade...

> You can *not* change pg_restore in a way that will make it impossible
> for such people to restore their dumps (and no, I don't think it will
> fly to tell someone after the fact they should have used 7.5 pg_dump...)

I didn't say they won't work, I just said that they will not have a fix 
for the owner/acl issue - I'll just check the version of their binary 
dump and if it doesn't have independent acls and owners, I will just 
restore it using the old set session authorization method.

> Maybe it's sufficient to have a backwards-compatibility mode in which
> the SET SESSION AUTH commands still get issued same-as-ever.  In fact,
> you could just automatically do that if you see the archive version is
> too old to have ALTER OWNER support.

That's what I was suggesting :)  Peter E wanted me to keep a set session 
auth mode around anyway for sql standard compatibilty - is that still a 
requirement (it doesn't seem hard to do).  However...I think that if you 
ask 100 postgres users if they want to be able to restore their own 
backups, or have sql standard backups, I know which one 100 of them will 
choose :)

Actually, this brings up another point - people occasionally complain on 
the list that pg_dump is not considered important enough :(  ie. Is 
there any good reason we cannot backport the entire new pg_dump to the 
7.4 branch, and change the 3 small things that prevent its output 
restoring to 7.4.  This is because there are many 7.4 users who cannot 
restore their own backups to 7.4 in an emergency, and have resorted to 
using CVS pg_dump and running sed scripts over it...

> On the whole though, I think editing the DROP commands might be the best
> way.  Are there any cases where that would actually not work?

There's a couple of tricks.

* Drop commands for TYPEs have 'CASCADE' on the end (has that always 
been true)

* I currently assume that the last two characters in a drop command are 
; and \n.  I'm not sure if this has always been the case.  Maybe I 
should make it loop until it removes the trailing semi-colon.

* Do we no longer worry about the SCHEMA AUTHORIZATION clause?  I might 
set it to keep being issued in 'sql standard mode', but otherwise we 
cannot use it in dumps any more.

* I have to special case VIEWS and SEQUENCES to use ALTER TABLE...OWNER TO

* On the second scan I'm not sure what to use as the comment above each 
owner and acl bit - I'll make it the same as the master object.

* On the second scan I have to hard-code the list of object types that 
need to have owners and acls dumped.

Other than that it works really well :)  I just have to tie off some 
loose ends and make really sure that my drop command extraction is safe 
from buffer errors, etc.  Hopefully I'll be able to submit tonight.

Chris



Re: Fixing pg_dump

From
Christopher Kings-Lynne
Date:
> * Do we no longer worry about the SCHEMA AUTHORIZATION clause?  I might 
> set it to keep being issued in 'sql standard mode', but otherwise we 
> cannot use it in dumps any more.

Actually, that's not true - I'm being silly.  We can use the 
AUTHORIZATION clause instead of ALTER SCHEMA ... OWNER TO :)

Another question:

We currently fully qualify DROP command with the namespace so that drops 
will not accidentally modify the system catalogs.  Shouldn't this also 
be necessary on ALL non-CREATE commands?

ie. All ALTER, GRANT and REVOKE commands?

Otherwise, if the create table command associated with each of these 
fails (for whatever reason), the script could happily carry on and 
modify the system catalog tables?

Chris



Re: Fixing pg_dump

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> Actually, this brings up another point - people occasionally complain on 
> the list that pg_dump is not considered important enough :(  ie. Is 
> there any good reason we cannot backport the entire new pg_dump to the 
> 7.4 branch, and change the 3 small things that prevent its output 
> restoring to 7.4.

Lack of testing.  Maybe after 7.5 has been out for a while, we'll have
enough trust in current pg_dump to think of doing that, but on the other
hand it would be somewhat moot at that point.  The main point though is
that 7.4 is supposed to be a *stable* branch, and dropping a rewritten
pg_dump that hasn't even been through beta yet into a stable branch is
Simply Not Done.

> * Drop commands for TYPEs have 'CASCADE' on the end (has that always 
> been true)

Yeek.  That's got to be a hangover from pre-dependency-chasing days.
Let's lose it in our current output, at least.

> * I currently assume that the last two characters in a drop command are 
> ; and \n.  I'm not sure if this has always been the case.  Maybe I 
> should make it loop until it removes the trailing semi-colon.

I'd go for "remove while the last char is either ; or \n"; should cover
all cases.
        regards, tom lane


Re: Fixing pg_dump

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> We currently fully qualify DROP command with the namespace so that drops 
> will not accidentally modify the system catalogs.  Shouldn't this also 
> be necessary on ALL non-CREATE commands?
> Otherwise, if the create table command associated with each of these 
> fails (for whatever reason), the script could happily carry on and 
> modify the system catalog tables?

I don't buy it.  There's a tradeoff here between certainty of doing what
you want and having a script that is easy to edit.  DROP is a dangerous
weapon and we should be circumspect about applying it, but ALTER OWNER
etc are much less so.

Also, the point about qualifying the DROP is that you do not know
whether the object is there initially, and so you could be dropping
the wrong thing even in non-error cases.  The scenario where the CREATE
fails is much less probable.
        regards, tom lane


Re: Fixing pg_dump

From
Christopher Kings-Lynne
Date:
> I don't buy it.  There's a tradeoff here between certainty of doing what
> you want and having a script that is easy to edit.  DROP is a dangerous
> weapon and we should be circumspect about applying it, but ALTER OWNER
> etc are much less so.
> 
> Also, the point about qualifying the DROP is that you do not know
> whether the object is there initially, and so you could be dropping
> the wrong thing even in non-error cases.  The scenario where the CREATE
> fails is much less probable.

OK, does it matter then if I leave the ALTER OWNER statement qualified?  It would be a pain in the butt to parse off
theexisting DROP 
 
command's namespace qualification.

I've run into an ACLs problem now.  This is the situation:

ALTER TABLE public.forums_threads OWNER TO usadmin;

--
-- Name: forums_threads; Type: ACL; Schema: public; Owner: usadmin
--

REVOKE ALL ON TABLE forums_threads FROM PUBLIC;
REVOKE ALL ON TABLE forums_threads FROM usadmin;
SET SESSION AUTHORIZATION brett;
GRANT ALL ON TABLE forums_threads TO brett WITH GRANT OPTION;
RESET SESSION AUTHORIZATION;

It fails trying to grant brett the GRANT OPTION as the user brett.  I 
haven't yet looked into the logic behind dumping these acls in common.c,  but it seems that maybe it's a hangover from
historicalalter owner 
 
commands on that relation?  ie. The acl is still there from when brett 
used to own that table?

Do you still plan to fix that?  I'll see if there's anything I can do in  pg_dump to detect the condition and fix it if
youlike.
 

Chris



Re: Fixing pg_dump

From
Christopher Kings-Lynne
Date:
>>* Drop commands for TYPEs have 'CASCADE' on the end (has that always 
>>been true)
> 
> Yeek.  That's got to be a hangover from pre-dependency-chasing days.
> Let's lose it in our current output, at least.

I think it's necessary due to the circular dependency between types and 
their I/O functions.

Chris



Re: Fixing pg_dump

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> ... The acl is still there from when brett used to own that table?
> Do you still plan to fix that?

Yeah, that's still on my should-fix-for-7.5 list (and I think Fabien was
going to, or already did, submit some ACL-hacking code to help).  That
is, ALTER OWNER should adjust the ACL so that grants made by/to the
former owner now appear to be by/to the new owner.

However, there's still the problem that a dump from a 7.4 or older
database might exhibit the inconsistent ACL entries.  Not very sure
how to cope ...
        regards, tom lane


Re: Fixing pg_dump

From
Christopher Kings-Lynne
Date:
> Yeah, that's still on my should-fix-for-7.5 list (and I think Fabien was
> going to, or already did, submit some ACL-hacking code to help).  That
> is, ALTER OWNER should adjust the ACL so that grants made by/to the
> former owner now appear to be by/to the new owner.
> 
> However, there's still the problem that a dump from a 7.4 or older
> database might exhibit the inconsistent ACL entries.  Not very sure
> how to cope ...

How about "if the grant is to user blah, from user blah and blah is not 
the owner of the object, grant as default user, not user blah"?

Chris