Re: pg_dump --split patch - Mailing list pgsql-hackers
From | Gurjeet Singh |
---|---|
Subject | Re: pg_dump --split patch |
Date | |
Msg-id | AANLkTi=M0xquuPONK3qjbnw0jtDzP3VL-OyXe6_Ohkz6@mail.gmail.com Whole thread Raw |
In response to | Re: pg_dump --split patch (Joel Jacobson <joel@gluefinance.com>) |
Responses |
Re: pg_dump --split patch
Re: pg_dump --split patch |
List | pgsql-hackers |
On Tue, Dec 28, 2010 at 2:39 PM, Joel Jacobson <joel@gluefinance.com> wrote:
I haven't seen your code yet, but we need to make sure that in case of name collision we emit the object definitions in a sorted order so that the dump is always deterministic: func1(char) should be _always_ dumped before func1(int), that is, output file names are always deterministic.
The problem I see with suffixing a sequence id to the objects with name collision is that one day the dump may name myfunc(int) as myfunc.sql and after an overloaded version is created, say myfunc(char, int), then the same myfunc(int) may be dumped in myfunc-2.sql, which again is non-deterministic.
Also, it is a project policy that we do not introduce new features in back branches, so spending time on an 8.4.6 patch may not be the best use of your time.
Regards,
-- 2010/12/28 Gurjeet Singh <singh.gurjeet@gmail.com>This might n be more amenable to diff'ing the different dumps. Schemas are logical grouping of other objects and hence making that apparent in your dump's hierarchy makes more sense.I would suggest the directory structure as:
/crypt/pg.dump-split/schema-name-1/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-1/TABLES/table-name-1.sql
...
/crypt/pg.dump-split/schema-name-2/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-2/TABLES/table-name-1.sqlThanks Gurjeet and Tom for good feedback!I've made some changes and attached new patches.Looks much better now I think!This is what I've changed,*) Not using oid anymore in the filename*) New filename/path structure: [-f filename]-split/[schema]/[desc]/[tag].sql*) If two objects share the same name tag for the same [schema]/[desc], -2, -3, etc is appended to the name. Example:~/pg.dump-split/public/FUNCTION/foobar.sql~/pg.dump-split/public/FUNCTION/foobar-2.sql~/pg.dump-split/public/FUNCTION/barfoo.sql~/pg.dump-split/public/FUNCTION/barfoo-2.sql~/pg.dump-split/public/FUNCTION/barfoo-3.sqlI think you are right about functions (and aggregates) being the only desc-type where two objects can share the same name in the same schema.This means the problem of dumping objects in different order is a very limited problem, only affecting overloaded functions.I didn't include the arguments in the file name, as it would lead to very long file names unless truncated, and since the problem is very limited, I think we shouldn't include it. It's cleaner with just the name part of the tag in the file name.
I haven't seen your code yet, but we need to make sure that in case of name collision we emit the object definitions in a sorted order so that the dump is always deterministic: func1(char) should be _always_ dumped before func1(int), that is, output file names are always deterministic.
The problem I see with suffixing a sequence id to the objects with name collision is that one day the dump may name myfunc(int) as myfunc.sql and after an overloaded version is created, say myfunc(char, int), then the same myfunc(int) may be dumped in myfunc-2.sql, which again is non-deterministic.
Also, it is a project policy that we do not introduce new features in back branches, so spending time on an 8.4.6 patch may not be the best use of your time.
Regards,
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
pgsql-hackers by date: