Re: pg_dump --split patch - Mailing list pgsql-hackers

From Gurjeet Singh
Subject Re: pg_dump --split patch
Date
Msg-id AANLkTi=M0xquuPONK3qjbnw0jtDzP3VL-OyXe6_Ohkz6@mail.gmail.com
Whole thread Raw
In response to Re: pg_dump --split patch  (Joel Jacobson <joel@gluefinance.com>)
Responses Re: pg_dump --split patch
Re: pg_dump --split patch
List pgsql-hackers
On Tue, Dec 28, 2010 at 2:39 PM, Joel Jacobson <joel@gluefinance.com> wrote:
2010/12/28 Gurjeet Singh <singh.gurjeet@gmail.com>
I would suggest the directory structure as:

/crypt/pg.dump-split/schema-name-1/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-1/TABLES/table-name-1.sql
...
/crypt/pg.dump-split/schema-name-2/VIEWS/view-name-1.sql
/crypt/pg.dump-split/schema-name-2/TABLES/table-name-1.sql

This might n be more amenable to diff'ing the different dumps. Schemas are logical grouping of other objects and hence making that apparent in your dump's hierarchy makes more sense.

Thanks Gurjeet and Tom for good feedback!

I've made some changes and attached new patches.
Looks much better now I think!

This is what I've changed,

*) Not using oid anymore in the filename
*) New filename/path structure: [-f filename]-split/[schema]/[desc]/[tag].sql
*) If two objects share the same name tag for the same [schema]/[desc], -2, -3, etc is appended to the name. Example:
~/pg.dump-split/public/FUNCTION/foobar.sql
~/pg.dump-split/public/FUNCTION/foobar-2.sql
~/pg.dump-split/public/FUNCTION/barfoo.sql
~/pg.dump-split/public/FUNCTION/barfoo-2.sql
~/pg.dump-split/public/FUNCTION/barfoo-3.sql

I think you are right about functions (and aggregates) being the only desc-type where two objects can share the same name in the same schema.
This means the problem of dumping objects in different order is a very limited problem, only affecting overloaded functions.

I didn't include the arguments in the file name, as it would lead to very long file names unless truncated, and since the problem is very limited, I think we shouldn't include it. It's cleaner with just the name part of the tag in the file name.


I haven't seen your code yet, but we need to make sure that in case of name collision we emit the object definitions in a sorted order so that the dump is always deterministic: func1(char) should be _always_ dumped before func1(int), that is, output file names are always deterministic.

The problem I see with suffixing a sequence id to the objects with name collision is that one day the dump may name myfunc(int) as myfunc.sql and after an overloaded version is created, say myfunc(char, int), then the same myfunc(int) may be dumped in myfunc-2.sql, which again is non-deterministic.

Also, it is a project policy that we do not introduce new features in back branches, so spending time on an 8.4.6 patch may not be the best use of your time.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: the number of file descriptors when using POSIX semaphore
Next
From: Peter Eisentraut
Date:
Subject: 9.1alpha3 bundled -- please verify