Re: pg_dump --split patch - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_dump --split patch
Date
Msg-id AANLkTinuS5H5QxhS=PQajpDc3W0Sg9tXVw6JkgDsQiCs@mail.gmail.com
Whole thread Raw
In response to Re: pg_dump --split patch  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_dump --split patch  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: pg_dump --split patch  (Dimitri Fontaine <dimitri@2ndQuadrant.fr>)
List pgsql-hackers
On Mon, Jan 3, 2011 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On the specific issue of overloaded functions, I have a feeling that
>> the only feasible option is going to be to put them all in the same
>> file.  If you put them in different files, the names will either be
>> very long (because they'll have to include the argument types) or
>> fairly incomprehensible (if you did something like hash the argument
>> types and append 8 hex digits to the function name) or not all that
>> static (if you use OIDs; or if you number them sequentially, like
>> foo1.sql, foo2.sql, foo3.sql, then foo3.sql might end up as foo2.sql
>> on a system where there are only two variants of foo, making diff not
>> work very well).
>
> If you put all the variants in the same file, diff is *still* not going
> to work very well.  At least not unless you solve the problems that keep
> pg_dump from dumping objects in a consistent order ... and once you do
> that, you don't need this patch.

That's not really true.  It's a whole lot easier to look a diff of two
100-line files and then repeat that N times than to look at a single
diff of two N*100 line files.  I certainly spend enough of my
patch-review doing "git diff master <some particular source file>",
and then if what's going on isn't clear you can look at just that file
in more detail without worrying about every other source file in the
system.  And I have encountered this problem when comparing database
schemas (and sometimes data) also.  Yes, I've done that using diff.
Yes, it did suck.  Yes, I got it done before my boss fired me.

>> I think the problem with this patch is that different people are
>> likely to want slightly different things, and there may not be any
>> single format that pleases everyone, and supporting too many variants
>> will become confusing for users and hard for us to maintain.
>
> Yeah, that's exactly it.  I can think of some possible uses for
> splitting up pg_dump output, but frankly "to ease diff-ing" is not
> one of them.  For that problem, it's nothing but a crude kluge that
> only sort-of helps.  If we're to get anywhere on this, we need a
> better-defined problem statement that everyone can agree is worth
> solving and is well solved with this particular approach.

I have to admit I'm a bit unsold on the approach as well.  It seems
like you could write a short Perl script which would transform a text
format dump into the proposed format pretty easily, and if you did
that and published the script, then the next poor shmuck who had the
same problem could either use the script as-is or hack it up to meet
some slightly different set of requirements.  Or maybe you'd be better
off basing such a script on the custom or tar format instead, in order
to avoid the problem of misidentifying a line beginning with --- as a
comment when it's really part of a data item.  Or maybe even writing a
whole "schema diff" tool that would take two custom-format dumps as
inputs.

On the other hand, I can certainly think of times when even a pretty
dumb implementation of this would have saved me some time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid
Next
From: "David E. Wheeler"
Date:
Subject: Re: Upgrading Extension, version numbers (was: Extensions, patch v16)