Re: pg_dump: Sorted output, referential integrity - Mailing list pgsql-hackers

From Christof Petig
Subject Re: pg_dump: Sorted output, referential integrity
Date
Msg-id 3C147094.DB58BBF4@petig-baender.de
Whole thread Raw
In response to Re: pg_dump: Sorted output, referential integrity  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Responses Re: pg_dump: Sorted output, referential integrity  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
List pgsql-hackers
Christopher Kings-Lynne wrote:

> > > > but to manage the data in a version
> > > >control system you need it consistently sorted. So a flag to sort by
> > > >either primary key or left to right would be of great value. (--sorted
> > > >?)
> > >
> > > Not really very generalizable when you consider user defined types,
> > > triggers etc.
> >
> > Hmmm. But if we have a primary key on columns (A,B,C) and request the data
> > 'order by A,B,C' this should be portable, shouldn't it?
> > If we don't have a primary key simply ordering by 1,2,3,...n
> > should also work.
> > Or am I missing something?
>
> I can see how ordering a dump by the primary key would be a neat way of
> 'clustering' your data after a restore, however I have qualms about the
> scalability of such a scheme.  What if someone has a 100GB table?  They may
> have arranged things so that they never get a sort from it or something, or
> it might take ages.  However I guess if it's an optional parameter it might
> be neat.
>
> My feeling is that it won't happen unless you actually code it into a patch
> that makes it a parameter to pg_dump.  Having an actual patch is a great way
> of getting something you want done ;)
>
> Alternatively, have you tried just writing a PERL script (or some clever sed
> script) that will just sort the COPY FROM sections...?

That's beyond my perl skills. And I believe sed to be not the right tool. (hmm,
perhaps split (at 'COPY FROM' and at '\.'), then sort, then cat ... many
(perhaps big) temporary files, let the db do the hard work)

But making a patch to pg_dump is a matter of (say) up to 4 hours.
I'll do it since you seem to like it and nobody started doing it so far.

Christof




pgsql-hackers by date:

Previous
From: Hiroshi Inoue
Date:
Subject: Re: [BUGS] Bug #533: BLOB (lo type) objects could not be restored
Next
From: "Christopher Kings-Lynne"
Date:
Subject: Re: pg_dump: Sorted output, referential integrity