Re: Further pg_upgrade analysis for many tables - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Further pg_upgrade analysis for many tables
Date
Msg-id 20121112170908.GC14488@momjian.us
Whole thread Raw
In response to Re: Further pg_upgrade analysis for many tables  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Further pg_upgrade analysis for many tables
Re: Further pg_upgrade analysis for many tables
List pgsql-hackers
On Sat, Nov 10, 2012 at 12:41:44PM -0500, Bruce Momjian wrote:
> On Sat, Nov 10, 2012 at 07:17:34PM +0200, Ants Aasma wrote:
> > On Sat, Nov 10, 2012 at 7:10 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > I am confused why you see a loop.  transfer_all_new_dbs() does a
> > > merge-join of old/new database names, then calls gen_db_file_maps(),
> > > which loops over the relations and calls create_rel_filename_map(),
> > > which adds to the map via array indexing.   I don't see any file loops
> > > in there --- can you be more specific?
> > 
> > Sorry, I was too tired when posting that. I actually meant
> > transfer_single_new_db(). More specifically the profile clearly showed
> > that most of the time was spent in the two loops starting on lines 193
> > and 228.
> 
> Wow, you are right on target.  I was so focused on making logical
> lookups linear that I did not consider file system vm/fsm and file
> extension lookups.  Let me think a little and I will report back. 
> Thanks.

OK, I have had some time to think about this.  What the current code
does is, for each database, get a directory listing to know about any
vm, fsm, and >1gig extents that exist in the directory.  It caches the
directory listing and does full array scans looking for matches.  If the
tablespace changes, it creates a new directory cache and throws away the
old one.  This code certainly needs improvement!

I can think of two solutions.  The first would be to scan the database
directory, and any tablespaces used by the database, sort it, then allow
binary search of the directory listing looking for file prefixes that
match the current relation.

The second approach would be to simply try to copy the fsm, vm, and
extent files, and ignore any ENOEXIST errors.  This allows code
simplification.  The downside is that it doesn't pull all files with
matching prefixes --- it requires pg_upgrade to _know_ what suffixes
might exist in that directory.  Second, it assumes there can be no
number gaps in the file extent numbering (is that safe?).

I need recommendations on which direction to persue;  this would only be
for 9.3.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Further pg_upgrade analysis for many tables
Next
From: "Karl O. Pinc"
Date:
Subject: Re: Doc patch: Document names of automatically created constraints and indexes