Thread: pgsql: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

pgsql: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

From
Bruce Momjian
Date:
In pg_upgrade, copy fsm, vm, and extent files by checking for file
existence via open(), rather than collecting a directory listing and
looking up matching relfilenode files with sequential scans of the
array.  This speeds up pg_upgrade by 2x for a large number of tables,
e.g. 16k.

Per observation by Ants Aasma.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/29add0de4920e4f448a30bfc35798b939c211d97

Modified Files
--------------
contrib/pg_upgrade/file.c        |   55 ----------
contrib/pg_upgrade/pg_upgrade.h  |    2 -
contrib/pg_upgrade/relfilenode.c |  208 +++++++++++++++-----------------------
3 files changed, 82 insertions(+), 183 deletions(-)


Re: pgsql: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> In pg_upgrade, copy fsm, vm, and extent files by checking for file
> existence via open(), rather than collecting a directory listing and
> looking up matching relfilenode files with sequential scans of the
> array.  This speeds up pg_upgrade by 2x for a large number of tables,
> e.g. 16k.

Uh ... you replaced a strcmp() with an open()?

I'm prepared to believe that's a win for sufficiently large N, if you
assume that the filesystem is smart enough to have O(1) lookup time
regardless of the directory size ... but that doesn't seem like a very
good assumption, and in any case surely this loses badly for a smaller
number of files.

You would have been better off keeping the array and sorting it so you
could use binary search, instead of passing the problem off to the
filesystem.

            regards, tom lane


Re: pgsql: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

From
Bruce Momjian
Date:
On Wed, Nov 14, 2012 at 05:39:29PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > In pg_upgrade, copy fsm, vm, and extent files by checking for file
> > existence via open(), rather than collecting a directory listing and
> > looking up matching relfilenode files with sequential scans of the
> > array.  This speeds up pg_upgrade by 2x for a large number of tables,
> > e.g. 16k.
>
> Uh ... you replaced a strcmp() with an open()?

Yes, strcmp() on all elements of an array.

> I'm prepared to believe that's a win for sufficiently large N, if you
> assume that the filesystem is smart enough to have O(1) lookup time
> regardless of the directory size ... but that doesn't seem like a very
> good assumption, and in any case surely this loses badly for a smaller
> number of files.
>
> You would have been better off keeping the array and sorting it so you
> could use binary search, instead of passing the problem off to the
> filesystem.

Well, testing showed using open() was a big win.  To do this with the
directory listing, as I mentioned, you need to pull listings from all
tablespaces, sort them, then do a binary search.  I thought the removal
of the directory array code itself was a win, and I figured the
directory code was already doing a binary search in the kernel.

Do you want me to code up a test?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: pgsql: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Wed, Nov 14, 2012 at 05:39:29PM -0500, Tom Lane wrote:
>> You would have been better off keeping the array and sorting it so you
>> could use binary search, instead of passing the problem off to the
>> filesystem.

> Well, testing showed using open() was a big win.

... on the filesystem you tested on.  I'm concerned that it might not
look so good on other platforms.

            regards, tom lane


Re: pgsql: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

From
Bruce Momjian
Date:
On Wed, Nov 14, 2012 at 06:15:30PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Wed, Nov 14, 2012 at 05:39:29PM -0500, Tom Lane wrote:
> >> You would have been better off keeping the array and sorting it so you
> >> could use binary search, instead of passing the problem off to the
> >> filesystem.
>
> > Well, testing showed using open() was a big win.
>
> ... on the filesystem you tested on.  I'm concerned that it might not
> look so good on other platforms.

True. I am on ext3.  So I need to generate a proof-of-concept patch and
have others test it?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: pgsql: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

From
Tom Lane
Date:
BTW, this patch isn't looking so good on Windows.  Buildfarm member
chough says

"C:\prog\bf\root\HEAD\pgsql.11252\pgsql.sln" (default target) (1) ->
(contrib\pg_upgrade target) ->
  .\contrib\pg_upgrade\relfilenode.c(202): warning C4003: not enough actual parameters for macro 'open'


"C:\prog\bf\root\HEAD\pgsql.11252\pgsql.sln" (default target) (1) ->
(contrib\pg_upgrade target) ->
  .\contrib\pg_upgrade\relfilenode.c(202): error C2059: syntax error : ')'
  .\contrib\pg_upgrade\relfilenode.c(207): error C2181: illegal else without matching if
  .\contrib\pg_upgrade\relfilenode.c(242): error C2059: syntax error : 'return'
  .\contrib\pg_upgrade\relfilenode.c(243): error C2059: syntax error : '}'

    1 Warning(s)
    4 Error(s)

            regards, tom lane


Re: pgsql: In pg_upgrade, copy fsm, vm, and extent files by checking for fi

From
Bruce Momjian
Date:
On Wed, Nov 14, 2012 at 06:57:13PM -0500, Tom Lane wrote:
> BTW, this patch isn't looking so good on Windows.  Buildfarm member
> chough says
>
> "C:\prog\bf\root\HEAD\pgsql.11252\pgsql.sln" (default target) (1) ->
> (contrib\pg_upgrade target) ->
>   .\contrib\pg_upgrade\relfilenode.c(202): warning C4003: not enough actual parameters for macro 'open'
>
>
> "C:\prog\bf\root\HEAD\pgsql.11252\pgsql.sln" (default target) (1) ->
> (contrib\pg_upgrade target) ->
>   .\contrib\pg_upgrade\relfilenode.c(202): error C2059: syntax error : ')'
>   .\contrib\pg_upgrade\relfilenode.c(207): error C2181: illegal else without matching if
>   .\contrib\pg_upgrade\relfilenode.c(242): error C2059: syntax error : 'return'
>   .\contrib\pg_upgrade\relfilenode.c(243): error C2059: syntax error : '}'
>
>     1 Warning(s)
>     4 Error(s)

OK, fixed by adding a third open() parameter of 0.  I see our other code
does this too.  I am not sure what those syntax errors are but I am
guessing the failed macro messed them up.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +