Thread: pgsql: Eliminate some more O(N^2) behaviors in pg_dump/pg_restore.

pgsql: Eliminate some more O(N^2) behaviors in pg_dump/pg_restore.

From

Tom Lane

Date:

28 May 2012, 21:39:08

Eliminate some more O(N^2) behaviors in pg_dump/pg_restore.

This patch fixes three places (which AFAICT is all of them) where runtime
was O(N^2) in the number of TOC entries, by using an index array to replace
linear searches of the TOC list.  This performance issue is a bit less bad
than those recently fixed, because it depends on the number of items dumped
not the number in the source database, so the problem can be dodged by
doing partial dumps.

The previous coding already had an instance of one of the two index arrays
needed, but it was only calculated in parallel-restore cases; now we need
it all the time.  I also chose to move the arrays into the ArchiveHandle
data structure, to make this code a bit more ready for the day that we
try to sling multiple ArchiveHandles around in pg_dump or pg_restore.

Since we still need some server-side work before pg_dump can really cope
nicely with tens of thousands of tables, there's probably little point in
back-patching.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/c89bdf769065981e6948c94da8c0b96737bb8462

Modified Files
--------------
src/bin/pg_dump/pg_backup_archiver.c |  189 +++++++++++++++++-----------------
src/bin/pg_dump/pg_backup_archiver.h |    6 +-
2 files changed, 101 insertions(+), 94 deletions(-)

Re: pgsql: Eliminate some more O(N^2) behaviors in pg_dump/pg_restore.

From

Bruce Momjian

Date:

29 May 2012, 21:55:40

On Tue, May 29, 2012 at 12:38:51AM +0000, Tom Lane wrote:
> Eliminate some more O(N^2) behaviors in pg_dump/pg_restore.
>
> This patch fixes three places (which AFAICT is all of them) where runtime
> was O(N^2) in the number of TOC entries, by using an index array to replace
> linear searches of the TOC list.  This performance issue is a bit less bad
> than those recently fixed, because it depends on the number of items dumped
> not the number in the source database, so the problem can be dodged by
> doing partial dumps.
>
> The previous coding already had an instance of one of the two index arrays
> needed, but it was only calculated in parallel-restore cases; now we need
> it all the time.  I also chose to move the arrays into the ArchiveHandle
> data structure, to make this code a bit more ready for the day that we
> try to sling multiple ArchiveHandles around in pg_dump or pg_restore.
>
> Since we still need some server-side work before pg_dump can really cope
> nicely with tens of thousands of tables, there's probably little point in
> back-patching.

Great, thanks.  This will help speed up pg_upgrade.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +