Thread: pgsql: Eliminate some more O(N^2) behaviors in pg_dump/pg_restore.
Eliminate some more O(N^2) behaviors in pg_dump/pg_restore. This patch fixes three places (which AFAICT is all of them) where runtime was O(N^2) in the number of TOC entries, by using an index array to replace linear searches of the TOC list. This performance issue is a bit less bad than those recently fixed, because it depends on the number of items dumped not the number in the source database, so the problem can be dodged by doing partial dumps. The previous coding already had an instance of one of the two index arrays needed, but it was only calculated in parallel-restore cases; now we need it all the time. I also chose to move the arrays into the ArchiveHandle data structure, to make this code a bit more ready for the day that we try to sling multiple ArchiveHandles around in pg_dump or pg_restore. Since we still need some server-side work before pg_dump can really cope nicely with tens of thousands of tables, there's probably little point in back-patching. Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/c89bdf769065981e6948c94da8c0b96737bb8462 Modified Files -------------- src/bin/pg_dump/pg_backup_archiver.c | 189 +++++++++++++++++----------------- src/bin/pg_dump/pg_backup_archiver.h | 6 +- 2 files changed, 101 insertions(+), 94 deletions(-)
On Tue, May 29, 2012 at 12:38:51AM +0000, Tom Lane wrote: > Eliminate some more O(N^2) behaviors in pg_dump/pg_restore. > > This patch fixes three places (which AFAICT is all of them) where runtime > was O(N^2) in the number of TOC entries, by using an index array to replace > linear searches of the TOC list. This performance issue is a bit less bad > than those recently fixed, because it depends on the number of items dumped > not the number in the source database, so the problem can be dodged by > doing partial dumps. > > The previous coding already had an instance of one of the two index arrays > needed, but it was only calculated in parallel-restore cases; now we need > it all the time. I also chose to move the arrays into the ArchiveHandle > data structure, to make this code a bit more ready for the day that we > try to sling multiple ArchiveHandles around in pg_dump or pg_restore. > > Since we still need some server-side work before pg_dump can really cope > nicely with tens of thousands of tables, there's probably little point in > back-patching. Great, thanks. This will help speed up pg_upgrade. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +