Thread: 5 minutes to pg_dump nothing

5 minutes to pg_dump nothing

From

"Nikita The Spider The Spider"

Date:

21 September 2007, 12:51:33

Hi all,
I'm seeing a problem where pg_dump takes at least 5 minutes to execute
no matter what I ask it to dump -- even a non-existent or empty table.
One possible red flag is that pg_type contains 56508 rows. This
doesn't seem excessive to me, but perhaps it should. I've looked
through the source code and a system trace and I can see where the
execution time disappears, but I can't pinpoint the source of the
problem.

Some background: the database contains about 1200 schemas with 30+
tables and 20+ custom types in each. The whole database is only about
12G. An empty table created in another database on the same machine
pg_dumps in < .1sec, as expected. PG version is 8.1.4, OS is FreeBSD.

When I run pg_dump in verbose mode, it stalls after printing "reading
user-defined types". In the systrace, I can see the pg_types query at
the top of  getTypes() in pgdump.c followed by 628 reads of 4k blocks
(= 2512k) which only takes a second or so in total. I guess this is
the loop through the tuples in getTypes().

There's then four calls to getDomainConstraints() followed by a
129-second delay during which nothing appears in the system trace.
After that, there's 124 pairs of these:
 72263 pg_dump  135.956209 CALL  break(0x9b37000)
 72263 pg_dump  135.956222 RET   break 0

They consume another 118 seconds, and then pg_dump moves on to
"reading procedural languages".

I've VACUUM FULL ANALYZEd everything I can think of. I'd really
appreciate any further hints or help.

PS - Thanks for Postgres; it works like a dream for me except for
infrequent bumps. =)

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

Re: 5 minutes to pg_dump nothing

From

Tom Lane

Date:

21 September 2007, 13:37:01

"Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:
> I'm seeing a problem where pg_dump takes at least 5 minutes to execute
> no matter what I ask it to dump -- even a non-existent or empty table.
> One possible red flag is that pg_type contains 56508 rows. This
> doesn't seem excessive to me, but perhaps it should.

That does seem like a lot --- what sort of types are they?  Scalar,
composite, what?  It's fairly likely that no one has tried to optimize
pg_dump for such a case.

It'd be helpful if you could recompile pg_dump with profiling enabled
(-pg compiler switch) and get a gprof profile to show where the time
is going.

            regards, tom lane

Re: 5 minutes to pg_dump nothing

From

"Nikita The Spider The Spider"

Date:

21 September 2007, 13:59:09

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:
> > I'm seeing a problem where pg_dump takes at least 5 minutes to execute
> > no matter what I ask it to dump -- even a non-existent or empty table.
> > One possible red flag is that pg_type contains 56508 rows. This
> > doesn't seem excessive to me, but perhaps it should.
>
> That does seem like a lot --- what sort of types are they?  Scalar,
> composite, what?  It's fairly likely that no one has tried to optimize
> pg_dump for such a case.

Aha, thanks. Didn't realize I was pushing the bounds of what was
reasonable. Here's the type counts:
 typrelkind | the_count
------------+-----------
            |       114
 sequence   |     11496
 composite  |     12290
 ordinary   |     13844
 TOAST      |      9215
 view       |      9699
(6 rows)


> It'd be helpful if you could recompile pg_dump with profiling enabled
> (-pg compiler switch) and get a gprof profile to show where the time
> is going.

Will do. I'm going to try to recreate the problem in my development
environment where I have a bit more freedom to tinker.

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

Re: 5 minutes to pg_dump nothing

From

"Nikita The Spider The Spider"

Date:

23 September 2007, 17:37:56

On 9/21/07, Nikita The Spider The Spider <nikitathespider@gmail.com> wrote:
> On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:
> > > I'm seeing a problem where pg_dump takes at least 5 minutes to execute
> > > no matter what I ask it to dump -- even a non-existent or empty table.
> > > One possible red flag is that pg_type contains 56508 rows. This
> > > doesn't seem excessive to me, but perhaps it should.
> >
> > That does seem like a lot --- what sort of types are they?  Scalar,
> > composite, what?  It's fairly likely that no one has tried to optimize
> > pg_dump for such a case.
>
> Aha, thanks. Didn't realize I was pushing the bounds of what was
> reasonable. Here's the type counts:
>  typrelkind | the_count
> ------------+-----------
>             |       114
>  sequence   |     11496
>  composite  |     12290
>  ordinary   |     13844
>  TOAST      |      9215
>  view       |      9699
> (6 rows)

I think I found the problem. getTypes() calls findFuncByOid() twice
for each item it sees. A comment at the top of findFuncByOid() says,
"should hash this, but just do linear search for now". So that funtion
is O(n)/2 where n = the # of functions found by the query in
getFuncs(), and since it's called twice for each item in getTypes, the
overall performance is O(m * n) where n = # of functions and m = # of
types. As I said before, I have ~56000 items in pg_type and the query
at the top of getFuncs returns ~98000 rows so for me performance is
O(5.4 billion)-ish.

Some experimentation backs up my assertion that this is the
performance bottleneck. I created another database on the same machine
with 10k each of tables, views and custom types. The tables contained
a SERIAL PRIMARY KEY and a text column so that PG created indexes and
TOAST tables to go along with them. The result was ~50000 items in
pg_type. When I run pg_dump against this database, it completes in 4
seconds so simply having a lot of items in pg_type is not enough to
recreate my problem.

I added 100k functions to that database and pg_dump took several
minutes to run (again, trying to dump a non-existent table), which is
what I'm seeing on my production database.

My database design aimed to make each of my 1600 or so schemas its own
little self-sufficent world, hence the proliferation of types and
functions. I didn't realize that at some level they're all effectively
in the same namespace. Now I know.

Sorry I can't offer a patch. It'd be fun to write C code again but my
skills are pretty rusty and I'm afraid I'd butcher it. :-P

Thanks for your help! Given that this problem seems to be triggered by
a sort of edge case and the fix is non-trivial, I guess I should not
expect a new version of pg_dump soon?

Cheers

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more

Re: 5 minutes to pg_dump nothing

From

Tom Lane

Date:

23 September 2007, 18:07:21

"Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:
> I think I found the problem. getTypes() calls findFuncByOid() twice
> for each item it sees. A comment at the top of findFuncByOid() says,
> "should hash this, but just do linear search for now". So that funtion
> is O(n)/2 where n = the # of functions found by the query in
> getFuncs(), and since it's called twice for each item in getTypes, the
> overall performance is O(m * n) where n = # of functions and m = # of
> types. As I said before, I have ~56000 items in pg_type and the query
> at the top of getFuncs returns ~98000 rows so for me performance is
> O(5.4 billion)-ish.

Hah, so you need *both* a lot of types and a lot of functions to have
a problem.

> Thanks for your help! Given that this problem seems to be triggered by
> a sort of edge case and the fix is non-trivial, I guess I should not
> expect a new version of pg_dump soon?

We might look into fixing it for 8.3, but I doubt we'd risk back-patching
such a change to older branches.

            regards, tom lane

Re: 5 minutes to pg_dump nothing

From

Tom Lane

Date:

23 September 2007, 20:42:59

I wrote:
> "Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:
>> Thanks for your help! Given that this problem seems to be triggered by
>> a sort of edge case and the fix is non-trivial, I guess I should not
>> expect a new version of pg_dump soon?

> We might look into fixing it for 8.3, but I doubt we'd risk back-patching
> such a change to older branches.

Actually, it doesn't look that hard --- want to try the attached patch?
I couldn't measure any speed difference at all on the standard PG
regression-test database, but there are not huge numbers of objects
in that.

            regards, tom lane


Index: common.c
===================================================================
RCS file: /cvsroot/pgsql/src/bin/pg_dump/common.c,v
retrieving revision 1.97
diff -c -r1.97 common.c
*** common.c    21 Aug 2007 01:11:20 -0000    1.97
--- common.c    23 Sep 2007 23:37:32 -0000
***************
*** 41,47 ****

  /*
   * These variables are static to avoid the notational cruft of having to pass
!  * them into findTableByOid() and friends.
   */
  static TableInfo *tblinfo;
  static TypeInfo *typinfo;
--- 41,51 ----

  /*
   * These variables are static to avoid the notational cruft of having to pass
!  * them into findTableByOid() and friends.  For each of these arrays, we
!  * build a sorted-by-OID index array immediately after it's built, and then
!  * we use binary search in findTableByOid() and friends.  (qsort'ing the base
!  * arrays themselves would be simpler, but it doesn't work because pg_dump.c
!  * may have already established pointers between items.)
   */
  static TableInfo *tblinfo;
  static TypeInfo *typinfo;
***************
*** 51,62 ****
--- 55,72 ----
  static int    numTypes;
  static int    numFuncs;
  static int    numOperators;
+ static DumpableObject **tblinfoindex;
+ static DumpableObject **typinfoindex;
+ static DumpableObject **funinfoindex;
+ static DumpableObject **oprinfoindex;


  static void flagInhTables(TableInfo *tbinfo, int numTables,
                InhInfo *inhinfo, int numInherits);
  static void flagInhAttrs(TableInfo *tbinfo, int numTables,
               InhInfo *inhinfo, int numInherits);
+ static DumpableObject **buildIndexArray(void *objArray, int numObjs,
+                                         Size objSize);
  static int    DOCatalogIdCompare(const void *p1, const void *p2);
  static void findParentsByOid(TableInfo *self,
                   InhInfo *inhinfo, int numInherits);
***************
*** 104,114 ****
--- 114,126 ----
      if (g_verbose)
          write_msg(NULL, "reading user-defined functions\n");
      funinfo = getFuncs(&numFuncs);
+     funinfoindex = buildIndexArray(funinfo, numFuncs, sizeof(FuncInfo));

      /* this must be after getFuncs */
      if (g_verbose)
          write_msg(NULL, "reading user-defined types\n");
      typinfo = getTypes(&numTypes);
+     typinfoindex = buildIndexArray(typinfo, numTypes, sizeof(TypeInfo));

      /* this must be after getFuncs, too */
      if (g_verbose)
***************
*** 122,127 ****
--- 134,140 ----
      if (g_verbose)
          write_msg(NULL, "reading user-defined operators\n");
      oprinfo = getOperators(&numOperators);
+     oprinfoindex = buildIndexArray(oprinfo, numOperators, sizeof(OprInfo));

      if (g_verbose)
          write_msg(NULL, "reading user-defined operator classes\n");
***************
*** 154,159 ****
--- 167,173 ----
      if (g_verbose)
          write_msg(NULL, "reading user-defined tables\n");
      tblinfo = getTables(&numTables);
+     tblinfoindex = buildIndexArray(tblinfo, numTables, sizeof(TableInfo));

      if (g_verbose)
          write_msg(NULL, "reading table inheritance information\n");
***************
*** 540,545 ****
--- 554,623 ----
      return NULL;
  }

+ /*
+  * Find a DumpableObject by OID, in a pre-sorted array of one type of object
+  *
+  * Returns NULL for unknown OID
+  */
+ static DumpableObject *
+ findObjectByOid(Oid oid, DumpableObject **indexArray, int numObjs)
+ {
+     DumpableObject **low;
+     DumpableObject **high;
+
+     /*
+      * This is the same as findObjectByCatalogId except we assume we need
+      * not look at table OID because the objects are all the same type.
+      *
+      * We could use bsearch() here, but the notational cruft of calling
+      * bsearch is nearly as bad as doing it ourselves; and the generalized
+      * bsearch function is noticeably slower as well.
+      */
+     if (numObjs <= 0)
+         return NULL;
+     low = indexArray;
+     high = indexArray + (numObjs - 1);
+     while (low <= high)
+     {
+         DumpableObject **middle;
+         int            difference;
+
+         middle = low + (high - low) / 2;
+         difference = oidcmp((*middle)->catId.oid, oid);
+         if (difference == 0)
+             return *middle;
+         else if (difference < 0)
+             low = middle + 1;
+         else
+             high = middle - 1;
+     }
+     return NULL;
+ }
+
+ /*
+  * Build an index array of DumpableObject pointers, sorted by OID
+  */
+ static DumpableObject **
+ buildIndexArray(void *objArray, int numObjs, Size objSize)
+ {
+     DumpableObject **ptrs;
+     int        i;
+
+     ptrs = (DumpableObject **) malloc(numObjs * sizeof(DumpableObject *));
+     for (i = 0; i < numObjs; i++)
+         ptrs[i] = (DumpableObject *) ((char *) objArray + i * objSize);
+
+     /* We can use DOCatalogIdCompare to sort since its first key is OID */
+     if (numObjs > 1)
+         qsort((void *) ptrs, numObjs, sizeof(DumpableObject *),
+               DOCatalogIdCompare);
+
+     return ptrs;
+ }
+
+ /*
+  * qsort comparator for pointers to DumpableObjects
+  */
  static int
  DOCatalogIdCompare(const void *p1, const void *p2)
  {
***************
*** 630,709 ****
   * findTableByOid
   *      finds the entry (in tblinfo) of the table with the given oid
   *      returns NULL if not found
-  *
-  * NOTE:  should hash this, but just do linear search for now
   */
  TableInfo *
  findTableByOid(Oid oid)
  {
!     int            i;
!
!     for (i = 0; i < numTables; i++)
!     {
!         if (tblinfo[i].dobj.catId.oid == oid)
!             return &tblinfo[i];
!     }
!     return NULL;
  }

  /*
   * findTypeByOid
   *      finds the entry (in typinfo) of the type with the given oid
   *      returns NULL if not found
-  *
-  * NOTE:  should hash this, but just do linear search for now
   */
  TypeInfo *
  findTypeByOid(Oid oid)
  {
!     int            i;
!
!     for (i = 0; i < numTypes; i++)
!     {
!         if (typinfo[i].dobj.catId.oid == oid)
!             return &typinfo[i];
!     }
!     return NULL;
  }

  /*
   * findFuncByOid
   *      finds the entry (in funinfo) of the function with the given oid
   *      returns NULL if not found
-  *
-  * NOTE:  should hash this, but just do linear search for now
   */
  FuncInfo *
  findFuncByOid(Oid oid)
  {
!     int            i;
!
!     for (i = 0; i < numFuncs; i++)
!     {
!         if (funinfo[i].dobj.catId.oid == oid)
!             return &funinfo[i];
!     }
!     return NULL;
  }

  /*
   * findOprByOid
   *      finds the entry (in oprinfo) of the operator with the given oid
   *      returns NULL if not found
-  *
-  * NOTE:  should hash this, but just do linear search for now
   */
  OprInfo *
  findOprByOid(Oid oid)
  {
!     int            i;
!
!     for (i = 0; i < numOperators; i++)
!     {
!         if (oprinfo[i].dobj.catId.oid == oid)
!             return &oprinfo[i];
!     }
!     return NULL;
  }


--- 708,751 ----
   * findTableByOid
   *      finds the entry (in tblinfo) of the table with the given oid
   *      returns NULL if not found
   */
  TableInfo *
  findTableByOid(Oid oid)
  {
!     return (TableInfo *) findObjectByOid(oid, tblinfoindex, numTables);
  }

  /*
   * findTypeByOid
   *      finds the entry (in typinfo) of the type with the given oid
   *      returns NULL if not found
   */
  TypeInfo *
  findTypeByOid(Oid oid)
  {
!     return (TypeInfo *) findObjectByOid(oid, typinfoindex, numTypes);
  }

  /*
   * findFuncByOid
   *      finds the entry (in funinfo) of the function with the given oid
   *      returns NULL if not found
   */
  FuncInfo *
  findFuncByOid(Oid oid)
  {
!     return (FuncInfo *) findObjectByOid(oid, funinfoindex, numFuncs);
  }

  /*
   * findOprByOid
   *      finds the entry (in oprinfo) of the operator with the given oid
   *      returns NULL if not found
   */
  OprInfo *
  findOprByOid(Oid oid)
  {
!     return (OprInfo *) findObjectByOid(oid, oprinfoindex, numOperators);
  }

Re: 5 minutes to pg_dump nothing

From

"Nikita The Spider The Spider"

Date:

24 September 2007, 20:28:50

On 9/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > "Nikita The Spider The Spider" <nikitathespider@gmail.com> writes:
> >> Thanks for your help! Given that this problem seems to be triggered by
> >> a sort of edge case and the fix is non-trivial, I guess I should not
> >> expect a new version of pg_dump soon?
>
> > We might look into fixing it for 8.3, but I doubt we'd risk back-patching
> > such a change to older branches.
>
> Actually, it doesn't look that hard --- want to try the attached patch?
> I couldn't measure any speed difference at all on the standard PG
> regression-test database, but there are not huge numbers of objects
> in that.

Tom,
Your patch decreases the runtime of pg_dump by an order of magnitude
for me which means I don't have to roll my own table dumper. Thanks
very much for the swift turnaround and for a great database.

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more