Re: Possible performance regression with pg_dump of a large number ofrelations - Mailing list pgsql-hackers

From Luke Cowell
Subject Re: Possible performance regression with pg_dump of a large number ofrelations
Date
Msg-id 3BF53696-04AC-4917-B057-FF056E82515E@gmail.com
Whole thread Raw
In response to Re: Possible performance regression with pg_dump of a large number of relations  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Jan 12, 2018, at 8:01 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Jan 11, 2018 at 5:26 PM, Luke Cowell <lcowell@gmail.com> wrote:
I've been troubleshooting an issue with slow pg_dump times on postgres 9.6.6. I believe something changed between 9.5.10 and 9.6.6 that has made dumps significantly slower for databases with a large number of relations. I posted this in irc and someone suggested that I should post this here. I'm sorry if this isn't the right place.

To simulate the issue I generated 150,000 relations spread across 1000 schemas (this roughly reflects my production setup).

```ruby
File.write "many_relations.sql", (150000 / 150).times.flat_map {|n|
  [
   "create schema s_#{n};",
   150.times.map do |t|
     "create table s_#{n}.test_#{t} (id int);"
   end
   ]
}.join("\n")
```

I have 2 identical pieces of hardware. I've installed 9.5 on one and 9.6 on the other. I've run the same generated piece of sql in a fresh database on both systems.

On my 9.5.10 system:
> time pg_dump -n s_10 testing > /dev/null
real    0m5.492s
user    0m1.424s
sys     0m0.184s

On my 9.6.6 system:
> time pg_dump -n s_10 testing > /dev/null
real    0m27.342s
user    0m1.748s
sys     0m0.248s

I don't get quite as large a regression as you do, from 6s to 19s.  It looks like there are multiple of them, but the biggest is caused by:

commit 5d589993cad212f7d556d52cc1e42fe18f65b057
Author: Stephen Frost <sfrost@snowman.net>
Date:   Fri May 6 14:06:50 2016 -0400

    pg_dump performance and other fixes

That commit covered a few different things, and I don't what improvement it mentions is the one that motivated this, but the key change was to add this query:

EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON(c.oid = pip.objoid AND pip.classoid = (SELECT oid FROM pg_class WHERE relname = 'pg_class') AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND at.attnum>0 and ((SELECT count(acl) FROM (SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner))) AS acl EXCEPT SELECT unnest(coalesce(pip.initprivs,acldefault('c',c.relowner)))) as foo) >1 OR (SELECT count(acl) FROM (SELECT unnest(coalesce(pip.initprivs,acldefault('c',c.relowner))) AS acl EXCEPT SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner)))) as foo) >0))AS changed_acl

Considering it runs 2 subqueries for every column (including the 6 hidden system columns) of every table, even ones that don't end up getting dumped out, it is no wonder it is slow.

If you were just dumping the database with 150,000 objects, I wouldn't worry about a 20 second regression.  But I assume you intend to loop over every schema and dump each individually?
 
Cheers,

Jeff

Hi Jeff, thanks for your attention on this. Yes, that is exactly our use case. We dump each schema individually so we would be paying that 20 second penalty each time. As a workaround I've been dumping the schemas in batches of 20, but this isn't really ideal as we'll lose access to a number of our existing workflows.

Luke

pgsql-hackers by date:

Previous
From: Marina Polyakova
Date:
Subject: Re: WIP Patch: Pgbench Serialization and deadlock errors
Next
From: Amit Khandekar
Date:
Subject: Re: [HACKERS] UPDATE of partition key