Re: Possible performance regression with pg_dump of a large numberof relations - Mailing list pgsql-hackers

From Felipe Sateler
Subject Re: Possible performance regression with pg_dump of a large numberof relations
Date
Msg-id 20200118224611.xgwxdlvs7egflxqm@felipedell
Whole thread Raw
In response to Re: Possible performance regression with pg_dump of a large numberof relations  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
Hello Stephen, everyone,

On Fri, Jan 12, 2018 at 10:39:09PM -0500, Stephen Frost wrote:
> Greetings Jeff & Luke,
>
> * Jeff Janes (jeff.janes@gmail.com) wrote:
> > Sorry, that query reflects some munging I did to it.  The real part added
> > to the query is:
> >
> > EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON(c.oid
> > = pip.objoid AND pip.classoid = (SELECT oid FROM pg_class WHERE relname =
> > 'pg_class') AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND
> > ((SELECT array_agg(acl) FROM (SELECT
> > unnest(coalesce(at.attacl,acldefault('c',c.relowner))) AS acl EXCEPT SELECT
> > unnest(coalesce(pip.initprivs,acldefault('c',c.relowner)))) as foo) IS NOT
> > NULL OR (SELECT array_agg(acl) FROM (SELECT
> > unnest(coalesce(pip.initprivs,acldefault('c',c.relowner))) AS acl EXCEPT
> > SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner)))) as foo) IS
> > NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL))AS changed_ac
>
> Yes, this is to check if any of the rights on the table or any of its'
> columns has been changed from what it's initial rights are as recorded
> in pg_init_privs.
>
> I've been playing around with this a bit tonight trying to think of a
> way to avoid doing this work and it occurs to me that we really only
> need to worry about initprivs on objects in schemas that are either
> there at init time, or from extensions.  Not all of the objects in the
> system can have init-privs because the only way to get init-privs is
> at initdb time or from an extension creating a new object.
>
> As such, I've reworked the query (but not yet put it into pg_dump to
> run it through the regression tests) to look like this (for anyone else
> who wants to take a look at it and play with it):
> <snip sql>
>
> This ends up dropping the query time from around 8.6s on my system to
> about 1s, with a test rig of 1000 schemas and 100,000 tables.

The effect seems quite milder here. It reduces from 8s to 5.7s. Still an
improvement though.

I'm using pg_dump 11.6. Has something been merged since this time and
now?

My use case is similar to the OP: I dump each schema separately, which
causes long delays between dumps.

>
> Unfortunately, the way pg_dump is structured today, it really wouldn't
> be easy to have it only run this query for the tables in the schema(s)
> requested because it expects to be able to look for dependencies in the
> data structure which results from this query (as I recall, it's been a
> while since I looked, but I remember trying to figure out a way to do
> that and it certainly didn't look easy to do).

This would be nice. Such a filter (when applied manually) reduces the query
time significantly.

-- 
Saludos,
Felipe Sateler



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: should crash recovery ignore checkpoint_flush_after ?
Next
From: Tom Lane
Date:
Subject: Re: Patch to document base64 encoding