Re: Make deparsing of column defaults faster - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Make deparsing of column defaults faster
Date
Msg-id 20180705165827.GK7025@telsasoft.com
Whole thread Raw
In response to Re: Make deparsing of column defaults faster  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: Make deparsing of column defaults faster  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On Mon, Jun 04, 2018 at 10:00:53PM -0400, Peter Eisentraut wrote:
> On 6/4/18 20:55, Jeff Janes wrote:
> > Since defaults can't contain Vars, this patch converts the second
> > parameter to zero in places where pg_get_expr is invoked on column
> > defaults.
> 
> My in-progress generated columns patch removes that assumption (since a
> generated column generally refers to another column of the same table).

On Thu, Jul 05, 2018 at 04:45:07PM +0200, Peter Eisentraut wrote:
> On 29.06.18 05:15, Jeff Janes wrote:
> > Since pg_dump calls pg_get_expr once over and over again on the same
> > table consecutively, perhaps we could cache the column alias assignments
> > in a single-entry cache, so if it is called on the same table as last
> > time it just re-uses the aliases from last time.  I am not planning on
> 
> I looked into that.  deparse_context_for() is actually not that
> expensive on its own, well below one second, but it gets somewhat
> expensive when you call it 1600 times for one table.  So to address that
> case, we can cache the deparse context between calls in the fn_extra
> field of pg_get_expr.  The attached patch does that.  This makes the
> pg_dump -s times pretty much constant even with 1600 columns with
> defaults.

I checked on one customer running PG10.4, for which pg_dump takes 8 minutes to
pg_dump -s.

I imported existing schema to PG12dev (which itself took 25min) and compared:
patched: 2m33.616s
unpatched: 7m19.578s

Note that I've reduced the number of child tables in this DB recently (by
repartitioning tables from daily to monthly granularity), thereby reducing the
number of columns of the largest tables by a factor of 30, and reducing the
size of pg_dump -s to 51MB from 120MB (6 months ago).  I expect this patch
would've saved even more before the cleanup.

> How realistic is this use case?  Is it worth it?

Note, that affects pg_upgrade, which is how this issue originally came up [0].
(But I believe pg_upgrade likes to call pg_dump from the old server version, so
pg_upgrade to v11 couldn't benefit unless this was included in PG10.5).

[pryzbyj@database postgresql]$ grep -c 'SET DEFAULT' /srv/cdrperfbackup/ts/2018-07-04/pg_dump-section=pre-data
183915

Justin

[0] https://www.postgresql.org/message-id/CAMkU%3D1x-e%2BmaqefhM1yMeSiJ8J9Z%2BSJHgW7c9bqo3E3JMG4iJA%40mail.gmail.com


pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Regarding shared_preload_libraries (postgresql.conf file)
Next
From: Fujii Masao
Date:
Subject: Re: New function pg_stat_statements_reset_query() to reset statisticsof a specific query