Re: Can we get rid of repeated queries from pg_dump? - Mailing list pgsql-general

From hubert depesz lubaczewski
Subject Re: Can we get rid of repeated queries from pg_dump?
Date
Msg-id 20210826160644.GA3120@depesz.com
Whole thread Raw
In response to Re: Can we get rid of repeated queries from pg_dump?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Can we get rid of repeated queries from pg_dump?
List pgsql-general
On Thu, Aug 26, 2021 at 10:20:29AM -0400, Tom Lane wrote:
> Well, you could move it forward by doing the legwork to identify which
> queries are worth merging.  Is it really sane to do a global "select
> format_type() from pg_type" query and save all the results on the client
> side?  I wonder whether there are cases where that'd be a net loss.
> You could do the experimentation to figure that out without necessarily
> having the C skills to make pg_dump actually do it.

So, I got some info.

First, some stats. The DB contains:

- 14 extensions
- 1 aggregate
- 107 functions
- 5 schemas
- 5 sequences
- 188 logged tables
- 1 unlogged table
- 206 "normal" indexes
- 30 unique indexes
- 15 materialized views
- 16 triggers
- 87 types
- 26 views

pg_dump -s of it is ~ 670kB.

Interestingly, while dumping (pg_dump -s -v), we can see progress going on, and then, after:

====
...
pg_dump: reading publications
pg_dump: reading publication membership
pg_dump: reading subscriptions
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving standard_conforming_strings = on
pg_dump: saving search_path = 
====

It stops (progress visible in console). And then, in pg logs I see queries like:

#v+
SELECT
    proretset,
    prosrc,
    probin,
    provolatile,
    proisstrict,
    prosecdef,
    lanname,
    proconfig,
    procost,
    prorows,
    pg_catalog.pg_get_function_arguments(p.oid) AS funcargs,
    pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs,
    pg_catalog.pg_get_function_result(p.oid) AS funcresult,
    proleakproof,
    array_to_string(protrftypes, ' ') AS protrftypes,
    proparallel,
    prokind,
    prosupport,
    NULL AS prosqlbody
    FROM pg_catalog.pg_proc p, pg_catalog.pg_language l
    WHERE p.oid = '43875'::pg_catalog.oid AND l.oid = p.prolang
#v-

Now for query stats.

To dump it all, pg_dump needed 9173 queries (logged by
log_min_duration_statement = 0 for this user).

I extracted all queries to separate files, and made stats. In total there were
only 4257 unique queries.

Then I checked for repeated queries. Top 10 most repeated offenders were:

615 times : SELECT pg_catalog.format_type('25'::pg_catalog.oid, NULL)
599 times : SELECT pg_catalog.format_type('23'::pg_catalog.oid, NULL)
579 times : SELECT pg_catalog.format_type('2281'::pg_catalog.oid, NULL)
578 times : SELECT pg_catalog.format_type('41946'::pg_catalog.oid, NULL)
523 times : SELECT pg_catalog.format_type('701'::pg_catalog.oid, NULL)
459 times : SELECT pg_catalog.format_type('42923'::pg_catalog.oid, NULL)
258 times : SELECT pg_catalog.format_type('16'::pg_catalog.oid, NULL)
176 times : SELECT pg_catalog.format_type('19'::pg_catalog.oid, NULL)
110 times : SELECT pg_catalog.format_type('21'::pg_catalog.oid, NULL)
106 times : SELECT pg_catalog.format_type('42604'::pg_catalog.oid, NULL)

In total, there were 5000 queries:
SELECT pg_catalog.format_type('[0-9]+'::pg_catalog.oid, NULL)

But there were only 83 separate oids that were scanned.

The only other repeated command was:
SELECT pg_catalog.set_config('search_path', '', false);
and it was called only twice.

Based on my reading of queries in order it seems to follow the pattern of:

One call for:

SELECT  proretset,  prosrc,  probin,  provolatile,  proisstrict,  prosecdef,  lanname,  proconfig,  procost,  prorows,
pg_catalog.pg_get_function_arguments(p.oid)AS funcargs,  pg_catalog.pg_get_function_identity_arguments(p.oid) AS
funciargs, pg_catalog.pg_get_function_re
 
sult(p.oid) AS funcresult,  proleakproof,  array_to_string(protrftypes, ' ') AS protrftypes,  proparallel,  prokind,
prosupport, NULL AS prosqlbody  FROM pg_catalog.pg_proc p, pg_catalog.pg_language l  WHERE p.oid =
'SOME_NUMBER'::pg_catalog.oidAND l.oid = p.prolang 
 

and then one or more:

SELECT pg_catalog.format_type('SOME_NUMBER'::pg_catalog.oid, NULL)

In one case, after proc query, there were 94 concecutive
pg_catalog.format_type queries.

I hope it helps.

Best regards,

depesz




pgsql-general by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: lower() and unaccent() not leakproof
Next
From: Tom Lane
Date:
Subject: Re: lower() and unaccent() not leakproof