Re: Statistics Import and Export - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Statistics Import and Export
Date
Msg-id rp6int5sqs5bn7g35brqz7w3t7il6tn2dgpm7tr3pev626wnc4@fi7a3enofhct
Whole thread Raw
In response to Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: Statistics Import and Export
Re: Statistics Import and Export
List pgsql-hackers
Hi,

On 2025-02-25 21:29:56 -0500, Corey Huinker wrote:
> On Tue, Feb 25, 2025 at 9:00 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> > On Mon, 2025-02-24 at 09:54 -0500, Andres Freund wrote:
> > > Have you compared performance of with/without stats after these
> > > optimizations?
> >
> > On unoptimized build with asserts enabled, dumping the regression
> > database:
> >
> >   --no-statistics: 1.0s
> >   master:          3.6s
> >   v3j-0001:        3.0s
> >   v3j-0002:        1.7s
> >
> > I plan to commit the patches soon.

I think these have all been committed, but I still see a larger performance
difference than what you observed. I just checked because I was noticing that
the tests are still considerably slower than they used to be.


An optimized pg_dump against an unoptimized assert-enabled server:

time ./src/bin/pg_dump/pg_dump --no-data --quote-all-identifiers --binary-upgrade --format=custom --no-sync regression
>/dev/null
 
real    0m2.778s
user    0m0.167s
sys    0m0.057s

$ time ./src/bin/pg_dump/pg_dump --no-data --quote-all-identifiers --binary-upgrade --format=custom --no-sync
--no-statisticsregression > /dev/null
 

real    0m1.290s
user    0m0.097s
sys    0m0.026s


I thought it might be interesting to look at the set of queries arriving on
the server side, so I enabled pg-stat_statements and ran a dump:

regression[4041753][1]=# SELECT total_exec_time, total_plan_time, calls, plans, substring(query, 1, 30) FROM
pg_stat_statementsORDER BY calls DESC LIMIT 15;
 
┌─────────────────────┬─────────────────────┬───────┬───────┬────────────────────────────────┐
│   total_exec_time   │   total_plan_time   │ calls │ plans │           substring            │
├─────────────────────┼─────────────────────┼───────┼───────┼────────────────────────────────┤
│   239.9672189999998 │             12.5725 │   981 │     6 │ PREPARE getAttributeStats(pg_c │
│  15.330405000000004 │            1.836712 │   282 │     6 │ PREPARE dumpFunc(pg_catalog.oi │
│  10.129114000000003 │ 0.39834800000000004 │   199 │     6 │ PREPARE dumpTableAttach(pg_cat │
│   9.887489000000002 │  0.9332620000000001 │    84 │    84 │ SELECT pg_get_partkeydef($1)   │
│  14.350725000000006 │            0.691071 │    60 │    60 │ SELECT pg_catalog.pg_get_viewd │
│  5.1174219999999995 │  1.4604219999999999 │    47 │     6 │ PREPARE dumpAgg(pg_catalog.oid │
│ 0.24036199999999996 │            0.545125 │    41 │    41 │ SELECT pg_catalog.format_type( │
│   7.099635000000002 │ 0.47031800000000007 │    39 │    39 │ SELECT pg_catalog.pg_get_ruled │
│            0.672752 │  1.9036320000000002 │    21 │     6 │ PREPARE dumpDomain(pg_catalog. │
│  1.6519299999999997 │  3.1480380000000006 │    21 │    22 │ PREPARE getDomainConstraints(p │
│            1.085548 │  3.9647630000000005 │    16 │     6 │ PREPARE dumpCompositeType(pg_c │
│            0.196259 │            0.602291 │    11 │     6 │ PREPARE dumpOpr(pg_catalog.oid │
│            0.265461 │            4.428914 │    10 │    10 │ SELECT amprocnum, amproc::pg_c │
│ 0.39591399999999993 │            9.345471 │    10 │    10 │ SELECT amopstrategy, amopopr:: │
│ 0.35752100000000003 │            2.128437 │     9 │     9 │ SELECT nspname, tmplname FROM  │
└─────────────────────┴─────────────────────┴───────┴───────┴────────────────────────────────┘


It looks a lot less bad with an optimized build:
regression[4042057][1]=# SELECT total_exec_time, total_plan_time, calls, plans, substring(query, 1, 30) FROM
pg_stat_statementsORDER BY calls DESC LIMIT 15;
 
┌─────────────────────┬─────────────────────┬───────┬───────┬────────────────────────────────┐
│   total_exec_time   │   total_plan_time   │ calls │ plans │           substring            │
├─────────────────────┼─────────────────────┼───────┼───────┼────────────────────────────────┤
│   50.63764299999999 │            2.503585 │   981 │     6 │ PREPARE getAttributeStats(pg_c │
│  3.5241990000000007 │            0.478541 │   282 │     6 │ PREPARE dumpFunc(pg_catalog.oi │
│  2.3170359999999985 │            0.126379 │   199 │     6 │ PREPARE dumpTableAttach(pg_cat │
│            2.291331 │ 0.25360400000000005 │    84 │    84 │ SELECT pg_get_partkeydef($1)   │
│   4.678433000000003 │            0.202578 │    60 │    60 │ SELECT pg_catalog.pg_get_viewd │
│  1.1288440000000004 │ 0.30976200000000004 │    47 │     6 │ PREPARE dumpAgg(pg_catalog.oid │
│             0.06619 │ 0.16813600000000004 │    41 │    41 │ SELECT pg_catalog.format_type( │
│            2.102865 │            0.115169 │    39 │    39 │ SELECT pg_catalog.pg_get_ruled │
│             0.16163 │            0.439991 │    21 │     6 │ PREPARE dumpDomain(pg_catalog. │
│  0.5335120000000001 │            0.727573 │    21 │    22 │ PREPARE getDomainConstraints(p │
│             0.28177 │            0.894156 │    16 │     6 │ PREPARE dumpCompositeType(pg_c │
│            0.038558 │            0.140807 │    11 │     6 │ PREPARE dumpOpr(pg_catalog.oid │
│            0.082078 │  0.9654280000000001 │    10 │    10 │ SELECT amprocnum, amproc::pg_c │
│            0.136964 │  2.1140120000000002 │    10 │    10 │ SELECT amopstrategy, amopopr:: │
│ 0.11634699999999999 │ 0.48550499999999996 │     9 │     9 │ SELECT nspname, tmplname FROM  │
└─────────────────────┴─────────────────────┴───────┴───────┴────────────────────────────────┘
(15 rows)


This isn't even *remotely* an adversarial case, there are lots of workloads
with folks have a handful of indexes on each table and many many tables.


Right now --statistics more than doubles the number of queries that pg_dump
issues. That's oviously noticeable locally, but it's going to be really
noticeable when dumping across the network.


I think we need to do more to lessen the impact. Even leaving regression test
performance aside, the time increase for the default pg_dump invocation will
be painful for folks, particularly due to this being enabled by default.


One fairly easy win would be to stop issuing getAttributeStats() for
non-expression indexes. In most cases that'll already drastically cut down on
the extra queries.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: track generic and custom plans in pg_stat_statements
Next
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export