Re: Very slow "bloat query" - Mailing list pgsql-performance
From | Gilles Darold |
---|---|
Subject | Re: Very slow "bloat query" |
Date | |
Msg-id | 804ea9ef-e9d4-e104-a9f1-d9a94e36013c@darold.net Whole thread Raw |
In response to | Very slow "bloat query" (Marcin Gozdalik <gozdal@gmail.com>) |
List | pgsql-performance |
Le 14/05/2021 à 13:06, Marcin Gozdalik a écrit : > Hi > > I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW) > database. It takes around 5 minutes for pgmetrics to run. I traced the > problem to the "bloat query" (version of > https://wiki.postgresql.org/wiki/Show_database_bloat > <https://wiki.postgresql.org/wiki/Show_database_bloat>) spinning in > CPU, doing no I/O. > > I have traced the problem to the bloated `pg_class` (the irony: > `pgmetrics` does not collect bloat on `pg_catalog`): > `vacuum (full, analyze, verbose) pg_class;` > ``` > INFO: vacuuming "pg_catalog.pg_class" > INFO: "pg_class": found 1 removable, 7430805 nonremovable row > versions in 158870 pages > DETAIL: 7429943 dead row versions cannot be removed yet. > CPU 1.36s/6.40u sec elapsed 9.85 sec. > INFO: analyzing "pg_catalog.pg_class" > INFO: "pg_class": scanned 60000 of 158869 pages, containing 295 live > rows and 2806547 dead rows; 295 rows in sample, 781 estimated total rows > VACUUM > ``` > > `pg_class` has so many dead rows because the workload is temp-table > heavy (creating/destroying 1M+ temporary tables per day) and has long > running analytics queries running for 24h+. > > PG query planner assumes that index scan on `pg_class` will be very > quick and plans Nested loop with Index scan. However, the index scan > has 7M dead tuples to filter out and the query takes more than 200 > seconds (https://explain.depesz.com/s/bw2G > <https://explain.depesz.com/s/bw2G>). > > If I create a temp table from `pg_class` to contain only the live tuples: > ``` > CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class; > CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid); > CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON > pg_class_alive(relname, relnamespace); > CREATE INDEX pg_class_tblspc_relfilenode_index ON > pg_class_alive(reltablespace, relfilenode); > ANALYZE pg_class_alive; > ``` > > and run the bloat query on `pg_class_alive` instead of `pg_class`: > ``` > SELECT > nn.nspname AS schemaname, > cc.relname AS tablename, > COALESCE(cc.reltuples,0) AS reltuples, > COALESCE(cc.relpages,0) AS relpages, > COALESCE(CEIL((cc.reltuples*((datahdr+8- > (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8 > END))+nullhdr2+4))/(8192-20::float)),0) AS otta > FROM > pg_class_alive cc > JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> > 'information_schema' > LEFT JOIN > ( > SELECT > foo.nspname,foo.relname, > (datawidth+32)::numeric AS datahdr, > (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE > nullhdr%8 END))) AS nullhdr2 > FROM ( > SELECT > ns.nspname, tbl.relname, > SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS > datawidth, > MAX(coalesce(null_frac,0)) AS maxfracsum, > 23+( > SELECT 1+count(*)/8 > FROM pg_stats s2 > WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND > s2.tablename = tbl.relname > ) AS nullhdr > FROM pg_attribute att > JOIN pg_class_alive tbl ON att.attrelid = tbl.oid > JOIN pg_namespace ns ON ns.oid = tbl.relnamespace > LEFT JOIN pg_stats s ON s.schemaname=ns.nspname > AND s.tablename = tbl.relname > AND s.inherited=false > AND s.attname=att.attname > WHERE att.attnum > 0 AND tbl.relkind='r' > GROUP BY 1,2 > ) AS foo > ) AS rs > ON cc.relname = rs.relname AND nn.nspname = rs.nspname > LEFT JOIN pg_index i ON indrelid = cc.oid > LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid > ``` > > it runs in 10s, 20x faster (https://explain.depesz.com/s/K4SH > <https://explain.depesz.com/s/K4SH>) > > The rabbit hole probably goes deeper (e.g. should do the same for > pg_statistic and pg_attribute and create a new pg_stats view). > > I am not able (at least not quickly) change the amount of temporary > tables created or make the analytics queries finish quicker. Apart > from the above hack of filtering out live tuples to a separate table > is there anything I could do? Hi, To avoid bloating your catalog with temporary tables you can try using https://github.com/darold/pgtt-rsl I don't know if it will fit the performances but at least you will not bloat the catalog anymore. About your hack, I don't see other solution except running vacuum on the catalog tables more often, but I guess that this is already done or not possible. But not bloating the catalog at such level is the right solution. -- Gilles Darold http://www.darold.net/
pgsql-performance by date: