Re: Very slow "bloat query" - Mailing list pgsql-performance

From Gilles Darold
Subject Re: Very slow "bloat query"
Date
Msg-id 804ea9ef-e9d4-e104-a9f1-d9a94e36013c@darold.net
Whole thread Raw
In response to Very slow "bloat query"  (Marcin Gozdalik <gozdal@gmail.com>)
List pgsql-performance
Le 14/05/2021 à 13:06, Marcin Gozdalik a écrit :
> Hi
>
> I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW)
> database. It takes around 5 minutes for pgmetrics to run. I traced the
> problem to the "bloat query" (version of
> https://wiki.postgresql.org/wiki/Show_database_bloat
> <https://wiki.postgresql.org/wiki/Show_database_bloat>) spinning in
> CPU, doing no I/O.
>
> I have traced the problem to the bloated `pg_class` (the irony:
> `pgmetrics` does not collect bloat on `pg_catalog`):
> `vacuum (full, analyze, verbose) pg_class;`
> ```
> INFO:  vacuuming "pg_catalog.pg_class"
> INFO:  "pg_class": found 1 removable, 7430805 nonremovable row
> versions in 158870 pages
> DETAIL:  7429943 dead row versions cannot be removed yet.
> CPU 1.36s/6.40u sec elapsed 9.85 sec.
> INFO:  analyzing "pg_catalog.pg_class"
> INFO:  "pg_class": scanned 60000 of 158869 pages, containing 295 live
> rows and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
> VACUUM
> ```
>
> `pg_class` has so many dead rows because the workload is temp-table
> heavy (creating/destroying 1M+ temporary tables per day) and has long
> running analytics queries running for 24h+.
>
> PG query planner assumes that index scan on `pg_class` will be very
> quick and plans Nested loop with Index scan. However, the index scan
> has 7M dead tuples to filter out and the query takes more than 200
> seconds (https://explain.depesz.com/s/bw2G
> <https://explain.depesz.com/s/bw2G>).
>
> If I create a temp table from `pg_class` to contain only the live tuples:
> ```
> CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
> CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
> CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON
> pg_class_alive(relname, relnamespace);
> CREATE INDEX pg_class_tblspc_relfilenode_index ON
> pg_class_alive(reltablespace, relfilenode);
> ANALYZE pg_class_alive;
> ```
>
> and run the bloat query on `pg_class_alive` instead of `pg_class`:
> ```
> SELECT
>    nn.nspname AS schemaname,
>    cc.relname AS tablename,
>    COALESCE(cc.reltuples,0) AS reltuples,
>    COALESCE(cc.relpages,0) AS relpages,
>    COALESCE(CEIL((cc.reltuples*((datahdr+8-
>      (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8
> END))+nullhdr2+4))/(8192-20::float)),0) AS otta
>  FROM
>     pg_class_alive cc
>  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <>
> 'information_schema'
>  LEFT JOIN
>  (
>    SELECT
>      foo.nspname,foo.relname,
>      (datawidth+32)::numeric AS datahdr,
>      (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE
> nullhdr%8 END))) AS nullhdr2
>    FROM (
>      SELECT
>        ns.nspname, tbl.relname,
>        SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS
> datawidth,
>        MAX(coalesce(null_frac,0)) AS maxfracsum,
>        23+(
>          SELECT 1+count(*)/8
>          FROM pg_stats s2
>          WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND
> s2.tablename = tbl.relname
>        ) AS nullhdr
>      FROM pg_attribute att
>      JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
>      JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
>      LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
>      AND s.tablename = tbl.relname
>      AND s.inherited=false
>      AND s.attname=att.attname
>      WHERE att.attnum > 0 AND tbl.relkind='r'
>      GROUP BY 1,2
>    ) AS foo
>  ) AS rs
>  ON cc.relname = rs.relname AND nn.nspname = rs.nspname
>  LEFT JOIN pg_index i ON indrelid = cc.oid
>  LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid
> ```
>
> it runs in 10s, 20x faster (https://explain.depesz.com/s/K4SH
> <https://explain.depesz.com/s/K4SH>)
>
> The rabbit hole probably goes deeper (e.g. should do the same for
> pg_statistic and pg_attribute and create a new pg_stats view).
>
> I am not able (at least not quickly) change the amount of temporary
> tables created or make the analytics queries finish quicker. Apart
> from the above hack of filtering out live tuples to a separate table
> is there anything I could do?


Hi,


To avoid bloating your catalog with temporary tables you can try using
https://github.com/darold/pgtt-rsl I don't know if it will fit the
performances but at least you will not bloat the catalog anymore.


About your hack, I don't see other solution except running vacuum on the
catalog tables more often, but I guess that this is already done or not
possible. But not bloating the catalog at  such level is the right solution.


--
Gilles Darold
http://www.darold.net/





pgsql-performance by date:

Previous
From: Marcin Gozdalik
Date:
Subject: Very slow "bloat query"
Next
From: Imre Samu
Date:
Subject: Re: Very slow "bloat query"