Thread: pg_stats.avg_width
Hello, The pg_stats.avg_width field is documented [1] as "Average width in bytes of column's entries" but it's not defined exactly what "entries" means here with respect to STORAGE (the underlying pg_statistic documentation doesn't clarify this either). I thought initially this was the "logical" size of the values, but I ran an experiment that suggests this interpretation is not right: maciek=# create table foo(a text); CREATE TABLE maciek=# insert into foo(a) select string_agg(floor((random() * 10)::numeric)::text, '') from generate_series(1,1000000) g; INSERT 0 1 maciek=# analyze foo; ANALYZE maciek=# select avg_width from pg_stats where tablename = 'foo' and attname = 'a'; avg_width ----------- 18 (1 row) maciek=# select length(a) from foo; length --------- 1000000 (1 row) maciek=# select reltoastrelid::regclass from pg_class where relname = 'foo'; reltoastrelid --------------------------- pg_toast.pg_toast_6454708 (1 row) maciek=# select sum(length(chunk_data)) from pg_toast.pg_toast_6454708; sum -------- 724257 (1 row) So the avg_width here appears to correspond to neither the logical size nor the compressed toasted size. Am I missing something? Postgres 14.7 in case that matters. Thanks, Maciek [1]: https://www.postgresql.org/docs/current/view-pg-stats.html
Maciek Sakrejda <m.sakrejda@gmail.com> writes: > The pg_stats.avg_width field is documented [1] as "Average width in > bytes of column's entries" but it's not defined exactly what "entries" > means here with respect to STORAGE (the underlying pg_statistic > documentation doesn't clarify this either). I thought initially this > was the "logical" size of the values, but I ran an experiment that > suggests this interpretation is not right: Nope. What that's meant to be is the size of value that would be passed around inside the executor. In this case: > maciek=# select avg_width from pg_stats where tablename = 'foo' and > attname = 'a'; > avg_width > ----------- > 18 what you are getting is the size of the TOAST pointer. That's correct for the planner's purposes, because it'd be the TOAST pointer not the detoasted value that would be passed through joins, sorts, hashes, etc. And we really only care about how much space would be needed for things like sort temp files. regards, tom lane
Thanks, that makes sense. It was going to be my third guess, but it seemed pretty wide for a TOAST pointer. Reviewing what goes in there, though, it's reasonable. I assume that this means for unTOASTed but compressed data, this counts the compressed size. Would a doc patch clarifying this (and possibly linking to the relevant TOAST docs [1]) be welcome? The current wording is pretty vague. Something like diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 7c09ab3000..2814ac8007 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -7466,7 +7466,9 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l <structfield>stawidth</structfield> <type>int4</type> </para> <para> - The average stored width, in bytes, of nonnull entries + The average stored width, in bytes, of nonnull entries. For compressed + entries, counts the compressed size; for TOASTed data, the size of the + TOAST pointer (see <link linkend="storage-toast">TOAST</link>). </para></entry> </row> diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index bb1a418450..62184fe32b 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -3680,7 +3680,9 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx <structfield>avg_width</structfield> <type>int4</type> </para> <para> - Average width in bytes of column's entries + Average width in bytes of column's entries. For compressed entries, + counts the compressed size; for TOASTed data, the size of the TOAST + pointer (see <link linkend="storage-toast">TOAST</link>). </para></entry> </row> (not sure if this should be <link /> or <xref />). Thanks, Maciek [1]: https://www.postgresql.org/docs/current/storage-toast.html