Thread: dead tuple difference between pgstattuple and pg_stat_user_tables

dead tuple difference between pgstattuple and pg_stat_user_tables

From

Matthew Tice

Date:

23 August 2024, 19:14:34

Hi All,

I'm trying to understand why there's a difference between what pgstattuple reports and pg_stat_user_tables reports (for the number of dead tuples).

As I understand, pgstattuple and pgstattuple_approx return the exact number of dead tuples (as noted in the documentation) and based on an older Stack Overflow answer the value returned from pg_stat_user_tables "uses the most recent data collected by ANALYZE".

Why would it be that even after analyzing a table the n_dead_tup value is still vastly different than dead_tuple_count?

> SELECT * FROM (SELECT dead_tuple_count from pgstattuple_approx('oban.oban_jobs'))a, (SELECT n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c
ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname = 'oban')b;
-[ RECORD 1 ]-------------------------
dead_tuple_count | 3736
n_dead_tup | 1127044
last_autovacuum | 2024-08-23 16:00:30.983141+00
last_analyze | 2024-08-23 15:33:50.628422+00
now | 2024-08-23 16:01:19.915893+00
autovacuum_count | 446478
SELECT 1

> vacuum (verbose,analyze) oban.oban_jobs;

vacuuming "oban.oban_jobs"
table "oban_jobs": index scan bypassed: 29341 pages from table (0.79% of total) have 1111747 dead item identifiers
launched 2 parallel vacuum workers for index cleanup (planned: 2)
index "oban_jobs_args_index" now contains 18281 row versions in 10232 pages
0 index row versions were removed.
0 index pages were newly deleted.
56 index pages are currently deleted, of which 833 are currently reusable.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
index "oban_jobs_meta_index" now contains 18281 row versions in 9698 pages
0 index row versions were removed.
0 index pages were newly deleted.
35 index pages are currently deleted, of which 621 are currently reusable.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
table "oban_jobs": found 855 removable, 9661 nonremovable row versions in 29341 out of 3727204 pages
1330 dead row versions cannot be removed yet, oldest xmin: 1378705314
Skipped 0 pages due to buffer pins, 3696951 frozen pages.
912 skipped pages using mintxid fork.
CPU: user: 0.12 s, system: 0.08 s, elapsed: 0.22 s.
vacuuming "pg_toast.pg_toast_72454950"
table "pg_toast_72454950": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
0 dead row versions cannot be removed yet, oldest xmin: 1378705314
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 skipped pages using mintxid fork.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
analyzing "oban.oban_jobs"
"oban_jobs": scanned 30000 of 3727204 pages, containing 75 live rows and 10501 dead rows; 75 rows in sample, 9318 estimated total rows
VACUUM

> SELECT * FROM (SELECT dead_tuple_count from pgstattuple_approx('oban.oban_jobs'))a, (SELECT n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c
ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname = 'oban')b;
-[ RECORD 1 ]-------------------------
dead_tuple_count | 1701
n_dead_tup | 1306009
last_autovacuum | 2024-08-23 16:01:31.034229+00
last_analyze | 2024-08-23 16:01:47.85574+00
now | 2024-08-23 16:01:55.734589+00
autovacuum_count | 446479

This is a Google Alloy DB instance running:
> select version();
-[ RECORD 1 ]-------------------------
version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit
SELECT 1

Re: dead tuple difference between pgstattuple and pg_stat_user_tables

From

Adrian Klaver

Date:

23 August 2024, 19:26:51

On 8/23/24 09:14, Matthew Tice wrote:
> Hi All,
> 
> I'm trying to understand why there's a difference between what 
> pgstattuple reports and pg_stat_user_tables reports (for the number of 
> dead tuples).
> 
> As I understand, pgstattuple and pgstattuple_approx return the exact 
> number of dead tuples (as noted in the documentation) and based on an 

https://www.postgresql.org/docs/current/pgstattuple.html

pgstattuple_approx(regclass) returns record

     pgstattuple_approx is a faster alternative to pgstattuple that 
returns approximate results.

Not sure how you get exact count out of that?

> This is a Google Alloy DB instance running:

https://cloud.google.com/alloydb/docs/overview

"AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible 
database service that's designed for your most demanding workloads, 
including hybrid transactional and analytical processing. AlloyDB pairs 
a Google-built database engine with a cloud-based, multi-node 
architecture to deliver enterprise-grade performance, reliability, and 
availability."

Where the important parts are 'PostgreSQL-compatible' and 'Google-built 
database engine'. You probably need to reach out to Google to see what 
that means for this situation.

>  > select version();
> -[ RECORD 1 ]-------------------------
> version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian 
> clang version 12.0.1, 64-bit
> SELECT 1

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: dead tuple difference between pgstattuple and pg_stat_user_tables

From

Matthew Tice

Date:

23 August 2024, 19:33:38

On Fri, Aug 23, 2024 at 10:26 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 8/23/24 09:14, Matthew Tice wrote:
> Hi All,
>
> I'm trying to understand why there's a difference between what
> pgstattuple reports and pg_stat_user_tables reports (for the number of
> dead tuples).
>
> As I understand, pgstattuple and pgstattuple_approx return the exact
> number of dead tuples (as noted in the documentation) and based on an

https://www.postgresql.org/docs/current/pgstattuple.html

pgstattuple_approx(regclass) returns record

pgstattuple_approx is a faster alternative to pgstattuple that
returns approximate results.

Not sure how you get exact count out of that?

Maybe the wording is a little confusing to me. Under the section for pgstattuple_approx:

"pgstattuple_approx tries to avoid the full-table scan and returns exact dead tuple statistics along with an approximation of the number and size of live tuples and free space."

> This is a Google Alloy DB instance running:

https://cloud.google.com/alloydb/docs/overview

"AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible
database service that's designed for your most demanding workloads,
including hybrid transactional and analytical processing. AlloyDB pairs
a Google-built database engine with a cloud-based, multi-node
architecture to deliver enterprise-grade performance, reliability, and
availability."

Where the important parts are 'PostgreSQL-compatible' and 'Google-built
database engine'. You probably need to reach out to Google to see what
that means for this situation.

Got it, thanks Adrian.

> > select version();
> -[ RECORD 1 ]-------------------------
> version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian
> clang version 12.0.1, 64-bit
> SELECT 1

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: dead tuple difference between pgstattuple and pg_stat_user_tables

From

Adrian Klaver

Date:

23 August 2024, 19:51:17

On 8/23/24 09:33, Matthew Tice wrote:
> 
> 
> On Fri, Aug 23, 2024 at 10:26 AM Adrian Klaver 
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

>     https://www.postgresql.org/docs/current/pgstattuple.html
>     <https://www.postgresql.org/docs/current/pgstattuple.html>
> 
>     pgstattuple_approx(regclass) returns record
> 
>           pgstattuple_approx is a faster alternative to pgstattuple that
>     returns approximate results.
> 
>     Not sure how you get exact count out of that?
> 
> 
> Maybe the wording is a little confusing to me. Under the section 
> for pgstattuple_approx:
> "pgstattuple_approx tries to avoid the full-table scan and returns exact 
> dead tuple statistics along with an approximation of the number and size 
> of live tuples and free space."

Yeah, see what you mean.

The part that bears more investigating for this case is:

"It does this by skipping pages that have only visible tuples according 
to the visibility map (if a page has the corresponding VM bit set, then 
it is assumed to contain no dead tuples).

Wondering if PostgreSQl-compatible covers this?




-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: dead tuple difference between pgstattuple and pg_stat_user_tables

From

Adrian Klaver

Date:

23 August 2024, 19:53:15

On 8/23/24 09:51, Adrian Klaver wrote:
> On 8/23/24 09:33, Matthew Tice wrote:
>>
>>
>> On Fri, Aug 23, 2024 at 10:26 AM Adrian Klaver 
>> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>>     https://www.postgresql.org/docs/current/pgstattuple.html
>>     <https://www.postgresql.org/docs/current/pgstattuple.html>
>>
>>     pgstattuple_approx(regclass) returns record
>>
>>           pgstattuple_approx is a faster alternative to pgstattuple that
>>     returns approximate results.
>>
>>     Not sure how you get exact count out of that?
>>
>>
>> Maybe the wording is a little confusing to me. Under the section 
>> for pgstattuple_approx:
>> "pgstattuple_approx tries to avoid the full-table scan and returns 
>> exact dead tuple statistics along with an approximation of the number 
>> and size of live tuples and free space."
> 
> Yeah, see what you mean.
> 
> The part that bears more investigating for this case is:
> 
> "It does this by skipping pages that have only visible tuples according 
> to the visibility map (if a page has the corresponding VM bit set, then 
> it is assumed to contain no dead tuples).
> 
> Wondering if PostgreSQl-compatible covers this?

Meant to add:

What happens if you use pgstattuple instead?

-- 
Adrian Klaver
adrian.klaver@aklaver.com