Re: Flush pgstats file during checkpoints - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Flush pgstats file during checkpoints
Date
Msg-id 54fdfc3e-74f6-4ea4-b844-05aa66b39490@iki.fi
Whole thread Raw
In response to Re: Flush pgstats file during checkpoints  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
>> It'd not be such an issue if we updated stats during recovery, but I
>> think think we're doing that. Perhaps we should, which might also help
>> on replicas - no idea if it's feasible, though.
> 
> Stats on replicas are considered an independent thing AFAIU (scans are
> counted for example in read-only queries).  If we were to do that we
> may want to split stats handling between nodes in standby state and
> crash recovery.  Not sure if that's worth the complication.  First,
> the stats exist at node-level.

Hmm, I'm a bit disappointed this doesn't address replication. It makes 
sense that scans are counted separately on a standby, but it would be 
nice if stats like last_vacuum were propagated from primary to standbys. 
  I guess that can be handled separately later.


Reviewing v7-0001-Flush-pgstats-file-during-checkpoints.patch:

There are various race conditions where a stats entry can be leaked in 
the pgstats file. I.e. relation is dropped, but its stats entry is 
retained in the stats file after crash. In the worst case, suck leaked 
entries can accumulate until the stats file is manually removed, which 
resets all stats again. Perhaps that's acceptable - it's still possible 
leak the actual relation file for a new relation on crash, after all, 
which is much worse (I'm glad Horiguchi-san is working on that [1]).

For example:
1. BEGIN; CREATE TABLE foo (); ANALYZE foo;
2. CHECKPOINT;
3. pg_ctl restart -m immediate

This is the same scenario where we leak the relfile, but now you can 
have it with e.g. function statistics too.

Until 5891c7a8ed, there was a mechanism to garbage collect such orphaned 
entries (pgstat_vacuum()). How bad would it be to re-introduce that? Or 
can we make it more watertight so that there are no leaks?


If you do this:

pg_ctl start -D data
pg_ctl stop -D data -m immediate
pg_ctl start -D data
pg_ctl stop -D data -m immediate

You get this in the log:

2024-09-02 16:28:37.874 EEST [1397281] WARNING:  found incorrect redo 
LSN 0/160A3C8 (expected 0/160A440)

I think it's failing to flush the stats file at the end of recovery 
checkpoint.


[1] 
https://www.postgresql.org/message-id/20240901.010925.656452225144636594.horikyota.ntt%40gmail.com

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: proposal: schema variables
Next
From: Bertrand Drouvot
Date:
Subject: Re: Track IO times in pg_stat_io