On Fri, Oct 06, 2023 at 09:20:05AM +0900, Michael Paquier wrote:
> On Thu, Oct 05, 2023 at 11:45:18AM -0500, Justin Pryzby wrote:
> > This table is what it sounds like: a partition into which CSV logs are
> > COPY'ed. It would've been created around 8am. There's no special
> > params set for the table nor for autovacuum.
>
> This may be an important bit of information. 31966b151e6a is new as
> of Postgres 16, has changed the way relations are extended and COPY
> was one area touched. I am adding Andres in CC.
Also, I realized that someone kicked off a process just after 9am which
would've done a lot of INSERT ON CONFLICT DO UPDATE, VACUUM FULL, and
VACUUM. Which consumed and dirtied buffers about 100x faster than normal.
log_time | 2023-10-05 10:00:55.794-05
pid | 31754
left | duration: 51281.001 ms statement: VACUUM (FULL,FREEZE) othertable...
log_time | 2023-10-05 10:01:01.784-05
backend_type | checkpointer
left | checkpoint starting: time
log_time | 2023-10-05 10:01:02.935-05
pid | 10023
left | page verification failed, calculated checksum 5074 but expected 5050
context | COPY postgres_log, line 947
left | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095600.csv' WITH csv
log_time | 2023-10-05 10:01:02.935-05
pid | 10023
left | invalid page in block 119 of relation base/16409/801594131
context | COPY postgres_log, line 947
left | COPY postgres_log FROM '/var/log/postgresql/postgresql-2023-10-05_095600.csv' WITH csv
log_time | 2023-10-05 10:01:11.636-05
pid | 31754
left | duration: 15838.374 ms statement: VACUUM (FREEZE) othertable...
I meant to point out that the issue is on the last block.
postgres=# SELECT pg_relation_size('"BROKEN_postgres_log_2023_10_05_0900"')/8192;
?column? | 120
It sounds like there may be an issue locking (pinning?) a page, or
rather not locking it, or releasing the lock too early.
--
Justin