Re: "PANIC: could not open critical system index 2662" - twice - Mailing list pgsql-general

From Evgeny Morozov
Subject Re: "PANIC: could not open critical system index 2662" - twice
Date
Msg-id 01020187e773e031-eecea370-714f-4858-82e5-ed1c9091eb6f-000000@eu-west-1.amazonses.com
Whole thread Raw
In response to Re: "PANIC: could not open critical system index 2662" - twice  (Alban Hertroys <haramrae@gmail.com>)
Responses Re: "PANIC: could not open critical system index 2662" - twice
List pgsql-general
On 14/04/2023 10:42 am, Alban Hertroys wrote:
> Your problem coincides with a thread at freebsd-current with very
> similar data corruption after a recent OpenZFS import: blocks of all
> zeroes, but also missing files. So, perhaps these problems are related?
> Apparently, there was a recent fix for a data corruption issue with the block_cloning feature enabled, but people are
stillseeing corruption even when they never enabled that feature.
 
>
> I couldn’t really find the start of the thread in the archives, so this one kind of jumps into the middle of the
threadat a relevant-looking point:
 
>
> https://lists.freebsd.org/archives/freebsd-current/2023-April/003446.html

That thread was a bit over my head, I'm afraid, so I can't say if it's
related. I haven't detected any missing files, anyway.


Well, the problem happened again! Kind of... This time PG has not
crashed with the PANIC error in the subject, but pg_dumping certain DBs
again fails with


pg_dump: error: connection to server on socket
"/var/run/postgresql/.s.PGSQL.5434" failed: FATAL:  index
"pg_class_oid_index" contains unexpected zero page at block 0

PG server log contains:

2023-05-03 04:31:49.903 UTC [14724]
postgres@test_behavior_638186279733138190 FATAL:  index
"pg_class_oid_index" contains unexpected zero page at block 0
2023-05-03 04:31:49.903 UTC [14724]
postgres@test_behavior_638186279733138190 HINT:  Please REINDEX it.

The server PID does not change on such a pg_dump attempt, so it appears
that only the backend process for the pg_dump connection crashes this
time. I don't see any disk errors and there haven't been any OS crashes.

This currently happens for two DBs. Both of them are very small DBs
created by automated .NET tests using Npgsql as client. The code creates
such a test DB from a template DB and tries to drop it at the end of the
test. This times out sometimes and on timeout our test code tries to
drop the DB again (while the first drop command is likely still pending
on the server). This second attempt to drop the DB also timed out:

[12:40:39] Npgsql.NpgsqlException : Exception while reading from stream
 ----> System.TimeoutException : Timeout during reading attempt
   at
Npgsql.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|194_0(NpgsqlConnector
connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean
readingNotifications, Boolean isReadingPrependedMessage)
   at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean
isConsuming, CancellationToken cancellationToken)
   at Npgsql.NpgsqlDataReader.NextResult()
   at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior,
Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior,
Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async,
CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery()

...
[12:41:41] (same error again for the same DB)

From looking at old logs it seems like the same thing happened last time
(in April) as well. That's quite an unlikely coincidence if a bad disk
or bad filesystem was to blame, isn't it?

I've tried to reproduce this by re-running those tests over and over,
but without success so far. So what can I do about this? Do I just try
to drop those databases again manually? But what about the next time it
happens? How do I figure out the cause and prevent this problem?



pgsql-general by date:

Previous
From: "Nagendra Mahesh (namahesh)"
Date:
Subject: Re: Invoking SQL function while doing CREATE OR REPLACE on it
Next
From: Laurenz Albe
Date:
Subject: Re: "PANIC: could not open critical system index 2662" - twice