Re: "PANIC: could not open critical system index 2662" - twice - Mailing list pgsql-general

From Evgeny Morozov
Subject Re: "PANIC: could not open critical system index 2662" - twice
Date
Msg-id 01020187f6fa8f05-d1bd9975-48ec-4d8d-9ab7-75478400100d-000000@eu-west-1.amazonses.com
Whole thread Raw
In response to Re: "PANIC: could not open critical system index 2662" - twice  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: "PANIC: could not open critical system index 2662" - twice
Re: "PANIC: could not open critical system index 2662" - twice
List pgsql-general
On 6/05/2023 11:13 pm, Thomas Munro wrote:
> Did you previously run this same workload on versions < 15 and never
> see any problem?
Yes, kind of. We have a test suite that creates one test DB and runs a
bunch of tests on it. Two of these tests, however, create another DB
each (also by cloning the same template DB) in order to test copying
data between DBs. It's only these "extra" DBs that were corrupted, at
least on this occasion. (Hard to say about the last time, because that
time it all went south and the whole server crashed, and we may have had
some residual corruption from bad disks then - who knows.) I'm not sure
whether the tests that created the extra DBs existed before we upgraded
to PG 15, but we definitely have not seen such problems on PG 13 or 14.

> It seems like you have some kind of high frequency testing workload that creates and tests databases all day long,
andjust occasionally detects this corruption.
 
Maybe 10-30 times per day normally, depending on the day. However, I
have tried to repro this by running those two specific tests thousands
of times in one day, without success.
> Would you like to try requesting FILE_COPY for a while and see if it eventually happens like that too?
Sure, we can try that.

On 7/05/2023 12:30 pm, Thomas Munro wrote:
> your "zfs get all /path/to/pgdata"

PROPERTY              VALUE                  SOURCE
type                  filesystem             -
creation              Mon Mar  6 17:07 2023  -
used                  166G                   -
available             2.34T                  -
referenced            166G                   -
compressratio         2.40x                  -
mounted               yes                    -
quota                 none                   default
reservation           none                   default
recordsize            16K                    local
mountpoint            /default
sharenfs              off                    default
checksum              on                     default
compression           lz4                    received
atime                 off                    inherited from pgdata
devices               on                     default
exec                  off                    inherited from pgdata
setuid                off                    inherited from pgdata
readonly              off                    default
zoned                 off                    default
snapdir               hidden                 default
aclinherit            restricted             default
createtxg             90                     -
canmount              on                     received
xattr                 on                     default
copies                1                      default
version               5                      -
utf8only              off                    -
normalization         none                   -
casesensitivity       sensitive              -
vscan                 off                    default
nbmand                off                    default
sharesmb              off                    default
refquota              none                   default
refreservation        none                   default
primarycache          all                    default
secondarycache        all                    default
usedbysnapshots       199M                   -
usedbydataset         166G                   -
usedbychildren        0B                     -
usedbyrefreservation  0B                     -
logbias               latency                default
dedup                 off                    default
mlslabel              none                   default
sync                  standard               default
dnodesize             legacy                 default
refcompressratio      2.40x                  -
written               64.9M                  -
logicalused           397G                   -
logicalreferenced     397G                   -
volmode               default                default
filesystem_limit      none                   default
snapshot_limit        none                   default
filesystem_count      none                   default
snapshot_count        none                   default
snapdev               hidden                 default
acltype               off                    default
context               none                   default
fscontext             none                   default
defcontext            none                   default
rootcontext           none                   default
relatime              off                    default
redundant_metadata    all                    default
overlay               off                    default

> your postgresql.conf?

We have a bunch of config files, so I tried to get the resulting config
using "select name, setting from pg_settings where source =
'configuration file'" - hopefully that gives what you wanted.

            name            |                       
setting                       
----------------------------+-------------------------------------------------------
 archive_command            | pgbackrest --stanza="behavior-pg15"
archive-push "%p"
 archive_mode               | on
 archive_timeout            | 900
 cluster_name               | 15/behavior
 DateStyle                  | ISO, MDY
 default_text_search_config | pg_catalog.english
 dynamic_shared_memory_type | posix
 external_pid_file          | /var/run/postgresql/15-behavior.pid
 full_page_writes           | off
 lc_messages                | C
 lc_monetary                | C
 lc_numeric                 | C
 lc_time                    | C
 listen_addresses           | *
 log_checkpoints            | on
 log_connections            | on
 log_disconnections         | on
 log_file_mode              | 0640
 log_line_prefix            | %m [%p] %q%u@%d
 log_lock_waits             | on
 log_min_duration_statement | 1000
 log_temp_files             | 0
 log_timezone               | Etc/UTC
 maintenance_work_mem       | 1048576
 max_connections            | 100
 max_slot_wal_keep_size     | 30000
 max_wal_size               | 1024
 min_wal_size               | 80
 port                       | 5434
 shared_buffers             | 4194304
 ssl                        | on
 ssl_cert_file              | (redacted)
 ssl_ciphers                | TLSv1.2:TLSv1.3:!aNULL
 ssl_dh_params_file         | (redacted)
 ssl_key_file               | (redacted)
 ssl_min_protocol_version   | TLSv1.2
 temp_buffers               | 10240
 TimeZone                   | Etc/UTC
 unix_socket_directories    | /var/run/postgresql
 wal_compression            | pglz
 wal_init_zero              | off
 wal_level                  | replica
 wal_recycle                | off
 work_mem                   | 262144

> And your exact Ubuntu kernel version and ZFS package versions? 

Ubuntu 18.04.6
Kernel 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023
x86_64 x86_64 x86_64 GNU/Linux
zfsutils-linux package version 0.7.5-1ubuntu16.12 amd64





pgsql-general by date:

Previous
From: Marc Millas
Date:
Subject: Re: Death postgres
Next
From: Adrian Klaver
Date:
Subject: Re: Death postgres