strange behavior (corruption?) of large production database - Mailing list pgsql-hackers
From | Joe Conway |
---|---|
Subject | strange behavior (corruption?) of large production database |
Date | |
Msg-id | 4390DC54.2050400@joeconway.com Whole thread Raw |
Responses |
Re: strange behavior (corruption?) of large production database
|
List | pgsql-hackers |
We have very strange behavior from an internal production database. There are multiple symptoms, all pointing to a problem with clusterwide tables. For example: postgres@csdfds1:~> psql -U postgres -p 5433 cyspec Welcome to psql 7.4.8, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit cyspec=# select version(); version ------------------------------------------------------------------------------------- PostgreSQL 7.4.8 on i686-pc-linux-gnu,compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux) (1 row) postgres@csdfds1:~> psql -l List of databases Name | Owner | Encoding ----------------+----------+----------- cyspec | postgres | SQL_ASCII temp_mike | postgres | SQL_ASCII temp_mike_new | postgres | SQL_ASCII temp_mike_orig | postgres | SQL_ASCII template0 | postgres | SQL_ASCII template1 | postgres | SQL_ASCII (6 rows) cyspec=# select * from pg_database; datname | datdba | encoding | datistemplate | datallowconn | datlastsysoid | datvacuumxid | datfrozenxid | datpath | datconfig | datacl ---------+--------+----------+---------------+--------------+---------------+--------------+--------------+---------+-----------+-------- (0 rows) cyspec=# \l List of databases Name | Owner | Encoding ------+-------+---------- (0 rows) No databases found. Additionally: cyspec=# select usename, usesysid from pg_shadow; usename | usesysid ----------+---------- postgres | 1 colxl | 102 colro | 101 l400509 | 105 (4 rows) cyspec=# \c - colprod You are now connected as new user "colprod". cyspec=> \c - colxl You are now connected as new user "colxl". cyspec=> \c - colprod You are now connected as new user "colprod". cyspec=> \c - zxcvvb FATAL: user "zxcvvb" does not exist Previous connection kept The "colprod" user has disappeared from pg_shadow (there was one previously, and it was never intentionally dropped), but I can still connect with that user. The current problem was actually initially found because pg_dump complained that the owner of the colprod schema didn't exist. One more thing: cyspec=# show wal_sync_method; wal_sync_method ----------------- fdatasync (1 row) That works, but SHOW ALL and "select * from pg_settings;" return lines and lines of nothing in psql. (I mean literally blank lines, not even "(0 rows)") After issuing \o /tmp/filename the output is all there, and looks normal. The oddness was first noticed about 3 days after a maintenance shutdown. As far as I have been told, during the maintenance window, there may have been OS level package upgrades, and there was a firmware upgrade done on the storage subsystem (NetApp). Any advice at what to look at/do would be appreciated. This database is somewhere around 1.1 TB in size, so dump and reload is not something we're anxious to do. Joe
pgsql-hackers by date: