Re: Unable to Connect to DB Instance - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Unable to Connect to DB Instance |
Date | |
Msg-id | 9e42f409-2e90-636a-772d-9ea6ca34d25a@aklaver.com Whole thread Raw |
In response to | Unable to Connect to DB Instance (Boblitz John <john.boblitz@bertschi.com>) |
List | pgsql-general |
On 07/04/2018 04:08 AM, Boblitz John wrote: > > Good Morning, > > Beginning yesterday morning, users have been unable to fully connect > to our DB Instance. > > 1.At the time of the initial report – I was connected to the DB via > pgAdmin and could perform queries without problem. > > 2.Users reported messages similar to “could not open file > "global/11801": No such file or directory” > > 3.At that time, connection logging was turned off and there were no > messages in the log files. > > 4.As this is a development environment, I turned logging on in the > config and restarted the DB > > 5.After restart, neither I, nor the Users could fully reconnect. > > 6.I have performed a SYS Level backup (tar of the whole postgres > directory tree) > > 7.I cannot perform a DB level Backup (same errors occur) > > System Details > > Linux Debian 7.11 > > Postgres 9.1 (9.1.24lts-0+d) > > It appears that we can connect to the DB Server itself as I get > “connection received” and “connection authorized” – but when trying > to access the DB itself, several errors are raised (see below). > > I am assuming that some internals are no longer consistent – the > file “global/11801” for instance really does not exist on the system. > > Questions: > > 1.Is there any way to recover from this (backup is unfortunately > rather old) > > 2.What are possible causes? I’d like to prevent this from happening > on my production servers. Looks like something/someone deleted files from portions of the $DATA directory. In particular from ~/global and ~/pg_tblspc. Without those files you can't really proceed. Can you recover by creating a new instance and restoring from a dump of the production server? As to exact cause the only thing I can think of is to to look at the system logs at the time of the initial failure and see if there is anything there that would shed light. > > ** I am aware that we are on older releases, and yes, we plan to > migrate to more current releases “soon” ™ … > > Thanks in advance. > > John Boblitz > > Exceprt from Log: > > 2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG: > connection received: host=192.168.250.50 port=28559 > > 2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG: > connection authorized: user=dbadmin database=postgres > > 2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR: could > not open file "global/11801": No such file or directory > > 2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT: > SELECT usecreatedb, usesuper, CASE WHEN usesuper THEN > pg_postmaster_start_time() ELSE NULL END as upsince, CASE WHEN > usesuper THEN pg_conf_load_time() ELSE NULL END as confloadedsince, > CASE WHEN usesuper THEN pg_is_in_recovery() ELSE NULL END as > inrecovery, CASE WHEN usesuper THEN pg_last_xlog_receive_location() > ELSE NULL END as receiveloc, CASE WHEN usesuper THEN > pg_last_xlog_replay_location() ELSE NULL END as replayloc, CASE WHEN > usesuper THEN pg_last_xact_replay_timestamp() ELSE NULL END as > replay_timestamp, CASE WHEN usesuper AND pg_is_in_recovery() THEN > pg_is_xlog_replay_paused() ELSE NULL END as isreplaypaused > > FROM pg_user WHERE usename=current_user > > 2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR: could > not open file "global/11801": No such file or directory > > 2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT: > SELECT rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname = > current_user; > > 2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG: > connection received: host=192.168.250.50 port=28561 > > 2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG: > connection authorized: user=dbadmin database=g11Base > > 2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR: could > not open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No > such file or directory > > 2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT: > SELECT CASE WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1 > > WHEN (nspname LIKE E'pg\\_%') THEN 0 > > ELSE 3 END AS nsptyp, > > nsp.nspname, nsp.oid, > pg_get_userbyid(nspowner) AS namespaceowner, nspacl, > description, has_schema_privilege(nsp.oid, 'CREATE') as cancreate, > > (SELECT array_agg(label) FROM pg_seclabels sl1 > WHERE sl1.objoid=nsp.oid) AS labels, > > (SELECT array_agg(provider) FROM pg_seclabels sl2 > WHERE sl2.objoid=nsp.oid) AS providers > > FROM pg_namespace nsp > > LEFT OUTER JOIN pg_description des ON > (des.objoid=nsp.oid AND des.classoid='pg_namespace'::regclass) > > WHERE NOT ((nspname = 'pg_catalog' AND EXISTS > (SELECT 1 FROM pg_class WHERE relname = 'pg_class' AND relnamespace > = nsp.oid LIMIT 1)) OR > > (nspname = 'pgagent' AND EXISTS (SELECT 1 FROM > pg_class WHERE relname = 'pga_job' AND relnamespace = nsp.oid LIMIT > 1)) OR > > (nspname = 'information_schema' AND EXISTS (SELECT > 1 FROM pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid > LIMIT 1)) OR > > (nspname LIKE '_%' AND EXISTS (SELECT 1 FROM > pg_proc WHERE proname='slonyversion' AND pronamespace = nsp.oid > LIMIT 1)) > > ) AND nspname NOT LIKE E'pg\\_temp\\_%'AND nspname > NOT LIKE E'pg\\_toast_temp\\_%' ORDER BY 1, nspname > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: