Re: Unable to Connect to DB Instance - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Unable to Connect to DB Instance
Date
Msg-id 9e42f409-2e90-636a-772d-9ea6ca34d25a@aklaver.com
Whole thread Raw
In response to Unable to Connect to DB Instance  (Boblitz John <john.boblitz@bertschi.com>)
List pgsql-general
On 07/04/2018 04:08 AM, Boblitz John wrote:

> 
>     Good Morning,
> 
>     Beginning yesterday morning, users have been unable to fully connect
>     to our DB Instance.
> 
>     1.At the time of the initial report – I was connected to the DB via
>     pgAdmin and could perform queries without problem.
> 
>     2.Users reported messages similar to “could not open file
>     "global/11801": No such file or directory”
> 
>     3.At that time, connection logging was turned off and there were no
>     messages in the log files.
> 
>     4.As this is a development environment, I turned logging on in the
>     config and restarted the DB
> 
>     5.After restart, neither I, nor the Users could fully reconnect.
> 
>     6.I have performed a SYS Level backup (tar of the whole postgres
>     directory tree)
> 
>     7.I cannot perform a DB level Backup (same errors occur)
> 
>     System Details
> 
>                      Linux Debian      7.11
> 
>                      Postgres              9.1 (9.1.24lts-0+d)
> 

>     It appears that we can connect to the DB Server itself as I get
>     “connection received” and “connection authorized” – but when trying
>     to access the DB itself, several errors are raised (see below).
> 
>     I am assuming that some internals are no longer consistent – the
>     file “global/11801” for instance really does not exist on the system.
> 
>     Questions:
> 
>     1.Is there any way to recover from this (backup is unfortunately
>     rather old) >
>     2.What are possible causes?  I’d like to prevent this from happening
>     on my production servers.

Looks like something/someone deleted files from portions of  the $DATA 
directory. In particular from ~/global and ~/pg_tblspc. Without those 
files you can't really proceed.

Can you recover by creating a new instance and restoring from a dump of 
the production server?

As to exact cause the only thing I can think of is to to look at the 
system logs at the time of the initial failure and see if there is 
anything there that would shed light.

> 
>     ** I am aware that we are on older releases, and yes, we plan to
>     migrate to more current releases “soon” ™ …

> 
>     Thanks in advance.
> 
>     John Boblitz
> 
>     Exceprt from Log:
> 
>     2018-07-04 09:15:13 CEST 192.168.250.50(28559) [unknown]LOG: 
>     connection received: host=192.168.250.50 port=28559
> 
>     2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresLOG: 
>     connection authorized: user=dbadmin database=postgres
> 
>     2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresERROR:  could
>     not open file "global/11801": No such file or directory
> 
>     2018-07-04 09:15:14 CEST 192.168.250.50(28559) postgresSTATEMENT: 
>     SELECT usecreatedb, usesuper, CASE WHEN usesuper THEN
>     pg_postmaster_start_time() ELSE NULL END as upsince, CASE WHEN
>     usesuper THEN pg_conf_load_time() ELSE NULL END as confloadedsince,
>     CASE WHEN usesuper THEN pg_is_in_recovery() ELSE NULL END as
>     inrecovery, CASE WHEN usesuper THEN pg_last_xlog_receive_location()
>     ELSE NULL END as receiveloc, CASE WHEN usesuper THEN
>     pg_last_xlog_replay_location() ELSE NULL END as replayloc, CASE WHEN
>     usesuper THEN pg_last_xact_replay_timestamp() ELSE NULL END as
>     replay_timestamp, CASE WHEN usesuper AND pg_is_in_recovery() THEN
>     pg_is_xlog_replay_paused() ELSE NULL END as isreplaypaused
> 
>                        FROM pg_user WHERE usename=current_user
> 
>     2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresERROR:  could
>     not open file "global/11801": No such file or directory
> 
>     2018-07-04 09:15:19 CEST 192.168.250.50(28559) postgresSTATEMENT: 
>     SELECT rolcreaterole, rolcreatedb FROM pg_roles WHERE rolname =
>     current_user;
> 
>     2018-07-04 09:15:22 CEST 192.168.250.50(28561) [unknown]LOG: 
>     connection received: host=192.168.250.50 port=28561
> 
>     2018-07-04 09:15:22 CEST 192.168.250.50(28561) g11BaseLOG: 
>     connection authorized: user=dbadmin database=g11Base
> 
>     2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseERROR:  could
>     not open file "pg_tblspc/24579/PG_9.1_201105231/24580/11866": No
>     such file or directory
> 
>     2018-07-04 09:15:23 CEST 192.168.250.50(28561) g11BaseSTATEMENT: 
>     SELECT CASE WHEN nspname LIKE E'pg\\_temp\\_%' THEN 1
> 
>                                  WHEN (nspname LIKE E'pg\\_%') THEN 0
> 
>                                  ELSE 3 END AS nsptyp,
> 
>                             nsp.nspname, nsp.oid,
>     pg_get_userbyid(nspowner) AS namespaceowner, nspacl,
>     description,       has_schema_privilege(nsp.oid, 'CREATE') as cancreate,
> 
>                      (SELECT array_agg(label) FROM pg_seclabels sl1
>     WHERE sl1.objoid=nsp.oid) AS labels,
> 
>                      (SELECT array_agg(provider) FROM pg_seclabels sl2
>     WHERE sl2.objoid=nsp.oid) AS providers
> 
>                        FROM pg_namespace nsp
> 
>                        LEFT OUTER JOIN pg_description des ON
>     (des.objoid=nsp.oid AND des.classoid='pg_namespace'::regclass)
> 
>                      WHERE NOT ((nspname = 'pg_catalog' AND EXISTS
>     (SELECT 1 FROM pg_class WHERE relname = 'pg_class' AND relnamespace
>     = nsp.oid LIMIT 1)) OR
> 
>                      (nspname = 'pgagent' AND EXISTS (SELECT 1 FROM
>     pg_class WHERE relname = 'pga_job' AND relnamespace = nsp.oid LIMIT
>     1)) OR
> 
>                      (nspname = 'information_schema' AND EXISTS (SELECT
>     1 FROM pg_class WHERE relname = 'tables' AND relnamespace = nsp.oid
>     LIMIT 1)) OR
> 
>                      (nspname LIKE '_%' AND EXISTS (SELECT 1 FROM
>     pg_proc WHERE proname='slonyversion' AND pronamespace = nsp.oid
>     LIMIT 1))
> 
>                      )  AND nspname NOT LIKE E'pg\\_temp\\_%'AND nspname
>     NOT LIKE E'pg\\_toast_temp\\_%' ORDER BY 1, nspname
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Unable to Connect to DB Instance
Next
From: Tom Lane
Date:
Subject: Re: Postgres sometimes stalling on 'percentile_cont'