Re: could not open file "global/pg_filenode.map": Operation not permitted - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: could not open file "global/pg_filenode.map": Operation not permitted |
Date | |
Msg-id | 3250f7cb-0ced-469b-ad15-8088a9185e87@aklaver.com Whole thread Raw |
In response to | Re: could not open file "global/pg_filenode.map": Operation not permitted (Nick Renders <postgres@arcict.com>) |
List | pgsql-general |
On 3/12/24 02:57, Nick Renders wrote: > On 11 Mar 2024, at 16:04, Adrian Klaver wrote: > >> On 3/11/24 03:11, Nick Renders wrote: >>> Thank you for your reply Laurenz. >>> I don't think it is related to any third party security software. We have several other machines with a similar setup,but this is the only server that has this issue. >>> >>> The one thing different about this machine however, is that it runs 2 instances of Postgres: >>> - cluster A on port 165 >>> - cluster B on port 164 >>> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This meansthat we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder,restore the latest back into the Data folder, and start up the service again. >>> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences theintermittent "operation not permitted" issue. >>> >>> Over the past 2 weeks, I have suspended our restore script and the issue did not occur. >>> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again. >> >> Since it seems to be the trigger, what are the contents of the restore script? >> >>> >>> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway. >>> >>> Thanks, >>> >>> Nick Renders >>> >> >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com > > > >> ...how are A and B connected? > > The 2 cluster are not connected. They run on the same macOS 14 machine with a single Postgres installation ( /Library/PostgreSQL/16/) and their respective Data folders are located on the same volume ( /Volumes/Postgres_Data/PostgreSQL/16/dataand /Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run independentlyon 2 different ports, specified in the postgresql.conf. > > >> ...run them under different users on the system. > > Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories? > > >> ...what are the contents of the restore script? > > ## stop cluster A > ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data stop' > > ## save config files (ARC_postgresql_16.conf is included in postgresql.conf and contains cluster-specific information likethe port number) > ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp ARC_postgresql_16.conf ../ARC_postgresql_16.conf' > ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp pg_hba.conf ../pg_hba.conf' > > ## clear data directory > ssh postgres@10.0.0.1 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*' > > ## transfer recovery (this will copy the backup "20240312T040106" and any lingering WAL files into the Data folder) > barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data > > ## restore config files > ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv ARC_postgresql_16.conf /Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf' > ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv pg_hba.conf /Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf' > > ## start cluster A > ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data start > /dev/null' > > > This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even thoughthe issue is most likely related to the script, it does not cause it every time. I'm not seeing anything obvious, caveat I'm on my first cup of coffee. From your first post: 2024-02-26 10:29:41.580 CET [63962] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:30:11.147 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway For now the only suggestion I have is note the presence, ownership and privileges of the above files in the present working setup. Then when it fails do the same and see if there is a difference. My hunch it is in this step: barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data If not the step itself then in the process that creates 20240312T040106. > > > Best regards, > > Nick Renders > > > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: