Re: could not open file "global/pg_filenode.map": Operation not permitted - Mailing list pgsql-general

From Adrian Klaver
Subject Re: could not open file "global/pg_filenode.map": Operation not permitted
Date
Msg-id 3250f7cb-0ced-469b-ad15-8088a9185e87@aklaver.com
Whole thread Raw
In response to Re: could not open file "global/pg_filenode.map": Operation not permitted  (Nick Renders <postgres@arcict.com>)
List pgsql-general
On 3/12/24 02:57, Nick Renders wrote:
> On 11 Mar 2024, at 16:04, Adrian Klaver wrote:
> 
>> On 3/11/24 03:11, Nick Renders wrote:
>>> Thank you for your reply Laurenz.
>>> I don't think it is related to any third party security software. We have several other machines with a similar
setup,but this is the only server that has this issue.
 
>>>
>>> The one thing different about this machine however, is that it runs 2 instances of Postgres:
>>> - cluster A on port 165
>>> - cluster B on port 164
>>> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This
meansthat we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data
folder,restore the latest back into the Data folder, and start up the service again.
 
>>> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences
theintermittent "operation not permitted" issue.
 
>>>
>>> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
>>> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
>>
>> Since it seems to be the trigger, what are the contents of the restore script?
>>
>>>
>>> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.
>>>
>>> Thanks,
>>>
>>> Nick Renders
>>>
>>
>>
>> -- 
>> Adrian Klaver
>> adrian.klaver@aklaver.com
> 
> 
> 
>> ...how are A and B connected?
> 
> The 2 cluster are not connected. They run on the same macOS 14 machine with a single Postgres installation (
/Library/PostgreSQL/16/) and their respective Data folders are located on the same volume (
/Volumes/Postgres_Data/PostgreSQL/16/dataand /Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run
independentlyon 2 different ports, specified in the postgresql.conf.
 
> 
> 
>> ...run them under different users on the system.
> 
> Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?
> 
> 
>> ...what are the contents of the restore script?
> 
> ## stop cluster A
> ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data stop'
> 
> ## save config files (ARC_postgresql_16.conf is included in postgresql.conf and contains cluster-specific information
likethe port number)
 
> ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp ARC_postgresql_16.conf
../ARC_postgresql_16.conf'
> ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp pg_hba.conf ../pg_hba.conf'
> 
> ## clear data directory
> ssh postgres@10.0.0.1 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*'
> 
> ## transfer recovery (this will copy the backup "20240312T040106" and any lingering WAL files into the Data folder)
> barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106
/Volumes/Postgres_Data/PostgreSQL/16/data
> 
> ## restore config files
> ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv ARC_postgresql_16.conf
/Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf'
> ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv pg_hba.conf
/Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf'
> 
> ## start cluster A
> ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data start >
/dev/null'
> 
> 
> This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even
thoughthe issue is most likely related to the script, it does not cause it every time.
 

I'm not seeing anything obvious, caveat I'm on my first cup of coffee.

 From your first post:

2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file 
"postmaster.pid": Operation not permitted; continuing anyway

For now the only suggestion I have is note the presence, ownership and 
privileges of the above files in the present working setup. Then when it 
fails do the same and see if there is a difference. My hunch it is in 
this step:

barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 
20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data

If not the step itself then in the process that creates 20240312T040106.

> 
> 
> Best regards,
> 
> Nick Renders
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




pgsql-general by date:

Previous
From: Amna Abdul Rehman
Date:
Subject: Postgresql docker health check
Next
From: Avi Weinberg
Date:
Subject: Simple way to simulate a bug in logical replication