Thread: could not open file "global/pg_filenode.map": Operation not permitted

could not open file "global/pg_filenode.map": Operation not permitted

From
Nick Renders
Date:
Hello,

We have a Postgres server that intermittently logs the following:

2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing
anyway
2024-02-26 10:30:11.149 CET [63975] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:35.941 CET [63986] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:41.546 CET [63991] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:44.398 CET [63994] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:31:11.149 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing
anyway
2024-02-26 10:31:11.151 CET [64008] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:31:41.546 CET [64023] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:32:11.150 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing
anyway
2024-02-26 10:32:11.153 CET [64035] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:32:41.547 CET [64050] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:33:11.151 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing
anyway
2024-02-26 10:33:11.153 CET [64062] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:33:41.548 CET [64087] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted


This has happened 3 times over the last 2 weeks now, without any indication what caused it.
The privileges of those 2 files are all in order.
When this happens, the server is no longer accessible, and we need to restart the service (pg_ctl restart).
Once restarted, Popstgres runs fine again for a couple of days.

We are running PostgreSQL 16.2 on macOS 14.3.1.

Any idea what might be causing this issue, or how to resolve it?


Best regards,

Nick Renders



On Mon, 2024-02-26 at 15:14 +0100, Nick Renders wrote:
> We have a Postgres server that intermittently logs the following:
>
> 2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing
anyway
>
> This has happened 3 times over the last 2 weeks now, without any indication what caused it.
> The privileges of those 2 files are all in order.
> When this happens, the server is no longer accessible, and we need to restart the service (pg_ctl restart).
> Once restarted, Popstgres runs fine again for a couple of days.
>
> We are running PostgreSQL 16.2 on macOS 14.3.1.

Perhaps that is some kind of virus checker or something else that locks files.

Yours,
Laurenz Albe



Thank you for your reply Laurenz.
I don't think it is related to any third party security software. We have several other machines with a similar setup,
butthis is the only server that has this issue. 

The one thing different about this machine however, is that it runs 2 instances of Postgres:
- cluster A on port 165
- cluster B on port 164
Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means
thatwe login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder,
restorethe latest back into the Data folder, and start up the service again. 
Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences the
intermittent"operation not permitted" issue. 

Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.

Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.

Thanks,

Nick Renders


On 26 Feb 2024, at 16:29, Laurenz Albe wrote:

> On Mon, 2024-02-26 at 15:14 +0100, Nick Renders wrote:
>> We have a Postgres server that intermittently logs the following:
>>
>> 2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
>> 2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing
anyway
>>
>> This has happened 3 times over the last 2 weeks now, without any indication what caused it.
>> The privileges of those 2 files are all in order.
>> When this happens, the server is no longer accessible, and we need to restart the service (pg_ctl restart).
>> Once restarted, Popstgres runs fine again for a couple of days.
>>
>> We are running PostgreSQL 16.2 on macOS 14.3.1.
>
> Perhaps that is some kind of virus checker or something else that locks files.
>
> Yours,
> Laurenz Albe



On Mon, 2024-03-11 at 11:11 +0100, Nick Renders wrote:
> We have several other machines with a similar setup, but this is the only server that has this issue.
>
> [...] Cluster A is actually a backup from another Postgres server that is restored on a
> daily basis via Barman. This means that we login remotely from the Barman server over SSH,
> stop cluster A's service (port 165), clear the Data folder, restore the latest back into
> the Data folder, and start up the service again.
> Cluster B's Data and service (port 164) remain untouched during all this time. This is
> the cluster that experiences the intermittent "operation not permitted" issue.
>
> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
>
> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.

I don't know Barman, but with that incomplete description anybody will have
problems determining the cause.  For example, how are A and B connected?

Yours,
Laurenz Albe



Greetings,

* Nick Renders (postgres@arcict.com) wrote:
> The one thing different about this machine however, is that it runs 2 instances of Postgres:
> - cluster A on port 165
> - cluster B on port 164
> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means
thatwe login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder,
restorethe latest back into the Data folder, and start up the service again. 
> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences
theintermittent "operation not permitted" issue. 
>
> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
>
> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.

Not sure why it's happening but they certainly sound related based on
the correlation.

One thing that I suggest doing when you're running multiple independent
PG clusters on the same host is to run them under different users on the
system.  Perhaps if you move cluster B to a distinct user account,
you'll have better luck figuring out what's going on since something
will presumably start throwing permission denied errors.

Not sure if it's an option or not, but you might also consider using
Linux instead of MacOS..

Thanks,

Stephen

Attachment
On 3/11/24 03:11, Nick Renders wrote:
> Thank you for your reply Laurenz.
> I don't think it is related to any third party security software. We have several other machines with a similar
setup,but this is the only server that has this issue.
 
> 
> The one thing different about this machine however, is that it runs 2 instances of Postgres:
> - cluster A on port 165
> - cluster B on port 164
> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means
thatwe login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder,
restorethe latest back into the Data folder, and start up the service again.
 
> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences
theintermittent "operation not permitted" issue.
 
> 
> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.

Since it seems to be the trigger, what are the contents of the restore 
script?

> 
> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.
> 
> Thanks,
> 
> Nick Renders
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com




On 11 Mar 2024, at 16:04, Adrian Klaver wrote:

> On 3/11/24 03:11, Nick Renders wrote:
>> Thank you for your reply Laurenz.
>> I don't think it is related to any third party security software. We have several other machines with a similar
setup,but this is the only server that has this issue. 
>>
>> The one thing different about this machine however, is that it runs 2 instances of Postgres:
>> - cluster A on port 165
>> - cluster B on port 164
>> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means
thatwe login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder,
restorethe latest back into the Data folder, and start up the service again. 
>> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences
theintermittent "operation not permitted" issue. 
>>
>> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
>> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
>
> Since it seems to be the trigger, what are the contents of the restore script?
>
>>
>> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.
>>
>> Thanks,
>>
>> Nick Renders
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com



> ...how are A and B connected?

The 2 cluster are not connected. They run on the same macOS 14 machine with a single Postgres installation (
/Library/PostgreSQL/16/) and their respective Data folders are located on the same volume (
/Volumes/Postgres_Data/PostgreSQL/16/dataand /Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run
independentlyon 2 different ports, specified in the postgresql.conf. 


> ...run them under different users on the system.

Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?


> ...what are the contents of the restore script?

## stop cluster A
ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data stop'

## save config files (ARC_postgresql_16.conf is included in postgresql.conf and contains cluster-specific information
likethe port number) 
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp ARC_postgresql_16.conf
../ARC_postgresql_16.conf'
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp pg_hba.conf ../pg_hba.conf'

## clear data directory
ssh postgres@10.0.0.1 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*'

## transfer recovery (this will copy the backup "20240312T040106" and any lingering WAL files into the Data folder)
barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106
/Volumes/Postgres_Data/PostgreSQL/16/data

## restore config files
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv ARC_postgresql_16.conf
/Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf'
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv pg_hba.conf
/Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf'

## start cluster A
ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data start >
/dev/null'


This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even
thoughthe issue is most likely related to the script, it does not cause it every time. 


Best regards,

Nick Renders




On 3/12/24 02:57, Nick Renders wrote:
> On 11 Mar 2024, at 16:04, Adrian Klaver wrote:
> 
>> On 3/11/24 03:11, Nick Renders wrote:
>>> Thank you for your reply Laurenz.
>>> I don't think it is related to any third party security software. We have several other machines with a similar
setup,but this is the only server that has this issue.
 
>>>
>>> The one thing different about this machine however, is that it runs 2 instances of Postgres:
>>> - cluster A on port 165
>>> - cluster B on port 164
>>> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This
meansthat we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data
folder,restore the latest back into the Data folder, and start up the service again.
 
>>> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences
theintermittent "operation not permitted" issue.
 
>>>
>>> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
>>> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
>>
>> Since it seems to be the trigger, what are the contents of the restore script?
>>
>>>
>>> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.
>>>
>>> Thanks,
>>>
>>> Nick Renders
>>>
>>
>>
>> -- 
>> Adrian Klaver
>> adrian.klaver@aklaver.com
> 
> 
> 
>> ...how are A and B connected?
> 
> The 2 cluster are not connected. They run on the same macOS 14 machine with a single Postgres installation (
/Library/PostgreSQL/16/) and their respective Data folders are located on the same volume (
/Volumes/Postgres_Data/PostgreSQL/16/dataand /Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run
independentlyon 2 different ports, specified in the postgresql.conf.
 
> 
> 
>> ...run them under different users on the system.
> 
> Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?
> 
> 
>> ...what are the contents of the restore script?
> 
> ## stop cluster A
> ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data stop'
> 
> ## save config files (ARC_postgresql_16.conf is included in postgresql.conf and contains cluster-specific information
likethe port number)
 
> ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp ARC_postgresql_16.conf
../ARC_postgresql_16.conf'
> ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp pg_hba.conf ../pg_hba.conf'
> 
> ## clear data directory
> ssh postgres@10.0.0.1 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*'
> 
> ## transfer recovery (this will copy the backup "20240312T040106" and any lingering WAL files into the Data folder)
> barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106
/Volumes/Postgres_Data/PostgreSQL/16/data
> 
> ## restore config files
> ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv ARC_postgresql_16.conf
/Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf'
> ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv pg_hba.conf
/Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf'
> 
> ## start cluster A
> ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data start >
/dev/null'
> 
> 
> This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even
thoughthe issue is most likely related to the script, it does not cause it every time.
 

I'm not seeing anything obvious, caveat I'm on my first cup of coffee.

 From your first post:

2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file 
"postmaster.pid": Operation not permitted; continuing anyway

For now the only suggestion I have is note the presence, ownership and 
privileges of the above files in the present working setup. Then when it 
fails do the same and see if there is a difference. My hunch it is in 
this step:

barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 
20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data

If not the step itself then in the process that creates 20240312T040106.

> 
> 
> Best regards,
> 
> Nick Renders
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Greetings,

* Nick Renders (postgres@arcict.com) wrote:
> > ...run them under different users on the system.
>
> Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?

Yes, two separate MacOS user accounts is what I was suggesting.  You
could use the same postgres binaries though, no need to have two
installation of them.  You'd need seperate data directories, of course,
as you have currently.

> This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even
thoughthe issue is most likely related to the script, it does not cause it every time. 

Seems likely that it's some sort of race condition.

Thanks,

Stephen

Attachment
On 13 Mar 2024, at 12:35, Stephen Frost wrote:

> Greetings,
>
> * Nick Renders (postgres@arcict.com) wrote:
>>> ...run them under different users on the system.
>>
>> Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?
>
> Yes, two separate MacOS user accounts is what I was suggesting.  You
> could use the same postgres binaries though, no need to have two
> installation of them.  You'd need seperate data directories, of course,
> as you have currently.
>
>> This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even
thoughthe issue is most likely related to the script, it does not cause it every time. 
>
> Seems likely that it's some sort of race condition.
>
> Thanks,
>
> Stephen

We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2.
This one only has a single Data directory, so there are no multiple instances running.

I installed Postgres yesterday and restored a copy from our live database in the Data directory. The Postgres process
startedup without problems, but after 40 minutes it started throwing the same errors in the log: 

    2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
    2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
    2024-03-21 11:50:07.398 CET [965] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing
anyway

I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again. I
wasn'tdoing anything on the machine at that time, so I have no idea what might have triggered it. 

Is there perhaps some feature that I can enable that logs which processes use these 2 files?

Thanks,

Nick Renders



On 3/22/24 07:01, Nick Renders wrote:
> On 13 Mar 2024, at 12:35, Stephen Frost wrote:

> We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL
16.2.
> This one only has a single Data directory, so there are no multiple instances running.
> 
> I installed Postgres yesterday and restored a copy from our live database in the Data directory. The Postgres process
startedup without problems, but after 40 minutes it started throwing the same errors in the log:
 
> 
>     2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
>     2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
>     2024-03-21 11:50:07.398 CET [965] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing
anyway
> 
> I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again.
Iwasn't doing anything on the machine at that time, so I have no idea what might have triggered it.
 

Have you looked at the OS system logs?

> 
> Is there perhaps some feature that I can enable that logs which processes use these 2 files?
> 
> Thanks,
> 
> Nick Renders
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com





On Fri, 22 Mar 2024 at 15:01, Nick Renders <postgres@arcict.com> wrote:

We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2.
This one only has a single Data directory, so there are no multiple instances running.

I don't think that having a single Data directory prevents multiple instances from running. That's more of a matter of how often pg_ctl was called with the start command for that particular data directory.
 
I installed Postgres yesterday and restored a copy from our live database in the Data directory.

How did you restore that copy? Was that a file-based copy perhaps? Your files may have incorrect owners or permissions in that case.
 
The Postgres process started up without problems, but after 40 minutes it started throwing the same errors in the log:

        2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
        2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
        2024-03-21 11:50:07.398 CET [965] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing anyway

It's possible that some other process put a lock on these files. Spotlight perhaps? Or TimeMachine?
 
I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again. I wasn't doing anything on the machine at that time, so I have no idea what might have triggered it.

Is there perhaps some feature that I can enable that logs which processes use these 2 files?

IIRC, MacOS comes shipped with the lsof command, which will tell you which processes have a given file open. See man lsof.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

On 22 Mar 2024, at 17:00, Alban Hertroys wrote:


On Fri, 22 Mar 2024 at 15:01, Nick Renders <postgres@arcict.com> wrote:

We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2.
This one only has a single Data directory, so there are no multiple instances running.

I don't think that having a single Data directory prevents multiple instances from running. That's more of a matter of how often pg_ctl was called with the start command for that particular data directory.
 
I installed Postgres yesterday and restored a copy from our live database in the Data directory.

How did you restore that copy? Was that a file-based copy perhaps? Your files may have incorrect owners or permissions in that case.
 
The Postgres process started up without problems, but after 40 minutes it started throwing the same errors in the log:

        2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
        2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
        2024-03-21 11:50:07.398 CET [965] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing anyway

It's possible that some other process put a lock on these files. Spotlight perhaps? Or TimeMachine?
 
I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again. I wasn't doing anything on the machine at that time, so I have no idea what might have triggered it.

Is there perhaps some feature that I can enable that logs which processes use these 2 files?

IIRC, MacOS comes shipped with the lsof command, which will tell you which processes have a given file open. See man lsof.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

I have tried the lsof command, but it returns no info about the postmaster.pid and global/pg_filenode.map files, so I take they are not open at that moment.

Spotlight indexing has been disabled, and TimeMachine takes no snapshots of the volume where the data resides.

Looking at the 2 machines that are having this issue (and the others that don't), I think it is somehow related to the following setup:
- macOS Sonoma (14.4 and 14.4.1)
- data directory on an external drive

That external drive (a Promise RAID system in one case, a simple SSD in the other) has the option "ignore ownership" on by default. I have tried disabling that, and updating the data directory to have owner + read/write access for the postgres user. It seemed to work at first, but just now the issue re-appeared again.

Any other suggestions?

Thanks,

Nick Renders

On Fri, Mar 29, 2024 at 4:47 AM Nick Renders <postgres@arcict.com> wrote:
> Looking at the 2 machines that are having this issue (and the others that don't), I think it is somehow related to
thefollowing setup: 
> - macOS Sonoma (14.4 and 14.4.1)
> - data directory on an external drive
>
> That external drive (a Promise RAID system in one case, a simple SSD in the other) has the option "ignore ownership"
onby default. I have tried disabling that, and updating the data directory to have owner + read/write access for the
postgresuser. It seemed to work at first, but just now the issue re-appeared again. 
>
> Any other suggestions?

I don't have any specific ideas and I have no idea what "ignore
ownership" means ... what kind of filesystem is running on it?  For
the simple SSD, is it directly connected, running a normal Apple APFS
filesystem, or something more complicated?

I wonder if this could be related to the change in 16 which started to
rename that file:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d8cd0c6c95c0120168df93aae095df4e0682a08a

Did you ever run 15 or earlier on that system?



On 29 Mar 2024, at 4:25, Thomas Munro wrote:
>
> I don't have any specific ideas and I have no idea what "ignore
> ownership" means ... what kind of filesystem is running on it?  For
> the simple SSD, is it directly connected, running a normal Apple APFS
> filesystem, or something more complicated?
>
> I wonder if this could be related to the change in 16 which started to
> rename that file:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d8cd0c6c95c0120168df93aae095df4e0682a08a
>
> Did you ever run 15 or earlier on that system?


In the macOS Finder, when you show the Info (command+i) for an external drive (or any partition that is not the boot
drive),there is a checkbox "Ignore ownership on this volume" in the Permissions section. I think it is by default "on"
forexternal drives. 

The external SSD is an Orico drive that is connected with USB-C. It is initialised as a GUID Partition Map with a
singleAFPS partition. 

We have run PostgreSQL 15 and earlier, before upgrading to 16 when it came out last year. We didn't have any problems
with16 until recently, after upgrading to Sonoma. 


Nick



On Thu, Apr 4, 2024 at 3:11 AM Nick Renders <postgres@arcict.com> wrote:
> In the macOS Finder, when you show the Info (command+i) for an external drive (or any partition that is not the boot
drive),there is a checkbox "Ignore ownership on this volume" in the Permissions section. I think it is by default "on"
forexternal drives. 

Hmm.  Sounds suspicious, but why would only this file be affected?

> The external SSD is an Orico drive that is connected with USB-C. It is initialised as a GUID Partition Map with a
singleAFPS partition. 
>
> We have run PostgreSQL 15 and earlier, before upgrading to 16 when it came out last year. We didn't have any problems
with16 until recently, after upgrading to Sonoma. 

Interesting.  So the rename might have something to do with it, though
I don't have a theory for how,.

Can you show what the permissions and ownership looks like for pg_*
under there, normally, and once the system reaches this state?
Something like:

tmunro@phonebox postgresql % ls -slap pgdata/global/pg_*
16 -rw-------  1 tmunro  staff  8192  4 Apr 09:50 pgdata/global/pg_control
 8 -rw-------  1 tmunro  staff   524  4 Apr 09:50 pgdata/global/pg_filenode.map

I'm asking for "pg_*" because I want to see pg_control as well, to
understand the permissions for the other files in the cluster, and
because I want to see if there are any stray remnants of a temporary
file, which would be called pg_filenode.map.tmp.



On Sat, Mar 23, 2024 at 3:01 AM Nick Renders <postgres@arcict.com> wrote:
> We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL
16.2.
> This one only has a single Data directory, so there are no multiple instances running.

BTW if you're running databases on mains-powered Macs, I have a patch
that you might be interested in, which so far hasn't attracted any
reviews.  The short version is that I bet you can at least lose many
seconds of commits (because WAL doesn't really hit durable part of
disk), and possibly also fail to recover (pg_control hits disk before
WAL, not sure if this is really possible), if you yank the power and
you're using the default settings for wal_sync_method.  I'd like to
rationalise the settings for that stuff and make it safe by default.

I don't know anything about the USB storage pathway but I'd be
surprised if it's different.

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BF0EL4Up6yVYbbcWse4xKaqW4wc2xpw67Pq9FjmByWVg%40mail.gmail.com



Hi Nick

On Thu, Apr 25, 2024 at 12:56 PM Nick Renders <postgres@arcict.com> wrote:
Hello,

We have a Postgres server that intermittently logs the following:

2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing anyway
2024-02-26 10:30:11.149 CET [63975] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:35.941 CET [63986] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:41.546 CET [63991] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:44.398 CET [63994] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:31:11.149 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing anyway
2024-02-26 10:31:11.151 CET [64008] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:31:41.546 CET [64023] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:32:11.150 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing anyway
2024-02-26 10:32:11.153 CET [64035] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:32:41.547 CET [64050] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:33:11.151 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing anyway
2024-02-26 10:33:11.153 CET [64062] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:33:41.548 CET [64087] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted


This has happened 3 times over the last 2 weeks now, without any indication what caused it.
The privileges of those 2 files are all in order.
When this happens, the server is no longer accessible, and we need to restart the service (pg_ctl restart).
Once restarted, Popstgres runs fine again for a couple of days.
Hi
 
This issue seems to be related to the permission issue, please check your script if it's changing the permissions.
 
Regards
Kashif Zeeshan
Bitnine Global

We are running PostgreSQL 16.2 on macOS 14.3.1.

Any idea what might be causing this issue, or how to resolve it?


Best regards,

Nick Renders