Thread: could not open file "global/pg_filenode.map": Operation not permitted
Hello, We have a Postgres server that intermittently logs the following: 2024-02-26 10:29:41.580 CET [63962] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:30:11.147 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway 2024-02-26 10:30:11.149 CET [63975] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:30:35.941 CET [63986] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:30:41.546 CET [63991] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:30:44.398 CET [63994] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:31:11.149 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway 2024-02-26 10:31:11.151 CET [64008] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:31:41.546 CET [64023] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:32:11.150 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway 2024-02-26 10:32:11.153 CET [64035] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:32:41.547 CET [64050] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:33:11.151 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway 2024-02-26 10:33:11.153 CET [64062] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:33:41.548 CET [64087] FATAL: could not open file "global/pg_filenode.map": Operation not permitted This has happened 3 times over the last 2 weeks now, without any indication what caused it. The privileges of those 2 files are all in order. When this happens, the server is no longer accessible, and we need to restart the service (pg_ctl restart). Once restarted, Popstgres runs fine again for a couple of days. We are running PostgreSQL 16.2 on macOS 14.3.1. Any idea what might be causing this issue, or how to resolve it? Best regards, Nick Renders
On Mon, 2024-02-26 at 15:14 +0100, Nick Renders wrote: > We have a Postgres server that intermittently logs the following: > > 2024-02-26 10:29:41.580 CET [63962] FATAL: could not open file "global/pg_filenode.map": Operation not permitted > 2024-02-26 10:30:11.147 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway > > This has happened 3 times over the last 2 weeks now, without any indication what caused it. > The privileges of those 2 files are all in order. > When this happens, the server is no longer accessible, and we need to restart the service (pg_ctl restart). > Once restarted, Popstgres runs fine again for a couple of days. > > We are running PostgreSQL 16.2 on macOS 14.3.1. Perhaps that is some kind of virus checker or something else that locks files. Yours, Laurenz Albe
Thank you for your reply Laurenz. I don't think it is related to any third party security software. We have several other machines with a similar setup, butthis is the only server that has this issue. The one thing different about this machine however, is that it runs 2 instances of Postgres: - cluster A on port 165 - cluster B on port 164 Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means thatwe login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restorethe latest back into the Data folder, and start up the service again. Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences the intermittent"operation not permitted" issue. Over the past 2 weeks, I have suspended our restore script and the issue did not occur. I have just performed another restore on cluster A and now cluster B is throwing errors in the log again. Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway. Thanks, Nick Renders On 26 Feb 2024, at 16:29, Laurenz Albe wrote: > On Mon, 2024-02-26 at 15:14 +0100, Nick Renders wrote: >> We have a Postgres server that intermittently logs the following: >> >> 2024-02-26 10:29:41.580 CET [63962] FATAL: could not open file "global/pg_filenode.map": Operation not permitted >> 2024-02-26 10:30:11.147 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway >> >> This has happened 3 times over the last 2 weeks now, without any indication what caused it. >> The privileges of those 2 files are all in order. >> When this happens, the server is no longer accessible, and we need to restart the service (pg_ctl restart). >> Once restarted, Popstgres runs fine again for a couple of days. >> >> We are running PostgreSQL 16.2 on macOS 14.3.1. > > Perhaps that is some kind of virus checker or something else that locks files. > > Yours, > Laurenz Albe
On Mon, 2024-03-11 at 11:11 +0100, Nick Renders wrote: > We have several other machines with a similar setup, but this is the only server that has this issue. > > [...] Cluster A is actually a backup from another Postgres server that is restored on a > daily basis via Barman. This means that we login remotely from the Barman server over SSH, > stop cluster A's service (port 165), clear the Data folder, restore the latest back into > the Data folder, and start up the service again. > Cluster B's Data and service (port 164) remain untouched during all this time. This is > the cluster that experiences the intermittent "operation not permitted" issue. > > Over the past 2 weeks, I have suspended our restore script and the issue did not occur. > I have just performed another restore on cluster A and now cluster B is throwing errors in the log again. > > Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway. I don't know Barman, but with that incomplete description anybody will have problems determining the cause. For example, how are A and B connected? Yours, Laurenz Albe
Greetings, * Nick Renders (postgres@arcict.com) wrote: > The one thing different about this machine however, is that it runs 2 instances of Postgres: > - cluster A on port 165 > - cluster B on port 164 > Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means thatwe login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restorethe latest back into the Data folder, and start up the service again. > Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences theintermittent "operation not permitted" issue. > > Over the past 2 weeks, I have suspended our restore script and the issue did not occur. > I have just performed another restore on cluster A and now cluster B is throwing errors in the log again. > > Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway. Not sure why it's happening but they certainly sound related based on the correlation. One thing that I suggest doing when you're running multiple independent PG clusters on the same host is to run them under different users on the system. Perhaps if you move cluster B to a distinct user account, you'll have better luck figuring out what's going on since something will presumably start throwing permission denied errors. Not sure if it's an option or not, but you might also consider using Linux instead of MacOS.. Thanks, Stephen
Attachment
On 3/11/24 03:11, Nick Renders wrote: > Thank you for your reply Laurenz. > I don't think it is related to any third party security software. We have several other machines with a similar setup,but this is the only server that has this issue. > > The one thing different about this machine however, is that it runs 2 instances of Postgres: > - cluster A on port 165 > - cluster B on port 164 > Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means thatwe login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restorethe latest back into the Data folder, and start up the service again. > Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences theintermittent "operation not permitted" issue. > > Over the past 2 weeks, I have suspended our restore script and the issue did not occur. > I have just performed another restore on cluster A and now cluster B is throwing errors in the log again. Since it seems to be the trigger, what are the contents of the restore script? > > Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway. > > Thanks, > > Nick Renders > -- Adrian Klaver adrian.klaver@aklaver.com
On 11 Mar 2024, at 16:04, Adrian Klaver wrote: > On 3/11/24 03:11, Nick Renders wrote: >> Thank you for your reply Laurenz. >> I don't think it is related to any third party security software. We have several other machines with a similar setup,but this is the only server that has this issue. >> >> The one thing different about this machine however, is that it runs 2 instances of Postgres: >> - cluster A on port 165 >> - cluster B on port 164 >> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means thatwe login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restorethe latest back into the Data folder, and start up the service again. >> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences theintermittent "operation not permitted" issue. >> >> Over the past 2 weeks, I have suspended our restore script and the issue did not occur. >> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again. > > Since it seems to be the trigger, what are the contents of the restore script? > >> >> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway. >> >> Thanks, >> >> Nick Renders >> > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > ...how are A and B connected? The 2 cluster are not connected. They run on the same macOS 14 machine with a single Postgres installation ( /Library/PostgreSQL/16/) and their respective Data folders are located on the same volume ( /Volumes/Postgres_Data/PostgreSQL/16/dataand /Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run independentlyon 2 different ports, specified in the postgresql.conf. > ...run them under different users on the system. Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories? > ...what are the contents of the restore script? ## stop cluster A ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data stop' ## save config files (ARC_postgresql_16.conf is included in postgresql.conf and contains cluster-specific information likethe port number) ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp ARC_postgresql_16.conf ../ARC_postgresql_16.conf' ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp pg_hba.conf ../pg_hba.conf' ## clear data directory ssh postgres@10.0.0.1 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*' ## transfer recovery (this will copy the backup "20240312T040106" and any lingering WAL files into the Data folder) barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data ## restore config files ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv ARC_postgresql_16.conf /Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf' ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv pg_hba.conf /Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf' ## start cluster A ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data start > /dev/null' This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even thoughthe issue is most likely related to the script, it does not cause it every time. Best regards, Nick Renders
On 3/12/24 02:57, Nick Renders wrote: > On 11 Mar 2024, at 16:04, Adrian Klaver wrote: > >> On 3/11/24 03:11, Nick Renders wrote: >>> Thank you for your reply Laurenz. >>> I don't think it is related to any third party security software. We have several other machines with a similar setup,but this is the only server that has this issue. >>> >>> The one thing different about this machine however, is that it runs 2 instances of Postgres: >>> - cluster A on port 165 >>> - cluster B on port 164 >>> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This meansthat we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder,restore the latest back into the Data folder, and start up the service again. >>> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences theintermittent "operation not permitted" issue. >>> >>> Over the past 2 weeks, I have suspended our restore script and the issue did not occur. >>> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again. >> >> Since it seems to be the trigger, what are the contents of the restore script? >> >>> >>> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway. >>> >>> Thanks, >>> >>> Nick Renders >>> >> >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com > > > >> ...how are A and B connected? > > The 2 cluster are not connected. They run on the same macOS 14 machine with a single Postgres installation ( /Library/PostgreSQL/16/) and their respective Data folders are located on the same volume ( /Volumes/Postgres_Data/PostgreSQL/16/dataand /Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run independentlyon 2 different ports, specified in the postgresql.conf. > > >> ...run them under different users on the system. > > Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories? > > >> ...what are the contents of the restore script? > > ## stop cluster A > ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data stop' > > ## save config files (ARC_postgresql_16.conf is included in postgresql.conf and contains cluster-specific information likethe port number) > ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp ARC_postgresql_16.conf ../ARC_postgresql_16.conf' > ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp pg_hba.conf ../pg_hba.conf' > > ## clear data directory > ssh postgres@10.0.0.1 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*' > > ## transfer recovery (this will copy the backup "20240312T040106" and any lingering WAL files into the Data folder) > barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data > > ## restore config files > ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv ARC_postgresql_16.conf /Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf' > ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv pg_hba.conf /Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf' > > ## start cluster A > ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data start > /dev/null' > > > This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even thoughthe issue is most likely related to the script, it does not cause it every time. I'm not seeing anything obvious, caveat I'm on my first cup of coffee. From your first post: 2024-02-26 10:29:41.580 CET [63962] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-02-26 10:30:11.147 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway For now the only suggestion I have is note the presence, ownership and privileges of the above files in the present working setup. Then when it fails do the same and see if there is a difference. My hunch it is in this step: barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data If not the step itself then in the process that creates 20240312T040106. > > > Best regards, > > Nick Renders > > > -- Adrian Klaver adrian.klaver@aklaver.com
Greetings, * Nick Renders (postgres@arcict.com) wrote: > > ...run them under different users on the system. > > Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories? Yes, two separate MacOS user accounts is what I was suggesting. You could use the same postgres binaries though, no need to have two installation of them. You'd need seperate data directories, of course, as you have currently. > This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even thoughthe issue is most likely related to the script, it does not cause it every time. Seems likely that it's some sort of race condition. Thanks, Stephen
Attachment
On 13 Mar 2024, at 12:35, Stephen Frost wrote: > Greetings, > > * Nick Renders (postgres@arcict.com) wrote: >>> ...run them under different users on the system. >> >> Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories? > > Yes, two separate MacOS user accounts is what I was suggesting. You > could use the same postgres binaries though, no need to have two > installation of them. You'd need seperate data directories, of course, > as you have currently. > >> This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even thoughthe issue is most likely related to the script, it does not cause it every time. > > Seems likely that it's some sort of race condition. > > Thanks, > > Stephen We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2. This one only has a single Data directory, so there are no multiple instances running. I installed Postgres yesterday and restored a copy from our live database in the Data directory. The Postgres process startedup without problems, but after 40 minutes it started throwing the same errors in the log: 2024-03-21 11:49:27.410 CET [1655] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-03-21 11:49:46.955 CET [1760] FATAL: could not open file "global/pg_filenode.map": Operation not permitted 2024-03-21 11:50:07.398 CET [965] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again. I wasn'tdoing anything on the machine at that time, so I have no idea what might have triggered it. Is there perhaps some feature that I can enable that logs which processes use these 2 files? Thanks, Nick Renders
On 3/22/24 07:01, Nick Renders wrote: > On 13 Mar 2024, at 12:35, Stephen Frost wrote: > We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2. > This one only has a single Data directory, so there are no multiple instances running. > > I installed Postgres yesterday and restored a copy from our live database in the Data directory. The Postgres process startedup without problems, but after 40 minutes it started throwing the same errors in the log: > > 2024-03-21 11:49:27.410 CET [1655] FATAL: could not open file "global/pg_filenode.map": Operation not permitted > 2024-03-21 11:49:46.955 CET [1760] FATAL: could not open file "global/pg_filenode.map": Operation not permitted > 2024-03-21 11:50:07.398 CET [965] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway > > I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again. Iwasn't doing anything on the machine at that time, so I have no idea what might have triggered it. Have you looked at the OS system logs? > > Is there perhaps some feature that I can enable that logs which processes use these 2 files? > > Thanks, > > Nick Renders > > -- Adrian Klaver adrian.klaver@aklaver.com
We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2.
This one only has a single Data directory, so there are no multiple instances running.
I installed Postgres yesterday and restored a copy from our live database in the Data directory.
The Postgres process started up without problems, but after 40 minutes it started throwing the same errors in the log:
2024-03-21 11:49:27.410 CET [1655] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-03-21 11:49:46.955 CET [1760] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-03-21 11:50:07.398 CET [965] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway
I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again. I wasn't doing anything on the machine at that time, so I have no idea what might have triggered it.
Is there perhaps some feature that I can enable that logs which processes use these 2 files?
--
Cut the trees and you'll see there is no forest.
On 22 Mar 2024, at 17:00, Alban Hertroys wrote:
On Fri, 22 Mar 2024 at 15:01, Nick Renders <postgres@arcict.com> wrote:
We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2.
This one only has a single Data directory, so there are no multiple instances running.I don't think that having a single Data directory prevents multiple instances from running. That's more of a matter of how often pg_ctl was called with the start command for that particular data directory.I installed Postgres yesterday and restored a copy from our live database in the Data directory.How did you restore that copy? Was that a file-based copy perhaps? Your files may have incorrect owners or permissions in that case.The Postgres process started up without problems, but after 40 minutes it started throwing the same errors in the log:
2024-03-21 11:49:27.410 CET [1655] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-03-21 11:49:46.955 CET [1760] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-03-21 11:50:07.398 CET [965] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anywayIt's possible that some other process put a lock on these files. Spotlight perhaps? Or TimeMachine?I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again. I wasn't doing anything on the machine at that time, so I have no idea what might have triggered it.
Is there perhaps some feature that I can enable that logs which processes use these 2 files?IIRC, MacOS comes shipped with the lsof command, which will tell you which processes have a given file open. See man lsof.
--If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.
I have tried the lsof command, but it returns no info about the postmaster.pid and global/pg_filenode.map files, so I take they are not open at that moment.
Spotlight indexing has been disabled, and TimeMachine takes no snapshots of the volume where the data resides.
Looking at the 2 machines that are having this issue (and the others that don't), I think it is somehow related to the following setup:
- macOS Sonoma (14.4 and 14.4.1)
- data directory on an external drive
That external drive (a Promise RAID system in one case, a simple SSD in the other) has the option "ignore ownership" on by default. I have tried disabling that, and updating the data directory to have owner + read/write access for the postgres user. It seemed to work at first, but just now the issue re-appeared again.
Any other suggestions?
Thanks,
Nick Renders
On Fri, Mar 29, 2024 at 4:47 AM Nick Renders <postgres@arcict.com> wrote: > Looking at the 2 machines that are having this issue (and the others that don't), I think it is somehow related to thefollowing setup: > - macOS Sonoma (14.4 and 14.4.1) > - data directory on an external drive > > That external drive (a Promise RAID system in one case, a simple SSD in the other) has the option "ignore ownership" onby default. I have tried disabling that, and updating the data directory to have owner + read/write access for the postgresuser. It seemed to work at first, but just now the issue re-appeared again. > > Any other suggestions? I don't have any specific ideas and I have no idea what "ignore ownership" means ... what kind of filesystem is running on it? For the simple SSD, is it directly connected, running a normal Apple APFS filesystem, or something more complicated? I wonder if this could be related to the change in 16 which started to rename that file: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d8cd0c6c95c0120168df93aae095df4e0682a08a Did you ever run 15 or earlier on that system?
On 29 Mar 2024, at 4:25, Thomas Munro wrote: > > I don't have any specific ideas and I have no idea what "ignore > ownership" means ... what kind of filesystem is running on it? For > the simple SSD, is it directly connected, running a normal Apple APFS > filesystem, or something more complicated? > > I wonder if this could be related to the change in 16 which started to > rename that file: > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d8cd0c6c95c0120168df93aae095df4e0682a08a > > Did you ever run 15 or earlier on that system? In the macOS Finder, when you show the Info (command+i) for an external drive (or any partition that is not the boot drive),there is a checkbox "Ignore ownership on this volume" in the Permissions section. I think it is by default "on" forexternal drives. The external SSD is an Orico drive that is connected with USB-C. It is initialised as a GUID Partition Map with a singleAFPS partition. We have run PostgreSQL 15 and earlier, before upgrading to 16 when it came out last year. We didn't have any problems with16 until recently, after upgrading to Sonoma. Nick
On Thu, Apr 4, 2024 at 3:11 AM Nick Renders <postgres@arcict.com> wrote: > In the macOS Finder, when you show the Info (command+i) for an external drive (or any partition that is not the boot drive),there is a checkbox "Ignore ownership on this volume" in the Permissions section. I think it is by default "on" forexternal drives. Hmm. Sounds suspicious, but why would only this file be affected? > The external SSD is an Orico drive that is connected with USB-C. It is initialised as a GUID Partition Map with a singleAFPS partition. > > We have run PostgreSQL 15 and earlier, before upgrading to 16 when it came out last year. We didn't have any problems with16 until recently, after upgrading to Sonoma. Interesting. So the rename might have something to do with it, though I don't have a theory for how,. Can you show what the permissions and ownership looks like for pg_* under there, normally, and once the system reaches this state? Something like: tmunro@phonebox postgresql % ls -slap pgdata/global/pg_* 16 -rw------- 1 tmunro staff 8192 4 Apr 09:50 pgdata/global/pg_control 8 -rw------- 1 tmunro staff 524 4 Apr 09:50 pgdata/global/pg_filenode.map I'm asking for "pg_*" because I want to see pg_control as well, to understand the permissions for the other files in the cluster, and because I want to see if there are any stray remnants of a temporary file, which would be called pg_filenode.map.tmp.
On Sat, Mar 23, 2024 at 3:01 AM Nick Renders <postgres@arcict.com> wrote: > We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2. > This one only has a single Data directory, so there are no multiple instances running. BTW if you're running databases on mains-powered Macs, I have a patch that you might be interested in, which so far hasn't attracted any reviews. The short version is that I bet you can at least lose many seconds of commits (because WAL doesn't really hit durable part of disk), and possibly also fail to recover (pg_control hits disk before WAL, not sure if this is really possible), if you yank the power and you're using the default settings for wal_sync_method. I'd like to rationalise the settings for that stuff and make it safe by default. I don't know anything about the USB storage pathway but I'd be surprised if it's different. https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BF0EL4Up6yVYbbcWse4xKaqW4wc2xpw67Pq9FjmByWVg%40mail.gmail.com
Hello,
We have a Postgres server that intermittently logs the following:
2024-02-26 10:29:41.580 CET [63962] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:11.147 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway
2024-02-26 10:30:11.149 CET [63975] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:35.941 CET [63986] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:41.546 CET [63991] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:44.398 CET [63994] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:31:11.149 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway
2024-02-26 10:31:11.151 CET [64008] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:31:41.546 CET [64023] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:32:11.150 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway
2024-02-26 10:32:11.153 CET [64035] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:32:41.547 CET [64050] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:33:11.151 CET [90610] LOG: could not open file "postmaster.pid": Operation not permitted; continuing anyway
2024-02-26 10:33:11.153 CET [64062] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
2024-02-26 10:33:41.548 CET [64087] FATAL: could not open file "global/pg_filenode.map": Operation not permitted
This has happened 3 times over the last 2 weeks now, without any indication what caused it.
The privileges of those 2 files are all in order.
When this happens, the server is no longer accessible, and we need to restart the service (pg_ctl restart).
Once restarted, Popstgres runs fine again for a couple of days.
We are running PostgreSQL 16.2 on macOS 14.3.1.
Any idea what might be causing this issue, or how to resolve it?
Best regards,
Nick Renders