Thread: PostgreSQL occasionally unable to rename WAL files (NTFS)

PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Guy Burgess
Date:

Hello,

Running 13.1 on Windows Server 2019, I am getting the following log entries occasionally:

    2021-02-11 12:34:10.149 NZDT [6072] LOG:  could not rename file "pg_wal/0000000100000099000000D3": Permission denied
    2021-02-11 12:40:31.377 NZDT [6072] LOG:  could not rename file "pg_wal/0000000100000099000000D3": Permission denied
    2021-02-11 12:46:06.294 NZDT [6072] LOG:  could not rename file "pg_wal/0000000100000099000000D3": Permission denied
    2021-02-11 12:46:16.502 NZDT [6072] LOG:  could not rename file "pg_wal/0000000100000099000000DA": Permission denied
    2021-02-11 12:50:20.917 NZDT [6072] LOG:  could not rename file "pg_wal/0000000100000099000000D3": Permission denied
    2021-02-11 12:50:31.098 NZDT [6072] LOG:  could not rename file "pg_wal/0000000100000099000000DA": Permission denied

What appears to be happening is the affected WAL files (which is usually only 2 or 3 WAL files at a time) are somehow "losing" their NTFS permissions, so the PG process can't rename them - though of course the PG process created them. Even running icacls as admin gives "Access is denied" on those files. A further oddity is the affected files do end up disappearing after a while.

The NTFS permissions on the pg_wal directory are correct, and most WAL files are unaffected. Chkdsk reports no problems, and the database is working fine otherwise. Have tried disabling antivirus software in case that was doing something but no difference.

I found another recent report of similar behaviour here: https://stackoverflow.com/questions/65405479/postgresql-13-log-could-not-rename-file-pg-wal-0000000100000001000000c6

WAL config as follows:

wal_level = replica
fsync = on
synchronous_commit = on
wal_sync_method = fsync
full_page_writes = on
wal_compression = off
wal_log_hints = off
wal_init_zero = on
wal_recycle = on
wal_buffers = -1
wal_writer_delay = 200ms
wal_writer_flush_after = 1MB
wal_skip_threshold = 2MB
commit_delay = 0
commit_siblings = 5
checkpoint_timeout = 5min
max_wal_size = 2GB
min_wal_size = 256MB
checkpoint_completion_target = 0.7
checkpoint_flush_after = 0
checkpoint_warning = 30s
archive_mode = off

I'm thinking of disabling wal_recycle as a first step to see if that makes any difference, but thought I'd seek some comments first.

Not sure how much of a problem this is - the database is running fine otherwise - but any thoughts would be appreciated. 

Thanks & regards,

Guy

Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Lionel Bouton
Date:
Hi,

Le 11/02/2021 à 01:21, Guy Burgess a écrit :
>
> Hello,
>
> Running 13.1 on Windows Server 2019, I am getting the following log
> entries occasionally:
>
>     2021-02-11 12:34:10.149 NZDT [6072] LOG:  could not rename file
> "pg_wal/0000000100000099000000D3": Permission denied
>     2021-02-11 12:40:31.377 NZDT [6072] LOG:  could not rename file
> "pg_wal/0000000100000099000000D3": Permission denied
>     2021-02-11 12:46:06.294 NZDT [6072] LOG:  could not rename file
> "pg_wal/0000000100000099000000D3": Permission denied
>     2021-02-11 12:46:16.502 NZDT [6072] LOG:  could not rename file
> "pg_wal/0000000100000099000000DA": Permission denied
>     2021-02-11 12:50:20.917 NZDT [6072] LOG:  could not rename file
> "pg_wal/0000000100000099000000D3": Permission denied
>     2021-02-11 12:50:31.098 NZDT [6072] LOG:  could not rename file
> "pg_wal/0000000100000099000000DA": Permission denied
>
> What appears to be happening is the affected WAL files (which is
> usually only 2 or 3 WAL files at a time) are somehow "losing" their
> NTFS permissions, so the PG process can't rename them - though of
> course the PG process created them. Even running icacls as admin gives
> "Access is denied" on those files. A further oddity is the affected
> files do end up disappearing after a while.
>
> The NTFS permissions on the pg_wal directory are correct, and most WAL
> files are unaffected. Chkdsk reports no problems, and the database is
> working fine otherwise. Have tried disabling antivirus software in
> case that was doing something but no difference.
>

I haven't dealt with a Windows environment for quite some time, but from
what I remember an antivirus installs a driver intercepting file
accesses and these drivers are still active even if you disable the
antivirus (I suppose they just call a noop instead of content analysis
code) and can still interfere with your system. For example some years
ago I've seen what looked like a race condition involving rename for
MySQL on Windows that could not be fixed by disabling the antivirus but
could by uninstalling it completely.

You might want to uninstall the antivirus temporarily to check this.

Best regards,

-- 
Lionel Bouton
gérant de JTEK SARL
https://www.linkedin.com/in/lionelbouton/




Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Thorsten Schöning
Date:
Guten Tag Guy Burgess,
am Donnerstag, 11. Februar 2021 um 01:21 schrieben Sie:

> What appears to be happening is the affected WAL files (which is
> usually only 2 or 3 WAL files at a time) are somehow "losing" their
> NTFS permissions, so the PG process can't rename them - though of
> course the PG process created them. Even running icacls as admin
> gives "Access is denied" on those files. A further oddity is the
> affected files do end up disappearing after a while.

If you see that somewhat frequently, use Process Monitor and Process
Explorer to see who accesses those files how. ProcExp easily allows
you to find all open handles per file. If it's not AV, it might be
something like Windows Search Indexer as well, if that is enabled by
default in Server 2019 at all.

Though, even with my Windows 10 and Search Indexer enabled I didn't
run into such problems yet. And especially when rewriting large parts
of my databases with lots of created WAL files, I see the Indexer
working on those files, but NOT conflicting with Postgres yet.

The behaviour you describe happens exactly when two processes e.g.
concurrently hold HANDLEs on the same file and one of those deletes
the file then. Windows keeps file names until all open HANDLEs are
closed and depending on how those HANDLEs have been opened by the
first app, concurrent deletion is perferctly fine for Windows.

Though, a such deleted file can't be opened easily anymore and looks
like it has lost permissions only. But that's not the case, it's
deleted already. It might be that this happens for Postgres to itself
somehow when some other app has an open HANDLE. I don't think that
some other app is deleting that file by purpose instead, reading it
for some reason seems more likely to me.

> dwShareMode
> FILE_SHARE_DELETE

> Enables subsequent open operations on a file or device to request
> delete access. Otherwise, other processes cannot open the file or
> device if they request delete access.

https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska









Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Guy Burgess
Date:
On 12/02/2021 12:31 am, Lionel Bouton wrote:
> I haven't dealt with a Windows environment for quite some time, but from
> what I remember an antivirus installs a driver intercepting file
> accesses and these drivers are still active even if you disable the
> antivirus (I suppose they just call a noop instead of content analysis
> code) and can still interfere with your system. For example some years
> ago I've seen what looked like a race condition involving rename for
> MySQL on Windows that could not be fixed by disabling the antivirus but
> could by uninstalling it completely.
>
> You might want to uninstall the antivirus temporarily to check this.

Thanks Lionel for this suggestion. Its the built-in Windows Defender AV 
which I believe can't be uninstalled, so am limited to 'disabling' it 
(plus it has exclusions for the PG directories & processes already). 
Using Procmon I can't see any AV file activity when it is disabled. Will 
keep monitoring it though.

Kind regards

Guy




Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Guy Burgess
Date:
On 12/02/2021 4:33 am, Thorsten Schöning wrote:
> If you see that somewhat frequently, use Process Monitor and Process
> Explorer to see who accesses those files how. ProcExp easily allows
> you to find all open handles per file. If it's not AV, it might be
> something like Windows Search Indexer as well, if that is enabled by
> default in Server 2019 at all.
>
Thanks Thorsten, that's a great idea - I'm using Procmon and Process 
Explorer and able to monitor activity on the WAL files, so hopefully 
that sheds some light. The Indexer service is disabled, so can rule that 
out.

Kind regards

Guy




Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Guy Burgess
Date:
On 12/02/2021 4:33 am, Thorsten Schöning wrote:
The behaviour you describe happens exactly when two processes e.g.
concurrently hold HANDLEs on the same file and one of those deletes
the file then. Windows keeps file names until all open HANDLEs are
closed and depending on how those HANDLEs have been opened by the
first app, concurrent deletion is perferctly fine for Windows.

Though, a such deleted file can't be opened easily anymore and looks
like it has lost permissions only. But that's not the case, it's
deleted already. It might be that this happens for Postgres to itself
somehow when some other app has an open HANDLE. I don't think that
some other app is deleting that file by purpose instead, reading it
for some reason seems more likely to me.

Using Process Monitor, Thorsten's explanation above appears to correctly diagnose what is happening. ProcMon data shows postgres.exe performing "CreateFile" operations on the affected WAL files, with the result status "DELETE PENDING". Which according to https://stackoverflow.com/a/29892104 means:

"Windows allows a process to delete a file, even though it is still opened by another process (e.g. Windows indexing service or Antivirus). It gets internally marked as "delete pending". The file does not actually get removed from the file system, it is still there after the File.Delete call. Anybody that tries to open the file after that gets an access denied error. The file doesn't actually get removed until the last handle to the file object gets closed"

which is the same behaviour Thorsten describes above (great info, thanks Thorsten).

The mystery now is that the only process logged as touching the affected WAL files is postgres.exe (of which there are many separate processes). Could it be that one of the postgres.exe instances is holding the affected WAL files in use after another postgres.exe instance has flagged the file as deleted? (or to put it the other way, a postgres.exe instance is flagging the file as deleted while another instance still has an open handle to the file)? If it is some other process such as the indexer (disabled) or AV (excluded from pgdata) is obtaining a handle on the WAL files, it isn't being logged by ProcMon.

Kind regards,

Guy

Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Thorsten Schöning
Date:
Guten Tag Guy Burgess,
am Montag, 15. Februar 2021 um 11:52 schrieben Sie:

> The mystery now is that the only process logged as touching the
> affected WAL files is postgres.exe (of which there are many separate
> processes). Could it be that one of the postgres.exe instances is
> holding the affected WAL files in use after another postgres.exe
> instance has flagged the file as deleted?[...]

I suggest checking your WAL-related and archive/backup settings for
Postgres again. There's e.g. "archive_command" optionally copying WALs
to some other place and postgres.exe would wait until that process has
finished, maybe locking the file to copy itself as well. Or
"archive_timeout" interfering with some other operations or alike.

Mit freundlichen Grüßen

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

E-Mail: Thorsten.Schoening@AM-SoFT.de
Web:    http://www.AM-SoFT.de/

Telefon: 05151-  9468- 0
Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

AM-SoFT IT-Service - Bitstore Hameln GmbH i.G., Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB neu - Geschäftsführer: Janine Galonska


Für Rückfragen stehe ich Ihnen sehr gerne zur Verfügung.

Mit freundlichen Grüßen

Thorsten Schöning


Tel: 05151 9468 0
Fax: 05151 9468 88
Mobil:
Webseite: https://www.am-soft.de

AM-Soft IT-Service - Bitstore Hameln GmbH i.G. ist ein Mitglied der Bitstore Gruppe - Ihr Full-Service-Dienstleister
fürIT und TK 

AM-Soft IT-Service - Bitstore Hameln GmbH i.G.
Brandenburger Str. 7c
31789 Hameln
Tel: 05151 9468 0

Bitstore IT-Consulting GmbH
Zentrale - Berlin Lichtenberg
Frankfurter Allee 285
10317 Berlin
Tel: 030 453 087 80

CBS IT-Service - Bitstore Kaulsdorf UG
Tel: 030 453 087 880 1

Büro Dallgow-Döberitz
Tel: 03322 507 020

Büro Kloster Lehnin
Tel: 033207 566 530

PCE IT-Service - Bitstore Darmstadt UG
Darmstadt
Tel: 06151 392 973 0

Büro Neuruppin
Tel: 033932 606 090

ACI EDV Systemhaus Dresden GmbH
Dresden
Tel: 0351 254 410

Das Systemhaus - Bitstore Magdeburg GmbH
Magdeburg
Tel: 0391 636 651 0

Allerdata.IT - Bitstore Wittenberg GmbH
Wittenberg
Tel: 03491 876 735 7

Büro Liebenwalde
Tel: 033054 810 00

HSA - das Büro - Bitstore Altenburg UG
Altenburg
Tel: 0344 784 390 97

Bitstore IT – Consulting GmbH
NL Piesteritz
Piesteritz
Tel: 03491 644 868 6

Solltec IT-Services - Bitstore Braunschweig UG
Braunschweig
Tel: 0531 206 068 0

MF Computer Service - Bitstore Gütersloh GmbH
Gütersloh
Tel: 05245 920 809 3

Firmensitz: MF Computer Service - Bitstore Gütersloh GmbH, Gildestraße 25, 33442 Herzebrock-Clarholz
Geschäftsführer Janine Galonska









Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Guy Burgess
Date:
On 16/02/2021 12:23 am, Thorsten Schöning wrote:
>> The mystery now is that the only process logged as touching the
>> affected WAL files is postgres.exe (of which there are many separate
>> processes). Could it be that one of the postgres.exe instances is
>> holding the affected WAL files in use after another postgres.exe
>> instance has flagged the file as deleted?[...]
> I suggest checking your WAL-related and archive/backup settings for
> Postgres again. There's e.g. "archive_command" optionally copying WALs
> to some other place and postgres.exe would wait until that process has
> finished, maybe locking the file to copy itself as well. Or
> "archive_timeout" interfering with some other operations or alike.

Thanks Thorsten.  The WAL archive settings are out-of-the-box defaults, 
i.e. disabled: archive_mode = off; archive_command = ''; archive_timeout 
= 0.

I'm not sure there is anything else I can check at this time. The good 
thing is it doesn't seem to cause any problem other than logging "could 
not rename file" warnings, so I might have to park this for now. If I 
find anything else that might offer a new lead I will report back.

Kind regards,

Guy




Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Thomas Kellerer
Date:
Guy Burgess schrieb am 15.02.2021 um 11:52:
> The mystery now is that the only process logged as touching the
> affected WAL files is postgres.exe (of which there are many separate
> processes). Could it be that one of the postgres.exe instances is
> holding the affected WAL files in use after another postgres.exe
> instance has flagged the file as deleted? (or to put it the other
> way, a postgres.exe instance is flagging the file as deleted while
> another instance still has an open handle to the file)? If it is some
> other process such as the indexer (disabled) or AV (excluded from
> pgdata) is obtaining a handle on the WAL files, it isn't being logged
> by ProcMon.

My first suspect is always the anti-virus on Windows when things like
that happen with Postgres.

Maybe the AV configuration was changed and pgdata is no longer excluded.

Thomas




Re: PostgreSQL occasionally unable to rename WAL files (NTFS)

From
Michael Paquier
Date:
On Tue, Feb 16, 2021 at 12:22:36PM +0100, Thomas Kellerer wrote:
> My first suspect is always the anti-virus on Windows when things like
> that happen with Postgres.

Or maybe not.  13 has introduced a regression in this area AFAIK, and
909b449 should have taken care of it (available in 13.3~).
--
Michael

Attachment