Thread: PANIC: could not write to log file {} at offset {}, length {}: Invalid argument

PANIC: could not write to log file {} at offset {}, length {}: Invalid argument

From
Shani Israeli
Date:
Hi all, 

We are running PostgreSQL v9.5.19 over Windows Server 2012 R2, 16GB RAM.
Lately, postgres started to crash (happened already 3 times ~once a month) and before its crashes I found this message in Event Log: 
PANIC:  could not write to log file {} at offset {}, length {}: Invalid argument
(so I assumed it is related).

attached is our configuration.

Any ideas about what is the problem? or anything else I need to check?

Thanks is advance, 
Shani Israeli - Software Developer
 
+972 54 6689920
sisraeli@illusivenetworks.com
www.illusivenetworks.com
Attachment

Re: PANIC: could not write to log file {} at offset {}, length {}: Invalid argument

From
Andreas Kretschmer
Date:
On 4 November 2020 11:24:03 CET, Shani Israeli <sisraeli@illusivenetworks.com> wrote:
>Hi all,
>
>We are running PostgreSQL v9.5.19 over Windows Server 2012 R2, 16GB
>RAM.
>Lately, postgres started to crash (happened already 3 times ~once a
>month)
>and before its crashes I found this message in Event Log:
>
>PANIC:  could not write to log file {} at offset {}, length {}: Invalid
>argument
>
>(so I assumed it is related).
>
>attached is our configuration.
>
>Any ideas about what is the problem? or anything else I need to check?


wild guess: Antivirus Software?



>
>Thanks is advance,
>Shani Israeli - Software Developer
>
>+972 54 6689920
>sisraeli@illusivenetworks.com
>www.illusivenetworks.com


--
2ndQuadrant - The PostgreSQL Support Company



I Could not read your config file but
What is the size of the Postgres log file?
Do you have a log rotation policy on it?
Perhaps your Postgres log level is to high or your connections are generating a lot of errors that need investigating.

Dave

On Wed, Nov 4, 2020 at 5:24 AM Shani Israeli <sisraeli@illusivenetworks.com> wrote:
Hi all, 

We are running PostgreSQL v9.5.19 over Windows Server 2012 R2, 16GB RAM.
Lately, postgres started to crash (happened already 3 times ~once a month) and before its crashes I found this message in Event Log: 
PANIC:  could not write to log file {} at offset {}, length {}: Invalid argument
(so I assumed it is related).

attached is our configuration.

Any ideas about what is the problem? or anything else I need to check?

Thanks is advance, 
Shani Israeli - Software Developer
 
+972 54 6689920
sisraeli@illusivenetworks.com
www.illusivenetworks.com


On 11/4/20 2:24 AM, Shani Israeli wrote:
> Hi all,
> 
> We are running PostgreSQL v9.5.19 over Windows Server 2012 R2, 16GB RAM.
> Lately, postgres started to crash (happened already 3 times ~once a 
> month) and before its crashes I found this message in Event Log:
> 
>     PANIC:  could not write to log file {} at offset {}, length {}:
>     Invalid argument
> 
> (so I assumed it is related).
> 
> attached is our configuration.
> 
> Any ideas about what is the problem? or anything else I need to check?

Any time I see seemingly random crashes involving file corruption on 
Windows I think anti-virus software. Has someone turned an AV program 
loose on this machine?

> 
> Thanks is advance,
> Shani Israeli - Software Developer
>     +972 54 6689920
> 
>     sisraeli@illusivenetworks.com
> 
>     www.illusivenetworks.com <https://www.illusivenetworks.com/>
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Nov 04, 2020 at 01:24:46PM +0100, Andreas Kretschmer wrote:
>> wild guess: Antivirus Software?

> Perhaps not.  To bring more context in here, PostgreSQL opens any
> files on WIN32 with shared writes and reads allowed to have an
> equivalent of what we do on all *nix platforms.  Note here that the
> problem comes from a WAL segment write, which is done after the file
> handle is opened in shared mode.  As long as the fd is correctly
> opened, any attempt for an antivirus software to open a file with an 
> exclusive write would be blocked, no?

The only hard data point we've got here is the "Invalid argument"
string, which should mean EINVAL, although I'm not entirely sure
where that string is determined in a Windows build.  So it seems
like there are two possibilities:

* The actual underlying Windows error code is one of the ones
that win32error.c maps to EINVAL:

                ERROR_INVALID_FUNCTION, EINVAL
                ERROR_INVALID_ACCESS, EINVAL
                ERROR_INVALID_DATA, EINVAL
                ERROR_INVALID_PARAMETER, EINVAL
                ERROR_INVALID_HANDLE, EINVAL
                ERROR_NEGATIVE_SEEK, EINVAL

* The actual underlying Windows error code is something that
win32error.c doesn't know, which would cause _dosmaperr() to
return EINVAL.

The latter case would result in a LOG message "unrecognized win32 error
code", so it would be good to know if any of those are showing up in
the postmaster log.

Seems like maybe it wasn't a great idea for _dosmaperr's fallback
errno to be something that is also a real error code.

            regards, tom lane



Re: PANIC: could not write to log file {} at offset {}, length {}: Invalid argument

From
Magnus Hagander
Date:
On Thu, Nov 5, 2020 at 3:12 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Nov 04, 2020 at 01:24:46PM +0100, Andreas Kretschmer wrote:
> >> Any ideas about what is the problem? or anything else I need to check?
> >
> > wild guess: Antivirus Software?
>
> Perhaps not.  To bring more context in here, PostgreSQL opens any
> files on WIN32 with shared writes and reads allowed to have an
> equivalent of what we do on all *nix platforms.  Note here that the
> problem comes from a WAL segment write, which is done after the file
> handle is opened in shared mode.  As long as the fd is correctly
> opened, any attempt for an antivirus software to open a file with an
> exclusive write would be blocked, no?

The problem with AVs generally doesn't come from them opening files in
non-share mode (I've, surprisingly enough, seen backup software that
causes that problem for example). It might happen on scheduled scans
for example, but the bigger problem with AV software has always been
their filter driver software which intercepts both the open/close and
the read/write calls an application makes and "does it's magic" on
them before handing the actual call up to the operating system. It's
completely independent of how the file is opened.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: PANIC: could not write to log file {} at offset {}, length {}: Invalid argument

From
Michael Paquier
Date:
On Thu, Nov 05, 2020 at 10:21:40AM +0100, Magnus Hagander wrote:
> The problem with AVs generally doesn't come from them opening files in
> non-share mode (I've, surprisingly enough, seen backup software that
> causes that problem for example). It might happen on scheduled scans
> for example, but the bigger problem with AV software has always been
> their filter driver software which intercepts both the open/close and
> the read/write calls an application makes and "does it's magic" on
> them before handing the actual call up to the operating system. It's
> completely independent of how the file is opened.

This one is a bit new to me.  I certainly saw my share of stat() or
open() calls failing on ENOPERM because of file handles taken
exclusively by external scanners around here or even with
customer-related issues, and I did not expect that such dark magic
could be involved in a write.  It would indeed not be surprising to
see a PANIC depending on what gets done.
--
Michael

Attachment
Michael Paquier <michael@paquier.xyz> writes:
> (I got to wonder whether it would be worth the complexity to show more
> information when using _dosmaperr() for WIN32 on stuff like 
> elog(ERROR, "%m"), just a wild thought).

Maybe.  It's been in the back of my mind for a long time that the
_dosmaperr() mapping may be confusing us in some of these hard-to-explain
trouble reports.  It'd be great if we could see the original Windows error
code too.  Not quite sure how to mechanize that, though.  Places where we
do stuff like save-and-restore errno across some other operation would break
any easy solution.

            regards, tom lane