Thread: EINTR while resizing dsm segment.

EINTR while resizing dsm segment.

From
Kyotaro Horiguchi
Date:
I provided the subject, and added -hackers.

> Hello,
> I am running postgres 11.5 and we were having issues with shared segments.
> So I increased the max_connection as suggested by you guys and reduced my
> work_mem to 600M.
> 
> Right now instead, it is the second time I see this error :
> 
> ERROR:  could not resize shared memory segment "/PostgreSQL.2137675995" to
> 33624064 bytes: Interrupted system call

The function posix_fallocate is protected against EINTR.

| do
| {
|     rc = posix_fallocate(fd, 0, size);
| } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));

But not for ftruncate and write. Don't we need to protect them from
ENTRI as the attached?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 590b783f93995bfd1ec05dbcb2805a577372604d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Thu, 2 Apr 2020 17:09:35 +0900
Subject: [PATCH] Protect dsm_impl from EINTR

dsm_impl functions should not error-out by EINTR.
---
 src/backend/storage/ipc/dsm_impl.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 1972aecbed..f4e7350a5e 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -360,7 +360,11 @@ dsm_impl_posix_resize(int fd, off_t size)
     int            rc;
 
     /* Truncate (or extend) the file to the requested size. */
-    rc = ftruncate(fd, size);
+    do
+    {
+        rc = ftruncate(fd, size);
+    } while (rc < 0 && errno == EINTR &&
+             !(ProcDiePending || QueryCancelPending));
 
     /*
      * On Linux, a shm_open fd is backed by a tmpfs file.  After resizing with
@@ -874,11 +878,19 @@ dsm_impl_mmap(dsm_op op, dsm_handle handle, Size request_size,
         while (success && remaining > 0)
         {
             Size        goal = remaining;
+            Size        rc;
 
             if (goal > ZBUFFER_SIZE)
                 goal = ZBUFFER_SIZE;
             pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE);
-            if (write(fd, zbuffer, goal) == goal)
+
+            do
+            {
+                rc = write(fd, zbuffer, goal);
+            } while (rc < 0 && errno == EINTR &&
+                     !(ProcDiePending || QueryCancelPending));
+
+            if (rc == goal)
                 remaining -= goal;
             else
                 success = false;
-- 
2.18.2


Re: EINTR while resizing dsm segment.

From
Thomas Munro
Date:
On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> I provided the subject, and added -hackers.
>
> > Hello,
> > I am running postgres 11.5 and we were having issues with shared segments.
> > So I increased the max_connection as suggested by you guys and reduced my
> > work_mem to 600M.
> >
> > Right now instead, it is the second time I see this error :
> >
> > ERROR:  could not resize shared memory segment "/PostgreSQL.2137675995" to
> > 33624064 bytes: Interrupted system call
>
> The function posix_fallocate is protected against EINTR.
>
> | do
> | {
> |       rc = posix_fallocate(fd, 0, size);
> | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
>
> But not for ftruncate and write. Don't we need to protect them from
> ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones.  Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?



Re: EINTR while resizing dsm segment.

From
Thomas Munro
Date:
On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> I provided the subject, and added -hackers.
>
> > Hello,
> > I am running postgres 11.5 and we were having issues with shared segments.
> > So I increased the max_connection as suggested by you guys and reduced my
> > work_mem to 600M.
> >
> > Right now instead, it is the second time I see this error :
> >
> > ERROR:  could not resize shared memory segment "/PostgreSQL.2137675995" to
> > 33624064 bytes: Interrupted system call
>
> The function posix_fallocate is protected against EINTR.
>
> | do
> | {
> |       rc = posix_fallocate(fd, 0, size);
> | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
>
> But not for ftruncate and write. Don't we need to protect them from
> ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones.  Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?



Re: EINTR while resizing dsm segment.

From
Nicola Contu
Date:
So that seems to be a bug, correct?
Just to confirm, I am not using NFS, it is directly on disk.

Other than that, is there a particular option we can set in the postgres.conf to mitigate the issue?

Thanks a lot for your help.


Il giorno sab 4 apr 2020 alle ore 02:49 Thomas Munro <thomas.munro@gmail.com> ha scritto:
On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> I provided the subject, and added -hackers.
>
> > Hello,
> > I am running postgres 11.5 and we were having issues with shared segments.
> > So I increased the max_connection as suggested by you guys and reduced my
> > work_mem to 600M.
> >
> > Right now instead, it is the second time I see this error :
> >
> > ERROR:  could not resize shared memory segment "/PostgreSQL.2137675995" to
> > 33624064 bytes: Interrupted system call
>
> The function posix_fallocate is protected against EINTR.
>
> | do
> | {
> |       rc = posix_fallocate(fd, 0, size);
> | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
>
> But not for ftruncate and write. Don't we need to protect them from
> ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones.  Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?

Re: EINTR while resizing dsm segment.

From
Nicola Contu
Date:
So that seems to be a bug, correct?
Just to confirm, I am not using NFS, it is directly on disk.

Other than that, is there a particular option we can set in the postgres.conf to mitigate the issue?

Thanks a lot for your help.


Il giorno sab 4 apr 2020 alle ore 02:49 Thomas Munro <thomas.munro@gmail.com> ha scritto:
On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> I provided the subject, and added -hackers.
>
> > Hello,
> > I am running postgres 11.5 and we were having issues with shared segments.
> > So I increased the max_connection as suggested by you guys and reduced my
> > work_mem to 600M.
> >
> > Right now instead, it is the second time I see this error :
> >
> > ERROR:  could not resize shared memory segment "/PostgreSQL.2137675995" to
> > 33624064 bytes: Interrupted system call
>
> The function posix_fallocate is protected against EINTR.
>
> | do
> | {
> |       rc = posix_fallocate(fd, 0, size);
> | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
>
> But not for ftruncate and write. Don't we need to protect them from
> ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones.  Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?

Re: EINTR while resizing dsm segment.

From
Nicola Contu
Date:
The only change we made on the disk, is the encryption at OS level.
Not sure this can be something related.

Il giorno mar 7 apr 2020 alle ore 10:58 Nicola Contu <nicola.contu@gmail.com> ha scritto:
So that seems to be a bug, correct?
Just to confirm, I am not using NFS, it is directly on disk.

Other than that, is there a particular option we can set in the postgres.conf to mitigate the issue?

Thanks a lot for your help.


Il giorno sab 4 apr 2020 alle ore 02:49 Thomas Munro <thomas.munro@gmail.com> ha scritto:
On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> I provided the subject, and added -hackers.
>
> > Hello,
> > I am running postgres 11.5 and we were having issues with shared segments.
> > So I increased the max_connection as suggested by you guys and reduced my
> > work_mem to 600M.
> >
> > Right now instead, it is the second time I see this error :
> >
> > ERROR:  could not resize shared memory segment "/PostgreSQL.2137675995" to
> > 33624064 bytes: Interrupted system call
>
> The function posix_fallocate is protected against EINTR.
>
> | do
> | {
> |       rc = posix_fallocate(fd, 0, size);
> | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
>
> But not for ftruncate and write. Don't we need to protect them from
> ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones.  Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?

Re: EINTR while resizing dsm segment.

From
Nicola Contu
Date:
The only change we made on the disk, is the encryption at OS level.
Not sure this can be something related.

Il giorno mar 7 apr 2020 alle ore 10:58 Nicola Contu <nicola.contu@gmail.com> ha scritto:
So that seems to be a bug, correct?
Just to confirm, I am not using NFS, it is directly on disk.

Other than that, is there a particular option we can set in the postgres.conf to mitigate the issue?

Thanks a lot for your help.


Il giorno sab 4 apr 2020 alle ore 02:49 Thomas Munro <thomas.munro@gmail.com> ha scritto:
On Thu, Apr 2, 2020 at 9:25 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> I provided the subject, and added -hackers.
>
> > Hello,
> > I am running postgres 11.5 and we were having issues with shared segments.
> > So I increased the max_connection as suggested by you guys and reduced my
> > work_mem to 600M.
> >
> > Right now instead, it is the second time I see this error :
> >
> > ERROR:  could not resize shared memory segment "/PostgreSQL.2137675995" to
> > 33624064 bytes: Interrupted system call
>
> The function posix_fallocate is protected against EINTR.
>
> | do
> | {
> |       rc = posix_fallocate(fd, 0, size);
> | } while (rc == EINTR && !(ProcDiePending || QueryCancelPending));
>
> But not for ftruncate and write. Don't we need to protect them from
> ENTRI as the attached?

We don't handle EINTR for write() generally because that's not
supposed to be necessary on local files (local disks are not "slow
devices", and we document that if you're using something like NFS you
should use its "hard" mount option so that it behaves that way too).
As for ftruncate(), you'd think it'd be similar, and I can't think of
a more local filesystem than tmpfs (where POSIX shmem lives on Linux),
but I can't seem to figure that out from reading man pages; maybe I'm
reading the wrong ones.  Perhaps in low memory situations, an I/O wait
path reached by ftruncate() can return EINTR here rather than entering
D state (non-interruptable sleep) or restarting due to our SA_RESTART
flag... anyone know?

Another thought: is there some way for the posix_fallocate() retry
loop to exit because (ProcDiePending || QueryCancelPending), but then
for CHECK_FOR_INTERRUPTS() to do nothing, so that we fall through to
reporting the EINTR?

Re: EINTR while resizing dsm segment.

From
Thomas Munro
Date:
On Tue, Apr 7, 2020 at 8:58 PM Nicola Contu <nicola.contu@gmail.com> wrote:
> So that seems to be a bug, correct?
> Just to confirm, I am not using NFS, it is directly on disk.
>
> Other than that, is there a particular option we can set in the postgres.conf to mitigate the issue?

Hi Nicola,

Yeah, I think it's a bug.  We're not sure exactly where yet.



Re: EINTR while resizing dsm segment.

From
Thomas Munro
Date:
On Tue, Apr 7, 2020 at 8:58 PM Nicola Contu <nicola.contu@gmail.com> wrote:
> So that seems to be a bug, correct?
> Just to confirm, I am not using NFS, it is directly on disk.
>
> Other than that, is there a particular option we can set in the postgres.conf to mitigate the issue?

Hi Nicola,

Yeah, I think it's a bug.  We're not sure exactly where yet.