Thread: "make check" fails over NFS or tmpfs

"make check" fails over NFS or tmpfs

From

SODA Noriyuki

Date:

22 May 2006, 01:15:59

Hi,

We've encountered failures of "make check", when we put PostgreSQL
data directory on a NFS filesystem or a tmpfs filesystem.
It doesn't always fail, but fails occasionally.

Is this expected behavior of PostgreSQL?

If it's expected, what is the reason of this symptom?
I grep'ed the source code of PostgreSQL, but it seems it doesn't
use problematic operations (for NFS) like flock(2) or F_SETLK/F_SETLKW
of fcntl(2)...  So, I guess (theoretically) it should work fine over
NFS or tmpfs.  Only idea which strucks me is there is some nasty
bug in Linux. ;-)

Of course, we are using single instance of PostgreSQL on single
machine. i.e. We are NOT accessing the data directory from
either multiple machines or multiple PostgreSQL instances.

To give an actual example,
when we invoked the following shell script:
    $ cat ~/regress-loop.sh
    #!/bin/sh
    loop=1
    make clean
    while true; do
        echo "############### loop = $loop ##################"
        make check
        ret=$?
        if [ $ret -ne 0 ]; then
        echo "error @ loop = $loop (return value = $ret)"
        exit $ret
        fi
        loop=`expr $loop + 1`
    done
Errors like the following happen, sometimes:
    $ sh ~/regress-loop.sh
       :
       :
    make: *** [check] Error 2
    error @ loop = 26 (return value = 2)

We observed this symptom under the following conditions:

1. putting PGDATA on NFS-async
  filesystem:
    NFS (async)
  NFS client:
    PostgreSQL version: 8.1.3
    OS version: Fedora Core 3 Linux
  NFS server:
    OS version: Fedora Core 3 Linux
    "async" is specified in /etc/exports, thus the server violates
    the NFS protocol, and replys to requests before it stores
    changes to its disk.
  How many loops until it fails:
    3000 loops or more

2. putting PGDATA on NFS
  filesystem:
    NFS
  NFS client:
    PostgreSQL version: 8.1.3
    OS version: Fedora Core 4 Linux
  NFS server:
    OS version: Fedora Core 5 Linux
  How many loops until it fails:
     approximately 300 loops

3. putting PGDATA on tmpfs
  filesystem:
    tmpfs
  PostgreSQL version: 8.1.3
  OS version: Fedora Core 5 Linux
  How many loops until it fails:
     approximately 100 loops

This symptom never happens over ext3fs, as far as we see.
I attached the diff between expected results and actual results in
this mail.

Any ideas appreciated, except using local filesystem. ;-)
--
SODA Noriyuki

*** ./expected/tablespace.out    Tue May 16 13:03:24 2006
--- ./results/tablespace.out    Fri May 19 21:04:30 2006
***************
*** 35,37 ****
--- 35,38 ----
  NOTICE:  drop cascades to table testschema.foo
  -- Should succeed
  DROP TABLESPACE testspace;
+ ERROR:  tablespace "testspace" is not empty

======================================================================

*** ./expected/tablespace.out    Fri May 19 15:28:32 2006
--- ./results/tablespace.out    Sat May 20 06:13:18 2006
***************
*** 35,37 ****
--- 35,38 ----
  NOTICE:  drop cascades to table testschema.foo
  -- Should succeed
  DROP TABLESPACE testspace;
+ ERROR:  tablespace "testspace" is not empty

======================================================================

*** ./expected/sanity_check.out    Fri Sep  9 05:07:42 2005
--- ./results/sanity_check.out    Fri May 19 16:31:37 2006
***************
*** 17,22 ****
--- 17,24 ----
   circle_tbl          | t
   fast_emp4000        | t
   func_index_heap     | t
+  gcircle_tbl         | t
+  gpolygon_tbl        | t
   hash_f8_heap        | t
   hash_i4_heap        | t
   hash_name_heap      | t
***************
*** 68,74 ****
   shighway            | t
   tenk1               | t
   tenk2               | t
! (58 rows)

  --
  -- another sanity check: every system catalog that has OIDs should have
--- 70,76 ----
   shighway            | t
   tenk1               | t
   tenk2               | t
! (60 rows)

  --
  -- another sanity check: every system catalog that has OIDs should have

======================================================================

Re: "make check" fails over NFS or tmpfs

From

Tom Lane

Date:

22 May 2006, 02:05:49

SODA Noriyuki <soda@sra.co.jp> writes:
> We've encountered failures of "make check", when we put PostgreSQL
> data directory on a NFS filesystem or a tmpfs filesystem.
> It doesn't always fail, but fails occasionally.

Is the NFS filesystem mounted fail-soft?

As a rule, database people will tell you to just go away if you
admit to running a database over NFS.  Its idea of reliability
is at least an order of magnitude worse than what we consider
acceptable.  But hard mounting helps.

            regards, tom lane

Re: "make check" fails over NFS or tmpfs

From

SODA Noriyuki

Date:

22 May 2006, 02:45:42

Thanks for the reply.

>>>>> On Mon, 22 May 2006 01:05:38 -0400, Tom Lane <tgl@sss.pgh.pa.us> said:

 > SODA Noriyuki <soda@sra.co.jp> writes:
>> We've encountered failures of "make check", when we put PostgreSQL
>> data directory on a NFS filesystem or a tmpfs filesystem.
>> It doesn't always fail, but fails occasionally.

> Is the NFS filesystem mounted fail-soft?

It's mounted with "hard" option.

> As a rule, database people will tell you to just go away if you
> admit to running a database over NFS.  Its idea of reliability
> is at least an order of magnitude worse than what we consider
> acceptable.

Yeah ;)
But weren't you surprised by the fact that tmpfs doesn't work either,
and fails more quicker than nfs?

Anyway, thanks for the reply.
--
SODA Noriyuki

Re: "make check" fails over NFS or tmpfs

From

Greg Stark

Date:

22 May 2006, 04:01:22

SODA Noriyuki <soda@sra.co.jp> writes:

>   NFS server:
>     OS version: Fedora Core 3 Linux
>     "async" is specified in /etc/exports, thus the server violates
>     the NFS protocol, and replys to requests before it stores
>     changes to its disk.

The reason the protocol is speced the way it is is because it's the only way
to gaurantee the semantics match the traditional unix semantics of a local
filesystem.

That said I would have expected a good NFS server to still live up to
everything important as long as the server doesn't actually crash or get shut
down at any point.

I certainly would have expected tmpfs to live up to the traditional unix
filesystem semantics.

> *** 35,37 ****
> --- 35,38 ----
>   NOTICE:  drop cascades to table testschema.foo
>   -- Should succeed
>   DROP TABLESPACE testspace;
> + ERROR:  tablespace "testspace" is not empty

This one looks like the unlink is returning before it completes and then
subsequent operations (perhaps only if they come from other processes?) are
allowed to see the old filesystem state. That really ought not every happen
even with async and certainly not in tmpfs.

This might bear some further testing. Can you send the exact commands you used
to set up the tmpfs filesystem? Also, it might be worth checking if Fedora
Core 3 has any relevant known bugs.

> ======================================================================
>
> *** ./expected/sanity_check.out    Fri Sep  9 05:07:42 2005
> --- ./results/sanity_check.out    Fri May 19 16:31:37 2006
> ***************
> *** 17,22 ****
> --- 17,24 ----
>    circle_tbl          | t
>    fast_emp4000        | t
>    func_index_heap     | t
> +  gcircle_tbl         | t
> +  gpolygon_tbl        | t
>    hash_f8_heap        | t
>    hash_i4_heap        | t
>    hash_name_heap      | t

This seems pretty mystifying. Perhaps it's leftover stuff from the tablespace
that failed to get dropped?

--
greg

Re: "make check" fails over NFS or tmpfs

From

"Rafael Martinez, Guerrero"

Date:

22 May 2006, 04:17:50

On Mon, 2006-05-22 at 06:15, SODA Noriyuki wrote:

Hei

> We've encountered failures of "make check", when we put PostgreSQL
> data directory on a NFS filesystem or a tmpfs filesystem.
> It doesn't always fail, but fails occasionally.
>

To have a database on a NFS filesystem is a disaster waiting to happen,
specially if the NFS server is running on Linux. Not to talk on the
performance penalty of running the database via NFS.

I would say that anything is better than NFS for running a database. But
if you absolutely have to use NFS, run NFS via TCP not UDP, use hard and
turn off all cache ..... In the server side we are talking about 'sync'
and 'no_wdelay' parameters and in the client about
'bg','hard','intr','noac' and 'tcp', probably the throughput will
improve by increasing rsize and wsize to 16384 or even 32768.

regards,
--
Rafael Martinez, <r.m.guerrero@usit.uio.no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

Re: "make check" fails over NFS or tmpfs

From

SODA Noriyuki

Date:

22 May 2006, 04:34:42

>>>>> On 22 May 2006 03:00:55 -0400, Greg Stark <gsstark@mit.edu> said:

> That said I would have expected a good NFS server to still live up to
> everything important as long as the server doesn't actually crash or get shut
> down at any point.

> I certainly would have expected tmpfs to live up to the traditional unix
> filesystem semantics.

Yeah.
That was what I hoped, although it was too optimistic.

>> *** 35,37 ****
>> --- 35,38 ----
>> NOTICE:  drop cascades to table testschema.foo
>> -- Should succeed
>> DROP TABLESPACE testspace;
>> + ERROR:  tablespace "testspace" is not empty

> This one looks like the unlink is returning before it completes and then
> subsequent operations (perhaps only if they come from other processes?) are
> allowed to see the old filesystem state. That really ought not every happen
> even with async and certainly not in tmpfs.

> This might bear some further testing. Can you send the exact
> commands you used to set up the tmpfs filesystem?

These two "tablespace not empty" results both came from NFS mounts.
(I should have said that explicitly, sorry.)
So, this means the REMOVE RPC sometimes may overtake other RPCs?
Hmm...

FWIW, no option was specified to mount tmpfs.

>> *** ./expected/sanity_check.out    Fri Sep  9 05:07:42 2005
>> --- ./results/sanity_check.out    Fri May 19 16:31:37 2006
>> ***************
>> *** 17,22 ****
>> --- 17,24 ----
>> circle_tbl          | t
>> fast_emp4000        | t
>> func_index_heap     | t
>> +  gcircle_tbl         | t
>> +  gpolygon_tbl        | t
>> hash_f8_heap        | t
>> hash_i4_heap        | t
>> hash_name_heap      | t

> This seems pretty mystifying. Perhaps it's leftover stuff from the
> tablespace that failed to get dropped?

No.
Because this is a result from tmpfs, and before this failure,
"make check" passed almost 100 times on this tmpfs.

It seems your explanation can describe what's happening in the NFS case,
Thanks!
--
soda

Re: "make check" fails over NFS or tmpfs

From

Greg Stark

Date:

22 May 2006, 05:26:21

"Rafael Martinez, Guerrero" <r.m.guerrero@usit.uio.no> writes:

> I would say that anything is better than NFS for running a database. But
> if you absolutely have to use NFS, run NFS via TCP not UDP, use hard and
> turn off all cache ..... In the server side we are talking about 'sync'
> and 'no_wdelay' parameters and in the client about
> 'bg','hard','intr','noac' and 'tcp', probably the throughput will
> improve by increasing rsize and wsize to 16384 or even 32768.

Using TCP with NFS is only really helpful when you have a high latency high
bandwidth link which isn't going to be a terribly positive environment for
postgres.

I'm not sure about all your other recommendations either, they strike me as a
bit cargo-cultish. Certainly not mounting your filesystem soft will protect
against unknowingly losing data if your server crashes, and boosting wsize and
rsize will help though the optimal values will depend on your specific
environment. But the others shouldn't be terribly relevant -- hell, bg only
affects the actual mount operation.

While I'm leery about recommending any network filesystem for anything that
depends on the filesystem as heavily as a database, of all the network
filesystems NFS takes the most care to maintain solid semantics. The main
problem is that people are always looking for new and interesting ways to
defeat those semantics with options like soft mounts. Certainly I can't agree
with "anything is better than NFS", what would you recommend, samba?

Now that I've read up on what "async" does it seems like the errors are a
pretty predictable consequence. Making directory operations asynchronous is
going to break a LOT of things. Most Unix mail servers, for example, also
depend on directory operations being synchronous. I would expect "async" to
cause Postgres errors on any filesystem that supports it.

"async" "intr" and "soft" seem like the real foot-guns here.

--
greg

Re: "make check" fails over NFS or tmpfs

From

Peter Eisentraut

Date:

22 May 2006, 06:00:56

Am Montag, 22. Mai 2006 09:17 schrieb Rafael Martinez, Guerrero:
> To have a database on a NFS filesystem is a disaster waiting to happen,
> specially if the NFS server is running on Linux. Not to talk on the
> performance penalty of running the database via NFS.

With all due respect, this is a bunch of FUD.  When used properly, NFS is
perfectly safe and well-performing for a PostgreSQL database or any other
application.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: "make check" fails over NFS or tmpfs

From

"Rafael Martinez, Guerrero"

Date:

22 May 2006, 06:13:02

On Mon, 2006-05-22 at 10:25, Greg Stark wrote:
> Using TCP with NFS is only really helpful when you have a high latency high
> bandwidth link which isn't going to be a terribly positive environment for
> postgres.
>

Well, having a protocol that by definition says that datagrams may
arrive out of order or go missing without notice does not sound like a
good thing to have a database running on.

[.......]
> environment. But the others shouldn't be terribly relevant -- hell, bg only
> affects the actual mount operation.
>

The result of using cut & paste ;) not directly relevant to postgres but
nice to have when mounting the nfs directory.

> While I'm leery about recommending any network filesystem for anything that
> depends on the filesystem as heavily as a database, of all the network
> filesystems NFS takes the most care to maintain solid semantics. The main
> problem is that people are always looking for new and interesting ways to
> defeat those semantics with options like soft mounts. Certainly I can't agree
> with "anything is better than NFS", what would you recommend, samba?
>

Samba? :-) not at all, it was a way of saying how bad idea is to run a
database via NFS if you want reliability and performance. Not everybody
agrees with this, but well, they can do what the want with their data.

>
> "async" "intr" and "soft" seem like the real foot-guns here.

Why do you think 'intr' is a bad thing, from man pages:
" ........  If  an  NFS file operation has a major timeout and it is
hard mounted, then allow signals to  interupt  the  file operation  and
cause  it to return EINTR to the calling program.  The default is to not
allow file operations to be interrupted ....."

This will be like an error reported by the filesystem, the program will
get the information and will take care of the problem instead of waiting
indefinitely for a respons not comming and having the database probably
in a nonconsistent state.

With 'noac' I was thinking about two processes trying to access the same
file at the same time, better not to have some cache in our way that
alter the real state of the file to other processes.

--
Rafael Martinez, <r.m.guerrero@usit.uio.no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

Re: "make check" fails over NFS or tmpfs

From

"Rafael Martinez, Guerrero"

Date:

22 May 2006, 06:30:49

On Mon, 2006-05-22 at 11:00, Peter Eisentraut wrote:
> Am Montag, 22. Mai 2006 09:17 schrieb Rafael Martinez, Guerrero:
> > To have a database on a NFS filesystem is a disaster waiting to happen,
> > specially if the NFS server is running on Linux. Not to talk on the
> > performance penalty of running the database via NFS.
>
> With all due respect, this is a bunch of FUD.  When used properly, NFS is
> perfectly safe and well-performing for a PostgreSQL database or any other
> application.

Well, I do not agree with what you say. NFS and specially if the server
is running Linux is not perfectly safe and well-performing for a
database system, specially in a busy one.

Our experience with NFS on linux is that not always works as is suppose
to do. Yes, it works ok in many cases and yes, it works ok for some type
of applications but it does not work for the level of reliability and
integrity we want in our systems/databases.

regards,
--
Rafael Martinez, <r.m.guerrero@usit.uio.no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

Re: "make check" fails over NFS or tmpfs

From

Martijn van Oosterhout

Date:

22 May 2006, 06:53:09

On Mon, May 22, 2006 at 04:34:34PM +0900, SODA Noriyuki wrote:
> These two "tablespace not empty" results both came from NFS mounts.
> (I should have said that explicitly, sorry.)
> So, this means the REMOVE RPC sometimes may overtake other RPCs?
> Hmm...

Maybe an explicit fsync() is needed on the directory before trying to
remove it. Still, this seems like something the protocol should deal
with.

There is one possibility, considering it's NFS. On normal unix
filesystems, if you delete a file which is still open, the directory
entry goes away and you can rmdir the directory. However, due to the
way NFS works, the file can't really be deleted on the server because
NFS is stateless, the connection and all open files should in principle
survive a server restart.

Perhaps another postgresql process has a file open still, and that is
preventing the rmdir from suceeding. In this case, the issue should
manifest itself on windows also, since it also does not permit the
deletion of an open file (or maybe it does now).

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: "make check" fails over NFS or tmpfs

From

Peter Eisentraut

Date:

22 May 2006, 07:10:43

Am Montag, 22. Mai 2006 11:30 schrieb Rafael Martinez, Guerrero:
> Our experience with NFS on linux is that not always works as is suppose
> to do. Yes, it works ok in many cases and yes, it works ok for some type
> of applications but it does not work for the level of reliability and
> integrity we want in our systems/databases.

I have no experience with running NFS on Linux in a database environment, so I
grant you that this may not be a good choice.  But that is a problem of the
server implementation, not the protocol or the file system.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

PostgreSQL and EnterpriseDB in New York Times

From

Josh Rovero

Date:

22 May 2006, 08:46:02

Thomas Friedman (NYT editorial writer) mentions
PostgreSQL and EnterpriseDB in his column (published
today in syndication in the hinterlands) about how
outsourcing is helping the economies in many places.

Re: "make check" fails over NFS or tmpfs

From

Tom Lane

Date:

22 May 2006, 11:47:59

SODA Noriyuki <soda@sra.co.jp> writes:
> On 22 May 2006 03:00:55 -0400, Greg Stark <gsstark@mit.edu> said:
> +  gcircle_tbl         | t
> +  gpolygon_tbl        | t

>> This seems pretty mystifying. Perhaps it's leftover stuff from the
>> tablespace that failed to get dropped?

> No.
> Because this is a result from tmpfs, and before this failure,
> "make check" passed almost 100 times on this tmpfs.

It looks to me like this is just a possible result from sufficiently
weird timing.  gcircle_tbl and gpolygon_tbl are temp tables made in
the create_index test, which runs just before sanity_check.  If the
backend running create_index hadn't finished deleting its temp tables
yet, they could still be present when sanity_check looks in pg_class.
Curious that we've never seen this on any other platform though.

            regards, tom lane

Re: "make check" fails over NFS or tmpfs

From

Scott Marlowe

Date:

22 May 2006, 12:48:43

On Mon, 2006-05-22 at 04:12, Rafael Martinez, Guerrero wrote:
> On Mon, 2006-05-22 at 10:25, Greg Stark wrote:
> > Using TCP with NFS is only really helpful when you have a high latency high
> > bandwidth link which isn't going to be a terribly positive environment for
> > postgres.
> >
>
> Well, having a protocol that by definition says that datagrams may
> arrive out of order or go missing without notice does not sound like a
> good thing to have a database running on.
>
> [.......]
> > environment. But the others shouldn't be terribly relevant -- hell, bg only
> > affects the actual mount operation.
> >
>
> The result of using cut & paste ;) not directly relevant to postgres but
> nice to have when mounting the nfs directory.
>
> > While I'm leery about recommending any network filesystem for anything that
> > depends on the filesystem as heavily as a database, of all the network
> > filesystems NFS takes the most care to maintain solid semantics. The main
> > problem is that people are always looking for new and interesting ways to
> > defeat those semantics with options like soft mounts. Certainly I can't agree
> > with "anything is better than NFS", what would you recommend, samba?
> >
>
> Samba? :-) not at all, it was a way of saying how bad idea is to run a
> database via NFS if you want reliability and performance. Not everybody
> agrees with this, but well, they can do what the want with their data.

Given my experiences with Linux, NFS, and Samba in the past, I would say
Samba is a MUCH better choice for network file systems under Linux than
NFS, especially if you're using different kernel versions on the systems
and what not.  It seems that if you find the right kernel on both sides
of a Linux - Linux NFS system, then it can be very stable.  But that's
only a small percentage of the time.  Most of the time, I've had serious
issues with Linux and NFS, and I'm a big proponent of Linux in general.
But the NFS implementation has serious issues.

Re: "make check" fails over NFS or tmpfs

From

Greg Stark

Date:

22 May 2006, 13:53:03

"Rafael Martinez, Guerrero" <r.m.guerrero@usit.uio.no> writes:

> Why do you think 'intr' is a bad thing, from man pages:
> " ........  If  an  NFS file operation has a major timeout and it is
> hard mounted, then allow signals to  interupt  the  file operation  and
> cause  it to return EINTR to the calling program.  The default is to not
> allow file operations to be interrupted ....."
>
> This will be like an error reported by the filesystem, the program will
> get the information and will take care of the problem instead of waiting
> indefinitely for a respons not comming and having the database probably
> in a nonconsistent state.

Traditional file systems guaranteed it never happened, so older applications
do not expect to have filesystem operations interrupted. Many do not check for
it or do not handle it properly. I recall a conversation a while back about
Postgres in particular not checking for it.

> With 'noac' I was thinking about two processes trying to access the same
> file at the same time, better not to have some cache in our way that
> alter the real state of the file to other processes.

The description of the option gave me the impression that this would only be
an issue if your processes were on two different clients.

--
greg

Re: "make check" fails over NFS or tmpfs

From

Martijn van Oosterhout

Date:

22 May 2006, 15:01:56

On Mon, May 22, 2006 at 12:52:33PM -0400, Greg Stark wrote:
> "Rafael Martinez, Guerrero" <r.m.guerrero@usit.uio.no> writes:
>
> > Why do you think 'intr' is a bad thing, from man pages:
> > " ........  If  an  NFS file operation has a major timeout and it is
> > hard mounted, then allow signals to  interupt  the  file operation  and
> > cause  it to return EINTR to the calling program.  The default is to not
> > allow file operations to be interrupted ....."
>
> Traditional file systems guaranteed it never happened, so older applications
> do not expect to have filesystem operations interrupted. Many do not check for
> it or do not handle it properly. I recall a conversation a while back about
> Postgres in particular not checking for it.

I've occasionally wondered if this is a SysV vs BSD thing. Under SysV
signal semantics, any signal would cause the current system call to
return EINTR. The list of system calls that could be interrupted is
long, and include just about anything filesystem related. So programs
with any kind of signal handling would handle the broken-NFS case
automatically.

BSD signal semantics (what postgres uses) make all system calls
restart across signals. Thus, a system call can never return EINTR
unless you have non-blocking I/O enabled. These programs would be
confused by unexpected EINTRs.

Postgres doesn't check EINTR on all filesystem system call and thus
would be susceptable to the above problem.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: "make check" fails over NFS or tmpfs

From

"Florian G. Pflug"

Date:

23 May 2006, 19:16:27

Martijn van Oosterhout wrote:
> On Mon, May 22, 2006 at 12:52:33PM -0400, Greg Stark wrote:
>> "Rafael Martinez, Guerrero" <r.m.guerrero@usit.uio.no> writes:
>>
>>> Why do you think 'intr' is a bad thing, from man pages:
>>> " ........  If  an  NFS file operation has a major timeout and it is
>>> hard mounted, then allow signals to  interupt  the  file operation  and
>>> cause  it to return EINTR to the calling program.  The default is to not
>>> allow file operations to be interrupted ....."
>> Traditional file systems guaranteed it never happened, so older applications
>> do not expect to have filesystem operations interrupted. Many do not check for
>> it or do not handle it properly. I recall a conversation a while back about
>> Postgres in particular not checking for it.
>
> I've occasionally wondered if this is a SysV vs BSD thing. Under SysV
> signal semantics, any signal would cause the current system call to
> return EINTR. The list of system calls that could be interrupted is
> long, and include just about anything filesystem related. So programs
> with any kind of signal handling would handle the broken-NFS case
> automatically.
>
> BSD signal semantics (what postgres uses) make all system calls
> restart across signals. Thus, a system call can never return EINTR
> unless you have non-blocking I/O enabled. These programs would be
> confused by unexpected EINTRs.
AFAIK, linux actually abort syscalls when an signal arrives, and it's
just the libc that restarts them automatically. So, actually, doing

do {
   ret = syscall(args) ;
} until (ret != EINTR)

in your code should be equivalent to telling the libc to provide BSD semantics, and
just do

ret = syscall(args) ;

> Postgres doesn't check EINTR on all filesystem system call and thus
> would be susceptable to the above problem.
Even if postgres checked for EINTR, what could it possibly do in that case?
Just retrying wont have any advantage over simply mounting with "nointr" -
it would still just hang when the nfs-server dies.

greetings, Florian Pflug

Re: "make check" fails over NFS or tmpfs

From

Martijn van Oosterhout

Date:

26 May 2006, 11:00:56

On Wed, May 24, 2006 at 12:16:13AM +0200, Florian G. Pflug wrote:
> >BSD signal semantics (what postgres uses) make all system calls
> >restart across signals. Thus, a system call can never return EINTR
> >unless you have non-blocking I/O enabled. These programs would be
> >confused by unexpected EINTRs.
> AFAIK, linux actually abort syscalls when an signal arrives, and it's
> just the libc that restarts them automatically. So, actually, doing

All UNIX OS's do something similar. After all, if you define a signal
handler, the kernel has to return to user space to execute your
handler. All BSD did was always restart the syscall (your loop, though
probably just by fiddling the instruction pointer)) whereas SysV never
did. Nowadays you can choose which way you want it using sigaction().

I think the real lesson is that you can emulate BSD semantics if you
have SysV semantics, but not vice-versa.

> >Postgres doesn't check EINTR on all filesystem system call and thus
> >would be susceptable to the above problem.
> Even if postgres checked for EINTR, what could it possibly do in that case?
> Just retrying wont have any advantage over simply mounting with "nointr" -
> it would still just hang when the nfs-server dies.

Well, it could check whether statement_tineout has passed and return an
error rather than hanging...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc