Thread: [BUGS] Fails to work on live images due to fsync() on pg_commit_ts beforedoing any write there

[ 2nd try after subscription to pgsql-bugs ]

Hello,

PostgreSQL 10 no longer works on a (Kali) live system where the
root filesystem is an overlayfs with an underlying squashfs
filesystem (where postgresql and its initial file structure
is present) and a writable tmpfs overlay.

When you try to create a new database you get this failure:
createdb: database creation failed: ERROR: checkpoint request failed
HINT: Consult recent messages in the server log for details.

And in the server log you have this:
ERROR: could not fsync file "pg_commit_ts": Invalid argument

When you strace the postgresql checkpointer process you see
this:
# strace -f -p 31599
select(0, NULL, NULL, NULL, {tv_sec=1, tv_usec=0}) = 0 (Timeout)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
open("pg_xact", O_RDONLY)               = 3
fsync(3)                                = 0
close(3)                                = 0
open("pg_commit_ts", O_RDONLY)          = 3
fsync(3)                                = -1 EINVAL (Invalid argument)
close(3)                                = 0
rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE KILL SEGV CONT STOP SYS RTMIN RT_1], NULL, 8) = 0
write(2, "2017-11-07 09:47:38.580 UTC [315"..., 98) = 98

The reason why the second fsync() fails is because the
pg_commit_ts directory has not had any change since its
creation in the initial image. It is thus stored in the
read-only squashfs filesystem and has not yet been copied
up in the writable tmpfs (which does support fsync). In
this case, overlayfs delegates the fsync() call to the read-only
squashfs filesystem which returns EINVAL as it does not support
such an operation.

This has been explained by the overlayfs upstream developer
(to which I reported this bug initially, thinking it was an
overlayfs regression):
https://marc.info/?l=linux-unionfs&m=151005246512873&w=2
https://marc.info/?l=linux-unionfs&m=151005699414227&w=2

My request is thus that PostgreSQL should fsync that directory only after
it has made changes to the directory or its content. PostgreSQL 9.6 was
working fine in the same setup and I would like PostgreSQL 10 to do the
same. :)

I'm ccing Teodor Sigaev <teodor@sigaev.ru> because I believe that
the problematic fsync() has been added by him in this commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1b02be21f271db6bd3cd43abb23fa596fcb6bac3

Cheers,
-- 
Raphaël Hertzog ◈ Writer/Consultant ◈ Debian Developer

Discover the Debian Administrator's Handbook:
→ https://debian-handbook.info/get/



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Raphael Hertzog wrote:

> PostgreSQL 10 no longer works on a (Kali) live system where the
> root filesystem is an overlayfs with an underlying squashfs
> filesystem (where postgresql and its initial file structure
> is present) and a writable tmpfs overlay.

Please create a machine that works this way and get it added to the
buildfarm, so that this sort of thing doesn't surprise us in the future
months after the fact.https://wiki.postgresql.org/wiki/Buildfarm

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On Tue, Nov 7, 2017 at 10:54 PM, Raphael Hertzog <hertzog@debian.org> wrote:
> This has been explained by the overlayfs upstream developer
> (to which I reported this bug initially, thinking it was an
> overlayfs regression):
> https://marc.info/?l=linux-unionfs&m=151005246512873&w=2
> https://marc.info/?l=linux-unionfs&m=151005699414227&w=2
>
> My request is thus that PostgreSQL should fsync that directory only after
> it has made changes to the directory or its content. PostgreSQL 9.6 was
> working fine in the same setup and I would like PostgreSQL 10 to do the
> same. :)
>
> I'm ccing Teodor Sigaev <teodor@sigaev.ru> because I believe that
> the problematic fsync() has been added by him in this commit:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1b02be21f271db6bd3cd43abb23fa596fcb6bac3

Hm. I am wondering if we should change fsync_fname_ext() so as EINVAL
is considered as a no-op. EIO and EINTR should really be caught with a
proper error, but I am not sure about this one. Thoughts?
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Alvaro, Raphael,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> Raphael Hertzog wrote:
>
> > PostgreSQL 10 no longer works on a (Kali) live system where the
> > root filesystem is an overlayfs with an underlying squashfs
> > filesystem (where postgresql and its initial file structure
> > is present) and a writable tmpfs overlay.
>
> Please create a machine that works this way and get it added to the
> buildfarm, so that this sort of thing doesn't surprise us in the future
> months after the fact.

While I agree with this, I'm not entirely convinced that this isn't an
issue with the implementation of the underlying filesystem after all.  I
haven't had a chance to go read those other bug reports, but my fsync()
manpage pretty clearly seems to say that fsync should only be returning
EINVAL if it's called on a special file (FIFO, pipe, et al).  There's
certainly no indication that it's ok for the same file to sometimes
support fsync() and other times *not* support fsync().  That's pretty
bizarre.

Why wouldn't it make sense for the filesystem to realize it's a no-op if
there's been no changes?

Thanks!

Stephen