Thread: pg_basebackup failed to read a file

pg_basebackup failed to read a file

From
Mike Cardwell
Date:
Hi,

I was just setting up streaming replication for the first time. I ran
pg_basebackup on the slave. It copied 1.5TB of data. Then it errored
out with:

```
1498215035/1498215035 kB (100%), 1/1 tablespace
pg_basebackup: could not get write-ahead log end position from server:
ERROR:  could not open file "./postgresql.conf~": Permission denied
pg_basebackup: removing data directory "/var/lib/pgsql/10/data"
bash-4.2$
```

Now, I know what this error means. There was a root owned file at
"/var/lib/pgsql/10/data/postgresql.conf~" which contained an old
version of our postgres config and was not readable by the postgres
user. I'll delete this file and try again. However, in the mean time: I
feel like it would be useful for pg_basebackup to check that it has
read access to all of the existing files in the source directory at the
start, before it begins it's copy. I'd like to submit this as a feature
request, but I'm struggling to find how to do that. So here I am... Can
anyone point me in the right direction?

Regards,

Mike
Attachment

Re: pg_basebackup failed to read a file

From
Tom Lane
Date:
Mike Cardwell <mike.cardwell@hardenize.com> writes:
> pg_basebackup: could not get write-ahead log end position from server:
> ERROR:  could not open file "./postgresql.conf~": Permission denied

> Now, I know what this error means. There was a root owned file at
> "/var/lib/pgsql/10/data/postgresql.conf~" which contained an old
> version of our postgres config and was not readable by the postgres
> user. I'll delete this file and try again. However, in the mean time: I
> feel like it would be useful for pg_basebackup to check that it has
> read access to all of the existing files in the source directory at the
> start, before it begins it's copy.

That seems like a pretty expensive thing to do, if there are lots of
files ... and you'd still end up failing, so it's not moving the ball
very far.

More generally, this seems closely related to bug #14999 [1]
which concerned pg_rewind's behavior in the face of unexpected file
permissions within the data directory.  We ended up not doing anything
about that except documenting it, which I wasn't very satisfied with,
but the costs of doing better seemed to exceed the benefits.

It'd be nice to have a more coherent theory about what needs to be copied
or not, and not fail on files that could simply be ignored.  Up to now
we've resisted having any centrally defined knowledge of what can be
inside a PG data directory, but maybe that bullet needs to be bitten.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/20180104200633.17004.16377%40wrigleys.postgresql.org


Re: pg_basebackup failed to read a file

From
Ron
Date:

On 08/14/2018 11:14 AM, Tom Lane wrote:
> Mike Cardwell <mike.cardwell@hardenize.com> writes:
>> pg_basebackup: could not get write-ahead log end position from server:
>> ERROR:  could not open file "./postgresql.conf~": Permission denied
>> Now, I know what this error means. There was a root owned file at
>> "/var/lib/pgsql/10/data/postgresql.conf~" which contained an old
>> version of our postgres config and was not readable by the postgres
>> user. I'll delete this file and try again. However, in the mean time: I
>> feel like it would be useful for pg_basebackup to check that it has
>> read access to all of the existing files in the source directory at the
>> start, before it begins it's copy.
> That seems like a pretty expensive thing to do, if there are lots of
> files ... and you'd still end up failing, so it's not moving the ball
> very far.

Why is checking a bunch of file permissions anywhere close to being as 
expensive as transferring 1.5TB over a WAN link?

-- 
Angular momentum makes the world go 'round.


Re: pg_basebackup failed to read a file

From
Dimitri Maziuk
Date:
On 08/14/2018 12:14 PM, Ron wrote:

> Why is checking a bunch of file permissions anywhere close to being as
> expensive as transferring 1.5TB over a WAN link?

Normally it shouldn't be but I recently had postgres create ~13M .snap
files and just opendir() took longer than anyone would care to wait...
so it can be just as expensive.

One could just as easily ask why create mode 600 files in places where
they don't belong.

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


Attachment

Re: pg_basebackup failed to read a file

From
Stephen Frost
Date:
Greetings,

* Ron (ronljohnsonjr@gmail.com) wrote:
> On 08/14/2018 11:14 AM, Tom Lane wrote:
> >Mike Cardwell <mike.cardwell@hardenize.com> writes:
> >>pg_basebackup: could not get write-ahead log end position from server:
> >>ERROR:  could not open file "./postgresql.conf~": Permission denied
> >>Now, I know what this error means. There was a root owned file at
> >>"/var/lib/pgsql/10/data/postgresql.conf~" which contained an old
> >>version of our postgres config and was not readable by the postgres
> >>user. I'll delete this file and try again. However, in the mean time: I
> >>feel like it would be useful for pg_basebackup to check that it has
> >>read access to all of the existing files in the source directory at the
> >>start, before it begins it's copy.
> >That seems like a pretty expensive thing to do, if there are lots of
> >files ... and you'd still end up failing, so it's not moving the ball
> >very far.
>
> Why is checking a bunch of file permissions anywhere close to being as
> expensive as transferring 1.5TB over a WAN link?

One could argue that the cost would be bourn by everyone who is using
pg_basebackup and not just those users who are transferring 1.5TB over a
WAN link.

That said, pgbackrest always builds a full manifest by scanning all of
the directories, tablespaces, files, etc, and I can't recall anyone ever
complaining about it.  Certainly, failing fast would be better than
failing after a lot of time has been spent.

Thanks!

Stephen

Attachment

Re: pg_basebackup failed to read a file

From
"Joshua D. Drake"
Date:
On 08/14/2018 09:14 AM, Tom Lane wrote:
> Mike Cardwell <mike.cardwell@hardenize.com> writes:
>
> It'd be nice to have a more coherent theory about what needs to be copied
> or not, and not fail on files that could simply be ignored.  Up to now
> we've resisted having any centrally defined knowledge of what can be
> inside a PG data directory, but maybe that bullet needs to be bitten.

This is not the first time, nor the second time this issue has arisen. I 
would think we would know that a coherent theory or at least 
semi-coherent theory would be pretty easy to resolve. Granted, we can't 
reasonably know what is going on under base but under the / of PGDATA, 
we know *exactly* what files should and should not be in there.

JD

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****



Re: pg_basebackup failed to read a file

From
Michael Paquier
Date:
On Tue, Aug 14, 2018 at 12:14:59PM -0400, Tom Lane wrote:
> That seems like a pretty expensive thing to do, if there are lots of
> files ... and you'd still end up failing, so it's not moving the ball
> very far.

Yeah, I would think that with many small relations it is going to have a
measurable performance impact if we scan the whole data directory a
second time.

> More generally, this seems closely related to bug #14999 [1]
> which concerned pg_rewind's behavior in the face of unexpected file
> permissions within the data directory.  We ended up not doing anything
> about that except documenting it, which I wasn't very satisfied with,
> but the costs of doing better seemed to exceed the benefits.

Please feel free to read the end of the thread about details on the
matter.  There are many things you could do, all have drawbacks.

> It'd be nice to have a more coherent theory about what needs to be copied
> or not, and not fail on files that could simply be ignored.  Up to now
> we've resisted having any centrally defined knowledge of what can be
> inside a PG data directory, but maybe that bullet needs to be bitten.

Yeah, I have not really come up with a nice idea yet, especially when
things sometimes move with custom files that some users have been
deploying, so I am not completely sure that we'd need to do something
anyway, nor that it is worth the trouble.  One saner strategy may be to
split your custom file into a directory out of the main data folder...
--
Michael

Attachment