Re: Allow replication roles to use file access functions - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Allow replication roles to use file access functions
Date
Msg-id 20150903022036.GA3685@tamriel.snowman.net
Whole thread Raw
In response to Re: Allow replication roles to use file access functions  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Allow replication roles to use file access functions  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Thu, Sep 3, 2015 at 10:05 AM, Stephen Frost wrote:
> > * Michael Paquier (michael.paquier@gmail.com) wrote:
> >> On Thu, Sep 3, 2015 at 4:52 AM, Alvaro Herrera wrote:
> > The replication role already has read-only access to everything
> > (nearly?) in the PGDATA directory.  The specific issue in this case, I
> > believe, is if the functions mentioned provide access beyond what the
> > replication role already has access to.
>
> It seems to me that this difference of files is symbolized by what is
> filtered out in a base backup, like postmaster.opts or postmaster.pid.

Neither of those strike me as particularly sensitive information..

> The matter here is: are we going to have in the future files that we
> do not want to include in a base backup that are superuser-only? Today
> I cannot recall we have any of that, we may have in the future though.

This also seems unliekly.

> >> I don't entirely understand why pg_rewind should be invoking any of these
> >> functions to begin with.  Isn't it working directly with the disk data,
> >> with both postmasters shut down?
> >
> > Certain information is needed from the new master to rewind the old
> > master.  What I would hope is that we'd provide a way for *just* that
> > information to be made available to the replication role rather than
> > completely general access through the functions mentioned.
>
> The rewound node does not only need the diffs of blocks from a source,
> but also some more information like clog. There are actually three
> different approaches to solve that:

Ok, so CLOG information is also required, but that's not a particularly
large issue and the replication role has access to those files already,
no?  It's just inconvenient to access them using the current protocol,
if that's all you're interested in.

> 1) Use a differential backup to me, or the possibility to fetch a set
> of data block diffs from a source node using an LSN and then re-apply
> them on the target node. The major disadvantage of this approach is
> actually performance: we would need to scan the source node
> completely, so that's slow. And pg_rewind is fast because it only
> scans the WAL records of the node to be rewound from the last
> checkpoint before WAL forked.

I don't follow this performance concern at all.  Why can't pg_rewind
look through the WAL and find what it needs, and then request exactly
that information?  Clearly, we would need to add to the existing
replication protocol, but I don't see any reason to not consider that a
perfectly reasonable approach for this.

> 2) Request data blocks from the source node using the replication
> protocol, that's what pg_read_binary_file actually does, we just don't
> have the logic on replication protocol side, though we could.

Right, we would need to modify the replication protocol to allow such
requests, but that's not particularly difficult.

> 3) Create a new set of functions similar to the existing file access
> functions that are usable by the replication user, except that they
> refuse to return file entries that match the existing filters in
> basebackup.c. That is doable, with a routine in basebackup.c that
> decides if a given file string can be read or not. Base backups could
> use this routine as well.

I don't particularly like this approach as it implies SQL access for the
replication role.

> >> > Another problem with it is that access to the filesystem is about halfway
> >> > to being superuser anyway, even if it's only read-only access.  It would
> >> > certainly let you read things that one would not expect a replication
> >> > user to be able to read (like files unrelated to Postgres).
> >>
> >> Doesn't pg_read_file et al insist that the path is below the data
> >> directorY?
> >
> > I don't believe that's the case, actually..  I might be wrong though,
> > I'm not looking at the code right now.
>
> The use of PGDATA is enforced

We know that there are a lot of potential risks around enforcing this,
or similar, requirements.  Right now, we don't have to worry about any
of those corner cases, but we would if we used this approach.  If,
instead, the replication protocol is modified to accept requests for the
specific information (CLOG xxxx), we could be sure to only ever return
exactly that information from the current cluster, or throw an error.

> > This is definitely a big part of
> > the question, but I'd like to ask- what, exactly, does pg_rewind
> > actually need access to? Is it only the WAL, or are heap and WAL files
> > needed?
>
> Not only, +clog, configuration files, etc.

Configuration files?  Perhaps you could elaborate?

> > Consider the discussion about delta backups, et al, using LSNs.  Perhaps
> > the replication protocol should be extended to allow access to arbitrary
> > WAL, querying what WAL is available, and access to the heap files,
> > perhaps even specific pages in the heap files and relation forks,
> > instead of giving pg_rewind access to these extremely general
> > nearly-OS-user-level functions.
>
> The problem when using differential backups in this case is
> performance as mentioned above. We would need to scan the whole target
> cluster, which may take time, the current approach of pg_rewind only
> needs to scan WAL records to find the list of blocks modified, and
> directly requests them from the source. I would expect pg_rewind to be
> as quick as possible.

I don't follow why the current approach of pg_rewind would have to
change.  All I'm suggesting is that we have a different way, one which
is much more restricted, for pg_rewind to request exactly the
information it needs for performant operation.

> > Further, for my two cents, I view any access to the system by a
> > replication role to a non-replication-protocol communication with the
> > server as being inappropriate.  To that point, I'd actually remove the
> > ability for the replication roles to call pg_start/stop_backup as normal
> > SQL calls since that implies the replication user has connected to a
> > normal backend instead of talking the replication protocol.  I'm not
> > sure if we can really put that cat back in the bag, but I'd be a lot
> > happier if we could.  Allowing access to these functions outside of the
> > replication protocol doubles-down on that bad idea, in my view.
>
> That would break many existing applications, but I definitely see the
> rightful concern here: basically replication roles should just use the
> replication protocol and have no access to directly to SQL.

Indeed.
Thanks!
    Stephen

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Allow replication roles to use file access functions
Next
From: Michael Paquier
Date:
Subject: Re: Horizontal scalability/sharding