Re: Directory/File Access Permissions for COPY and Generic File Access Functions - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Directory/File Access Permissions for COPY and Generic File Access Functions
Date
Msg-id 20141029195849.GZ28859@tamriel.snowman.net
Whole thread Raw
In response to Re: Directory/File Access Permissions for COPY and Generic File Access Functions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Directory/File Access Permissions for COPY and Generic File Access Functions
List pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > This "ad-hoc data load for Joe" use-case isn't where I had been going
> > with this feature, and I do trust the ETL processes that are behind the
> > use-case that I've proposed for the most part, but there's also no
> > reason for those files to be symlinks or have hard-links or have
> > subdirectories beyond those that I've specifically set up, and having
> > those protections seems, to me at least, like they'd be a good idea to
> > have, just in case.
>
> If your ETL process can be restricted that much, can't it use file_fdw or
> some such to access a fixed filename set by somebody with more privilege?

We currently have the ETL figure out what the filename is on a daily
basis and by contrasting where it "should" be against what has been
loaded thus far (which is tracked in tables in the DB) we can figure out
what need to be loaded.  To do what you're suggesting we'd have to write
a pl/pgsql function to do the same which runs as a superuser- not ideal,
but it would be possible.

> Why exactly does it need freedom to specify a filename but not a directory
> path?

Because the file names change every day for daily processes, and there
can be cases (such as the system being backlogged or down for a day or
two) where it'd need to go back a few days in time.  This isn't
abnormal- I've run into exactly these cases a few times.  The Hadoop
system dumps the files out on the NFS server and the PG side sucks them
in.  The directories are part of the API which is defined between the
Hadoop team and the PG team, along with the file names, file formats,
etc.  These can go in either direction too, of course, Hadoop -> PG or
PG -> Hadoop, though each direction is always in a different directory
in my experience (as it's just sane to set things up that way), though I
suppose they wouldn't absolutely have to be.

> As for the DBA-access set of use cases, ISTM that most real-world needs
> for this sort of functionality are inherently a bit ad-hoc, and therefore
> once you've locked it down tightly enough that it's credibly not
> exploitable, it's not really going to be as useful as all that.  The
> nature of an admin job is dealing with unforeseen cases.

I agree that for the DBA-access set of use-cases (ad-hoc data loads,
etc), having a role attribute would be sufficient.  Note that this
doesn't cover the auditor role and log file access use-case that we've
been discussing though as auditors shouldn't have write access to the
system.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Directory/File Access Permissions for COPY and Generic File Access Functions
Next
From: Petr Jelinek
Date:
Subject: Re: Add shutdown_at_recovery_target option to recovery.conf