Thread: How to fork pg_dump or psql w/o leaking secrets?
Hi. To administer our PostgreSQL-based system, we have custom tools
to manage the schemas, load data, etc... Including a versatile CLI tool.
But that tool is special purpose, while sometimes we want/need the general
purpose PSQL. But because our ROLE and SCHEMA names are rather long
and unwieldly, it's cumbersome to SET ROLE and SET SEARCH_PATH manually
to setup the session for easy access to the data. Thus I'd like to fork PSQL from
our tool, which requires reconnecting to the server (thus possibly "secrets"),
and I also need to run some PSQL \commands and SQL to config PSQL correctly
for the context our tool was run with (i.e. our tool's own CLI options).
So how would I:
1) "pass secrets" somehow, so they don't leak to the PS command for example. stdin?
2) configure the forked PSQL in terms of SET ROLE, SET SEARCH_PATH, and possible other \commands?
In a very similar vein, I do my own backups using COPY, similar to [pg_dumpbinary][1],
for the same reasons, but using a different "container" (not several files), for several reasons,
but I'd like to reuse `pg_dump --pre/post-data` by also forking it, with the save issue #1 above,
to take care of the DDLs for me, while I handle the DMLs myself.
Does anybody have experience doing something similar?
Any public example anywhere?
Otherwise, any advice on how best to do it?
Thanks, --DD
On Thu, Sep 21, 2023 at 7:46 PM Dominique Devienne <ddevienne@gmail.com> wrote: > > Hi. To administer our PostgreSQL-based system, we have custom tools > to manage the schemas, load data, etc... Including a versatile CLI tool. > > But that tool is special purpose, while sometimes we want/need the general > purpose PSQL. But because our ROLE and SCHEMA names are rather long > and unwieldly, it's cumbersome to SET ROLE and SET SEARCH_PATH manually > to setup the session for easy access to the data. Thus I'd like to fork PSQL from > our tool, which requires reconnecting to the server (thus possibly "secrets"), > and I also need to run some PSQL \commands and SQL to config PSQL correctly > for the context our tool was run with (i.e. our tool's own CLI options). > Isn't .psqlrc a possibile approach? % cat .psqlrc \echo "Loading configuration" set search_path to 'luca'; > So how would I: > 1) "pass secrets" somehow, so they don't leak to the PS command for example. stdin? I don't get what you mean by secrets, but I guess you have to play with a wrapper and pass or tools like that. > 2) configure the forked PSQL in terms of SET ROLE, SET SEARCH_PATH, and possible other \commands? I would use .psqlrc for every user. > > In a very similar vein, I do my own backups using COPY, similar to [pg_dumpbinary][1], > for the same reasons, but using a different "container" (not several files), for several reasons, > but I'd like to reuse `pg_dump --pre/post-data` by also forking it, with the save issue #1 above, > to take care of the DDLs for me, while I handle the DMLs myself. Again, I don't understand the aim, seems you would obfuscate what you are doing to others for, ehm, commercial reasons? However, pg_dump can dump the only schema (--schema-only and friends), as well as data only. I'm not sure pg_dumpbinary can (is it a purpose?). I would not mix and macth the two tools however. Luca
On Fri, Sep 22, 2023 at 10:56 AM Luca Ferrari <fluca1978@gmail.com> wrote:
On Thu, Sep 21, 2023 at 7:46 PM Dominique Devienne <ddevienne@gmail.com> wrote:
> and I also need to run some PSQL \commands and SQL to config PSQL correctly
> for the context our tool was run with (i.e. our tool's own CLI options).
>
Isn't .psqlrc a possibile approach?
% cat .psqlrc
\echo "Loading configuration"
set search_path to 'luca';
I see. I don't want to mess with user's .pqsqlrc, but from
I can use PSQLRC to override the default location.
Or I can use `-f filename` or `-f -` to provide them independently of the .psqlrc,
with or without `--no-psqlrc`.
So that takes care of configuring PSQL, thanks.
> So how would I:
> 1) "pass secrets" somehow, so they don't leak to the PS command for example. stdin?
I don't get what you mean by secrets, but I guess you have to play
with a wrapper and pass or tools like that.
Well, simply the password for example. The user connects to PostgreSQL using our tool,
possibly prompting for the password, if PQconnectionNeedsPassword() returns true.
We translate project names and "persona" into actual ROLE and SCHEMA names, check
permissions, check schema versions, that sort of things. Then "fork" PSQL, which must
reconnect to the same DB for the same USER, but ideally w/o having PSQL prompting
for the password again. It should be seamless. Especially since we have alternate ways
to get the password from, beside prompting and PGSERVICE and PGPASSWORD.
So my question is how I pass the password my tool already own, to the forked PSQL,
w/o that password leaking. I could pass it on the command-line, but that would be leaking
it to the `ps` command (and in various other places).
I guess I could create a temp file with the connection info, and set a PGSERVICE env-var
for PSQL, but then the password in in-clear in a file. Again, that's leaking the password, in a way.
Another alternative would be to read what PSQL outputs, and look for it asking for a password,
and provide it, but that seems brittle, requires `pipe` (not cross-platform).
That's why I'm asking the community how best to the forked PSQL can connect w/o password prompting.
> In a very similar vein, I do my own backups using COPY, similar to [pg_dumpbinary][1],
> for the same reasons, but using a different "container" (not several files), for several reasons,
> but I'd like to reuse `pg_dump --pre/post-data` by also forking it, with the save issue #1 above,
> to take care of the DDLs for me, while I handle the DMLs myself.
Again, I don't understand the aim, seems you would obfuscate what you
are doing to others for, ehm, commercial reasons?
Not really no. It's actually to make the backup more introspectable in fact.
But that's beside the point, no?
However, pg_dump can dump the only schema (--schema-only and friends),
as well as data only. I'm not sure pg_dumpbinary can (is it a
purpose?). I would not mix and macth the two tools however.
pg_dumpbinary is a Perl script that uses pg_dump and pg_restore.
On Fri, Sep 22, 2023 at 12:13 PM Dominique Devienne <ddevienne@gmail.com> wrote: > So my question is how I pass the password my tool already own, to the forked PSQL, > w/o that password leaking. I could pass it on the command-line, but that would be leaking > it to the `ps` command (and in various other places). > I think that forking a beast like psql will make you incurring into a lot of security problems that are worst your "password leak". One solution I could see, could be something like "a poor man captive psql": write a per-user .pgpass file with the password for the database (from your tool), write a .psqlrc configuration for the user and let him to connect to the database. Once done, remove both files and/or restore previous ones. Besides, as far as I understand, you are approaching the problem with a possible wrong design: schemas in PostgreSQL do not provide the amount of isolation it seems you are trying to achieve with your persona-to-role-to-schema wrapper. > That's why I'm asking the community how best to the forked PSQL can connect w/o password prompting. psql and lipq can exploit .pgpass for exactly that aim: not messing around with passwords. Again, I would discourage you to fork psql. Would you be able to maintain the new upcoming versions in the future? >> However, pg_dump can dump the only schema (--schema-only and friends), >> as well as data only. I'm not sure pg_dumpbinary can (is it a >> purpose?). I would not mix and macth the two tools however. > > > pg_dumpbinary is a Perl script that uses pg_dump and pg_restore. Yes, and in fact it does the pre-data and post-data sections, e.g., <https://github.com/lzlabs/pg_dumpbinary/blob/master/pg_dumpbinary#L305>. However, I'm not sure you will be ble to mix and match a pg_dump manually obtined schema (in plain text) with pg_restorebinary. And it is not clear to me, still, the aim of this approach. Luca
On Fri, Sep 22, 2023 at 12:45 PM Luca Ferrari <fluca1978@gmail.com> wrote:
On Fri, Sep 22, 2023 at 12:13 PM Dominique Devienne <ddevienne@gmail.com> wrote:
> So my question is how I pass the password my tool already own, to the forked PSQL,
> w/o that password leaking. I could pass it on the command-line, but that would be leaking
> it to the `ps` command (and in various other places).
I think that forking a beast like psql will make you incurring into a
lot of security problems that are worst your "password leak".
I'm sorry, but this doesn't make sense to me.
I'm talking of replacing using my tool then psql, with using my tool that forks psql.
In both cases these are clients apps that connect to a particular DB, for a given user.
The point if about the best way to not expose the password, if a password is necessary.
One solution I could see, could be something like "a poor man captive
psql": write a per-user .pgpass file with the password for the
database (from your tool), write a .psqlrc configuration for the user
and let him to connect to the database. Once done, remove both files
and/or restore previous ones.
I already told you my app is LIBPQ aware, including about PGPASSWORD and PGSERVICE.
It's users who decide to use these mechanisms (which have plain-text passwords BTW...), not my tool.
The same way PSQL prompts for a password when not using these mechanism, so does my tool.
Besides, as far as I understand, you are approaching the problem with
a possible wrong design: schemas in PostgreSQL do not provide the
amount of isolation it seems you are trying to achieve with your
persona-to-role-to-schema wrapper.
I don't know where you are inferring that...
And it seems beside the point anyway, if it was true (and it isn't).
> That's why I'm asking the community how best to the forked PSQL can connect w/o password prompting.
psql and lipq can exploit .pgpass for exactly that aim: not messing
around with passwords.
You are confusing things. I use PGSERVICE and PGPASSWORD myself.
But other users don't. The tool must work either way. And besides, I already
mentioned we have other custom (encrypted store) ways to get passwords too.
So in that case, the password must be communicated to PSQL anyways, somehow.
And the somehow is exactly the point of my questions here.
Again, I would discourage you to fork psql. Would you be able to
maintain the new upcoming versions in the future?
PSQL is not willy-nilly changing its CLI options or the way it operates.
Plus I bundled PSQL in my package, which is standalone and cross-platforms.
And again, I really don't see what's the harm is forking PSQL versus running it directly,
albeit having to provide credentials again, exactly what I want to avoid.
>> However, pg_dump can dump the only schema (--schema-only and friends),
>> as well as data only. I'm not sure pg_dumpbinary can (is it a
>> purpose?). I would not mix and macth the two tools however.
>
>
> pg_dumpbinary is a Perl script that uses pg_dump and pg_restore.
Yes, and in fact it does the pre-data and post-data sections, e.g.,
<https://github.com/lzlabs/pg_dumpbinary/blob/master/pg_dumpbinary#L305>.
However, I'm not sure you will be ble to mix and match a pg_dump
manually obtined schema (in plain text) with pg_restorebinary.
And it is not clear to me, still, the aim of this approach.
Who says I'd use pg_restorebinary? My tool does custom backup / restore because:
1) it's a mix of a schema (a "project"), and pieces (subset) of another schema (meta-data about the project)
2) it's full of bytea values, some quite large.
3) the backup is not opaque, it's an SQLite DB with a 1-to-1 match in terms of tables and rows with the original schema.
(but not in terms of columns, the COPY BINARY bytes of the row as dumped as-is; I also use the same integer PK if any)
(and in some cases, I even [expose virtual columns][1] in SQLite for "interesting" (i.e. NKs) columns of the COPY BINARY bytes)
The DML parts using COPY BINARY are already working just fine, thank you.
And it yields a 1-file backup per "project" schema (+ meta-data) that you can easily introspect in any SQLite tool.
Most of the rows are opaque, but tables, rows, integer PKs, and even some text NKs and FKs (parent-child)
are visible from SQLite, making it possible to partially restore some subset, or get row stats about the "project", etc...
I even have views which know about the schema structure, to project even more insights about the backup content.
It's so convenient, I wonder why it doesn't exist already. I'm not the only SQLite + PostgreSQL dev, after all, right.
OTOH, I do tend to think out-of-the-box...
Now I want to add capturing the DDLs for the "project" schema at backup time,
so I don't have to pre-create the "project" schema using my own tool, before restoring it.
I want to capture the schema exactly as it was, recreate the schema using the captured DDLs,
and the use my tool as usual to check and possibly upgrade that schema, as necessary.
And instead of re-inventing the wheel (in my tool) to capture DDLs, reusing pg_dump
makes complete sense to me.
My users don't know SQL or PostgreSQL. They ask to backup a project "by name" to my tool,
they won't run pg_dump manually. They wouldn't even know what the schema name is. Or even
how to properly quote it (requires triple dquotes in CMD on Windows, how many people know that!),
much less what pg_dump options to use.
Anyways, I'm getting carried away I'm afraid...
That's not the kind of answers (or questions / comments) I was expecting, from this ML.
Hopefully all is not lost, and someone might answer the as-secure-as-possible password passing question.
Thanks, --DD
On Fri, 22 Sept 2023 at 15:25, Dominique Devienne <ddevienne@gmail.com> wrote: > On Fri, Sep 22, 2023 at 12:45 PM Luca Ferrari <fluca1978@gmail.com> wrote: ... >> I think that forking a beast like psql will make you incurring into a >> lot of security problems that are worst your "password leak". > I'm sorry, but this doesn't make sense to me. > I'm talking of replacing using my tool then psql, with using my tool that forks psql. > In both cases these are clients apps that connect to a particular DB, for a given user. It appears you use fork to mean "fork & exec", the classic way to start a slave ( controlled? helper ? I do not know the modern PC word for that ) process on *ix, while Luca is thinking on a source fork, the thing which is normally done in git and friends to have a similar but derived software built. Francisco Olarte.
On Friday, September 22, 2023, Luca Ferrari <fluca1978@gmail.com> wrote:
> That's why I'm asking the community how best to the forked PSQL can connect w/o password prompting.
psql and lipq can exploit .pgpass for exactly that aim: not messing
around with passwords.
Again, I would discourage you to fork psql. Would you be able to
maintain the new upcoming versions in the future?
The OP seems to be used the term fork in a process sense, not forking the source code. Process execution from within another program. “Launch” or “execute” psql would be a better choice of wording here.
David J.
On Fri, Sep 22, 2023 at 3:24 PM Dominique Devienne <ddevienne@gmail.com> wrote: > I'm talking of replacing using my tool then psql, with using my tool that forks psql. I thought you were talking about forking the source code to write your own psql, sorry about that. It is clear to me that you mean Unix fork(2) and exec(2). > The point if about the best way to not expose the password, if a password is necessary. As I understand the point, which is apparently really hard for me, one way is using .pgpass (for example adding/removing lines from your tools, that is effectively a psql wrapper in this sense). > > I already told you my app is LIBPQ aware, including about PGPASSWORD and PGSERVICE. > It's users who decide to use these mechanisms (which have plain-text passwords BTW...), not my tool. It cannot be your tool, it is the database that asks for a password if needed (i.e., pg_hba.conf). Therefore, if your tool is PGPASSWORD aware, it can handle dynamic lines in .pgpass, which apparently is the simplest and most secure way to avoid other users spoofing on passwords using ps(1) and friends. Or pass the environment variable as it is: you are already risking at the entry point of your tool. > So in that case, the password must be communicated to PSQL anyways, somehow. > And the somehow is exactly the point of my questions here. The options are what you already have mentioned: environment variables, stdin, and .pgpass. I'm not aware of any other method, so I thjnk the decision is up to you. > PSQL is not willy-nilly changing its CLI options or the way it operates. > Plus I bundled PSQL in my package, which is standalone and cross-platforms. > And again, I really don't see what's the harm is forking PSQL versus running it directly, I was referring to source code forking, not fork(2). > And instead of re-inventing the wheel (in my tool) to capture DDLs, reusing pg_dump > makes complete sense to me. > You added information I was not aware before: you are dumping PostgreSQL to restore it into SQLite, while I was thinking you wanted to do some stuff with a PostgreSQL-to-PostgreSQL backup and restore. That is why I was suggesting to stick with a tool and not to mix and match. Apparently you have already what you need, or quite frankly I'm not getting your actual need. > That's not the kind of answers (or questions / comments) I was expecting, from this ML. Sorry about that, hope you will find the correct answer. Luca
On Fri, Sep 22, 2023 at 5:19 PM Luca Ferrari <fluca1978@gmail.com> wrote:
On Fri, Sep 22, 2023 at 3:24 PM Dominique Devienne <ddevienne@gmail.com> wrote:
You added information I was not aware before: you are dumping
PostgreSQL to restore it into SQLite, while I was thinking you wanted
to do some stuff with a PostgreSQL-to-PostgreSQL backup and restore.
I must be really bad at explaining things today, sorry...
Because of course I'm doing PostgreSQL-to-PostgreSQL backup/restore.
The fact I use a custom SQLite DB file as the backup's "format" instead of
a "pile-of-files" (or a TAR of that pile), is the only difference.
That, and the fact it's a backup that mixes one full schema with part of another.
And that my SQLite backup format is much more convenient and "expressive" IMHO,
since fully introspectable ("semi-opaque", since rows are still COPY BINARY encoded).
Writing the backup as an SQLite DB incurs a little overhead, of course,
compared to just appending to per-table files, but not that much when
in non-transactional mode. Acceptable tradeoff compared to the fact one
can open the backup easily and see the tables and rows almost as-is,
which also opens the door to partial restores (there are separates "roots",
independent entity trees, in these schemas).
Seems to me that your tool could set the env var that you want.
If you don't export it, I think it shouldn't "leak" but your child process should get it as part of their environment.
On Fri, Sep 22, 2023 at 12:43 PM Dominique Devienne <ddevienne@gmail.com> wrote:
On Fri, Sep 22, 2023 at 5:19 PM Luca Ferrari <fluca1978@gmail.com> wrote:On Fri, Sep 22, 2023 at 3:24 PM Dominique Devienne <ddevienne@gmail.com> wrote:
You added information I was not aware before: you are dumping
PostgreSQL to restore it into SQLite, while I was thinking you wanted
to do some stuff with a PostgreSQL-to-PostgreSQL backup and restore.I must be really bad at explaining things today, sorry...Because of course I'm doing PostgreSQL-to-PostgreSQL backup/restore.The fact I use a custom SQLite DB file as the backup's "format" instead ofa "pile-of-files" (or a TAR of that pile), is the only difference.That, and the fact it's a backup that mixes one full schema with part of another.And that my SQLite backup format is much more convenient and "expressive" IMHO,since fully introspectable ("semi-opaque", since rows are still COPY BINARY encoded).Writing the backup as an SQLite DB incurs a little overhead, of course,compared to just appending to per-table files, but not that much whenin non-transactional mode. Acceptable tradeoff compared to the fact onecan open the backup easily and see the tables and rows almost as-is,which also opens the door to partial restores (there are separates "roots",independent entity trees, in these schemas).
On Friday, September 22, 2023, Dominique Devienne <ddevienne@gmail.com> wrote:
I already told you my app is LIBPQ aware, including about PGPASSWORD and PGSERVICE.It's users who decide to use these mechanisms (which have plain-text passwords BTW...), not my tool.The same way PSQL prompts for a password when not using these mechanism, so does my tool.
Once you have the password you should utilize the PGPASSWORD environment variable to get it passed to psql. It doesn’t matter in the least how you obtained that password in the first place.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > Once you have the password you should utilize the PGPASSWORD environment > variable to get it passed to psql. It doesn’t matter in the least how you > obtained that password in the first place. Keep in mind that on many flavors of Unix, a process's environment variables can readily be inspected by other processes. You should check your platform carefully before assuming that PGPASSWORD is a safe way to pass down a secret. regards, tom lane
On Fri, Sep 22, 2023 at 8:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Once you have the password you should utilize the PGPASSWORD environment
> variable to get it passed to psql. It doesn’t matter in the least how you
> obtained that password in the first place.
Keep in mind that on many flavors of Unix, a process's environment
variables can readily be inspected by other processes. You should
check your platform carefully before assuming that PGPASSWORD is
a safe way to pass down a secret.
> Use of this environment variable is not recommended for security reasons,
> as some operating systems allow non-root users to see process environment
> variables via ps; instead consider using a password file (see Section 34.16).
but I'm not a fan of creating a temporary file either, with the password in plain text...
but I'm not a fan of creating a temporary file either, with the password in plain text...
Remember that I'm already connected in the "parent" process, to the DB.
There aught to be a way to obtain a token from the DB via a connection,
with a short duration, to supply to the exec'd PostgreSQL tools like psql or pg_dump,
to completely bypass passwords. The server would maintain per-DB secrets,
and sign a JWT token for example, valid for a few seconds, for that user/DB pair,
that the parent "process" could then utilize / pass to the "fork/exec"d tool.
Much safer than plain-text passwords floating around env-vars or temp-files. --DD
On Friday, September 22, 2023, Dominique Devienne <ddevienne@gmail.com> wrote:
Remember that I'm already connected in the "parent" process, to the DB.There aught to be a way to obtain a token from the DB via a connection,with a short duration, to supply to the exec'd PostgreSQL tools like psql or pg_dump,to completely bypass passwords. The server would maintain per-DB secrets,and sign a JWT token for example, valid for a few seconds, for that user/DB pair,that the parent "process" could then utilize / pass to the "fork/exec"d tool.Much safer than plain-text passwords floating around env-vars or temp-files. --DD
Sure, though maybe just some kind of “—password-on-stdin” option and then the next input read from stdin is interpreted as the password, would be more readily accomplished. Scripts should be sent via “—file” in that usage but that seems desirable anyway.
David J.