Re: pgsql: Add pg_alterckey utility to change the cluster key - Mailing list pgsql-committers

From Fabien COELHO
Subject Re: pgsql: Add pg_alterckey utility to change the cluster key
Date
Msg-id alpine.DEB.2.22.394.2012280907150.2094581@pseudo
Whole thread Raw
In response to pgsql: Add pg_alterckey utility to change the cluster key  (Bruce Momjian <bruce@momjian.us>)
List pgsql-committers
Hello Bruce,

I put the thread back on hackers.

>> The first two keys are stored in pg_cryptokeys/ in the data directory,
>> while the third one is retrieved using a GUC for validation at server
>> startup for the other two.

>> Do we necessarily have to store the first level keys within the data 
>> directory?  I guess that this choice has been made for performance, but 
>> is that really something that a user would want all the time?  AES256 
>> is the only option available for the data keys.  What if somebody wants 
>> to roll in their own encryption?
>
> To clarify, we encrypt the data keys using AES256, but the data keys
> themselves can be 128, 192, or 256 bits.
>
>> Companies can have many requirements in terms of accepting the use of
>> one option or another.
>
> I think ultimately we will need three commands to control the keys.
> First, there is the cluster_key_command, which we have now.  Second, I
> think we will need an optional command which returns random bytes ---
> this would allow users to get random bytes from a different source than
> that used by the server code.
>
> Third, we will probably need a command that returns the data encryption
> keys directly, either heap/index or WAL keys, probably based on key
> number --- you pass the key number you want, and the command returns the
> data key.  There would not be a cluster key in this case, but the
> command could still prompt the user for perhaps a password to the KMS
> server. It could not be used if any of the previous two commands are
> used. I assume an HMAC would still be stored in the pg_cryptokeys
> directory to check that the right key has been returned.

Yep, my point is that it should be possible to have the whole key 
management outside of postgres.

This said, postgres should provide a reasonable default implementation, 
obviously, preferably by using the provided mechanism (*NOT* a direct 
internal implementation and a possible switch to something else, IMHO, 
because then it would not be tested for whether it provides the right 
level of usability).

I agree that keys need to be identified. I somehow disagree with the 
naming of the script and the implied usage.

ISTM that there could be an external interface:

  - to initialize something. It may start a suid process, it may connect to 
a remote host, it may ask for a master password, who knows?

    /path/to/init --options arguments…

the init process would return something which would be reused later on, eg 
an authentication token, or maybe a path to a socket for communication, or 
a file which contains something, or even a master/cluster key, but not 
necessarily. It may be anything. How it is passed to the next 
process/connection is an open question. Maybe on its stdin?

  - to start a process (?) which provide keys, either created (new) or 
existing (get), and possibly destroy them (or not?). The init process 
result should/could be passed somehow to this process, which may be suid 
something else. Another option would be to rely on some IPC mechanism.
I'm not sure what the best choice is.

ISTM that this process/connection could/should be persistent, with a 
simplistic text or binary based client/server interface. What this 
process/connection does it beyond postgres. In my mind, it could implement 
getting random data as well. I'd suggest that under no circumstances 
should the postgres process create cryptographic keys, although it should 
probably name them with some predefine length limit.

    /path/to/run --options arguments…

Then there should be an postgres internal interface to store the results 
for local processing, retrieve them when needed, and so on, ok.

ISTM that there should also be an internal interface to load the 
cryptographic primitives. Possibly a so/dll would do, or maybe just an 
extension mechanism which would provide the necessary functions, but this 
raise the issue of bootstraping, so maybe not so great an idea. The 
functions should probably be able to implement a counter mode, so that 
actual keys depend on the page position in file position, but what is 
really does is not postgres concern.

A cryptographic concern for me is whether it would be possible to have 
authentication/integrity checks associated to each page. This means having 
the ability to reserve some space somewhere, possibly 8-16 bytes, in a 
page. Different algorithm could have different space requirements.

The same interface should be used by other back-end commands (pg_upgrade, 
whatever).

Somehow, the design should be abstract, without implying much, so that 
very different interfaces could be provided in term of whether there 
exists a master key, how keys are derived, what key sizes are, what 
algorithms are used, and so on. Postgres itself should not store keys, 
only key identifiers.

I'm wondering whether replication should be able to work without some/all 
keys, so that a streaming replication could be implemented without the 
remote host being fully aware of the cryptographic keys.

Another functional point is to allow changing the underlying key for a 
file, and discuss how this could work with the interface, as I noted that 
it was a desired feature. I'd suggest that maybe this should be based on 
changing the name of the "key", so that the external key management would 
not need to know about it. How to achieve that as a transaction is an open 
question. Maybe it should be an change outside of postgres, which modifies 
files at the cluster level with the database stopped.

> I thought we should implement the first command, because it will
> probably be the most common and easiest to use, and then see what people
> want added.

I somehow disagree: I think that pg should provide from the start the full 
generic interface, *and* a reasonable implementation which is what the 
current proposal does, fine with me. A simplistic test-oriented interface 
could be implemented in a scripting language. I think that great care must 
be put upfront in the overall design, so that it can be reused later on by 
people with pretty different requirements (in term of auditors, legal 
constraints, functions, whatever). I would like to avoid providing an 
half-baked design which suits some use-cases but cannot be used for 
others, because of key design choices. From a number of line of code point 
of view, it may not change much, really, this is more about design and 
putting functionalities in the right places.

Now I intend to give some time to review patches with this in mind. Maybe 
I'll have some time at the end of the next CF, or the next.

-- 
Fabien.

pgsql-committers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: pgsql: Add pg_alterckey utility to change the cluster key
Next
From: Tom Lane
Date:
Subject: pgsql: Fix thinko in plpgsql memory leak fix.