Thread: Complete data erasure

Complete data erasure

From
"asaba.takanori@fujitsu.com"
Date:
Hello hackers,

I want to add the feature to erase data so that it cannot be restored
because it prevents attackers from stealing data from released data area.

- Background
International security policies require that above threat is taken measures.
It is "Base Protection Profile for Database Management Systems Version 2.12 (DBMS PP)" [1] based on iso 15408.
If the security is improved, it will be more likely to be adopted by security-conscious procurers such as public
agencies.

- Feature
This feature erases data area just before it is returned to the OS (“erase” means that overwrite data area to hide its
contentshere)  
because there is a risk that the data will be restored by attackers if it is returned to the OS without being
overwritten.
The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
I want users to be able to customize the erasure method for their security policies.

- Implementation
My idea is adding a new parameter erase_command to postgresql.conf.
The command that users set in this parameter is executed just before unlink(path) or ftruncate(fd, 0) is called.
For example, the command is shred on Linux and SDelete on Windows.

When erase_command is set, VACUUM does not truncate a file size to non-zero
because it's safer for users to return the entire file to the OS than to return part of it.
Also, there is no standard tool that overwrites part of a file.
With the above specifications, users can easily and safely use this feature using standard tool that overwrites entire
filelike shred. 

Hope to hear your feedback and comments.

[1] https://www.commoncriteriaportal.org/files/ppfiles/pp0088V2b_pdf.pdf
P44 8.1.2

- Threat/Policy
A threat agent may use or manage TSF, bypassing the protection mechanisms of the TSF.

- TOE Security Objectives Addressing the Threat/Policy
The TOE will ensure that any information contained in a protected resource within its Scope of Control
is not inappropriately disclosed when the resource is reallocated.

- Rationale
diminishes this threat by ensuring that TSF data and user data is not persistent
when resources are released by one user/process and allocated to another user/process.

TOE: Target of Evaluation
TSF: TOE Security Functionality


Regards

--
Takanori Asaba




Re: Complete data erasure

From
Kyotaro Horiguchi
Date:
Hello, Asaba-san.

At Wed, 15 Jan 2020 01:31:44 +0000, "asaba.takanori@fujitsu.com" <asaba.takanori@fujitsu.com> wrote in
> Hello hackers,
>
> I want to add the feature to erase data so that it cannot be restored
> because it prevents attackers from stealing data from released data area.
>
> - Background
> International security policies require that above threat is taken measures.
> It is "Base Protection Profile for Database Management Systems Version 2.12 (DBMS PP)" [1] based on iso 15408.
> If the security is improved, it will be more likely to be adopted by security-conscious procurers such as public
agencies.
>
> - Feature
> This feature erases data area just before it is returned to the OS (-Y´erase¡ means that overwrite data area to hide
itscontents here) -A 
> because there is a risk that the data will be restored by attackers if it is returned to the OS without being
overwritten.
> The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
> I want users to be able to customize the erasure method for their security policies.

shred(1) or wipe(1) doesn't seem to contribute to the objective on
journaled or copy-on-write file systems. I'm not sure, but maybe the
same can be true for read-modify-write devices like SSD. I'm not sure
about SDelete, but anyway replacing unlink() with something like
'system("shred")' leads to siginificant performance degradation.

man 1 wipe says (https://linux.die.net/man/1/wipe) : (shred has a
similar note.)

> NOTE ABOUT JOURNALING FILESYSTEMS AND SOME RECOMMENDATIONS (JUNE 2004)
> Journaling filesystems (such as Ext3 or ReiserFS) are now being used
> by default by most Linux distributions. No secure deletion program
> that does filesystem-level calls can sanitize files on such
> filesystems, because sensitive data and metadata can be written to the
> journal, which cannot be readily accessed. Per-file secure deletion is
> better implemented in the operating system.


WAL files contain copies of such sensitive information, which is not
covered by the proposal. Also temporary files are not.  If the system
doesn't want not to be recoverable after corruption, it must copy such
WAL files to archive.

Currently there's a discussion on transparent data encyryption
covering the all of the above cases on and off of this mailing list.
It is different from device-level encryption mentioned in the man
page.  Doesn't that fit the requirement?

https://www.postgresql.org/message-id/CALS%2BJ3-57cL%3Djz_eT9uxiLa8CAh5BE3-HcQvXQBz0ScMjag4Zg%40mail.gmail.com


regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Complete data erasure

From
Tom Lane
Date:
"asaba.takanori@fujitsu.com" <asaba.takanori@fujitsu.com> writes:
> I want to add the feature to erase data so that it cannot be restored 
> because it prevents attackers from stealing data from released data area.

I think this is fairly pointless, unfortunately.

Dropping or truncating tables is as much as we can do without making
unwarranted assumptions about the filesystem's behavior.  You say
you want to zero out the files first, but what will that accomplish
on copy-on-write filesystems?

Even granting that zeroing our storage files is worth something,
it's not worth much if there are more copies of the data elsewhere.
And any well-run database is going to have backups, or standby servers,
or both.  There's no way for the primary node to force standbys to erase
themselves (and it'd be a different sort of security hazard if there
were).

Also to the point: if your assumption is that an attacker has access
to the storage medium at a sufficiently low level that they can examine
previously-deleted files, what's stopping them from reading the data
*before* it's deleted?

So I think doing anything useful in this area is a bit outside
Postgres' remit.  You'd need to be thinking at an operating-system
or hardware level.

            regards, tom lane



Re: Complete data erasure

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> "asaba.takanori@fujitsu.com" <asaba.takanori@fujitsu.com> writes:
> > I want to add the feature to erase data so that it cannot be restored
> > because it prevents attackers from stealing data from released data area.
>
> I think this is fairly pointless, unfortunately.

I disagree- it's a feature that's been asked for multiple times and does
have value in some situations.

> Dropping or truncating tables is as much as we can do without making
> unwarranted assumptions about the filesystem's behavior.  You say
> you want to zero out the files first, but what will that accomplish
> on copy-on-write filesystems?

What about filesystems which are not copy-on-write though?

> Even granting that zeroing our storage files is worth something,
> it's not worth much if there are more copies of the data elsewhere.

Backups are not our responsibility to worry about, or, at least, it'd be
an independent feature if we wanted to add something like this to
pg_basebackup, and not something the initial feature would need to worry
about.

> And any well-run database is going to have backups, or standby servers,
> or both.  There's no way for the primary node to force standbys to erase
> themselves (and it'd be a different sort of security hazard if there
> were).

A user can't "force" PG to do anything more than we can "force" a
replica to do something, but a user can issue a request to a primary and
that primary can then pass that request along to the replica as part of
the WAL stream.

> Also to the point: if your assumption is that an attacker has access
> to the storage medium at a sufficiently low level that they can examine
> previously-deleted files, what's stopping them from reading the data
> *before* it's deleted?

This argument certainly doesn't make any sense- who said they had access
to the storage medium at a time before the files were deleted?  What if
they only had access after the files were zero'd?  When you consider the
lifetime of a storage medium, it's certainly a great deal longer than
the length of time that a given piece of sensitive data might reside on
it.

> So I think doing anything useful in this area is a bit outside
> Postgres' remit.  You'd need to be thinking at an operating-system
> or hardware level.

I disagree entirely.  If the operating system and hardware level provide
a way for this to work, which is actually rather common when you
consider that ext4 is an awful popular filesystem where this would work
just fine with nearly all traditional hardware underneath it, then we're
just blocking enabling this capability for no justifiably reason.

Thanks,

Stephen

Attachment

Re: Complete data erasure

From
Tomas Vondra
Date:
On Wed, Jan 15, 2020 at 10:23:22AM -0500, Stephen Frost wrote:
>Greetings,
>
>* Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> "asaba.takanori@fujitsu.com" <asaba.takanori@fujitsu.com> writes:
>> > I want to add the feature to erase data so that it cannot be restored
>> > because it prevents attackers from stealing data from released data area.
>>
>> I think this is fairly pointless, unfortunately.
>
>I disagree- it's a feature that's been asked for multiple times and does
>have value in some situations.
>
>> Dropping or truncating tables is as much as we can do without making
>> unwarranted assumptions about the filesystem's behavior.  You say
>> you want to zero out the files first, but what will that accomplish
>> on copy-on-write filesystems?
>
>What about filesystems which are not copy-on-write though?
>
>> Even granting that zeroing our storage files is worth something,
>> it's not worth much if there are more copies of the data elsewhere.
>
>Backups are not our responsibility to worry about, or, at least, it'd be
>an independent feature if we wanted to add something like this to
>pg_basebackup, and not something the initial feature would need to worry
>about.
>
>> And any well-run database is going to have backups, or standby servers,
>> or both.  There's no way for the primary node to force standbys to erase
>> themselves (and it'd be a different sort of security hazard if there
>> were).
>
>A user can't "force" PG to do anything more than we can "force" a
>replica to do something, but a user can issue a request to a primary and
>that primary can then pass that request along to the replica as part of
>the WAL stream.
>
>> Also to the point: if your assumption is that an attacker has access
>> to the storage medium at a sufficiently low level that they can examine
>> previously-deleted files, what's stopping them from reading the data
>> *before* it's deleted?
>
>This argument certainly doesn't make any sense- who said they had access
>to the storage medium at a time before the files were deleted?  What if
>they only had access after the files were zero'd?  When you consider the
>lifetime of a storage medium, it's certainly a great deal longer than
>the length of time that a given piece of sensitive data might reside on
>it.
>
>> So I think doing anything useful in this area is a bit outside
>> Postgres' remit.  You'd need to be thinking at an operating-system
>> or hardware level.
>
>I disagree entirely.  If the operating system and hardware level provide
>a way for this to work, which is actually rather common when you
>consider that ext4 is an awful popular filesystem where this would work
>just fine with nearly all traditional hardware underneath it, then we're
>just blocking enabling this capability for no justifiably reason.
>

Not sure. I agree the goal (securely discarding data) is certainly
worthwile, although I suspect it's just of many things you'd need to
care about. That is, there's probably a million other things you'd need
to worry about (logs, WAL, CSV files, temp files, ...), so with actually
sensitive data I'd expect people to just dump/load the data into a clean
system and rebuild the old one (zero drives, ...).

But let's assume it makes sense - is this really the right solution? I
think what I'd prefer is encryption + rotation of the keys. Which should
work properly even on COW filesystems, the performance impact is kinda
low and amortized etc. Of course, we're discussing built-in encryption
for quite a bit of time.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Complete data erasure

From
Stephen Frost
Date:
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On Wed, Jan 15, 2020 at 10:23:22AM -0500, Stephen Frost wrote:
> >I disagree entirely.  If the operating system and hardware level provide
> >a way for this to work, which is actually rather common when you
> >consider that ext4 is an awful popular filesystem where this would work
> >just fine with nearly all traditional hardware underneath it, then we're
> >just blocking enabling this capability for no justifiably reason.
>
> Not sure. I agree the goal (securely discarding data) is certainly
> worthwile, although I suspect it's just of many things you'd need to
> care about. That is, there's probably a million other things you'd need
> to worry about (logs, WAL, CSV files, temp files, ...), so with actually
> sensitive data I'd expect people to just dump/load the data into a clean
> system and rebuild the old one (zero drives, ...).

Of course there's other things that one would need to worry about, but
saying this isn't useful because PG isn't also scanning your entire
infrastructure for CSV files that have this sensitive information isn't
a sensible argument.

I agree that there are different levels of sensitive data- and for many,
many cases what is being proposed here will work just fine, even if that
level of sensitive data isn't considered "actually sensitive data" by
other people.

> But let's assume it makes sense - is this really the right solution? I
> think what I'd prefer is encryption + rotation of the keys. Which should
> work properly even on COW filesystems, the performance impact is kinda
> low and amortized etc. Of course, we're discussing built-in encryption
> for quite a bit of time.

In some cases that may make sense as an alternative, in other situations
it doesn't.  In other words, I disagree that what you're proposing is
just a different implementation of the same thing (which I'd be happy to
discuss), it's an entirely different feature with different trade-offs.

I'm certainly on-board with the idea of having table level and column
level encryption to facilitate an approach like this, but I disagree
strongly that we shouldn't have this simple "just overwrite the data a
few times" because we might, one day, have useful in-core encryption or
even that, even if we had it, that we should tell everyone to use that
instead of providing this capability.  I don't uninstall 'shread' when I
install 'gpg' on my system.

Thanks,

Stephen

Attachment

Re: Complete data erasure

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> But let's assume it makes sense - is this really the right solution? I
> think what I'd prefer is encryption + rotation of the keys. Which should
> work properly even on COW filesystems, the performance impact is kinda
> low and amortized etc. Of course, we're discussing built-in encryption
> for quite a bit of time.

Yeah, it seems to me that encrypted storage would solve strictly more
cases than this proposal does, and it has fewer foot-guns.

One foot-gun that had been vaguely bothering me yesterday, but the point
didn't quite crystallize till just now, is that the only place that we
could implement such file zeroing is post-commit in a transaction that
has done DROP TABLE or TRUNCATE.  Of course post-commit is an absolutely
horrid place to be doing anything that could fail, since there's no very
good way to recover from an error.  It's an even worse place to be doing
anything that could take a long time --- if the user loses patience and
kills the session, now your problems are multiplied.

Right now our risks in that area are confined to leaking files if
unlink() fails, which isn't great but it isn't catastrophic either.
With this proposal, erroring out post-commit becomes a security
failure, if it happens anywhere before we've finished a possibly
large amount of zero-writing.  I don't want to go there.

            regards, tom lane



Re: Complete data erasure

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > But let's assume it makes sense - is this really the right solution? I
> > think what I'd prefer is encryption + rotation of the keys. Which should
> > work properly even on COW filesystems, the performance impact is kinda
> > low and amortized etc. Of course, we're discussing built-in encryption
> > for quite a bit of time.
>
> Yeah, it seems to me that encrypted storage would solve strictly more
> cases than this proposal does, and it has fewer foot-guns.

I still view that as strictly a different solution and one that
certainly won't fit in all cases, not to mention that it's a great deal
more complicated and we're certainly no where near close to having it.

> One foot-gun that had been vaguely bothering me yesterday, but the point
> didn't quite crystallize till just now, is that the only place that we
> could implement such file zeroing is post-commit in a transaction that
> has done DROP TABLE or TRUNCATE.  Of course post-commit is an absolutely
> horrid place to be doing anything that could fail, since there's no very
> good way to recover from an error.  It's an even worse place to be doing
> anything that could take a long time --- if the user loses patience and
> kills the session, now your problems are multiplied.

This is presuming that we make this feature something that can be run in
a transaction and rolled back.  I don't think there's been any specific
expression that there is such a requirement, and you bring up good
points that show why providing that functionality would be particularly
challenging, but that isn't really an argument against this feature in
general.

Thanks,

Stephen

Attachment

RE: Complete data erasure

From
"asaba.takanori@fujitsu.com"
Date:
Hello, Horiguchi-san

Thank you for comment.

At Wed, 15 Jan 2020 03:46 +0000, "Kyotaro Horiguchi "<horikyota.ntt@gmail.com> wrote in
> shred(1) or wipe(1) doesn't seem to contribute to the objective on
> journaled or copy-on-write file systems. I'm not sure, but maybe the
> same can be true for read-modify-write devices like SSD. I'm not sure
> about SDelete, but anyway replacing unlink() with something like
> 'system("shred")' leads to siginificant performance degradation.
> 
> man 1 wipe says (https://linux.die.net/man/1/wipe) : (shred has a
> similar note.)
> 
> > NOTE ABOUT JOURNALING FILESYSTEMS AND SOME RECOMMENDATIONS
> (JUNE 2004)
> > Journaling filesystems (such as Ext3 or ReiserFS) are now being used
> > by default by most Linux distributions. No secure deletion program
> > that does filesystem-level calls can sanitize files on such
> > filesystems, because sensitive data and metadata can be written to the
> > journal, which cannot be readily accessed. Per-file secure deletion is
> > better implemented in the operating system.

shred can be used in certain modes of journaled file systems.
How about telling users that they must set the certain mode
if they set shred for erase_command in journaled file systems?
man 1 shred goes on like this:

> In  the  case of ext3 file systems, the above disclaimer applies (and shred is thus
> of limited effectiveness) only in data=journal mode, which journals  file  data  in
> addition  to  just metadata.  In both the data=ordered (default) and data=writeback
> modes, shred works as usual.  Ext3 journaling modes can be changed  by  adding  the
> data=something  option  to  the  mount  options for a particular file system in the
> /etc/fstab file, as documented in the mount man page (man mount).

As shown above, shred works as usual in both the data=ordered (default) and data=writeback modes.
I think data=journal mode is not used in many cases because it degrades performance.
Therefore, I think it is enough to indicate that shred cannot be used in data=journal mode.

Regards,

--
Takanori Asaba


Re: Complete data erasure

From
Stephen Frost
Date:
Greetings,

* asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote:
> This feature erases data area just before it is returned to the OS (“erase” means that overwrite data area to hide
itscontents here)  
> because there is a risk that the data will be restored by attackers if it is returned to the OS without being
overwritten.
> The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.

Looking at this fresh, I wanted to point out that I think Tom's right-
we aren't going to be able to reasonbly support this kind of data
erasure on a simple DROP TABLE or TRUNCATE.

> I want users to be able to customize the erasure method for their security policies.

There's also this- but I think what it means is that we'd probably have
a top-level command that basically is "ERASE TABLE blah;" or similar
which doesn't operate during transaction commit but instead marks the
table as "to be erased" and then perhaps "erasure in progress" and then
"fully erased" (or maybe just back to 'normal' at that point).  Making
those updates will require the command to perform its own transaction
management which is why it can't be in a transaction itself but also
means that the data erasure process doesn't need to be done during
commit.

> My idea is adding a new parameter erase_command to postgresql.conf.

Yeah, I don't think that's really a sensible option or even approach.

> When erase_command is set, VACUUM does not truncate a file size to non-zero
> because it's safer for users to return the entire file to the OS than to return part of it.

There was discussion elsewhere about preventing VACUUM from doing a
truncate on a file because of the lock it requires and problems with
replicas..  I'm not sure where that ended up, but, in general, I don't
think this feature and VACUUM should really have anything to do with
each other except for the possible case that a user might be told to
configure their system to not allow VACUUM to truncate tables if they
care about this case.

As mentioned elsewhere, you do also have to consider that the sensitive
data will end up in the WAL and on replicas.  I don't believe that means
this feature is without use, but it means that users of this feature
will also need to understand and be able to address WAL and replicas
(along with backups and such too, of course).

Thanks,

Stephen

Attachment

RE: Complete data erasure

From
"asaba.takanori@fujitsu.com"
Date:
Hello Stephen,

Thank you for comment.

From: Stephen Frost <sfrost@snowman.net>
> Greetings,
> 
> * asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote:
> > This feature erases data area just before it is returned to the OS (“erase”
> means that overwrite data area to hide its contents here)
> > because there is a risk that the data will be restored by attackers if it is returned
> to the OS without being overwritten.
> > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
> 
> Looking at this fresh, I wanted to point out that I think Tom's right-
> we aren't going to be able to reasonbly support this kind of data
> erasure on a simple DROP TABLE or TRUNCATE.
> 
> > I want users to be able to customize the erasure method for their security
> policies.
> 
> There's also this- but I think what it means is that we'd probably have
> a top-level command that basically is "ERASE TABLE blah;" or similar
> which doesn't operate during transaction commit but instead marks the
> table as "to be erased" and then perhaps "erasure in progress" and then
> "fully erased" (or maybe just back to 'normal' at that point).  Making
> those updates will require the command to perform its own transaction
> management which is why it can't be in a transaction itself but also
> means that the data erasure process doesn't need to be done during
> commit.
> 
> > My idea is adding a new parameter erase_command to postgresql.conf.
> 
> Yeah, I don't think that's really a sensible option or even approach.

I think erase_command can also manage the state of a table.
The exit status of a configured command shows it.( 0 is "fully erased" or "normal", 1 is "erasure in progress") 
erase_command is executed not during a transaction but when unlink() is executed. 
(for example, after a transaction that has done DROP TABLE)
I think that this shows " to be erased ".

> > When erase_command is set, VACUUM does not truncate a file size to non-zero
> > because it's safer for users to return the entire file to the OS than to return part
> of it.
> 
> There was discussion elsewhere about preventing VACUUM from doing a
> truncate on a file because of the lock it requires and problems with
> replicas..  I'm not sure where that ended up, but, in general, I don't
> think this feature and VACUUM should really have anything to do with
> each other except for the possible case that a user might be told to
> configure their system to not allow VACUUM to truncate tables if they
> care about this case.

I think that if ftruncate(fd, 0) is executed in VACUUM, 
data area allocated to a file is returned to the OS, so that area must be overwritten.

> As mentioned elsewhere, you do also have to consider that the sensitive
> data will end up in the WAL and on replicas.  I don't believe that means
> this feature is without use, but it means that users of this feature
> will also need to understand and be able to address WAL and replicas
> (along with backups and such too, of course).

I see.
I can't think of it right away, but I will deal with it.

Sorry for my late reply.
It takes time to understand email from you because I'm a beginner.
Please point out any mistakes.

Regards,

--
Takanori Asaba



Re: Complete data erasure

From
Stephen Frost
Date:
Greetings,

* asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote:
> From: Stephen Frost <sfrost@snowman.net>
> > * asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote:
> > > This feature erases data area just before it is returned to the OS (“erase”
> > means that overwrite data area to hide its contents here)
> > > because there is a risk that the data will be restored by attackers if it is returned
> > to the OS without being overwritten.
> > > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
> >
> > Looking at this fresh, I wanted to point out that I think Tom's right-
> > we aren't going to be able to reasonbly support this kind of data
> > erasure on a simple DROP TABLE or TRUNCATE.
> >
> > > I want users to be able to customize the erasure method for their security
> > policies.
> >
> > There's also this- but I think what it means is that we'd probably have
> > a top-level command that basically is "ERASE TABLE blah;" or similar
> > which doesn't operate during transaction commit but instead marks the
> > table as "to be erased" and then perhaps "erasure in progress" and then
> > "fully erased" (or maybe just back to 'normal' at that point).  Making
> > those updates will require the command to perform its own transaction
> > management which is why it can't be in a transaction itself but also
> > means that the data erasure process doesn't need to be done during
> > commit.
> >
> > > My idea is adding a new parameter erase_command to postgresql.conf.
> >
> > Yeah, I don't think that's really a sensible option or even approach.
>
> I think erase_command can also manage the state of a table.
> The exit status of a configured command shows it.( 0 is "fully erased" or "normal", 1 is "erasure in progress")
> erase_command is executed not during a transaction but when unlink() is executed.

I really don't see what the advantage of having this be configurable is.
In addition, an external command's actions wouldn't be put through the
WAL meaning that replicas would have to be dealt with in some other way
beyind regular WAL and that seems like it'd just be ugly.

> (for example, after a transaction that has done DROP TABLE)

We certainly can't run external commands during transaction COMMIT, so
this can't be part of a regular DROP TABLE.

> > > When erase_command is set, VACUUM does not truncate a file size to non-zero
> > > because it's safer for users to return the entire file to the OS than to return part
> > of it.
> >
> > There was discussion elsewhere about preventing VACUUM from doing a
> > truncate on a file because of the lock it requires and problems with
> > replicas..  I'm not sure where that ended up, but, in general, I don't
> > think this feature and VACUUM should really have anything to do with
> > each other except for the possible case that a user might be told to
> > configure their system to not allow VACUUM to truncate tables if they
> > care about this case.
>
> I think that if ftruncate(fd, 0) is executed in VACUUM,
> data area allocated to a file is returned to the OS, so that area must be overwritten.

As I mentioned, there was already talk of making that disallowed in
VACUUM, so that would just need to be configured (and be able to be
configured) when running in an environment which requires this.

> > As mentioned elsewhere, you do also have to consider that the sensitive
> > data will end up in the WAL and on replicas.  I don't believe that means
> > this feature is without use, but it means that users of this feature
> > will also need to understand and be able to address WAL and replicas
> > (along with backups and such too, of course).
>
> I see.
> I can't think of it right away, but I will deal with it.

It's something that will need to be understood and dealt with.  A simple
answer might be "the user must also destroy all WAL and all backups in
an approved manner, and ensure all replicas have replayed all WAL or
been destroyed".

Thanks,

Stephen

Attachment

Re: Complete data erasure

From
Tomas Vondra
Date:
On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote:
>Greetings,
>
>* asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote:
>> From: Stephen Frost <sfrost@snowman.net>
>> > * asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote:
>> > > This feature erases data area just before it is returned to the OS (“erase”
>> > means that overwrite data area to hide its contents here)
>> > > because there is a risk that the data will be restored by attackers if it is returned
>> > to the OS without being overwritten.
>> > > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
>> >
>> > Looking at this fresh, I wanted to point out that I think Tom's right-
>> > we aren't going to be able to reasonbly support this kind of data
>> > erasure on a simple DROP TABLE or TRUNCATE.
>> >
>> > > I want users to be able to customize the erasure method for their security
>> > policies.
>> >
>> > There's also this- but I think what it means is that we'd probably have
>> > a top-level command that basically is "ERASE TABLE blah;" or similar
>> > which doesn't operate during transaction commit but instead marks the
>> > table as "to be erased" and then perhaps "erasure in progress" and then
>> > "fully erased" (or maybe just back to 'normal' at that point).  Making
>> > those updates will require the command to perform its own transaction
>> > management which is why it can't be in a transaction itself but also
>> > means that the data erasure process doesn't need to be done during
>> > commit.
>> >
>> > > My idea is adding a new parameter erase_command to postgresql.conf.
>> >
>> > Yeah, I don't think that's really a sensible option or even approach.
>>
>> I think erase_command can also manage the state of a table.
>> The exit status of a configured command shows it.( 0 is "fully erased" or "normal", 1 is "erasure in progress")
>> erase_command is executed not during a transaction but when unlink() is executed.
>
>I really don't see what the advantage of having this be configurable is.
>In addition, an external command's actions wouldn't be put through the
>WAL meaning that replicas would have to be dealt with in some other way
>beyind regular WAL and that seems like it'd just be ugly.
>
>> (for example, after a transaction that has done DROP TABLE)
>
>We certainly can't run external commands during transaction COMMIT, so
>this can't be part of a regular DROP TABLE.
>

IMO the best solution would be that the DROP TABLE does everything as
usual, but instead of deleting the relfilenode it moves it to some sort
of queue. And then a background worker would "erase" these relfilenodes
outside the COMMIT.

And yes, we need to do this in a way that works with replicas, i.e. we
need to WAL-log it somehow. And it should to be done in a way that works
when the replica is on a different type of filesystem.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Complete data erasure

From
Stephen Frost
Date:
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote:
> >We certainly can't run external commands during transaction COMMIT, so
> >this can't be part of a regular DROP TABLE.
>
> IMO the best solution would be that the DROP TABLE does everything as
> usual, but instead of deleting the relfilenode it moves it to some sort
> of queue. And then a background worker would "erase" these relfilenodes
> outside the COMMIT.

That sounds interesting, though I'm a bit worried that it's going to
lead to the same kind of complications and difficulty that we have with
deleted columns- anything that's working with the system tables will
need to see this new "dropped but pending delete" flag.  Would we also
rename the table when this happens?  Or change the schema it's in?
Otherwise, your typical DROP IF EXISTS / CREATE could end up breaking.

> And yes, we need to do this in a way that works with replicas, i.e. we
> need to WAL-log it somehow. And it should to be done in a way that works
> when the replica is on a different type of filesystem.

I agree it should go through WAL somehow (ideally without needing an
actual zero'd or whatever page for every page in the relation), but why
do we care about the filesystem on the replica?  We don't have anything
that's really filesystem specific in WAL replay today and I don't see
this as needing to change that..

Thanks,

Stephen

Attachment

Re: Complete data erasure

From
Tomas Vondra
Date:
On Mon, Feb 03, 2020 at 09:07:09AM -0500, Stephen Frost wrote:
>Greetings,
>
>* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote:
>> >We certainly can't run external commands during transaction COMMIT, so
>> >this can't be part of a regular DROP TABLE.
>>
>> IMO the best solution would be that the DROP TABLE does everything as
>> usual, but instead of deleting the relfilenode it moves it to some sort
>> of queue. And then a background worker would "erase" these relfilenodes
>> outside the COMMIT.
>
>That sounds interesting, though I'm a bit worried that it's going to
>lead to the same kind of complications and difficulty that we have with
>deleted columns- anything that's working with the system tables will
>need to see this new "dropped but pending delete" flag.  Would we also
>rename the table when this happens?  Or change the schema it's in?
>Otherwise, your typical DROP IF EXISTS / CREATE could end up breaking.
>

That's not really what I meant - let me explain. When I said DROP TABLE
should do everything as usual, that includes catalog changes. I.e. after
the commit there would not be any remaining entries in system catalogs
or anything like that.

The only thing we'd do differently is that instead of unlinking the
relfilenode segments, we'd move the relfilenode to a persistent queue
(essentially a regular table used as a queue relfilenodes). The
background worker would watch the queue, and when it gets a new
relfilenode it'd "delete" the data and then remove the relfilenode from
the queue.

So essentially others would not be able to even see the (now dropped)
object, they could create new object with the same name etc.

I imagine we might provide a way to wait for the deletion to actually
complete (can't do that as part of the DROP TABLE, though), so that
people can be sure when the data is actually gone (for scripts etc.).
A simple function waiting for the queue to get empty might be enough, I
guess, but maybe not.

>> And yes, we need to do this in a way that works with replicas, i.e. we
>> need to WAL-log it somehow. And it should to be done in a way that works
>> when the replica is on a different type of filesystem.
>
>I agree it should go through WAL somehow (ideally without needing an
>actual zero'd or whatever page for every page in the relation), but why
>do we care about the filesystem on the replica?  We don't have anything
>that's really filesystem specific in WAL replay today and I don't see
>this as needing to change that..
>

I think this depends on what our requirements are.

My assumption is that when you perform this "secure data erasure" on the
primary, you probably also want to erase the data on the replica. But if
the instances use different file systems (COW vs. non-COW, ...) the
exact thing that needs to happen may be different. Or maybe the replica
does not need to do anything, making it noop?

In which case we don't need to WAL-log the exact change for each page,
it might even be fine to not even WAL-log anything except for the final
removal from the queue. I mean, the data is worthless and not used by
anyone at this point, there's no point in replicating it ...

I haven't thought about this very hard. It's not clear what should
happen if we complete the erasure on primary, remove the relfilenode
from the queue, and then restart the replica before it finishes the
local erasure. The queue (if represented by a simple table) will be
replicated, so the replica will forget it still has work to do.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: Complete data erasure

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
> That's not really what I meant - let me explain. When I said DROP TABLE
> should do everything as usual, that includes catalog changes. I.e. after
> the commit there would not be any remaining entries in system catalogs
> or anything like that.
>
> The only thing we'd do differently is that instead of unlinking the
> relfilenode segments, we'd move the relfilenode to a persistent queue
> (essentially a regular table used as a queue relfilenodes). The
> background worker would watch the queue, and when it gets a new
> relfilenode it'd "delete" the data and then remove the relfilenode from
> the queue.
>
> So essentially others would not be able to even see the (now dropped)
> object, they could create new object with the same name etc.

That sounds good.  I think we can also follow the way the WAL archiver does its job, instead of using a regular table.
Thatis, when the transaction that performed DROP TABLE commits, it puts the data files in the "trash bin," which is
actuallya filesystem directory.  Or, it just renames the data files in the original directory by appending some suffix
suchas ".del".  Then, the background worker scans the trash bin or the data directory to erase the file content and
deletethe file. 

The trash bin mechanism may open up the application for restoring mistakenly dropped tables, a feature like Oracle's
FlashDrop.  The dropping transaction puts the table metadata (system catalog data or DDL) in the trash bin as well as
thedata file. 


> I imagine we might provide a way to wait for the deletion to actually
> complete (can't do that as part of the DROP TABLE, though), so that
> people can be sure when the data is actually gone (for scripts etc.).
> A simple function waiting for the queue to get empty might be enough, I
> guess, but maybe not.

Agreed, because the user should expect the disk space to be available after DROP TABLE has been committed.  Can't we
reallymake the COMMIT to wait for the erasure to complete?  Do we have to use an asynchronous erasure method with a
backgroundworker?  For example, COMMIT performs: 

1. Writes a commit WAL record, finalizing the system catalog change.
2. Puts the data files in the trash bin or renames them.
3. Erase the file content and delete the file.  This could take a long time.
4. COMMIT replies success to the client.

What is concerned about is that the need to erase and delete the data file would be forgotten if the server crashes
duringstep 3.  If so, postmaster can do the job at startup, just like it deletes temporary files (although it delays
thestartup.) 


> I think this depends on what our requirements are.
>
> My assumption is that when you perform this "secure data erasure" on the
> primary, you probably also want to erase the data on the replica. But if
> the instances use different file systems (COW vs. non-COW, ...) the
> exact thing that needs to happen may be different. Or maybe the replica
> does not need to do anything, making it noop?

We can guide the use of non-COW file systems on both the primary and standby in the manual.


Regards
Takayuki Tsunakawa




RE: Complete data erasure

From
"imai.yoshikazu@fujitsu.com"
Date:
From tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com>
> What is concerned about is that the need to erase and delete the data file would be forgotten if the server crashes
duringstep 
> 3.  If so, postmaster can do the job at startup, just like it deletes temporary files (although it delays the
startup.)

I suspect erasing and deleting the data file at startup is rather not
acceptable.
We can query to the table foo during erasing the table bar in
normal conditions. However, if a crash happens and postmaster erase the table
bar at startup, we can't execute any queries until the erasure is finished
(and it would take long time). I'm afraid there will be someone complain about
that.
Can we erase the table after startup when a crash happens and also implement
the function that returns whether the erasure of a specified table is completed
or not? The users who want to know whether the erasure is completed can use
that function and wait their tasks until the erasure is done, and the other
users can execute query while erasing is processed.


I have an another point to want to discuss. In current specification, every
table will be completely erased if erase_command is set.
The security conscious systems might want to delete all data completely, but
isn't there a case that handles data that doesn't care about security? In
that case, someone would want to choose which tables to be erased completely
and which tables to be only dropped, which is achieved adding an option to
"DROP TABLE" or implementing "ERASE TABLE" command.

--
Yoshikazu Imai



Re: Complete data erasure

From
Tomas Vondra
Date:
On Tue, Feb 04, 2020 at 12:53:44AM +0000, tsunakawa.takay@fujitsu.com
wrote:
>From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> That's not really what I meant - let me explain. When I said DROP
>> TABLE should do everything as usual, that includes catalog changes.
>> I.e. after the commit there would not be any remaining entries in
>> system catalogs or anything like that.
>>
>> The only thing we'd do differently is that instead of unlinking the
>> relfilenode segments, we'd move the relfilenode to a persistent queue
>> (essentially a regular table used as a queue relfilenodes). The
>> background worker would watch the queue, and when it gets a new
>> relfilenode it'd "delete" the data and then remove the relfilenode
>> from the queue.
>>
>> So essentially others would not be able to even see the (now dropped)
>> object, they could create new object with the same name etc.
>
>That sounds good.  I think we can also follow the way the WAL archiver
>does its job, instead of using a regular table.  That is, when the
>transaction that performed DROP TABLE commits, it puts the data files
>in the "trash bin," which is actually a filesystem directory.  Or, it
>just renames the data files in the original directory by appending some
>suffix such as ".del".  Then, the background worker scans the trash bin
>or the data directory to erase the file content and delete the file.
>

Yeah, that could work, I guess.

>The trash bin mechanism may open up the application for restoring
>mistakenly dropped tables, a feature like Oracle's Flash Drop.  The
>dropping transaction puts the table metadata (system catalog data or
>DDL) in the trash bin as well as the data file.
>

That seems like a very different feature, and I doubt this is the right
way to implement that. That would require much more infrastructure than
just moving the file to a separate dir.

>
>> I imagine we might provide a way to wait for the deletion to actually
>> complete (can't do that as part of the DROP TABLE, though), so that
>> people can be sure when the data is actually gone (for scripts etc.).
>> A simple function waiting for the queue to get empty might be enough,
>> I guess, but maybe not.
>
>Agreed, because the user should expect the disk space to be available
>after DROP TABLE has been committed.  Can't we really make the COMMIT
>to wait for the erasure to complete?  Do we have to use an asynchronous
>erasure method with a background worker?  For example, COMMIT performs:
>

I think it depends how exactly it's implemented. As Tom pointed out in
his message [1], we can't do the erasure itself in the post-commit is
not being able to handle errors. But if the files are renamed durably,
and the erasure happens in a separate process, that could be OK. The
COMMIT may wayt for it or not, that's mostly irrelevant I think.

[1] https://www.postgresql.org/message-id/9104.1579107235%40sss.pgh.pa.us

>1. Writes a commit WAL record, finalizing the system catalog change.
>2. Puts the data files in the trash bin or renames them.
>3. Erase the file content and delete the file. This could take a long time.
>4. COMMIT replies success to the client.
>

I don't think the COMMIT has to wait for (3) - it might, of course, but
for some use cases it may be better to just commit and leave the
bgworker do the work. And then allow checking if it completed.

>What is concerned about is that the need to erase and delete the data
>file would be forgotten if the server crashes during step 3.  If so,
>postmaster can do the job at startup, just like it deletes temporary
>files (although it delays the startup.)
>

Startup seems like a pretty bad place to do this stuff. There may be a
lot of data to erase, making recovery very long.

>
>> I think this depends on what our requirements are.
>>
>> My assumption is that when you perform this "secure data erasure" on
>> the primary, you probably also want to erase the data on the replica.
>> But if the instances use different file systems (COW vs. non-COW,
>> ...) the exact thing that needs to happen may be different. Or maybe
>> the replica does not need to do anything, making it noop?
>
>We can guide the use of non-COW file systems on both the primary and
>standby in the manual.
>

I don't see how that solves the issue. I think it's quite useful to be
able to use different filesystems for primary/replica. And we may even
know how to securely erase data on both, in which case I don't see a
point not to allow such configurations.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Complete data erasure

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I think it depends how exactly it's implemented. As Tom pointed out in
> his message [1], we can't do the erasure itself in the post-commit is
> not being able to handle errors. But if the files are renamed durably,
> and the erasure happens in a separate process, that could be OK. The
> COMMIT may wayt for it or not, that's mostly irrelevant I think.

How is requiring a file rename to be completed post-commit any less
problematic than the other way?  You still have a non-negligible
chance of failure.

>> 1. Writes a commit WAL record, finalizing the system catalog change.
>> 2. Puts the data files in the trash bin or renames them.
>> 3. Erase the file content and delete the file. This could take a long time.
>> 4. COMMIT replies success to the client.

> I don't think the COMMIT has to wait for (3) - it might, of course, but
> for some use cases it may be better to just commit and leave the
> bgworker do the work. And then allow checking if it completed.

This doesn't seem like a path that will lead to success.  The fundamental
point here is that COMMIT has to be an atomic action --- or if it isn't,
failure partway through has to lead to a database crash & restart, which
isn't very pleasant, especially if WAL replay of the commit after the
restart re-encounters the same error.

Up to now, we've sort of looked the other way with respect to failures
of file unlinks post-commit, reasoning that the worst that will happen
is disk space leakage from no-longer-referenced files that we failed to
unlink.  (Which is bad, certainly, but not catastrophic; it's immaterial
to database semantics.)  This patch basically needs to raise the level of
guarantee that exists in this area, or it won't do what it says on the
tin.  But I've not seen any indication that we know how to do that in a
workable way.

            regards, tom lane



RE: Complete data erasure

From
"tsunakawa.takay@fujitsu.com"
Date:
From: Tom Lane <tgl@sss.pgh.pa.us>
> Up to now, we've sort of looked the other way with respect to failures
> of file unlinks post-commit, reasoning that the worst that will happen
> is disk space leakage from no-longer-referenced files that we failed to
> unlink.  (Which is bad, certainly, but not catastrophic; it's immaterial
> to database semantics.)  This patch basically needs to raise the level of
> guarantee that exists in this area, or it won't do what it says on the
> tin.  But I've not seen any indication that we know how to do that in a
> workable way.

Hmm, the error case is a headache.Even if the bgworker does the erasure, it could hit the same error repeatedly when
thefile system or disk is broken, causing repeated I/O that may hamper performance. 

Do we have no good  choice but to leave it up to the user to erase the file content like the following?


https://docs.oracle.com/en/database/oracle/oracle-database/19/asoag/general-considerations-of-using-transparent-data-encryption.html#GUID-F02C9CBF-0374-408B-8655-F7531B681D41
--------------------------------------------------
Oracle Database
Advanced Security Guide
7 General Considerations of Using Transparent Data Encryption

Managing Security for Plaintext Fragments


You should remove old plaintext fragments that can appear over time.


Old plaintext fragments may be present for some time until the database overwrites the blocks containing such values.
Ifprivileged operating system users bypass the access controls of the database, then they might be able to directly
accessthese values in the data file holding the tablespace.  

To minimize this risk:


1.Create a new tablespace in a new data file.

You can use the CREATE TABLESPACE statement to create this tablespace.


2.Move the table containing encrypted columns to the new tablespace. You can use the ALTER TABLE.....MOVE statement.

Repeat this step for all of the objects in the original tablespace.


3.Drop the original tablespace.

You can use the DROP TABLESPACE tablespace INCLUDING CONTENTS KEEP DATAFILES statement. Oracle recommends that you
securelydelete data files using platform-specific utilities.  


4.Use platform-specific and file system-specific utilities to securely delete the old data file. Examples of such
utilitiesinclude shred (on Linux) and sdelete (on Windows).  
--------------------------------------------------


Regards
Takayuki Tsunakawa





Re: Complete data erasure

From
Masahiko Sawada
Date:
On Tue, 4 Feb 2020 at 09:53, tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
> > That's not really what I meant - let me explain. When I said DROP TABLE
> > should do everything as usual, that includes catalog changes. I.e. after
> > the commit there would not be any remaining entries in system catalogs
> > or anything like that.
> >
> > The only thing we'd do differently is that instead of unlinking the
> > relfilenode segments, we'd move the relfilenode to a persistent queue
> > (essentially a regular table used as a queue relfilenodes). The
> > background worker would watch the queue, and when it gets a new
> > relfilenode it'd "delete" the data and then remove the relfilenode from
> > the queue.
> >
> > So essentially others would not be able to even see the (now dropped)
> > object, they could create new object with the same name etc.
>
> That sounds good.  I think we can also follow the way the WAL archiver does its job, instead of using a regular
table. That is, when the transaction that performed DROP TABLE commits, it puts the data files in the "trash bin,"
whichis actually a filesystem directory.  Or, it just renames the data files in the original directory by appending
somesuffix such as ".del".  Then, the background worker scans the trash bin or the data directory to erase the file
contentand delete the file. 
>
> The trash bin mechanism may open up the application for restoring mistakenly dropped tables, a feature like Oracle's
FlashDrop.  The dropping transaction puts the table metadata (system catalog data or DDL) in the trash bin as well as
thedata file. 
>
>
> > I imagine we might provide a way to wait for the deletion to actually
> > complete (can't do that as part of the DROP TABLE, though), so that
> > people can be sure when the data is actually gone (for scripts etc.).
> > A simple function waiting for the queue to get empty might be enough, I
> > guess, but maybe not.
>
> Agreed, because the user should expect the disk space to be available after DROP TABLE has been committed.  Can't we
reallymake the COMMIT to wait for the erasure to complete?  Do we have to use an asynchronous erasure method with a
backgroundworker?  For example, COMMIT performs: 
>
> 1. Writes a commit WAL record, finalizing the system catalog change.
> 2. Puts the data files in the trash bin or renames them.
> 3. Erase the file content and delete the file.  This could take a long time.
> 4. COMMIT replies success to the client.
>
> What is concerned about is that the need to erase and delete the data file would be forgotten if the server crashes
duringstep 3.  If so, postmaster can do the job at startup, just like it deletes temporary files (although it delays
thestartup.) 

Please note that we need to erase files not only when dropping or
truncating tables but also when aborting the transaction that created
a new table. If user wants to sure the data is actually erased they
needs to wait for rollback as well that could be ROLLBACK command by
user or an error during transaction etc.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: Complete data erasure

From
"asaba.takanori@fujitsu.com"
Date:
Hello Stephen,

From: Stephen Frost <sfrost@snowman.net>
> I disagree- it's a feature that's been asked for multiple times and does
> have value in some situations.

I'm rethinking the need for this feature although I think that it improves the security.
You said that this feature has value in some situations.
Could you tell me about that situations?

Regards,

--
Takanori Asaba



RE: Complete data erasure

From
"asaba.takanori@fujitsu.com"
Date:
Greetings,

From: asaba.takanori@fujitsu.com <asaba.takanori@fujitsu.com>
> Hello Stephen,
>
> From: Stephen Frost <sfrost@snowman.net>
> > I disagree- it's a feature that's been asked for multiple times and does
> > have value in some situations.
>
> I'm rethinking the need for this feature although I think that it improves the
> security.
> You said that this feature has value in some situations.
> Could you tell me about that situations?
>
> Regards,
>
> --
> Takanori Asaba
>

I think that the use scene is to ensure that no data remains.
This feature will give users peace of mind.

There is a risk of leakage as long as data remains.
I think that there are some things that users are worried about.
For example, there is a possibility that even if it takes years, attackers decrypt encrypted data.
Or some users may be concerned about disk management in cloud environments.
These concerns will be resolved if they can erase data themselves.

I think that this feature is valuable, so I would appreciate your continued cooperation.

Regards,

--
Takanori Asaba





RE: Complete data erasure

From
"asaba.takanori@fujitsu.com"
Date:
Hello Tom,

From: Tom Lane <tgl@sss.pgh.pa.us>
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > I think it depends how exactly it's implemented. As Tom pointed out in
> > his message [1], we can't do the erasure itself in the post-commit is
> > not being able to handle errors. But if the files are renamed durably,
> > and the erasure happens in a separate process, that could be OK. The
> > COMMIT may wayt for it or not, that's mostly irrelevant I think.
>
> How is requiring a file rename to be completed post-commit any less
> problematic than the other way?  You still have a non-negligible
> chance of failure.

I think that errors of rename(2) listed in [1] cannot occur or can be handled.
What do you think?

[1] http://man7.org/linux/man-pages/man2/rename.2.html


Regards,

--
Takanori Asaba





RE: Complete data erasure

From
"asaba.takanori@fujitsu.com"
Date:
Hello Tom,

From: asaba.takanori@fujitsu.com <asaba.takanori@fujitsu.com>
> Hello Tom,
>
> From: Tom Lane <tgl@sss.pgh.pa.us>
> > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> > > I think it depends how exactly it's implemented. As Tom pointed out in
> > > his message [1], we can't do the erasure itself in the post-commit is
> > > not being able to handle errors. But if the files are renamed durably,
> > > and the erasure happens in a separate process, that could be OK. The
> > > COMMIT may wayt for it or not, that's mostly irrelevant I think.
> >
> > How is requiring a file rename to be completed post-commit any less
> > problematic than the other way?  You still have a non-negligible
> > chance of failure.
>
> I think that errors of rename(2) listed in [1] cannot occur or can be handled.
> What do you think?
>
> [1] http://man7.org/linux/man-pages/man2/rename.2.html
>

I have another idea.
How about managing status of data file like the WAL archiver?
For example,

1. Create a status file "...ready" in a transaction that has DROP TABLE. (not rename the data file)
2. Background worker scans the directory that has status file.
3. Rename the status file to "...progress" when the erase of the data file starts.
4. Rename the status file to "...done" when the erase of the data file finished.

I think that it's OK because step1 is not post-commit and background worker can handle error of the erase.

Regards,

--
Takanori Asaba





RE: Complete data erasure

From
"asaba.takanori@fujitsu.com"
Date:
Hello,

I was off the point.
I want to organize the discussion and suggest feature design.

There are two opinions.
1. COMMIT should not take a long time because errors are more likely to occur.
2. The data area should be released when COMMIT is completed because COMMIT has to be an atomic action.

These opinions are correct.
But it is difficult to satisfy them at the same time.
So I suggest that users have the option to choose.
DROP TABLE works as following two patterns:

1. Rename data file to "...del" instead of ftruncate(fd,0).
  After that, bgworker scan the directory and run erase_command.
  (erase_command is command set by user like archive_command.
   For example, shred on Linux.)

2. Run erase_command for data file immediately before ftruncate(fd,0).
  Wait until it completes, then reply COMMIT to the client.
  After that, it is the same as normal processing.

If error of erase_command occurs, it issues WARNING and don't request unlink to CheckPointer.
It’s not a security failure because I think that there is a risk when data area is returned to OS.

I will implement from pattern 2 because it's more similar to user experience than pattern 1.
This method has been pointed out as follows.

From Stephen
> We certainly can't run external commands during transaction COMMIT, so
> this can't be part of a regular DROP TABLE.

I think it means that error of external commands can't be handled.
If so, it's no problem because I determined behavior after error.
Are there any other problems?

Regards,

--
Takanori Asaba





Re: Complete data erasure

From
Tomas Vondra
Date:
On Fri, Apr 10, 2020 at 08:23:32AM +0000, asaba.takanori@fujitsu.com wrote:
>Hello,
>
>I was off the point.
>I want to organize the discussion and suggest feature design.
>
>There are two opinions.
>1. COMMIT should not take a long time because errors are more likely to occur.

I don't think it's a matter of commit duration but a question what to do
in response to errors in the data erasure code - which is something we
can't really rule out if we allow custom scripts to perform the erasure.
If the erasure took very long but couldn't possibly fail, it'd be much
easier to handle than fast erasure failing often.

The difficulty of error-handling is why adding new stuff to commit may
be tricky. Which is why I proposed not to do the failure-prone code in
commit itself, but move it to a separate process.

>2. The data area should be released when COMMIT is completed because COMMIT has to be an atomic action.
>

I don't think "commit is atomic" really implies "data should be released
at commit". This is precisely what makes the feature extremely hard to
implement, IMHO.

Why wouldn't it be acceptable to do something like this?

     BEGIN;
     ...
     DROP TABLE x ERASE;
     ...
     COMMIT;  <-- Don't do data erasure, just add "x" to queue.

     -- wait for another process to complete the erasure
     SELECT pg_wait_for_erasure();

That means we're not running any custom commands / code during commit,
which should (hopefully) make it easier to handle errors.

>
>These opinions are correct.
>But it is difficult to satisfy them at the same time.
>So I suggest that users have the option to choose.
>DROP TABLE works as following two patterns:
>
>1. Rename data file to "...del" instead of ftruncate(fd,0).
>  After that, bgworker scan the directory and run erase_command.
>  (erase_command is command set by user like archive_command.
>   For example, shred on Linux.)
>
>2. Run erase_command for data file immediately before ftruncate(fd,0).
>  Wait until it completes, then reply COMMIT to the client.
>  After that, it is the same as normal processing.
>
>If error of erase_command occurs, it issues WARNING and don't request unlink to CheckPointer.
>It’s not a security failure because I think that there is a risk when data area is returned to OS.
>

I think it was already disicussed why doing file renames and other
expensive stuff that could fail is a bad idea. And I'm not sure just
ignoring erase_command failures (because that's what WARNING does) is
really appropriate for this feature.

>I will implement from pattern 2 because it's more similar to user experience than pattern 1.
>This method has been pointed out as follows.
>
>From Stephen
>> We certainly can't run external commands during transaction COMMIT, so
>> this can't be part of a regular DROP TABLE.
>
>I think it means that error of external commands can't be handled.
>If so, it's no problem because I determined behavior after error.
>Are there any other problems?

I'm not sure what you mean by "determined behavior after error"? You
essentially propose to just print a warning and be done with it. But
that means we can simply leave data files with sensitive data on the
disk, which seems ... not great.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Complete data erasure

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I don't think "commit is atomic" really implies "data should be released
> at commit". This is precisely what makes the feature extremely hard to
> implement, IMHO.

> Why wouldn't it be acceptable to do something like this?

>      BEGIN;
>      ...
>      DROP TABLE x ERASE;
>      ...
>      COMMIT;  <-- Don't do data erasure, just add "x" to queue.

>      -- wait for another process to complete the erasure
>      SELECT pg_wait_for_erasure();

> That means we're not running any custom commands / code during commit,
> which should (hopefully) make it easier to handle errors.

Yeah, adding actions-that-could-fail to commit is a very hard sell,
so something like this API would probably have a better chance.

However ... the whole concept of erasure being a committable action
seems basically misguided from here.  Consider this scenario:

    begin;

    create table full_o_secrets (...);

    ... manipulate secret data in full_o_secrets ...

    drop table full_o_secrets erase;

    ... do something that unintentionally fails, causing xact abort ...

    commit;

Now what?  Your secret data is all over the disk and you have *no*
recourse to get rid of it; that's true even at a very low level,
because we unlinked the file when rolling back the transaction.
If the error occurred before getting to "drop table full_o_secrets
erase" then there isn't even any way in principle for the server
to know that you might not be happy about leaving that data lying
around.

And I haven't even spoken of copies that may exist in WAL, or
have been propagated to standby servers by now.

I have no idea what an actual solution that accounted for those
problems would look like.  But as presented, this is a toy feature
offering no real security gain, if you ask me.

            regards, tom lane



Re: Complete data erasure

From
Tomas Vondra
Date:
On Sat, Apr 11, 2020 at 01:56:10PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I don't think "commit is atomic" really implies "data should be released
>> at commit". This is precisely what makes the feature extremely hard to
>> implement, IMHO.
>
>> Why wouldn't it be acceptable to do something like this?
>
>>      BEGIN;
>>      ...
>>      DROP TABLE x ERASE;
>>      ...
>>      COMMIT;  <-- Don't do data erasure, just add "x" to queue.
>
>>      -- wait for another process to complete the erasure
>>      SELECT pg_wait_for_erasure();
>
>> That means we're not running any custom commands / code during commit,
>> which should (hopefully) make it easier to handle errors.
>
>Yeah, adding actions-that-could-fail to commit is a very hard sell,
>so something like this API would probably have a better chance.
>
>However ... the whole concept of erasure being a committable action
>seems basically misguided from here.  Consider this scenario:
>
>    begin;
>
>    create table full_o_secrets (...);
>
>    ... manipulate secret data in full_o_secrets ...
>
>    drop table full_o_secrets erase;
>
>    ... do something that unintentionally fails, causing xact abort ...
>
>    commit;
>
>Now what?  Your secret data is all over the disk and you have *no*
>recourse to get rid of it; that's true even at a very low level,
>because we unlinked the file when rolling back the transaction.
>If the error occurred before getting to "drop table full_o_secrets
>erase" then there isn't even any way in principle for the server
>to know that you might not be happy about leaving that data lying
>around.
>
>And I haven't even spoken of copies that may exist in WAL, or
>have been propagated to standby servers by now.
>
>I have no idea what an actual solution that accounted for those
>problems would look like.  But as presented, this is a toy feature
>offering no real security gain, if you ask me.
>

Yeah, unfortunately the feature as proposed has these weaknesses.

This is why I proposed that a solution based on encryption and throwing
away a key might be more reliable - if you don't have a key, who cares
if the encrypted data file (or parts of it) is still on disk?

It has issues too, though - a query might need a temporary file to do a
sort, hash join spills to disk, or something like that. And those won't
be encrypted without some executor changes (e.g. we might propagate
"needs erasure" to temp files, and do erasure when necessary).

I doubt a perfect solution would be so complex it's not feasible in
practice, especially in v1. So maybe the best thing we can do is
documenting those limitations, but I'm not sure where to draw the line
between acceptable and unacceptable limitations.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services