Thread: Complete data erasure
Hello hackers, I want to add the feature to erase data so that it cannot be restored because it prevents attackers from stealing data from released data area. - Background International security policies require that above threat is taken measures. It is "Base Protection Profile for Database Management Systems Version 2.12 (DBMS PP)" [1] based on iso 15408. If the security is improved, it will be more likely to be adopted by security-conscious procurers such as public agencies. - Feature This feature erases data area just before it is returned to the OS (“erase” means that overwrite data area to hide its contentshere) because there is a risk that the data will be restored by attackers if it is returned to the OS without being overwritten. The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed. I want users to be able to customize the erasure method for their security policies. - Implementation My idea is adding a new parameter erase_command to postgresql.conf. The command that users set in this parameter is executed just before unlink(path) or ftruncate(fd, 0) is called. For example, the command is shred on Linux and SDelete on Windows. When erase_command is set, VACUUM does not truncate a file size to non-zero because it's safer for users to return the entire file to the OS than to return part of it. Also, there is no standard tool that overwrites part of a file. With the above specifications, users can easily and safely use this feature using standard tool that overwrites entire filelike shred. Hope to hear your feedback and comments. [1] https://www.commoncriteriaportal.org/files/ppfiles/pp0088V2b_pdf.pdf P44 8.1.2 - Threat/Policy A threat agent may use or manage TSF, bypassing the protection mechanisms of the TSF. - TOE Security Objectives Addressing the Threat/Policy The TOE will ensure that any information contained in a protected resource within its Scope of Control is not inappropriately disclosed when the resource is reallocated. - Rationale diminishes this threat by ensuring that TSF data and user data is not persistent when resources are released by one user/process and allocated to another user/process. TOE: Target of Evaluation TSF: TOE Security Functionality Regards -- Takanori Asaba
Hello, Asaba-san. At Wed, 15 Jan 2020 01:31:44 +0000, "asaba.takanori@fujitsu.com" <asaba.takanori@fujitsu.com> wrote in > Hello hackers, > > I want to add the feature to erase data so that it cannot be restored > because it prevents attackers from stealing data from released data area. > > - Background > International security policies require that above threat is taken measures. > It is "Base Protection Profile for Database Management Systems Version 2.12 (DBMS PP)" [1] based on iso 15408. > If the security is improved, it will be more likely to be adopted by security-conscious procurers such as public agencies. > > - Feature > This feature erases data area just before it is returned to the OS (-Y´erase¡ means that overwrite data area to hide itscontents here) -A > because there is a risk that the data will be restored by attackers if it is returned to the OS without being overwritten. > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed. > I want users to be able to customize the erasure method for their security policies. shred(1) or wipe(1) doesn't seem to contribute to the objective on journaled or copy-on-write file systems. I'm not sure, but maybe the same can be true for read-modify-write devices like SSD. I'm not sure about SDelete, but anyway replacing unlink() with something like 'system("shred")' leads to siginificant performance degradation. man 1 wipe says (https://linux.die.net/man/1/wipe) : (shred has a similar note.) > NOTE ABOUT JOURNALING FILESYSTEMS AND SOME RECOMMENDATIONS (JUNE 2004) > Journaling filesystems (such as Ext3 or ReiserFS) are now being used > by default by most Linux distributions. No secure deletion program > that does filesystem-level calls can sanitize files on such > filesystems, because sensitive data and metadata can be written to the > journal, which cannot be readily accessed. Per-file secure deletion is > better implemented in the operating system. WAL files contain copies of such sensitive information, which is not covered by the proposal. Also temporary files are not. If the system doesn't want not to be recoverable after corruption, it must copy such WAL files to archive. Currently there's a discussion on transparent data encyryption covering the all of the above cases on and off of this mailing list. It is different from device-level encryption mentioned in the man page. Doesn't that fit the requirement? https://www.postgresql.org/message-id/CALS%2BJ3-57cL%3Djz_eT9uxiLa8CAh5BE3-HcQvXQBz0ScMjag4Zg%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
"asaba.takanori@fujitsu.com" <asaba.takanori@fujitsu.com> writes: > I want to add the feature to erase data so that it cannot be restored > because it prevents attackers from stealing data from released data area. I think this is fairly pointless, unfortunately. Dropping or truncating tables is as much as we can do without making unwarranted assumptions about the filesystem's behavior. You say you want to zero out the files first, but what will that accomplish on copy-on-write filesystems? Even granting that zeroing our storage files is worth something, it's not worth much if there are more copies of the data elsewhere. And any well-run database is going to have backups, or standby servers, or both. There's no way for the primary node to force standbys to erase themselves (and it'd be a different sort of security hazard if there were). Also to the point: if your assumption is that an attacker has access to the storage medium at a sufficiently low level that they can examine previously-deleted files, what's stopping them from reading the data *before* it's deleted? So I think doing anything useful in this area is a bit outside Postgres' remit. You'd need to be thinking at an operating-system or hardware level. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > "asaba.takanori@fujitsu.com" <asaba.takanori@fujitsu.com> writes: > > I want to add the feature to erase data so that it cannot be restored > > because it prevents attackers from stealing data from released data area. > > I think this is fairly pointless, unfortunately. I disagree- it's a feature that's been asked for multiple times and does have value in some situations. > Dropping or truncating tables is as much as we can do without making > unwarranted assumptions about the filesystem's behavior. You say > you want to zero out the files first, but what will that accomplish > on copy-on-write filesystems? What about filesystems which are not copy-on-write though? > Even granting that zeroing our storage files is worth something, > it's not worth much if there are more copies of the data elsewhere. Backups are not our responsibility to worry about, or, at least, it'd be an independent feature if we wanted to add something like this to pg_basebackup, and not something the initial feature would need to worry about. > And any well-run database is going to have backups, or standby servers, > or both. There's no way for the primary node to force standbys to erase > themselves (and it'd be a different sort of security hazard if there > were). A user can't "force" PG to do anything more than we can "force" a replica to do something, but a user can issue a request to a primary and that primary can then pass that request along to the replica as part of the WAL stream. > Also to the point: if your assumption is that an attacker has access > to the storage medium at a sufficiently low level that they can examine > previously-deleted files, what's stopping them from reading the data > *before* it's deleted? This argument certainly doesn't make any sense- who said they had access to the storage medium at a time before the files were deleted? What if they only had access after the files were zero'd? When you consider the lifetime of a storage medium, it's certainly a great deal longer than the length of time that a given piece of sensitive data might reside on it. > So I think doing anything useful in this area is a bit outside > Postgres' remit. You'd need to be thinking at an operating-system > or hardware level. I disagree entirely. If the operating system and hardware level provide a way for this to work, which is actually rather common when you consider that ext4 is an awful popular filesystem where this would work just fine with nearly all traditional hardware underneath it, then we're just blocking enabling this capability for no justifiably reason. Thanks, Stephen
Attachment
On Wed, Jan 15, 2020 at 10:23:22AM -0500, Stephen Frost wrote: >Greetings, > >* Tom Lane (tgl@sss.pgh.pa.us) wrote: >> "asaba.takanori@fujitsu.com" <asaba.takanori@fujitsu.com> writes: >> > I want to add the feature to erase data so that it cannot be restored >> > because it prevents attackers from stealing data from released data area. >> >> I think this is fairly pointless, unfortunately. > >I disagree- it's a feature that's been asked for multiple times and does >have value in some situations. > >> Dropping or truncating tables is as much as we can do without making >> unwarranted assumptions about the filesystem's behavior. You say >> you want to zero out the files first, but what will that accomplish >> on copy-on-write filesystems? > >What about filesystems which are not copy-on-write though? > >> Even granting that zeroing our storage files is worth something, >> it's not worth much if there are more copies of the data elsewhere. > >Backups are not our responsibility to worry about, or, at least, it'd be >an independent feature if we wanted to add something like this to >pg_basebackup, and not something the initial feature would need to worry >about. > >> And any well-run database is going to have backups, or standby servers, >> or both. There's no way for the primary node to force standbys to erase >> themselves (and it'd be a different sort of security hazard if there >> were). > >A user can't "force" PG to do anything more than we can "force" a >replica to do something, but a user can issue a request to a primary and >that primary can then pass that request along to the replica as part of >the WAL stream. > >> Also to the point: if your assumption is that an attacker has access >> to the storage medium at a sufficiently low level that they can examine >> previously-deleted files, what's stopping them from reading the data >> *before* it's deleted? > >This argument certainly doesn't make any sense- who said they had access >to the storage medium at a time before the files were deleted? What if >they only had access after the files were zero'd? When you consider the >lifetime of a storage medium, it's certainly a great deal longer than >the length of time that a given piece of sensitive data might reside on >it. > >> So I think doing anything useful in this area is a bit outside >> Postgres' remit. You'd need to be thinking at an operating-system >> or hardware level. > >I disagree entirely. If the operating system and hardware level provide >a way for this to work, which is actually rather common when you >consider that ext4 is an awful popular filesystem where this would work >just fine with nearly all traditional hardware underneath it, then we're >just blocking enabling this capability for no justifiably reason. > Not sure. I agree the goal (securely discarding data) is certainly worthwile, although I suspect it's just of many things you'd need to care about. That is, there's probably a million other things you'd need to worry about (logs, WAL, CSV files, temp files, ...), so with actually sensitive data I'd expect people to just dump/load the data into a clean system and rebuild the old one (zero drives, ...). But let's assume it makes sense - is this really the right solution? I think what I'd prefer is encryption + rotation of the keys. Which should work properly even on COW filesystems, the performance impact is kinda low and amortized etc. Of course, we're discussing built-in encryption for quite a bit of time. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings, * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: > On Wed, Jan 15, 2020 at 10:23:22AM -0500, Stephen Frost wrote: > >I disagree entirely. If the operating system and hardware level provide > >a way for this to work, which is actually rather common when you > >consider that ext4 is an awful popular filesystem where this would work > >just fine with nearly all traditional hardware underneath it, then we're > >just blocking enabling this capability for no justifiably reason. > > Not sure. I agree the goal (securely discarding data) is certainly > worthwile, although I suspect it's just of many things you'd need to > care about. That is, there's probably a million other things you'd need > to worry about (logs, WAL, CSV files, temp files, ...), so with actually > sensitive data I'd expect people to just dump/load the data into a clean > system and rebuild the old one (zero drives, ...). Of course there's other things that one would need to worry about, but saying this isn't useful because PG isn't also scanning your entire infrastructure for CSV files that have this sensitive information isn't a sensible argument. I agree that there are different levels of sensitive data- and for many, many cases what is being proposed here will work just fine, even if that level of sensitive data isn't considered "actually sensitive data" by other people. > But let's assume it makes sense - is this really the right solution? I > think what I'd prefer is encryption + rotation of the keys. Which should > work properly even on COW filesystems, the performance impact is kinda > low and amortized etc. Of course, we're discussing built-in encryption > for quite a bit of time. In some cases that may make sense as an alternative, in other situations it doesn't. In other words, I disagree that what you're proposing is just a different implementation of the same thing (which I'd be happy to discuss), it's an entirely different feature with different trade-offs. I'm certainly on-board with the idea of having table level and column level encryption to facilitate an approach like this, but I disagree strongly that we shouldn't have this simple "just overwrite the data a few times" because we might, one day, have useful in-core encryption or even that, even if we had it, that we should tell everyone to use that instead of providing this capability. I don't uninstall 'shread' when I install 'gpg' on my system. Thanks, Stephen
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > But let's assume it makes sense - is this really the right solution? I > think what I'd prefer is encryption + rotation of the keys. Which should > work properly even on COW filesystems, the performance impact is kinda > low and amortized etc. Of course, we're discussing built-in encryption > for quite a bit of time. Yeah, it seems to me that encrypted storage would solve strictly more cases than this proposal does, and it has fewer foot-guns. One foot-gun that had been vaguely bothering me yesterday, but the point didn't quite crystallize till just now, is that the only place that we could implement such file zeroing is post-commit in a transaction that has done DROP TABLE or TRUNCATE. Of course post-commit is an absolutely horrid place to be doing anything that could fail, since there's no very good way to recover from an error. It's an even worse place to be doing anything that could take a long time --- if the user loses patience and kills the session, now your problems are multiplied. Right now our risks in that area are confined to leaking files if unlink() fails, which isn't great but it isn't catastrophic either. With this proposal, erroring out post-commit becomes a security failure, if it happens anywhere before we've finished a possibly large amount of zero-writing. I don't want to go there. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > > But let's assume it makes sense - is this really the right solution? I > > think what I'd prefer is encryption + rotation of the keys. Which should > > work properly even on COW filesystems, the performance impact is kinda > > low and amortized etc. Of course, we're discussing built-in encryption > > for quite a bit of time. > > Yeah, it seems to me that encrypted storage would solve strictly more > cases than this proposal does, and it has fewer foot-guns. I still view that as strictly a different solution and one that certainly won't fit in all cases, not to mention that it's a great deal more complicated and we're certainly no where near close to having it. > One foot-gun that had been vaguely bothering me yesterday, but the point > didn't quite crystallize till just now, is that the only place that we > could implement such file zeroing is post-commit in a transaction that > has done DROP TABLE or TRUNCATE. Of course post-commit is an absolutely > horrid place to be doing anything that could fail, since there's no very > good way to recover from an error. It's an even worse place to be doing > anything that could take a long time --- if the user loses patience and > kills the session, now your problems are multiplied. This is presuming that we make this feature something that can be run in a transaction and rolled back. I don't think there's been any specific expression that there is such a requirement, and you bring up good points that show why providing that functionality would be particularly challenging, but that isn't really an argument against this feature in general. Thanks, Stephen
Attachment
Hello, Horiguchi-san Thank you for comment. At Wed, 15 Jan 2020 03:46 +0000, "Kyotaro Horiguchi "<horikyota.ntt@gmail.com> wrote in > shred(1) or wipe(1) doesn't seem to contribute to the objective on > journaled or copy-on-write file systems. I'm not sure, but maybe the > same can be true for read-modify-write devices like SSD. I'm not sure > about SDelete, but anyway replacing unlink() with something like > 'system("shred")' leads to siginificant performance degradation. > > man 1 wipe says (https://linux.die.net/man/1/wipe) : (shred has a > similar note.) > > > NOTE ABOUT JOURNALING FILESYSTEMS AND SOME RECOMMENDATIONS > (JUNE 2004) > > Journaling filesystems (such as Ext3 or ReiserFS) are now being used > > by default by most Linux distributions. No secure deletion program > > that does filesystem-level calls can sanitize files on such > > filesystems, because sensitive data and metadata can be written to the > > journal, which cannot be readily accessed. Per-file secure deletion is > > better implemented in the operating system. shred can be used in certain modes of journaled file systems. How about telling users that they must set the certain mode if they set shred for erase_command in journaled file systems? man 1 shred goes on like this: > In the case of ext3 file systems, the above disclaimer applies (and shred is thus > of limited effectiveness) only in data=journal mode, which journals file data in > addition to just metadata. In both the data=ordered (default) and data=writeback > modes, shred works as usual. Ext3 journaling modes can be changed by adding the > data=something option to the mount options for a particular file system in the > /etc/fstab file, as documented in the mount man page (man mount). As shown above, shred works as usual in both the data=ordered (default) and data=writeback modes. I think data=journal mode is not used in many cases because it degrades performance. Therefore, I think it is enough to indicate that shred cannot be used in data=journal mode. Regards, -- Takanori Asaba
Greetings, * asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote: > This feature erases data area just before it is returned to the OS (“erase” means that overwrite data area to hide itscontents here) > because there is a risk that the data will be restored by attackers if it is returned to the OS without being overwritten. > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed. Looking at this fresh, I wanted to point out that I think Tom's right- we aren't going to be able to reasonbly support this kind of data erasure on a simple DROP TABLE or TRUNCATE. > I want users to be able to customize the erasure method for their security policies. There's also this- but I think what it means is that we'd probably have a top-level command that basically is "ERASE TABLE blah;" or similar which doesn't operate during transaction commit but instead marks the table as "to be erased" and then perhaps "erasure in progress" and then "fully erased" (or maybe just back to 'normal' at that point). Making those updates will require the command to perform its own transaction management which is why it can't be in a transaction itself but also means that the data erasure process doesn't need to be done during commit. > My idea is adding a new parameter erase_command to postgresql.conf. Yeah, I don't think that's really a sensible option or even approach. > When erase_command is set, VACUUM does not truncate a file size to non-zero > because it's safer for users to return the entire file to the OS than to return part of it. There was discussion elsewhere about preventing VACUUM from doing a truncate on a file because of the lock it requires and problems with replicas.. I'm not sure where that ended up, but, in general, I don't think this feature and VACUUM should really have anything to do with each other except for the possible case that a user might be told to configure their system to not allow VACUUM to truncate tables if they care about this case. As mentioned elsewhere, you do also have to consider that the sensitive data will end up in the WAL and on replicas. I don't believe that means this feature is without use, but it means that users of this feature will also need to understand and be able to address WAL and replicas (along with backups and such too, of course). Thanks, Stephen
Attachment
Hello Stephen, Thank you for comment. From: Stephen Frost <sfrost@snowman.net> > Greetings, > > * asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote: > > This feature erases data area just before it is returned to the OS (“erase” > means that overwrite data area to hide its contents here) > > because there is a risk that the data will be restored by attackers if it is returned > to the OS without being overwritten. > > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed. > > Looking at this fresh, I wanted to point out that I think Tom's right- > we aren't going to be able to reasonbly support this kind of data > erasure on a simple DROP TABLE or TRUNCATE. > > > I want users to be able to customize the erasure method for their security > policies. > > There's also this- but I think what it means is that we'd probably have > a top-level command that basically is "ERASE TABLE blah;" or similar > which doesn't operate during transaction commit but instead marks the > table as "to be erased" and then perhaps "erasure in progress" and then > "fully erased" (or maybe just back to 'normal' at that point). Making > those updates will require the command to perform its own transaction > management which is why it can't be in a transaction itself but also > means that the data erasure process doesn't need to be done during > commit. > > > My idea is adding a new parameter erase_command to postgresql.conf. > > Yeah, I don't think that's really a sensible option or even approach. I think erase_command can also manage the state of a table. The exit status of a configured command shows it.( 0 is "fully erased" or "normal", 1 is "erasure in progress") erase_command is executed not during a transaction but when unlink() is executed. (for example, after a transaction that has done DROP TABLE) I think that this shows " to be erased ". > > When erase_command is set, VACUUM does not truncate a file size to non-zero > > because it's safer for users to return the entire file to the OS than to return part > of it. > > There was discussion elsewhere about preventing VACUUM from doing a > truncate on a file because of the lock it requires and problems with > replicas.. I'm not sure where that ended up, but, in general, I don't > think this feature and VACUUM should really have anything to do with > each other except for the possible case that a user might be told to > configure their system to not allow VACUUM to truncate tables if they > care about this case. I think that if ftruncate(fd, 0) is executed in VACUUM, data area allocated to a file is returned to the OS, so that area must be overwritten. > As mentioned elsewhere, you do also have to consider that the sensitive > data will end up in the WAL and on replicas. I don't believe that means > this feature is without use, but it means that users of this feature > will also need to understand and be able to address WAL and replicas > (along with backups and such too, of course). I see. I can't think of it right away, but I will deal with it. Sorry for my late reply. It takes time to understand email from you because I'm a beginner. Please point out any mistakes. Regards, -- Takanori Asaba
Greetings, * asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote: > From: Stephen Frost <sfrost@snowman.net> > > * asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote: > > > This feature erases data area just before it is returned to the OS (“erase” > > means that overwrite data area to hide its contents here) > > > because there is a risk that the data will be restored by attackers if it is returned > > to the OS without being overwritten. > > > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed. > > > > Looking at this fresh, I wanted to point out that I think Tom's right- > > we aren't going to be able to reasonbly support this kind of data > > erasure on a simple DROP TABLE or TRUNCATE. > > > > > I want users to be able to customize the erasure method for their security > > policies. > > > > There's also this- but I think what it means is that we'd probably have > > a top-level command that basically is "ERASE TABLE blah;" or similar > > which doesn't operate during transaction commit but instead marks the > > table as "to be erased" and then perhaps "erasure in progress" and then > > "fully erased" (or maybe just back to 'normal' at that point). Making > > those updates will require the command to perform its own transaction > > management which is why it can't be in a transaction itself but also > > means that the data erasure process doesn't need to be done during > > commit. > > > > > My idea is adding a new parameter erase_command to postgresql.conf. > > > > Yeah, I don't think that's really a sensible option or even approach. > > I think erase_command can also manage the state of a table. > The exit status of a configured command shows it.( 0 is "fully erased" or "normal", 1 is "erasure in progress") > erase_command is executed not during a transaction but when unlink() is executed. I really don't see what the advantage of having this be configurable is. In addition, an external command's actions wouldn't be put through the WAL meaning that replicas would have to be dealt with in some other way beyind regular WAL and that seems like it'd just be ugly. > (for example, after a transaction that has done DROP TABLE) We certainly can't run external commands during transaction COMMIT, so this can't be part of a regular DROP TABLE. > > > When erase_command is set, VACUUM does not truncate a file size to non-zero > > > because it's safer for users to return the entire file to the OS than to return part > > of it. > > > > There was discussion elsewhere about preventing VACUUM from doing a > > truncate on a file because of the lock it requires and problems with > > replicas.. I'm not sure where that ended up, but, in general, I don't > > think this feature and VACUUM should really have anything to do with > > each other except for the possible case that a user might be told to > > configure their system to not allow VACUUM to truncate tables if they > > care about this case. > > I think that if ftruncate(fd, 0) is executed in VACUUM, > data area allocated to a file is returned to the OS, so that area must be overwritten. As I mentioned, there was already talk of making that disallowed in VACUUM, so that would just need to be configured (and be able to be configured) when running in an environment which requires this. > > As mentioned elsewhere, you do also have to consider that the sensitive > > data will end up in the WAL and on replicas. I don't believe that means > > this feature is without use, but it means that users of this feature > > will also need to understand and be able to address WAL and replicas > > (along with backups and such too, of course). > > I see. > I can't think of it right away, but I will deal with it. It's something that will need to be understood and dealt with. A simple answer might be "the user must also destroy all WAL and all backups in an approved manner, and ensure all replicas have replayed all WAL or been destroyed". Thanks, Stephen
Attachment
On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote: >Greetings, > >* asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote: >> From: Stephen Frost <sfrost@snowman.net> >> > * asaba.takanori@fujitsu.com (asaba.takanori@fujitsu.com) wrote: >> > > This feature erases data area just before it is returned to the OS (“erase” >> > means that overwrite data area to hide its contents here) >> > > because there is a risk that the data will be restored by attackers if it is returned >> > to the OS without being overwritten. >> > > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed. >> > >> > Looking at this fresh, I wanted to point out that I think Tom's right- >> > we aren't going to be able to reasonbly support this kind of data >> > erasure on a simple DROP TABLE or TRUNCATE. >> > >> > > I want users to be able to customize the erasure method for their security >> > policies. >> > >> > There's also this- but I think what it means is that we'd probably have >> > a top-level command that basically is "ERASE TABLE blah;" or similar >> > which doesn't operate during transaction commit but instead marks the >> > table as "to be erased" and then perhaps "erasure in progress" and then >> > "fully erased" (or maybe just back to 'normal' at that point). Making >> > those updates will require the command to perform its own transaction >> > management which is why it can't be in a transaction itself but also >> > means that the data erasure process doesn't need to be done during >> > commit. >> > >> > > My idea is adding a new parameter erase_command to postgresql.conf. >> > >> > Yeah, I don't think that's really a sensible option or even approach. >> >> I think erase_command can also manage the state of a table. >> The exit status of a configured command shows it.( 0 is "fully erased" or "normal", 1 is "erasure in progress") >> erase_command is executed not during a transaction but when unlink() is executed. > >I really don't see what the advantage of having this be configurable is. >In addition, an external command's actions wouldn't be put through the >WAL meaning that replicas would have to be dealt with in some other way >beyind regular WAL and that seems like it'd just be ugly. > >> (for example, after a transaction that has done DROP TABLE) > >We certainly can't run external commands during transaction COMMIT, so >this can't be part of a regular DROP TABLE. > IMO the best solution would be that the DROP TABLE does everything as usual, but instead of deleting the relfilenode it moves it to some sort of queue. And then a background worker would "erase" these relfilenodes outside the COMMIT. And yes, we need to do this in a way that works with replicas, i.e. we need to WAL-log it somehow. And it should to be done in a way that works when the replica is on a different type of filesystem. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings, * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: > On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote: > >We certainly can't run external commands during transaction COMMIT, so > >this can't be part of a regular DROP TABLE. > > IMO the best solution would be that the DROP TABLE does everything as > usual, but instead of deleting the relfilenode it moves it to some sort > of queue. And then a background worker would "erase" these relfilenodes > outside the COMMIT. That sounds interesting, though I'm a bit worried that it's going to lead to the same kind of complications and difficulty that we have with deleted columns- anything that's working with the system tables will need to see this new "dropped but pending delete" flag. Would we also rename the table when this happens? Or change the schema it's in? Otherwise, your typical DROP IF EXISTS / CREATE could end up breaking. > And yes, we need to do this in a way that works with replicas, i.e. we > need to WAL-log it somehow. And it should to be done in a way that works > when the replica is on a different type of filesystem. I agree it should go through WAL somehow (ideally without needing an actual zero'd or whatever page for every page in the relation), but why do we care about the filesystem on the replica? We don't have anything that's really filesystem specific in WAL replay today and I don't see this as needing to change that.. Thanks, Stephen
Attachment
On Mon, Feb 03, 2020 at 09:07:09AM -0500, Stephen Frost wrote: >Greetings, > >* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: >> On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote: >> >We certainly can't run external commands during transaction COMMIT, so >> >this can't be part of a regular DROP TABLE. >> >> IMO the best solution would be that the DROP TABLE does everything as >> usual, but instead of deleting the relfilenode it moves it to some sort >> of queue. And then a background worker would "erase" these relfilenodes >> outside the COMMIT. > >That sounds interesting, though I'm a bit worried that it's going to >lead to the same kind of complications and difficulty that we have with >deleted columns- anything that's working with the system tables will >need to see this new "dropped but pending delete" flag. Would we also >rename the table when this happens? Or change the schema it's in? >Otherwise, your typical DROP IF EXISTS / CREATE could end up breaking. > That's not really what I meant - let me explain. When I said DROP TABLE should do everything as usual, that includes catalog changes. I.e. after the commit there would not be any remaining entries in system catalogs or anything like that. The only thing we'd do differently is that instead of unlinking the relfilenode segments, we'd move the relfilenode to a persistent queue (essentially a regular table used as a queue relfilenodes). The background worker would watch the queue, and when it gets a new relfilenode it'd "delete" the data and then remove the relfilenode from the queue. So essentially others would not be able to even see the (now dropped) object, they could create new object with the same name etc. I imagine we might provide a way to wait for the deletion to actually complete (can't do that as part of the DROP TABLE, though), so that people can be sure when the data is actually gone (for scripts etc.). A simple function waiting for the queue to get empty might be enough, I guess, but maybe not. >> And yes, we need to do this in a way that works with replicas, i.e. we >> need to WAL-log it somehow. And it should to be done in a way that works >> when the replica is on a different type of filesystem. > >I agree it should go through WAL somehow (ideally without needing an >actual zero'd or whatever page for every page in the relation), but why >do we care about the filesystem on the replica? We don't have anything >that's really filesystem specific in WAL replay today and I don't see >this as needing to change that.. > I think this depends on what our requirements are. My assumption is that when you perform this "secure data erasure" on the primary, you probably also want to erase the data on the replica. But if the instances use different file systems (COW vs. non-COW, ...) the exact thing that needs to happen may be different. Or maybe the replica does not need to do anything, making it noop? In which case we don't need to WAL-log the exact change for each page, it might even be fine to not even WAL-log anything except for the final removal from the queue. I mean, the data is worthless and not used by anyone at this point, there's no point in replicating it ... I haven't thought about this very hard. It's not clear what should happen if we complete the erasure on primary, remove the relfilenode from the queue, and then restart the replica before it finishes the local erasure. The queue (if represented by a simple table) will be replicated, so the replica will forget it still has work to do. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: Tomas Vondra <tomas.vondra@2ndquadrant.com> > That's not really what I meant - let me explain. When I said DROP TABLE > should do everything as usual, that includes catalog changes. I.e. after > the commit there would not be any remaining entries in system catalogs > or anything like that. > > The only thing we'd do differently is that instead of unlinking the > relfilenode segments, we'd move the relfilenode to a persistent queue > (essentially a regular table used as a queue relfilenodes). The > background worker would watch the queue, and when it gets a new > relfilenode it'd "delete" the data and then remove the relfilenode from > the queue. > > So essentially others would not be able to even see the (now dropped) > object, they could create new object with the same name etc. That sounds good. I think we can also follow the way the WAL archiver does its job, instead of using a regular table. Thatis, when the transaction that performed DROP TABLE commits, it puts the data files in the "trash bin," which is actuallya filesystem directory. Or, it just renames the data files in the original directory by appending some suffix suchas ".del". Then, the background worker scans the trash bin or the data directory to erase the file content and deletethe file. The trash bin mechanism may open up the application for restoring mistakenly dropped tables, a feature like Oracle's FlashDrop. The dropping transaction puts the table metadata (system catalog data or DDL) in the trash bin as well as thedata file. > I imagine we might provide a way to wait for the deletion to actually > complete (can't do that as part of the DROP TABLE, though), so that > people can be sure when the data is actually gone (for scripts etc.). > A simple function waiting for the queue to get empty might be enough, I > guess, but maybe not. Agreed, because the user should expect the disk space to be available after DROP TABLE has been committed. Can't we reallymake the COMMIT to wait for the erasure to complete? Do we have to use an asynchronous erasure method with a backgroundworker? For example, COMMIT performs: 1. Writes a commit WAL record, finalizing the system catalog change. 2. Puts the data files in the trash bin or renames them. 3. Erase the file content and delete the file. This could take a long time. 4. COMMIT replies success to the client. What is concerned about is that the need to erase and delete the data file would be forgotten if the server crashes duringstep 3. If so, postmaster can do the job at startup, just like it deletes temporary files (although it delays thestartup.) > I think this depends on what our requirements are. > > My assumption is that when you perform this "secure data erasure" on the > primary, you probably also want to erase the data on the replica. But if > the instances use different file systems (COW vs. non-COW, ...) the > exact thing that needs to happen may be different. Or maybe the replica > does not need to do anything, making it noop? We can guide the use of non-COW file systems on both the primary and standby in the manual. Regards Takayuki Tsunakawa
From tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> > What is concerned about is that the need to erase and delete the data file would be forgotten if the server crashes duringstep > 3. If so, postmaster can do the job at startup, just like it deletes temporary files (although it delays the startup.) I suspect erasing and deleting the data file at startup is rather not acceptable. We can query to the table foo during erasing the table bar in normal conditions. However, if a crash happens and postmaster erase the table bar at startup, we can't execute any queries until the erasure is finished (and it would take long time). I'm afraid there will be someone complain about that. Can we erase the table after startup when a crash happens and also implement the function that returns whether the erasure of a specified table is completed or not? The users who want to know whether the erasure is completed can use that function and wait their tasks until the erasure is done, and the other users can execute query while erasing is processed. I have an another point to want to discuss. In current specification, every table will be completely erased if erase_command is set. The security conscious systems might want to delete all data completely, but isn't there a case that handles data that doesn't care about security? In that case, someone would want to choose which tables to be erased completely and which tables to be only dropped, which is achieved adding an option to "DROP TABLE" or implementing "ERASE TABLE" command. -- Yoshikazu Imai
On Tue, Feb 04, 2020 at 12:53:44AM +0000, tsunakawa.takay@fujitsu.com wrote: >From: Tomas Vondra <tomas.vondra@2ndquadrant.com> >> That's not really what I meant - let me explain. When I said DROP >> TABLE should do everything as usual, that includes catalog changes. >> I.e. after the commit there would not be any remaining entries in >> system catalogs or anything like that. >> >> The only thing we'd do differently is that instead of unlinking the >> relfilenode segments, we'd move the relfilenode to a persistent queue >> (essentially a regular table used as a queue relfilenodes). The >> background worker would watch the queue, and when it gets a new >> relfilenode it'd "delete" the data and then remove the relfilenode >> from the queue. >> >> So essentially others would not be able to even see the (now dropped) >> object, they could create new object with the same name etc. > >That sounds good. I think we can also follow the way the WAL archiver >does its job, instead of using a regular table. That is, when the >transaction that performed DROP TABLE commits, it puts the data files >in the "trash bin," which is actually a filesystem directory. Or, it >just renames the data files in the original directory by appending some >suffix such as ".del". Then, the background worker scans the trash bin >or the data directory to erase the file content and delete the file. > Yeah, that could work, I guess. >The trash bin mechanism may open up the application for restoring >mistakenly dropped tables, a feature like Oracle's Flash Drop. The >dropping transaction puts the table metadata (system catalog data or >DDL) in the trash bin as well as the data file. > That seems like a very different feature, and I doubt this is the right way to implement that. That would require much more infrastructure than just moving the file to a separate dir. > >> I imagine we might provide a way to wait for the deletion to actually >> complete (can't do that as part of the DROP TABLE, though), so that >> people can be sure when the data is actually gone (for scripts etc.). >> A simple function waiting for the queue to get empty might be enough, >> I guess, but maybe not. > >Agreed, because the user should expect the disk space to be available >after DROP TABLE has been committed. Can't we really make the COMMIT >to wait for the erasure to complete? Do we have to use an asynchronous >erasure method with a background worker? For example, COMMIT performs: > I think it depends how exactly it's implemented. As Tom pointed out in his message [1], we can't do the erasure itself in the post-commit is not being able to handle errors. But if the files are renamed durably, and the erasure happens in a separate process, that could be OK. The COMMIT may wayt for it or not, that's mostly irrelevant I think. [1] https://www.postgresql.org/message-id/9104.1579107235%40sss.pgh.pa.us >1. Writes a commit WAL record, finalizing the system catalog change. >2. Puts the data files in the trash bin or renames them. >3. Erase the file content and delete the file. This could take a long time. >4. COMMIT replies success to the client. > I don't think the COMMIT has to wait for (3) - it might, of course, but for some use cases it may be better to just commit and leave the bgworker do the work. And then allow checking if it completed. >What is concerned about is that the need to erase and delete the data >file would be forgotten if the server crashes during step 3. If so, >postmaster can do the job at startup, just like it deletes temporary >files (although it delays the startup.) > Startup seems like a pretty bad place to do this stuff. There may be a lot of data to erase, making recovery very long. > >> I think this depends on what our requirements are. >> >> My assumption is that when you perform this "secure data erasure" on >> the primary, you probably also want to erase the data on the replica. >> But if the instances use different file systems (COW vs. non-COW, >> ...) the exact thing that needs to happen may be different. Or maybe >> the replica does not need to do anything, making it noop? > >We can guide the use of non-COW file systems on both the primary and >standby in the manual. > I don't see how that solves the issue. I think it's quite useful to be able to use different filesystems for primary/replica. And we may even know how to securely erase data on both, in which case I don't see a point not to allow such configurations. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > I think it depends how exactly it's implemented. As Tom pointed out in > his message [1], we can't do the erasure itself in the post-commit is > not being able to handle errors. But if the files are renamed durably, > and the erasure happens in a separate process, that could be OK. The > COMMIT may wayt for it or not, that's mostly irrelevant I think. How is requiring a file rename to be completed post-commit any less problematic than the other way? You still have a non-negligible chance of failure. >> 1. Writes a commit WAL record, finalizing the system catalog change. >> 2. Puts the data files in the trash bin or renames them. >> 3. Erase the file content and delete the file. This could take a long time. >> 4. COMMIT replies success to the client. > I don't think the COMMIT has to wait for (3) - it might, of course, but > for some use cases it may be better to just commit and leave the > bgworker do the work. And then allow checking if it completed. This doesn't seem like a path that will lead to success. The fundamental point here is that COMMIT has to be an atomic action --- or if it isn't, failure partway through has to lead to a database crash & restart, which isn't very pleasant, especially if WAL replay of the commit after the restart re-encounters the same error. Up to now, we've sort of looked the other way with respect to failures of file unlinks post-commit, reasoning that the worst that will happen is disk space leakage from no-longer-referenced files that we failed to unlink. (Which is bad, certainly, but not catastrophic; it's immaterial to database semantics.) This patch basically needs to raise the level of guarantee that exists in this area, or it won't do what it says on the tin. But I've not seen any indication that we know how to do that in a workable way. regards, tom lane
From: Tom Lane <tgl@sss.pgh.pa.us> > Up to now, we've sort of looked the other way with respect to failures > of file unlinks post-commit, reasoning that the worst that will happen > is disk space leakage from no-longer-referenced files that we failed to > unlink. (Which is bad, certainly, but not catastrophic; it's immaterial > to database semantics.) This patch basically needs to raise the level of > guarantee that exists in this area, or it won't do what it says on the > tin. But I've not seen any indication that we know how to do that in a > workable way. Hmm, the error case is a headache.Even if the bgworker does the erasure, it could hit the same error repeatedly when thefile system or disk is broken, causing repeated I/O that may hamper performance. Do we have no good choice but to leave it up to the user to erase the file content like the following? https://docs.oracle.com/en/database/oracle/oracle-database/19/asoag/general-considerations-of-using-transparent-data-encryption.html#GUID-F02C9CBF-0374-408B-8655-F7531B681D41 -------------------------------------------------- Oracle Database Advanced Security Guide 7 General Considerations of Using Transparent Data Encryption Managing Security for Plaintext Fragments You should remove old plaintext fragments that can appear over time. Old plaintext fragments may be present for some time until the database overwrites the blocks containing such values. Ifprivileged operating system users bypass the access controls of the database, then they might be able to directly accessthese values in the data file holding the tablespace. To minimize this risk: 1.Create a new tablespace in a new data file. You can use the CREATE TABLESPACE statement to create this tablespace. 2.Move the table containing encrypted columns to the new tablespace. You can use the ALTER TABLE.....MOVE statement. Repeat this step for all of the objects in the original tablespace. 3.Drop the original tablespace. You can use the DROP TABLESPACE tablespace INCLUDING CONTENTS KEEP DATAFILES statement. Oracle recommends that you securelydelete data files using platform-specific utilities. 4.Use platform-specific and file system-specific utilities to securely delete the old data file. Examples of such utilitiesinclude shred (on Linux) and sdelete (on Windows). -------------------------------------------------- Regards Takayuki Tsunakawa
On Tue, 4 Feb 2020 at 09:53, tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Tomas Vondra <tomas.vondra@2ndquadrant.com> > > That's not really what I meant - let me explain. When I said DROP TABLE > > should do everything as usual, that includes catalog changes. I.e. after > > the commit there would not be any remaining entries in system catalogs > > or anything like that. > > > > The only thing we'd do differently is that instead of unlinking the > > relfilenode segments, we'd move the relfilenode to a persistent queue > > (essentially a regular table used as a queue relfilenodes). The > > background worker would watch the queue, and when it gets a new > > relfilenode it'd "delete" the data and then remove the relfilenode from > > the queue. > > > > So essentially others would not be able to even see the (now dropped) > > object, they could create new object with the same name etc. > > That sounds good. I think we can also follow the way the WAL archiver does its job, instead of using a regular table. That is, when the transaction that performed DROP TABLE commits, it puts the data files in the "trash bin," whichis actually a filesystem directory. Or, it just renames the data files in the original directory by appending somesuffix such as ".del". Then, the background worker scans the trash bin or the data directory to erase the file contentand delete the file. > > The trash bin mechanism may open up the application for restoring mistakenly dropped tables, a feature like Oracle's FlashDrop. The dropping transaction puts the table metadata (system catalog data or DDL) in the trash bin as well as thedata file. > > > > I imagine we might provide a way to wait for the deletion to actually > > complete (can't do that as part of the DROP TABLE, though), so that > > people can be sure when the data is actually gone (for scripts etc.). > > A simple function waiting for the queue to get empty might be enough, I > > guess, but maybe not. > > Agreed, because the user should expect the disk space to be available after DROP TABLE has been committed. Can't we reallymake the COMMIT to wait for the erasure to complete? Do we have to use an asynchronous erasure method with a backgroundworker? For example, COMMIT performs: > > 1. Writes a commit WAL record, finalizing the system catalog change. > 2. Puts the data files in the trash bin or renames them. > 3. Erase the file content and delete the file. This could take a long time. > 4. COMMIT replies success to the client. > > What is concerned about is that the need to erase and delete the data file would be forgotten if the server crashes duringstep 3. If so, postmaster can do the job at startup, just like it deletes temporary files (although it delays thestartup.) Please note that we need to erase files not only when dropping or truncating tables but also when aborting the transaction that created a new table. If user wants to sure the data is actually erased they needs to wait for rollback as well that could be ROLLBACK command by user or an error during transaction etc. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Stephen, From: Stephen Frost <sfrost@snowman.net> > I disagree- it's a feature that's been asked for multiple times and does > have value in some situations. I'm rethinking the need for this feature although I think that it improves the security. You said that this feature has value in some situations. Could you tell me about that situations? Regards, -- Takanori Asaba
Greetings, From: asaba.takanori@fujitsu.com <asaba.takanori@fujitsu.com> > Hello Stephen, > > From: Stephen Frost <sfrost@snowman.net> > > I disagree- it's a feature that's been asked for multiple times and does > > have value in some situations. > > I'm rethinking the need for this feature although I think that it improves the > security. > You said that this feature has value in some situations. > Could you tell me about that situations? > > Regards, > > -- > Takanori Asaba > I think that the use scene is to ensure that no data remains. This feature will give users peace of mind. There is a risk of leakage as long as data remains. I think that there are some things that users are worried about. For example, there is a possibility that even if it takes years, attackers decrypt encrypted data. Or some users may be concerned about disk management in cloud environments. These concerns will be resolved if they can erase data themselves. I think that this feature is valuable, so I would appreciate your continued cooperation. Regards, -- Takanori Asaba
Hello Tom, From: Tom Lane <tgl@sss.pgh.pa.us> > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > > I think it depends how exactly it's implemented. As Tom pointed out in > > his message [1], we can't do the erasure itself in the post-commit is > > not being able to handle errors. But if the files are renamed durably, > > and the erasure happens in a separate process, that could be OK. The > > COMMIT may wayt for it or not, that's mostly irrelevant I think. > > How is requiring a file rename to be completed post-commit any less > problematic than the other way? You still have a non-negligible > chance of failure. I think that errors of rename(2) listed in [1] cannot occur or can be handled. What do you think? [1] http://man7.org/linux/man-pages/man2/rename.2.html Regards, -- Takanori Asaba
Hello Tom, From: asaba.takanori@fujitsu.com <asaba.takanori@fujitsu.com> > Hello Tom, > > From: Tom Lane <tgl@sss.pgh.pa.us> > > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > > > I think it depends how exactly it's implemented. As Tom pointed out in > > > his message [1], we can't do the erasure itself in the post-commit is > > > not being able to handle errors. But if the files are renamed durably, > > > and the erasure happens in a separate process, that could be OK. The > > > COMMIT may wayt for it or not, that's mostly irrelevant I think. > > > > How is requiring a file rename to be completed post-commit any less > > problematic than the other way? You still have a non-negligible > > chance of failure. > > I think that errors of rename(2) listed in [1] cannot occur or can be handled. > What do you think? > > [1] http://man7.org/linux/man-pages/man2/rename.2.html > I have another idea. How about managing status of data file like the WAL archiver? For example, 1. Create a status file "...ready" in a transaction that has DROP TABLE. (not rename the data file) 2. Background worker scans the directory that has status file. 3. Rename the status file to "...progress" when the erase of the data file starts. 4. Rename the status file to "...done" when the erase of the data file finished. I think that it's OK because step1 is not post-commit and background worker can handle error of the erase. Regards, -- Takanori Asaba
Hello, I was off the point. I want to organize the discussion and suggest feature design. There are two opinions. 1. COMMIT should not take a long time because errors are more likely to occur. 2. The data area should be released when COMMIT is completed because COMMIT has to be an atomic action. These opinions are correct. But it is difficult to satisfy them at the same time. So I suggest that users have the option to choose. DROP TABLE works as following two patterns: 1. Rename data file to "...del" instead of ftruncate(fd,0). After that, bgworker scan the directory and run erase_command. (erase_command is command set by user like archive_command. For example, shred on Linux.) 2. Run erase_command for data file immediately before ftruncate(fd,0). Wait until it completes, then reply COMMIT to the client. After that, it is the same as normal processing. If error of erase_command occurs, it issues WARNING and don't request unlink to CheckPointer. It’s not a security failure because I think that there is a risk when data area is returned to OS. I will implement from pattern 2 because it's more similar to user experience than pattern 1. This method has been pointed out as follows. From Stephen > We certainly can't run external commands during transaction COMMIT, so > this can't be part of a regular DROP TABLE. I think it means that error of external commands can't be handled. If so, it's no problem because I determined behavior after error. Are there any other problems? Regards, -- Takanori Asaba
On Fri, Apr 10, 2020 at 08:23:32AM +0000, asaba.takanori@fujitsu.com wrote: >Hello, > >I was off the point. >I want to organize the discussion and suggest feature design. > >There are two opinions. >1. COMMIT should not take a long time because errors are more likely to occur. I don't think it's a matter of commit duration but a question what to do in response to errors in the data erasure code - which is something we can't really rule out if we allow custom scripts to perform the erasure. If the erasure took very long but couldn't possibly fail, it'd be much easier to handle than fast erasure failing often. The difficulty of error-handling is why adding new stuff to commit may be tricky. Which is why I proposed not to do the failure-prone code in commit itself, but move it to a separate process. >2. The data area should be released when COMMIT is completed because COMMIT has to be an atomic action. > I don't think "commit is atomic" really implies "data should be released at commit". This is precisely what makes the feature extremely hard to implement, IMHO. Why wouldn't it be acceptable to do something like this? BEGIN; ... DROP TABLE x ERASE; ... COMMIT; <-- Don't do data erasure, just add "x" to queue. -- wait for another process to complete the erasure SELECT pg_wait_for_erasure(); That means we're not running any custom commands / code during commit, which should (hopefully) make it easier to handle errors. > >These opinions are correct. >But it is difficult to satisfy them at the same time. >So I suggest that users have the option to choose. >DROP TABLE works as following two patterns: > >1. Rename data file to "...del" instead of ftruncate(fd,0). > After that, bgworker scan the directory and run erase_command. > (erase_command is command set by user like archive_command. > For example, shred on Linux.) > >2. Run erase_command for data file immediately before ftruncate(fd,0). > Wait until it completes, then reply COMMIT to the client. > After that, it is the same as normal processing. > >If error of erase_command occurs, it issues WARNING and don't request unlink to CheckPointer. >It’s not a security failure because I think that there is a risk when data area is returned to OS. > I think it was already disicussed why doing file renames and other expensive stuff that could fail is a bad idea. And I'm not sure just ignoring erase_command failures (because that's what WARNING does) is really appropriate for this feature. >I will implement from pattern 2 because it's more similar to user experience than pattern 1. >This method has been pointed out as follows. > >From Stephen >> We certainly can't run external commands during transaction COMMIT, so >> this can't be part of a regular DROP TABLE. > >I think it means that error of external commands can't be handled. >If so, it's no problem because I determined behavior after error. >Are there any other problems? I'm not sure what you mean by "determined behavior after error"? You essentially propose to just print a warning and be done with it. But that means we can simply leave data files with sensitive data on the disk, which seems ... not great. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > I don't think "commit is atomic" really implies "data should be released > at commit". This is precisely what makes the feature extremely hard to > implement, IMHO. > Why wouldn't it be acceptable to do something like this? > BEGIN; > ... > DROP TABLE x ERASE; > ... > COMMIT; <-- Don't do data erasure, just add "x" to queue. > -- wait for another process to complete the erasure > SELECT pg_wait_for_erasure(); > That means we're not running any custom commands / code during commit, > which should (hopefully) make it easier to handle errors. Yeah, adding actions-that-could-fail to commit is a very hard sell, so something like this API would probably have a better chance. However ... the whole concept of erasure being a committable action seems basically misguided from here. Consider this scenario: begin; create table full_o_secrets (...); ... manipulate secret data in full_o_secrets ... drop table full_o_secrets erase; ... do something that unintentionally fails, causing xact abort ... commit; Now what? Your secret data is all over the disk and you have *no* recourse to get rid of it; that's true even at a very low level, because we unlinked the file when rolling back the transaction. If the error occurred before getting to "drop table full_o_secrets erase" then there isn't even any way in principle for the server to know that you might not be happy about leaving that data lying around. And I haven't even spoken of copies that may exist in WAL, or have been propagated to standby servers by now. I have no idea what an actual solution that accounted for those problems would look like. But as presented, this is a toy feature offering no real security gain, if you ask me. regards, tom lane
On Sat, Apr 11, 2020 at 01:56:10PM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> I don't think "commit is atomic" really implies "data should be released >> at commit". This is precisely what makes the feature extremely hard to >> implement, IMHO. > >> Why wouldn't it be acceptable to do something like this? > >> BEGIN; >> ... >> DROP TABLE x ERASE; >> ... >> COMMIT; <-- Don't do data erasure, just add "x" to queue. > >> -- wait for another process to complete the erasure >> SELECT pg_wait_for_erasure(); > >> That means we're not running any custom commands / code during commit, >> which should (hopefully) make it easier to handle errors. > >Yeah, adding actions-that-could-fail to commit is a very hard sell, >so something like this API would probably have a better chance. > >However ... the whole concept of erasure being a committable action >seems basically misguided from here. Consider this scenario: > > begin; > > create table full_o_secrets (...); > > ... manipulate secret data in full_o_secrets ... > > drop table full_o_secrets erase; > > ... do something that unintentionally fails, causing xact abort ... > > commit; > >Now what? Your secret data is all over the disk and you have *no* >recourse to get rid of it; that's true even at a very low level, >because we unlinked the file when rolling back the transaction. >If the error occurred before getting to "drop table full_o_secrets >erase" then there isn't even any way in principle for the server >to know that you might not be happy about leaving that data lying >around. > >And I haven't even spoken of copies that may exist in WAL, or >have been propagated to standby servers by now. > >I have no idea what an actual solution that accounted for those >problems would look like. But as presented, this is a toy feature >offering no real security gain, if you ask me. > Yeah, unfortunately the feature as proposed has these weaknesses. This is why I proposed that a solution based on encryption and throwing away a key might be more reliable - if you don't have a key, who cares if the encrypted data file (or parts of it) is still on disk? It has issues too, though - a query might need a temporary file to do a sort, hash join spills to disk, or something like that. And those won't be encrypted without some executor changes (e.g. we might propagate "needs erasure" to temp files, and do erasure when necessary). I doubt a perfect solution would be so complex it's not feasible in practice, especially in v1. So maybe the best thing we can do is documenting those limitations, but I'm not sure where to draw the line between acceptable and unacceptable limitations. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services