Re: Complete data erasure - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Complete data erasure |
Date | |
Msg-id | 20200204212900.ppe2v534shphfxlv@development Whole thread Raw |
In response to | RE: Complete data erasure ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>) |
Responses |
Re: Complete data erasure
|
List | pgsql-hackers |
On Tue, Feb 04, 2020 at 12:53:44AM +0000, tsunakawa.takay@fujitsu.com wrote: >From: Tomas Vondra <tomas.vondra@2ndquadrant.com> >> That's not really what I meant - let me explain. When I said DROP >> TABLE should do everything as usual, that includes catalog changes. >> I.e. after the commit there would not be any remaining entries in >> system catalogs or anything like that. >> >> The only thing we'd do differently is that instead of unlinking the >> relfilenode segments, we'd move the relfilenode to a persistent queue >> (essentially a regular table used as a queue relfilenodes). The >> background worker would watch the queue, and when it gets a new >> relfilenode it'd "delete" the data and then remove the relfilenode >> from the queue. >> >> So essentially others would not be able to even see the (now dropped) >> object, they could create new object with the same name etc. > >That sounds good. I think we can also follow the way the WAL archiver >does its job, instead of using a regular table. That is, when the >transaction that performed DROP TABLE commits, it puts the data files >in the "trash bin," which is actually a filesystem directory. Or, it >just renames the data files in the original directory by appending some >suffix such as ".del". Then, the background worker scans the trash bin >or the data directory to erase the file content and delete the file. > Yeah, that could work, I guess. >The trash bin mechanism may open up the application for restoring >mistakenly dropped tables, a feature like Oracle's Flash Drop. The >dropping transaction puts the table metadata (system catalog data or >DDL) in the trash bin as well as the data file. > That seems like a very different feature, and I doubt this is the right way to implement that. That would require much more infrastructure than just moving the file to a separate dir. > >> I imagine we might provide a way to wait for the deletion to actually >> complete (can't do that as part of the DROP TABLE, though), so that >> people can be sure when the data is actually gone (for scripts etc.). >> A simple function waiting for the queue to get empty might be enough, >> I guess, but maybe not. > >Agreed, because the user should expect the disk space to be available >after DROP TABLE has been committed. Can't we really make the COMMIT >to wait for the erasure to complete? Do we have to use an asynchronous >erasure method with a background worker? For example, COMMIT performs: > I think it depends how exactly it's implemented. As Tom pointed out in his message [1], we can't do the erasure itself in the post-commit is not being able to handle errors. But if the files are renamed durably, and the erasure happens in a separate process, that could be OK. The COMMIT may wayt for it or not, that's mostly irrelevant I think. [1] https://www.postgresql.org/message-id/9104.1579107235%40sss.pgh.pa.us >1. Writes a commit WAL record, finalizing the system catalog change. >2. Puts the data files in the trash bin or renames them. >3. Erase the file content and delete the file. This could take a long time. >4. COMMIT replies success to the client. > I don't think the COMMIT has to wait for (3) - it might, of course, but for some use cases it may be better to just commit and leave the bgworker do the work. And then allow checking if it completed. >What is concerned about is that the need to erase and delete the data >file would be forgotten if the server crashes during step 3. If so, >postmaster can do the job at startup, just like it deletes temporary >files (although it delays the startup.) > Startup seems like a pretty bad place to do this stuff. There may be a lot of data to erase, making recovery very long. > >> I think this depends on what our requirements are. >> >> My assumption is that when you perform this "secure data erasure" on >> the primary, you probably also want to erase the data on the replica. >> But if the instances use different file systems (COW vs. non-COW, >> ...) the exact thing that needs to happen may be different. Or maybe >> the replica does not need to do anything, making it noop? > >We can guide the use of non-COW file systems on both the primary and >standby in the manual. > I don't see how that solves the issue. I think it's quite useful to be able to use different filesystems for primary/replica. And we may even know how to securely erase data on both, in which case I don't see a point not to allow such configurations. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: