Re: Complete data erasure - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Complete data erasure
Date
Msg-id 20200204212900.ppe2v534shphfxlv@development
Whole thread Raw
In response to RE: Complete data erasure  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses Re: Complete data erasure
List pgsql-hackers
On Tue, Feb 04, 2020 at 12:53:44AM +0000, tsunakawa.takay@fujitsu.com
wrote:
>From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> That's not really what I meant - let me explain. When I said DROP
>> TABLE should do everything as usual, that includes catalog changes.
>> I.e. after the commit there would not be any remaining entries in
>> system catalogs or anything like that.
>>
>> The only thing we'd do differently is that instead of unlinking the
>> relfilenode segments, we'd move the relfilenode to a persistent queue
>> (essentially a regular table used as a queue relfilenodes). The
>> background worker would watch the queue, and when it gets a new
>> relfilenode it'd "delete" the data and then remove the relfilenode
>> from the queue.
>>
>> So essentially others would not be able to even see the (now dropped)
>> object, they could create new object with the same name etc.
>
>That sounds good.  I think we can also follow the way the WAL archiver
>does its job, instead of using a regular table.  That is, when the
>transaction that performed DROP TABLE commits, it puts the data files
>in the "trash bin," which is actually a filesystem directory.  Or, it
>just renames the data files in the original directory by appending some
>suffix such as ".del".  Then, the background worker scans the trash bin
>or the data directory to erase the file content and delete the file.
>

Yeah, that could work, I guess.

>The trash bin mechanism may open up the application for restoring
>mistakenly dropped tables, a feature like Oracle's Flash Drop.  The
>dropping transaction puts the table metadata (system catalog data or
>DDL) in the trash bin as well as the data file.
>

That seems like a very different feature, and I doubt this is the right
way to implement that. That would require much more infrastructure than
just moving the file to a separate dir.

>
>> I imagine we might provide a way to wait for the deletion to actually
>> complete (can't do that as part of the DROP TABLE, though), so that
>> people can be sure when the data is actually gone (for scripts etc.).
>> A simple function waiting for the queue to get empty might be enough,
>> I guess, but maybe not.
>
>Agreed, because the user should expect the disk space to be available
>after DROP TABLE has been committed.  Can't we really make the COMMIT
>to wait for the erasure to complete?  Do we have to use an asynchronous
>erasure method with a background worker?  For example, COMMIT performs:
>

I think it depends how exactly it's implemented. As Tom pointed out in
his message [1], we can't do the erasure itself in the post-commit is
not being able to handle errors. But if the files are renamed durably,
and the erasure happens in a separate process, that could be OK. The
COMMIT may wayt for it or not, that's mostly irrelevant I think.

[1] https://www.postgresql.org/message-id/9104.1579107235%40sss.pgh.pa.us

>1. Writes a commit WAL record, finalizing the system catalog change.
>2. Puts the data files in the trash bin or renames them.
>3. Erase the file content and delete the file. This could take a long time.
>4. COMMIT replies success to the client.
>

I don't think the COMMIT has to wait for (3) - it might, of course, but
for some use cases it may be better to just commit and leave the
bgworker do the work. And then allow checking if it completed.

>What is concerned about is that the need to erase and delete the data
>file would be forgotten if the server crashes during step 3.  If so,
>postmaster can do the job at startup, just like it deletes temporary
>files (although it delays the startup.)
>

Startup seems like a pretty bad place to do this stuff. There may be a
lot of data to erase, making recovery very long.

>
>> I think this depends on what our requirements are.
>>
>> My assumption is that when you perform this "secure data erasure" on
>> the primary, you probably also want to erase the data on the replica.
>> But if the instances use different file systems (COW vs. non-COW,
>> ...) the exact thing that needs to happen may be different. Or maybe
>> the replica does not need to do anything, making it noop?
>
>We can guide the use of non-COW file systems on both the primary and
>standby in the manual.
>

I don't see how that solves the issue. I think it's quite useful to be
able to use different filesystems for primary/replica. And we may even
know how to securely erase data on both, in which case I don't see a
point not to allow such configurations.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: Robert Haas
Date:
Subject: Re: [Proposal] Global temporary tables