RE: Complete data erasure - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: Complete data erasure
Date
Msg-id TY2PR01MB50849087BA692135DE58FE87FE030@TY2PR01MB5084.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Complete data erasure  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses RE: Complete data erasure
Re: Complete data erasure
Re: Complete data erasure
List pgsql-hackers
From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
> That's not really what I meant - let me explain. When I said DROP TABLE
> should do everything as usual, that includes catalog changes. I.e. after
> the commit there would not be any remaining entries in system catalogs
> or anything like that.
>
> The only thing we'd do differently is that instead of unlinking the
> relfilenode segments, we'd move the relfilenode to a persistent queue
> (essentially a regular table used as a queue relfilenodes). The
> background worker would watch the queue, and when it gets a new
> relfilenode it'd "delete" the data and then remove the relfilenode from
> the queue.
>
> So essentially others would not be able to even see the (now dropped)
> object, they could create new object with the same name etc.

That sounds good.  I think we can also follow the way the WAL archiver does its job, instead of using a regular table.
Thatis, when the transaction that performed DROP TABLE commits, it puts the data files in the "trash bin," which is
actuallya filesystem directory.  Or, it just renames the data files in the original directory by appending some suffix
suchas ".del".  Then, the background worker scans the trash bin or the data directory to erase the file content and
deletethe file. 

The trash bin mechanism may open up the application for restoring mistakenly dropped tables, a feature like Oracle's
FlashDrop.  The dropping transaction puts the table metadata (system catalog data or DDL) in the trash bin as well as
thedata file. 


> I imagine we might provide a way to wait for the deletion to actually
> complete (can't do that as part of the DROP TABLE, though), so that
> people can be sure when the data is actually gone (for scripts etc.).
> A simple function waiting for the queue to get empty might be enough, I
> guess, but maybe not.

Agreed, because the user should expect the disk space to be available after DROP TABLE has been committed.  Can't we
reallymake the COMMIT to wait for the erasure to complete?  Do we have to use an asynchronous erasure method with a
backgroundworker?  For example, COMMIT performs: 

1. Writes a commit WAL record, finalizing the system catalog change.
2. Puts the data files in the trash bin or renames them.
3. Erase the file content and delete the file.  This could take a long time.
4. COMMIT replies success to the client.

What is concerned about is that the need to erase and delete the data file would be forgotten if the server crashes
duringstep 3.  If so, postmaster can do the job at startup, just like it deletes temporary files (although it delays
thestartup.) 


> I think this depends on what our requirements are.
>
> My assumption is that when you perform this "secure data erasure" on the
> primary, you probably also want to erase the data on the replica. But if
> the instances use different file systems (COW vs. non-COW, ...) the
> exact thing that needs to happen may be different. Or maybe the replica
> does not need to do anything, making it noop?

We can guide the use of non-COW file systems on both the primary and standby in the manual.


Regards
Takayuki Tsunakawa




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Is custom MemoryContext prohibited?
Next
From: Andres Freund
Date:
Subject: Re: Experimenting with hash join prefetch