Re: Complete data erasure - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Complete data erasure |
Date | |
Msg-id | 20200203173038.wawcyw2rzmihj45z@development Whole thread Raw |
In response to | Re: Complete data erasure (Stephen Frost <sfrost@snowman.net>) |
Responses |
RE: Complete data erasure
|
List | pgsql-hackers |
On Mon, Feb 03, 2020 at 09:07:09AM -0500, Stephen Frost wrote: >Greetings, > >* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: >> On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote: >> >We certainly can't run external commands during transaction COMMIT, so >> >this can't be part of a regular DROP TABLE. >> >> IMO the best solution would be that the DROP TABLE does everything as >> usual, but instead of deleting the relfilenode it moves it to some sort >> of queue. And then a background worker would "erase" these relfilenodes >> outside the COMMIT. > >That sounds interesting, though I'm a bit worried that it's going to >lead to the same kind of complications and difficulty that we have with >deleted columns- anything that's working with the system tables will >need to see this new "dropped but pending delete" flag. Would we also >rename the table when this happens? Or change the schema it's in? >Otherwise, your typical DROP IF EXISTS / CREATE could end up breaking. > That's not really what I meant - let me explain. When I said DROP TABLE should do everything as usual, that includes catalog changes. I.e. after the commit there would not be any remaining entries in system catalogs or anything like that. The only thing we'd do differently is that instead of unlinking the relfilenode segments, we'd move the relfilenode to a persistent queue (essentially a regular table used as a queue relfilenodes). The background worker would watch the queue, and when it gets a new relfilenode it'd "delete" the data and then remove the relfilenode from the queue. So essentially others would not be able to even see the (now dropped) object, they could create new object with the same name etc. I imagine we might provide a way to wait for the deletion to actually complete (can't do that as part of the DROP TABLE, though), so that people can be sure when the data is actually gone (for scripts etc.). A simple function waiting for the queue to get empty might be enough, I guess, but maybe not. >> And yes, we need to do this in a way that works with replicas, i.e. we >> need to WAL-log it somehow. And it should to be done in a way that works >> when the replica is on a different type of filesystem. > >I agree it should go through WAL somehow (ideally without needing an >actual zero'd or whatever page for every page in the relation), but why >do we care about the filesystem on the replica? We don't have anything >that's really filesystem specific in WAL replay today and I don't see >this as needing to change that.. > I think this depends on what our requirements are. My assumption is that when you perform this "secure data erasure" on the primary, you probably also want to erase the data on the replica. But if the instances use different file systems (COW vs. non-COW, ...) the exact thing that needs to happen may be different. Or maybe the replica does not need to do anything, making it noop? In which case we don't need to WAL-log the exact change for each page, it might even be fine to not even WAL-log anything except for the final removal from the queue. I mean, the data is worthless and not used by anyone at this point, there's no point in replicating it ... I haven't thought about this very hard. It's not clear what should happen if we complete the erasure on primary, remove the relfilenode from the queue, and then restart the replica before it finishes the local erasure. The queue (if represented by a simple table) will be replicated, so the replica will forget it still has work to do. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: