Thread: [GENERAL] Storing LOs outside the database and having a proper cleanup-mechanism to prevent dangling files
[GENERAL] Storing LOs outside the database and having a proper cleanup-mechanism to prevent dangling files
From
Andreas Joseph Krogh
Date:
Hi.
After struggeling with storing LOs in the database and making efficient backups/restore-routines I've decided to move LOs out of the DB and use a "filename"-column containing an UUID referencing the file one the filesystem. Having this schema I need a robust book-keeping mechanism so I know when I can delete files on the filesystem when no longer referenced from the database.
In an email-program I store the whole email-body as a binary in a file and store all header-info in an email_message table (for this example's sake).
I have a book-keeping table :
create table origo_email_message_file_operation( filename VARCHAR not null, operation VARCHAR not NULL, PRIMARY KEY (filename, operation) );
And a email_message table holding the messages, with a "filename"-column referencing the BLOB.
For DELETE I use this routine:
deleteData(fileName, messageId) {
startTX()
deleteEmail() // Deletes the entry in email_message table
INSERT INTO origo_email_message_file_operation(filename, operation) VALUES('3b1d18ae-7b54-c055-1016-d928daec7294','DELETE');
deleteEmail(messageId)
commitTX()
}
So for DELETE-operations the file on disk isn't deleted by the main-program, but by a cleanup-job which runs as a cron-job and the inspects origo_email_message_file_operation for DELETE-entries and then delete the referenced files, then removes the DELETE-entries from origo_email_message_file_operation.
INSERT is like this:
insertData(fileName) {
startTX()
// First, register the INSERT in case it fails
INSERT INTO origo_email_message_file_operation(filename, operation)
VALUES('3b1d18ae-7b54-c055-1016-d928daec7294','INSERT');
commitTX()
startTX()
insertEmail() // Inserts the entry in email_message table
DELETE FROM origo_email_message_file_operation WHERE filename = '3b1d18ae-7b54-c055-1016-d928daec7294' AND operation = 'INSERT';
// If this commits there is no entry left in origo_email_message_file_operation and we're all good
commitTX()
}
UPDATE is implemented as INSERT + DELETE.
The challenge with this is what to do if INSERT rolls back. If INSERT rolls back then we end up with an INSERT-entry in origo_email_message_file_operation with no corresponding "filename"-entry in email_message. But I fail to se how a cleanup job can know the difference between such an INSERT-entry in origo_email_message_file_operation caused by ROLLBACK and and INSERT-entry caused by an in-progress insertEmail() operation.
Does anyone have a robust mechanism for cleaning up files in such scenarios?
Thanks.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963