Thread: [GENERAL] Storing LOs outside the database and having a proper cleanup-mechanism to prevent dangling files

Hi.
 
After struggeling with storing LOs in the database and making efficient backups/restore-routines I've decided to move LOs out of the DB and use a "filename"-column containing an UUID referencing the file one the filesystem. Having this schema I need a robust book-keeping mechanism so I know when I can delete files on the filesystem when no longer referenced from the database.
 
In an email-program I store the whole email-body as a binary in a file and store all header-info in an email_message table (for this example's sake).
 
I have a book-keeping table :
 
create table origo_email_message_file_operation(    filename VARCHAR not null,    operation VARCHAR not NULL,    PRIMARY KEY (filename, operation)
);
And a email_message table holding the messages, with a "filename"-column referencing the BLOB.
 
For DELETE I use this routine:
 
deleteData(fileName, messageId) {
startTX()
deleteEmail() // Deletes the entry in email_message table
INSERT INTO origo_email_message_file_operation(filename, operation) 
VALUES('3b1d18ae-7b54-c055-1016-d928daec7294','DELETE');
deleteEmail(messageId)
commitTX()
}
 
So for DELETE-operations the file on disk isn't deleted by the main-program, but by a cleanup-job which runs as a cron-job and the inspects origo_email_message_file_operation for DELETE-entries and then delete the referenced files, then removes the DELETE-entries from origo_email_message_file_operation.
 
 
 
INSERT is like this:
 
insertData(fileName) {
 
startTX()
// First, register the INSERT in case it fails 
INSERT INTO origo_email_message_file_operation(filename, operation) 
VALUES('3b1d18ae-7b54-c055-1016-d928daec7294','INSERT');
 
commitTX()
 
startTX()
 
 
insertEmail() // Inserts the entry in email_message table
DELETE FROM origo_email_message_file_operation 
WHERE filename = '3b1d18ae-7b54-c055-1016-d928daec7294' AND operation = 'INSERT';
// If this commits there is no entry left in origo_email_message_file_operation and we're all good
commitTX()
 
 
}
 
UPDATE is implemented as INSERT + DELETE.
 
 
The challenge with this is what to do if INSERT rolls back. If INSERT rolls back then we end up with an INSERT-entry in origo_email_message_file_operation with no corresponding "filename"-entry in email_message. But I fail to se how a cleanup job can know the difference between such an INSERT-entry in origo_email_message_file_operation caused by ROLLBACK and and INSERT-entry caused by an in-progress insertEmail() operation.
 
Does anyone have a robust mechanism for cleaning up files in such scenarios? 
 
Thanks.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment