Thread: Unloading a table consistently
Hi, I will have a log table which, once a day or so, is copied to a file (for movement to a data warehouse), and the log table emptied. For performance, the log table on the production system has no indexes, and is write-only. (The unload process is the only reader.) To unload it, I will be doing: BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; COPY log TO 'filename-path'; TRUNCATE log; COMMIT; My understanding is that I need the SERIALIZABLE isolation level so that the COPY and TRUNCATE see exactly the same view of the table. Obviously, I don't want to lose data by having the TRUNCATE delete records that appeared while the COPY was executing. Is that correct? Is there a better way to handle this kind of thing that I'm missing? Thanks! -- Xof
Christophe <xof@thebuild.com> writes: > I will have a log table which, once a day or so, is copied to a file > (for movement to a data warehouse), and the log table emptied. For > performance, the log table on the production system has no indexes, > and is write-only. (The unload process is the only reader.) > To unload it, I will be doing: > BEGIN; > SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; > COPY log TO 'filename-path'; > TRUNCATE log; > COMMIT; > My understanding is that I need the SERIALIZABLE isolation level so > that the COPY and TRUNCATE see exactly the same view of the table. No, the SERIALIZABLE bit is useless if not actually counterproductive here. What you'll actually need is an exclusive lock: BEGIN; LOCK TABLE log; COPY log TO 'filename-path'; TRUNCATE log; COMMIT; to prevent anyone from inserting new rows during the COPY. Otherwise any such rows would be lost at the TRUNCATE. You might be able to get away with a lock that only locks out writers and not readers (offhand I think IN SHARE MODE is the right thing), but from your description that would be a useless refinement anyway. If you can't tolerate locking out writers for that long, you won't be able to use TRUNCATE. The operation I think you were imagining is BEGIN; SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; COPY log TO 'filename-path'; DELETE FROM log; COMMIT; VACUUM log; -- highly recommended at this point Here, the serializable mode is exactly what you want to guarantee that the DELETE deletes the same rows the COPY copied. This is a great deal less efficient than TRUNCATE, but it's secure for concurrent insertions, which TRUNCATE is definitely not. regards, tom lane
On May 3, 2008, at 9:56 AM, Tom Lane wrote: > This is > a great deal less efficient than TRUNCATE, but it's secure for > concurrent insertions, which TRUNCATE is definitely not. Exactly my question; thank you! -- Xof
Re: Unloading a table consistently
From
ptjm@news-reader-radius.uniserve.com (Patrick TJ McPhee)
Date:
In article <9478.1209833817@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: % If you can't tolerate locking out writers for that long, you won't % be able to use TRUNCATE. The operation I think you were imagining is % % BEGIN; % SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; % COPY log TO 'filename-path'; % DELETE FROM log; % COMMIT; % VACUUM log; -- highly recommended at this point How about something along the lines of BEGIN; ALTER TABLE log RENAME to log_old; CREATE TABLE log(...); COMMIT; BEGIN; LOCK table log_old; COPY log_old TO 'filename-path'; DROP TABLE log_old; COMMIT; I believe this will keep the writers writing while keeping the efficiency of truncating. -- Patrick TJ McPhee North York Canada ptjm@interlog.com
Patrick TJ McPhee wrote: > How about something along the lines of > > BEGIN; > ALTER TABLE log RENAME to log_old; > CREATE TABLE log(...); > COMMIT; > > BEGIN; > LOCK table log_old; > COPY log_old TO 'filename-path'; > DROP TABLE log_old; > COMMIT; > > I believe this will keep the writers writing while keeping the efficiency > of truncating. It's almost a pity that there's no TRUNCATE TABLE log MOVE DATA TO log_copy; or similar; ie with two identical table definitions `log' and `log_copy' swap the backing file from `log' to `log_copy' before truncating `log'. `log_copy' could be a new temp table created with CREATE TEMPORARY TABLE ... LIKE. This sort of thing doesn't seem to come up all that much, though. -- Craig Ringer
On May 3, 2008, at 9:29 PM, Patrick TJ McPhee wrote: > How about something along the lines of > > BEGIN; > ALTER TABLE log RENAME to log_old; > CREATE TABLE log(...); > COMMIT; > > BEGIN; > LOCK table log_old; > COPY log_old TO 'filename-path'; > DROP TABLE log_old; > COMMIT; > > I believe this will keep the writers writing while keeping the > efficiency > of truncating. Brilliant.
On Sat, 2008-05-03 at 09:11 -0700, Christophe wrote:
Anyone care to comment about whether a table partition might be better here? Could an insert rule on a table partition automatically create the inherited table, if needed, and insert there?
Thanks,
Reece
...I will have a log table which, once a day or so, is copied to a file (for movement to a data warehouse), and the log table emptied.
Is there a better way to handle this kind of thing that I'm missing?
Anyone care to comment about whether a table partition might be better here? Could an insert rule on a table partition automatically create the inherited table, if needed, and insert there?
Thanks,
Reece
-- Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0 |