Thread: Unloading a table consistently

Unloading a table consistently

From
Christophe
Date:
Hi,

I will have a log table which, once a day or so, is copied to a file
(for movement to a data warehouse), and the log table emptied.  For
performance, the log table on the production system has no indexes,
and is write-only.  (The unload process is the only reader.)

To unload it, I will be doing:

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
COPY log TO 'filename-path';
TRUNCATE log;
COMMIT;

My understanding is that I need the SERIALIZABLE isolation level so
that the COPY and TRUNCATE see exactly the same view of the table.
Obviously, I don't want to lose data by having the TRUNCATE delete
records that appeared while the COPY was executing.  Is that
correct?  Is there a better way to handle this kind of thing that I'm
missing?

Thanks!
-- Xof

Re: Unloading a table consistently

From
Tom Lane
Date:
Christophe <xof@thebuild.com> writes:
> I will have a log table which, once a day or so, is copied to a file
> (for movement to a data warehouse), and the log table emptied.  For
> performance, the log table on the production system has no indexes,
> and is write-only.  (The unload process is the only reader.)

> To unload it, I will be doing:

> BEGIN;
> SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> COPY log TO 'filename-path';
> TRUNCATE log;
> COMMIT;

> My understanding is that I need the SERIALIZABLE isolation level so
> that the COPY and TRUNCATE see exactly the same view of the table.

No, the SERIALIZABLE bit is useless if not actually counterproductive
here.  What you'll actually need is an exclusive lock:

BEGIN;
LOCK TABLE log;
COPY log TO 'filename-path';
TRUNCATE log;
COMMIT;

to prevent anyone from inserting new rows during the COPY.  Otherwise
any such rows would be lost at the TRUNCATE.

You might be able to get away with a lock that only locks out writers
and not readers (offhand I think IN SHARE MODE is the right thing),
but from your description that would be a useless refinement anyway.

If you can't tolerate locking out writers for that long, you won't
be able to use TRUNCATE.  The operation I think you were imagining is

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
COPY log TO 'filename-path';
DELETE FROM log;
COMMIT;
VACUUM log;    -- highly recommended at this point

Here, the serializable mode is exactly what you want to guarantee
that the DELETE deletes the same rows the COPY copied.  This is
a great deal less efficient than TRUNCATE, but it's secure for
concurrent insertions, which TRUNCATE is definitely not.

            regards, tom lane

Re: Unloading a table consistently

From
Christophe
Date:
On May 3, 2008, at 9:56 AM, Tom Lane wrote:
> This is
> a great deal less efficient than TRUNCATE, but it's secure for
> concurrent insertions, which TRUNCATE is definitely not.

Exactly my question; thank you!

-- Xof

Re: Unloading a table consistently

From
ptjm@news-reader-radius.uniserve.com (Patrick TJ McPhee)
Date:
In article <9478.1209833817@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote:

% If you can't tolerate locking out writers for that long, you won't
% be able to use TRUNCATE.  The operation I think you were imagining is
%
% BEGIN;
% SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
% COPY log TO 'filename-path';
% DELETE FROM log;
% COMMIT;
% VACUUM log;    -- highly recommended at this point

How about something along the lines of

BEGIN;
ALTER TABLE log RENAME to log_old;
CREATE TABLE log(...);
COMMIT;

BEGIN;
LOCK table log_old;
COPY log_old TO 'filename-path';
DROP TABLE log_old;
COMMIT;

I believe this will keep the writers writing while keeping the efficiency
of truncating.
--

Patrick TJ McPhee
North York  Canada
ptjm@interlog.com

Re: Unloading a table consistently

From
Craig Ringer
Date:
Patrick TJ McPhee wrote:

> How about something along the lines of
>
> BEGIN;
> ALTER TABLE log RENAME to log_old;
> CREATE TABLE log(...);
> COMMIT;
>
> BEGIN;
> LOCK table log_old;
> COPY log_old TO 'filename-path';
> DROP TABLE log_old;
> COMMIT;
>
> I believe this will keep the writers writing while keeping the efficiency
> of truncating.

It's almost a pity that there's no

TRUNCATE TABLE log MOVE DATA TO log_copy;

or similar; ie with two identical table definitions `log' and `log_copy'
swap the backing file from `log' to `log_copy' before truncating `log'.

`log_copy' could be a new temp table created with CREATE TEMPORARY TABLE
... LIKE.

This sort of thing doesn't seem to come up all that much, though.

--
Craig Ringer

Re: Unloading a table consistently

From
Christophe
Date:
On May 3, 2008, at 9:29 PM, Patrick TJ McPhee wrote:
> How about something along the lines of
>
> BEGIN;
> ALTER TABLE log RENAME to log_old;
> CREATE TABLE log(...);
> COMMIT;
>
> BEGIN;
> LOCK table log_old;
> COPY log_old TO 'filename-path';
> DROP TABLE log_old;
> COMMIT;
>
> I believe this will keep the writers writing while keeping the
> efficiency
> of truncating.

Brilliant.

Re: Unloading a table consistently

From
Reece Hart
Date:
On Sat, 2008-05-03 at 09:11 -0700, Christophe wrote:
I will have a log table which, once a day or so, is copied to a file  
(for movement to a data warehouse), and the log table emptied.
...
Is there a better way to handle this kind of thing that I'm  
missing?

Anyone care to comment about whether a table partition might be better here? Could an insert rule on a table partition automatically create the inherited table, if needed, and insert there?

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0