Re: TRUNCATE - Mailing list pgsql-hackers

From Joel Burton
Subject Re: TRUNCATE
Date
Msg-id JGEPJNMCKODMDHGOBKDNGEPICNAA.joel@joelburton.com
Whole thread Raw
In response to Re: TRUNCATE  ("Rod Taylor" <rbt@zort.ca>)
List pgsql-hackers
> I still highly recommend that it be a drop foreign key, grab data,
> truncate, import data, reapply foreign key (which will double check
> your work) as I believe data and schema integrity should be high goals
> of Postgresql (myself anyway).

I agree that they should be high goals.

> However, I'd like to know what your doing.  ie.  Why is this method
> the fastest and easiest way.

It's easier than dropping and recreating rules because that takes a bit of
trouble. (If there were any easy way in pg_dump or in psql directly to get
the text of just the rules/triggers/RI declarations for a table, that would
make it a bit easier than pulling that out of the other table stuff in
pg_dump output).

It's easier than a full-database dump/fix/restore because sometimes
(hopefully now historically :) ) pg_dump wasn't a perfect tool: for a while,
it would drop RI statements, or occassionally have a hard time recreating a
view, etc. Plus, of course, with a large database, it can take quite a while
to process.

A limited-to-that-table dump/fix/restore can be a problem because of the
interrelationships of RI among tables. If there were any easier way to dump
information about a table so that I could restore the RI that other tables
have on it, that might be a solution.

> Given a dataset, how much (%age wise) do you generally modify when you
> clean it up? And what is the general dataset size (half million
> records?).

More often than not, I'm working with complex tables and fairly small # of
rows. Perhaps 30 fields x 10,000 records.

> I'm making the assumption you almost never delete data (primary key
> wise), otherwise foreign keyd data may no longer align.  I'm also
> making the assumption your either the sole user of the database, or
> have a long period where the database is not in use (overnight?).

No, I wouldn't delete things. I don't want to bypass RI, just not have to
deal with removing/creating all the rules every time I need to clean up some
data.

In most cases, yes, I can either take db offline for an hour or ensure that
there will be no writes to the db.

> What do you use to clean it up?  Custom script for each job?  Regular
> expressions?  Simple spreadsheet like format filling in numbers?
> Complete dump and replace of the data?

Generally, I'm doing something like pulling the data into a text file and
using regexes or spreadsheet tools to clean it up. Some of which could be
done (through plperl or plpython or such), but is often easier with full
text manipulation/emacs/etc.

Sometimes, though, I'm just cleaning out test data. For example: often, I'll
create a table where records can't be deleted w/out logging information
going into another table (via rule or trigger, and I usually prohibit
deletions at all from the log table). I'll put some fake records in, delete
a few, see the logging data, and later, when I want to delete the fake data
(& the fake logging data), I'll use TRUNCATE. I could only do this w/a
normal DELETE by dropping these rules/triggers, deleting, and re-creating.
Which is more of a pain than I'd like to do.

Given that only the owner of a table can truncate it, I'm not too worried
about the security of truncate: the owner is the person who would understand
the ramifications of truncate vs. delete. Having it either emit a warning
that there were triggers/rules/RI or (better) requiring a FORCE parameter to
truncate when there are might make others feel safe, though.

- J.

Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton
Knowledge Management & Technology Consultant



pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: strange explain
Next
From: Tom Lane
Date:
Subject: pg_dump DROP commands and implicit search paths