Re: TRUNCATE - Mailing list pgsql-hackers
From | Joel Burton |
---|---|
Subject | Re: TRUNCATE |
Date | |
Msg-id | JGEPJNMCKODMDHGOBKDNGEPICNAA.joel@joelburton.com Whole thread Raw |
In response to | Re: TRUNCATE ("Rod Taylor" <rbt@zort.ca>) |
List | pgsql-hackers |
> I still highly recommend that it be a drop foreign key, grab data, > truncate, import data, reapply foreign key (which will double check > your work) as I believe data and schema integrity should be high goals > of Postgresql (myself anyway). I agree that they should be high goals. > However, I'd like to know what your doing. ie. Why is this method > the fastest and easiest way. It's easier than dropping and recreating rules because that takes a bit of trouble. (If there were any easy way in pg_dump or in psql directly to get the text of just the rules/triggers/RI declarations for a table, that would make it a bit easier than pulling that out of the other table stuff in pg_dump output). It's easier than a full-database dump/fix/restore because sometimes (hopefully now historically :) ) pg_dump wasn't a perfect tool: for a while, it would drop RI statements, or occassionally have a hard time recreating a view, etc. Plus, of course, with a large database, it can take quite a while to process. A limited-to-that-table dump/fix/restore can be a problem because of the interrelationships of RI among tables. If there were any easier way to dump information about a table so that I could restore the RI that other tables have on it, that might be a solution. > Given a dataset, how much (%age wise) do you generally modify when you > clean it up? And what is the general dataset size (half million > records?). More often than not, I'm working with complex tables and fairly small # of rows. Perhaps 30 fields x 10,000 records. > I'm making the assumption you almost never delete data (primary key > wise), otherwise foreign keyd data may no longer align. I'm also > making the assumption your either the sole user of the database, or > have a long period where the database is not in use (overnight?). No, I wouldn't delete things. I don't want to bypass RI, just not have to deal with removing/creating all the rules every time I need to clean up some data. In most cases, yes, I can either take db offline for an hour or ensure that there will be no writes to the db. > What do you use to clean it up? Custom script for each job? Regular > expressions? Simple spreadsheet like format filling in numbers? > Complete dump and replace of the data? Generally, I'm doing something like pulling the data into a text file and using regexes or spreadsheet tools to clean it up. Some of which could be done (through plperl or plpython or such), but is often easier with full text manipulation/emacs/etc. Sometimes, though, I'm just cleaning out test data. For example: often, I'll create a table where records can't be deleted w/out logging information going into another table (via rule or trigger, and I usually prohibit deletions at all from the log table). I'll put some fake records in, delete a few, see the logging data, and later, when I want to delete the fake data (& the fake logging data), I'll use TRUNCATE. I could only do this w/a normal DELETE by dropping these rules/triggers, deleting, and re-creating. Which is more of a pain than I'd like to do. Given that only the owner of a table can truncate it, I'm not too worried about the security of truncate: the owner is the person who would understand the ramifications of truncate vs. delete. Having it either emit a warning that there were triggers/rules/RI or (better) requiring a FORCE parameter to truncate when there are might make others feel safe, though. - J. Joel BURTON | joel@joelburton.com | joelburton.com | aim: wjoelburton Knowledge Management & Technology Consultant
pgsql-hackers by date: