On Mar 2, 2005, at 7:15 PM, Casey T. Deccio wrote:
> A database I am currently using is built and updated periodically from
> a
> flat csv file (The situation is rather unfortunate, but that's all I
> have right now). The schema I use is more complex than the flat file,
> so I follow a process to populate the tables with the data from the
> file. First I slurp the whole file into one temporary table, whose
> columns correspond to the columns in the file. Then I DELETE all the
> existing rows from the tables in the schema and perform a series of
> queries on that table to INSERT and UPDATE rows in the tables that are
> in the schema. Then I DELETE the data from the temporary table. I do
> it this way, rather than trying to synchronize it, because of the
> inconsistencies and redundancies in the flat file.
>
> There is more than one problem with this, but the largest is that I
> would like to perform this whole database rebuild within one
> transaction, so other processes that need to access the database can do
> so without noticing the disturbance. However, performing this set of
> events (besides populating the temporary table) within a single
> transaction takes a long time--over an hour in some cases.
>
> What are some suggestions to help improve performance with replacing
> one
> set of data in a schema with another?
Why not rebuild the entire thing in a separate "build" schema then do
only the stuff like copying tables inside the transaction block:
BEGIN;
truncate table1;
truncate table2;
--reset any sequences you feel you need to have reset;
select * into table1 from build.table1;
select * into table2 from build.table2;
vacuum analyze table1;
vacuum analyze table2;
COMMIT;
I haven't tried this method exactly, but building in a separate schema
(expensive) and then doing cheap operations like copying the table into
the working schema should minimize the amount of time you spend inside
the transaction block.
**I don't know what effect this will have on performance of the whole
process, though**.
Sean