Re: Building a database from a flat file - Mailing list pgsql-sql

From Sean Davis
Subject Re: Building a database from a flat file
Date
Msg-id 8014dca0db521733863bb7967ce4be95@mail.nih.gov
Whole thread Raw
In response to Building a database from a flat file  ("Casey T. Deccio" <ctdecci@sandia.gov>)
Responses Re: Building a database from a flat file  ("Casey T. Deccio" <ctdecci@sandia.gov>)
List pgsql-sql
On Mar 2, 2005, at 7:15 PM, Casey T. Deccio wrote:

> A database I am currently using is built and updated periodically from 
> a
> flat csv file (The situation is rather unfortunate, but that's all I
> have right now).  The schema I use is more complex than the flat file,
> so I follow a process to populate the tables with the data from the
> file.  First I slurp the whole file into one temporary table, whose
> columns correspond to the columns in the file.  Then I DELETE all the
> existing rows from the tables in the schema and perform a series of
> queries on that table to INSERT and UPDATE rows in the tables that are
> in the schema.  Then I DELETE the data from the temporary table.  I do
> it this way, rather than trying to synchronize it, because of the
> inconsistencies and redundancies in the flat file.
>
> There is more than one problem with this, but the largest is that I
> would like to perform this whole database rebuild within one
> transaction, so other processes that need to access the database can do
> so without noticing the disturbance.  However, performing this set of
> events (besides populating the temporary table) within a single
> transaction takes a long time--over an hour in some cases.
>
> What are some suggestions to help improve performance with replacing 
> one
> set of data in a schema with another?

Why not rebuild the entire thing in a separate "build" schema then do 
only the stuff like copying tables inside the transaction block:

BEGIN;

truncate table1;
truncate table2;

--reset any sequences you feel you need to have reset;

select * into table1 from build.table1;
select * into table2 from build.table2;

vacuum analyze table1;
vacuum analyze table2;

COMMIT;

I haven't tried this method exactly, but building in a separate schema 
(expensive) and then doing cheap operations like copying the table into 
the working schema should minimize the amount of time you spend inside 
the transaction block.

**I don't know what effect this will have on performance of the whole 
process, though**.

Sean



pgsql-sql by date:

Previous
From: Sean Davis
Date:
Subject: Re: how to speed up these queries ?
Next
From: Sean Davis
Date:
Subject: Re: how to speed up these queries ?