Thread: Building a database from a flat file
A database I am currently using is built and updated periodically from a flat csv file (The situation is rather unfortunate, but that's all I have right now). The schema I use is more complex than the flat file, so I follow a process to populate the tables with the data from the file. First I slurp the whole file into one temporary table, whose columns correspond to the columns in the file. Then I DELETE all the existing rows from the tables in the schema and perform a series of queries on that table to INSERT and UPDATE rows in the tables that are in the schema. Then I DELETE the data from the temporary table. I do it this way, rather than trying to synchronize it, because of the inconsistencies and redundancies in the flat file. There is more than one problem with this, but the largest is that I would like to perform this whole database rebuild within one transaction, so other processes that need to access the database can do so without noticing the disturbance. However, performing this set of events (besides populating the temporary table) within a single transaction takes a long time--over an hour in some cases. What are some suggestions to help improve performance with replacing one set of data in a schema with another? Casey
On Mar 2, 2005, at 7:15 PM, Casey T. Deccio wrote: > A database I am currently using is built and updated periodically from > a > flat csv file (The situation is rather unfortunate, but that's all I > have right now). The schema I use is more complex than the flat file, > so I follow a process to populate the tables with the data from the > file. First I slurp the whole file into one temporary table, whose > columns correspond to the columns in the file. Then I DELETE all the > existing rows from the tables in the schema and perform a series of > queries on that table to INSERT and UPDATE rows in the tables that are > in the schema. Then I DELETE the data from the temporary table. I do > it this way, rather than trying to synchronize it, because of the > inconsistencies and redundancies in the flat file. > > There is more than one problem with this, but the largest is that I > would like to perform this whole database rebuild within one > transaction, so other processes that need to access the database can do > so without noticing the disturbance. However, performing this set of > events (besides populating the temporary table) within a single > transaction takes a long time--over an hour in some cases. > > What are some suggestions to help improve performance with replacing > one > set of data in a schema with another? Why not rebuild the entire thing in a separate "build" schema then do only the stuff like copying tables inside the transaction block: BEGIN; truncate table1; truncate table2; --reset any sequences you feel you need to have reset; select * into table1 from build.table1; select * into table2 from build.table2; vacuum analyze table1; vacuum analyze table2; COMMIT; I haven't tried this method exactly, but building in a separate schema (expensive) and then doing cheap operations like copying the table into the working schema should minimize the amount of time you spend inside the transaction block. **I don't know what effect this will have on performance of the whole process, though**. Sean
Hi, Casey, Casey T. Deccio schrieb: > There is more than one problem with this, but the largest is that I > would like to perform this whole database rebuild within one > transaction, so other processes that need to access the database can do > so without noticing the disturbance. However, performing this set of > events (besides populating the temporary table) within a single > transaction takes a long time--over an hour in some cases. > > What are some suggestions to help improve performance with replacing one > set of data in a schema with another? - Create the new date in another schema, and then simply rename those two schemas for "switch over" - Create the new data in differently named tables, and then simply rename all the old and new tables for "switch over". - Have two different set of tables (maybe two identical schemas), and let your application work on a set of views. Then you can change the views via "create or replace view" for switch over. Markus -- Markus Schaber | Dipl. Informatiker | Software Development GIS Fight against software patents in EU! http://ffii.org/ http://nosoftwarepatents.org/
On Thu, 2005-03-03 at 06:23 -0700, Sean Davis wrote: > Why not rebuild the entire thing in a separate "build" schema then do > only the stuff like copying tables inside the transaction block: > Building everything in the separate "build" schema works great, but it is the DELETE (TRUNCATE won't work when foreign keys point to the table) that took the most time in the transaction. However, renaming the schemas, as suggested in other posts, worked great. Casey
On Thu, 2005-03-03 at 08:28 -0700, Markus Schaber wrote: > - Create the new date in another schema, and then simply rename those > two schemas for "switch over" > This worked very well. I created another schema ("build") and populated the tables within build. Then: BEGIN; ALTER SCHEMA public RENAME TO public_old; ALTER SCHEMA build RENAME TO public; COMMIT; /* remove data from tables in build schema... */ It was very quick and seamless. Question: is there an "easy" way to duplicate an existing schema (tables, functions, sequences, etc.)--not the data; only the schema? This way, I would only need to modify one schema (public) to make changes, and the build schema could be created each time as a duplicate of the public schema. Maintenance would be much simpler. > - Create the new data in differently named tables, and then simply > rename all the old and new tables for "switch over". > This probably would work too, but there may be problems with foreign keys in renaming the tables one at a time (unless deferrable is used). To avoid any mess, the previous one works well. > - Have two different set of tables (maybe two identical schemas), and > let your application work on a set of views. Then you can change the > views via "create or replace view" for switch over. > > Markus Casey
Hi, Casey, Casey T. Deccio schrieb: > Question: is there an "easy" way to duplicate an existing schema > (tables, functions, sequences, etc.)--not the data; only the schema? > This way, I would only need to modify one schema (public) to make > changes, and the build schema could be created each time as a duplicate > of the public schema. Maintenance would be much simpler. I do not know about schemas, but for tables you can "CREATE TABLE alpha (LIKE beta)". >>- Create the new data in differently named tables, and then simply >>rename all the old and new tables for "switch over". > This probably would work too, but there may be problems with foreign > keys in renaming the tables one at a time (unless deferrable is used). > To avoid any mess, the previous one works well. AFAIK, the foreign key relations are adopted when a table is renamed, they stick to the same table disregarding name changes. Markus -- Markus Schaber | Dipl. Informatiker | Software Development GIS Fight against software patents in EU! http://ffii.org/ http://nosoftwarepatents.org/
Casey T. Deccio wrote: > Question: is there an "easy" way to duplicate an existing schema > (tables, functions, sequences, etc.)--not the data; only the schema? > This way, I would only need to modify one schema (public) to make > changes, and the build schema could be created each time as a duplicate > of the public schema. Maintenance would be much simpler. > check the docs for pg_dump (-s) for doing structural dumps of your schema. -- _______________________________ This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. _______________________________
On Mar 3, 2005, at 1:37 PM, Casey T. Deccio wrote: > On Thu, 2005-03-03 at 08:28 -0700, Markus Schaber wrote: >> - Create the new date in another schema, and then simply rename those >> two schemas for "switch over" >> > > This worked very well. I created another schema ("build") and > populated > the tables within build. Then: > > BEGIN; > ALTER SCHEMA public RENAME TO public_old; > ALTER SCHEMA build RENAME TO public; > COMMIT; > /* remove data from tables in build schema... */ > > It was very quick and seamless. > > Question: is there an "easy" way to duplicate an existing schema > (tables, functions, sequences, etc.)--not the data; only the schema? > This way, I would only need to modify one schema (public) to make > changes, and the build schema could be created each time as a duplicate > of the public schema. Maintenance would be much simpler. Why not just create a dump of your schema (without data), drop the schema, rebuild it from the dump and then populate as normal. Then do the name switch. Sean
Hi, Casey, Casey T. Deccio schrieb: > Building everything in the separate "build" schema works great, but it > is the DELETE (TRUNCATE won't work when foreign keys point to the table) Just as an additional info: You could temporarily drop the foreing key constraints, run TRUNCATE, and recreate the constraints. Markus -- Markus Schaber | Dipl. Informatiker | Software Development GIS Fight against software patents in EU! http://ffii.org/ http://nosoftwarepatents.org/
On 2005-03-03, Markus Schaber <schabi@logix-tt.com> wrote: > - Create the new date in another schema, and then simply rename those > two schemas for "switch over" This kind of thing looks superficially attractive but it has some serious problems if you try and do it while there is other database activity. Specifically, cached plans for functions and prepared statements will not be invalidated by the rename, and will still refer to the old tables, so you may have inconsistent results. > - Have two different set of tables (maybe two identical schemas), and > let your application work on a set of views. Then you can change the > views via "create or replace view" for switch over. Same problem applies here. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services