Thread: Suggestions for a data-warehouse migration routine
I've ported enough of my companies database to Postgres to make warehousing on PG a real possibility. I thought I would toss my data migration architecture ideas out for the list to shoot apart.. 1. Script on production server dumps the production database (MSSQL) to a set of delimited text files. 2. Script on production server moves files via FTP to a Postgres database server. 3. File Alteration Monitor trigger on PG server executes script when last file is transferred. 4. Script on PG server drops the target database (by issuing a "dropdb" command). 5. Script on PG server re-creates target database. (createdb command) 6. Script on PG server re-creates the tables. 7. Script on PG server issues COPY commands to import data. 8. Script on PG server indexes tables. 9. Script on PG server builds de-normalized reporting tables. 10. Script on PG server indexes the reporting tables. 11. Script on PG server creates needed reporting functions. 12. Vacuum analyze? My question revolves around the drop/create for the database. Is their significant downside to this approach? I'm taking this approach because it is simpler from a scripting point of view to simply start from scratch on each warehouse update. If I do not drop the database I would need to delete the contents of each table and drop all indexes prior to the COPY/data import. My assumption is all the table deletes and index drops would be more expensive then just droping/re-creating the entire database. Also, is the Vacuum analyze step needed on a freshly minted database where the indexes have all been newly created? Thanks in advance for all feedback. --
Richard Rowell wrote: > I've ported enough of my companies database to Postgres to make > warehousing on PG a real possibility. I thought I would toss my data > migration architecture ideas out for the list to shoot apart.. > > 1. Script on production server dumps the production database (MSSQL) to > a set of delimited text files. > 2. Script on production server moves files via FTP to a Postgres > database server. > 3. File Alteration Monitor trigger on PG server executes script when > last file is transferred. > 4. Script on PG server drops the target database (by issuing a "dropdb" > command). > 5. Script on PG server re-creates target database. (createdb command) > 6. Script on PG server re-creates the tables. > 7. Script on PG server issues COPY commands to import data. > 8. Script on PG server indexes tables. > 9. Script on PG server builds de-normalized reporting tables. > 10. Script on PG server indexes the reporting tables. > 11. Script on PG server creates needed reporting functions. > 12. Vacuum analyze? > > My question revolves around the drop/create for the database. Is their > significant downside to this approach? I'm taking this approach because > it is simpler from a scripting point of view to simply start from > scratch on each warehouse update. If I do not drop the database I would > need to delete the contents of each table and drop all indexes prior to > the COPY/data import. My assumption is all the table deletes and index > drops would be more expensive then just droping/re-creating the entire > database. I believe you are correct. If you are going to completely wipe the database, just drop it and re-create. Deleting is much slower than dropping. (One of the uses of partitioning is so that you can just drop one of the tables, rather than deleting the entries). Dropping the whole db skips any Foreign Key checks, etc. > > Also, is the Vacuum analyze step needed on a freshly minted database > where the indexes have all been newly created? > > Thanks in advance for all feedback. ANALYZE is needed, since you haven't updated any of your statistics yet. So the planner doesn't really know how many rows there are. VACUUM probably isn't since everything should be pretty well aligned. John =:->
Attachment
Quoting Richard Rowell <richard@bowmansystems.com>: > I've ported enough of my companies database to Postgres to make > warehousing on PG a real possibility. I thought I would toss my > data > migration architecture ideas out for the list to shoot apart.. > [...] Not much feedback required. Yes, dropping the entire database is faster and simpler. If your database is small enough that you can rebuild it from scratch every time, go for it. Yes, vacuum analyze required; creating indexes alone does not create statistics. From a I'd dump an extract of pg_stat[io_]user_(tables|indexes) to see how index usage and table load changes over time. -- "Dreams come true, not free." -- S.Sondheim, ITW